Commit Graph

269 Commits

Author SHA1 Message Date
Giovanni Petrantoni f8f0bc1bd8
Gossip improvements (#476)
* add more traces, remove async from rebalance

* more traces

* avoid computng scores when weight is 0.0

* debug colocation, fix an indent in unsubpeer (minor)

* add full ValidationResult coverage

* store in cache only after validation

* gossip 1.0 fixes

* fix typo

* gossip 10 internal test fixes

* test fixing

* refactor peerstats usages

* populate tables if missing when scoring
2020-12-15 10:25:22 +09:00
Jacek Sieka 6f1ecc8df7
streamline socket read/write hot path (#473)
* streamline socket read/write hot path

This avoids some unnecessary memory copying on the hot path of noise /
mplex, as well as getting rid of a few futures - profiling shows that
this is one of the main culprits of small memory allocations, which
makes sense - this is where gossip fan-out happens.

* fewer futures (and corresponding closures) when sending lpchannel
messages
* avoid data copies when encrypting and framing noise messages
* avoid copying tuple when reading noise data (poor c codegen)
* fix setting eof flag in secure read

* write noise frames in one go

...and closing secure socket once is enough
2020-12-09 08:56:40 -06:00
Jacek Sieka 1befeb8c2e
clean up peerid (#470)
* fix dangling cstring on error return
* remove some useless inlines
* less mallocs in shortlog
* proc -> func
* rename test
2020-12-03 13:53:16 -06:00
Dmitriy Ryajov d1c689e5ab
adding libp2p tag to logScope (#465) 2020-12-01 11:34:27 -06:00
Giovanni Petrantoni e1648d4404 fix mcache logic check in gossipsub 2020-12-01 23:55:51 +09:00
Giovanni Petrantoni b4738d723c
Some gossip fixes (#467)
* fix some missing rpc in rebalanceMesh

* clarify some variable names and lifetime

* further improvements
2020-12-01 11:44:09 +01:00
Dmitriy Ryajov 18443dafc1
rework peer event to take an initiator flag (#456)
* rework peer event to take an initiator flag

* use correct direction for initiator
2020-11-28 10:59:47 -06:00
Dmitriy Ryajov 3d44fcb8b3
use cancelAndAwait to mitigate further hangs (#459) 2020-11-28 09:48:06 -06:00
Giovanni Petrantoni 02b20440f2
Limit ihave emission (#462)
* add some limits to ihave emission, go has them, rust does not actually

* restore shuffling of IDs

* add some context
2020-11-28 16:27:39 +09:00
Giovanni Petrantoni 12db9a4cf2 TopicParams validation tuning 2020-11-28 00:23:09 +09:00
Giovanni Petrantoni 809df8d04d add some extra gossip metrics 2020-11-26 16:20:34 +09:00
Giovanni Petrantoni 6c7f2766fe
expose seenTTL parameters (#457) 2020-11-26 14:45:10 +09:00
Dmitriy Ryajov 7b1e652224
Allow custom identify agent string (#451)
* allow custom agent version string

* rework tests and add test for custom agent version
2020-11-25 07:42:02 -06:00
Dmitriy Ryajov 164892776b
get rid of hangs cleanup (#453) 2020-11-25 07:35:25 -06:00
Dmitriy Ryajov 21110636cb
fixing log level to avoid sacring users (#452) 2020-11-24 12:07:27 -06:00
Dmitriy Ryajov 69ae24dc8d
less leak prone cleanup (#447)
* less leak prone cleanup

* fix double allFinished
2020-11-23 18:22:15 -06:00
Dmitriy Ryajov 6cc3f4283a
update conn peerinfo instead of replacing (#445)
* update conn peerinfo instead of replacing

* remove unnecesary peerid var
2020-11-23 15:15:55 -06:00
Dmitriy Ryajov 034a1e8b1b
small cleanups from tcp-limits2 (#446) 2020-11-23 15:02:23 -06:00
Giovanni Petrantoni 93b6c4dc52
Gossip runtime params (#437)
* move gossip parameters to runtime

* internal test fixes

* add missing params

* restore const parameters are soldi base and use them in init

* more constants tuning
2020-11-19 16:48:17 +09:00
Giovanni Petrantoni a9948b0b05
clarify validation messages (#431)
* clarify validation messages

* add codecov threshold
2020-11-12 01:42:12 +09:00
Dmitriy Ryajov 90921bff09
move some importance trace logs to debug (#428) 2020-11-09 22:14:46 -06:00
Dmitriy Ryajov 3956f3fd69
make sure all streams are tracked (#422)
* make sure all streams are tracked

* revert unnecesary change
2020-11-04 21:52:54 -06:00
Giovanni Petrantoni e496802943
Least expensive metrics (#421)
* add more general and useful metrics

* fix gossipsub peers metrics in heartbeat
2020-11-04 15:18:00 +01:00
Dmitriy Ryajov 7b5259dbc7
Move triggers (#416)
* move event triggers to connmanager

* use base error type

* avoid deadlocks

* handle eof and closed when identifying incoming

* use `closeWait`
2020-11-02 14:35:26 -06:00
Dmitriy Ryajov 43a77e60a1
split stream counts by direction (#418) 2020-11-01 16:23:26 -06:00
Jacek Sieka 03639f1446
Revert "Channel leaks (#413)" (#417)
This reverts commit 1de1d49223.
2020-11-01 14:49:25 -06:00
Giovanni Petrantoni 75b023c9e5
gossipsub audit fixes (#412)
* [SEC] gossipsub - rebalanceMesh grafts peers giving preference to low scores #405

* comment score choices

* compiler warning fixes/bug fixes (unsubscribe)

* rebalanceMesh does not enforce D_out quota

* fix outbound grafting

* fight the nim compiler

* fix closure capture bs...

* another closure fix

* #403 rebalance prune fixes

* more test fixing

* #403 fixes

* #402 avoid removing scores on unsub

* #401 handleGraft improvements

* [SEC] handleIHAVE/handleIWANT recommendations

* add a note about peer exchange handling
2020-10-30 21:49:54 +09:00
Dmitriy Ryajov 1de1d49223
Channel leaks (#413)
* break stream tracking by type

* use closeWithEOF to await wrapped stream

* fix cancelation leaks

* fix channel leaks

* logging

* use close monitor and always call closeUnderlying

* don't use closeWithEOF

* removing close monitor

* logging
2020-10-27 11:21:03 -06:00
Giovanni Petrantoni 462da1f7a8
gossip MessageID as seq[byte] (#391)
* gossip MessageID as seq[byte]

* combina hashes in defaultMsgIdProvider

* wip

* fix defaultMsgIdProvider
2020-10-21 12:26:04 +09:00
Giovanni Petrantoni 27b9bf436e
fix validation according to specification (#410) 2020-10-21 12:25:42 +09:00
Giovanni Petrantoni 5c19668b2d
avoid verbose EOF messages in readOnce(secure) (#411)
* avoid verbose EOF messages in readOnce(secure)

* shorten azure tests further
2020-10-21 10:08:24 +09:00
Giovanni Petrantoni 32623b930e
handle secure errors in readOnce (secure) (#397)
* handle secure errors in readOnce(secure)

* small synthax fix

* fix mistake in readOnce's isNil
2020-10-19 14:13:14 +09:00
Giovanni Petrantoni 556213abf4
Extended validators (#395)
* gossip extended validation

* fix flood tests

* fix gossip 1.0 tests

* synthax consistency
2020-10-12 16:56:00 +09:00
Giovanni Petrantoni e3bdb9eb13
decode properly ControlPrune (#392) 2020-10-09 09:12:38 +09:00
Giovanni Petrantoni 0f2435f551
better opportunistic grafting score (when score is disabled) (#389) 2020-10-03 09:26:45 +09:00
Giovanni Petrantoni 4a98a8af5a
gossip pruning fixes related to #371 (#385)
* gossip pruning fixes related to #371

* better trace for grafted/pruned

* shorted azure testing again
2020-10-02 13:09:31 +09:00
Mamy Ratsimbazafy 03f5bbba6d
saner logging (#381) 2020-09-29 09:40:06 -06:00
Giovanni Petrantoni 98d0cc3a16
defaultMsgIdProvider alternative/test anonymize (#379)
* defaultMsgIdProvider alternative/test anonymize

* avoid freeze during flood tests

* avoid `empty message, skipping` situation

* test observers

* avoid double initPubSub

* fix gossip testing (specially when anonymize is on)

* make azure tests shorter
2020-09-28 09:11:18 +02:00
Jacek Sieka 8ecef46738
reencode gossipsub messages with anonymization (#378)
This helps protect against clients sending more data than they should
and thus getting penalized on topics that require anonymity
2020-09-25 18:39:34 +02:00
Jacek Sieka 17e00e642a
limit write queue length (#376)
To break a potential read/write deadlock, gossipsub uses an unbounded
queue for writes - when peers are too slow to process this queue, it may
end up growing without bounds causing high memory usage.

Here, we introduce a maximum write queue length after which the peer is
disconnected - the queue is generous enough that any "normal" usage
should be fine - writes that are `await`:ed are not affected, only
writes that are launched in an `asyncSpawn` task or similar.

* avoid unnecessary copy of message when there are no send observers
* release message memory earlier in gossipsub
* simplify pubsubpeer logging
2020-09-24 18:43:20 +02:00
Jacek Sieka 25bd0a18f4
small fixes (#374)
* add helper to read EOF marker after closing stream (else stream stay
alive until timeout/reset)
* don't assert on empty channel message
* don't loop when writing to chronos (no need)
2020-09-24 07:30:19 +02:00
Giovanni Petrantoni ec322124ac
allow to omit peerId and seqno (#372)
* allow to omit peerId and seqno

* small refactor

* wip

* fix message encoding

* improve rpc signature logic

* remove peerid from verify

* trace fixes

* fix message test

* fix gossip 1.0 tests
2020-09-23 17:56:33 +02:00
Jacek Sieka 471e5906f6
fix gossipsub memory leak on disconnected peer (#371)
When messages can't be sent to peer, we try to establish a send
connection - this causes messages to stack up as more and more unsent
messages are blocked on the dial lock.

* remove dial lock
* run reconnection loop in background task
2020-09-22 09:05:53 +02:00
Giovanni Petrantoni b99d2039a8
Gossip one one (#240)
* allow multiple codecs per protocol (without breaking things)

* add 1.1 protocol to gossip

* explicit peering part 1

* explicit peering part 2

* explicit peering part 3

* PeerInfo and ControlPrune protocols

* fix encodePrune

* validated always, even explicit peers

* prune by score (score is stub still)

* add a way to pass parameters to gossip

* standard setup fixes

* take into account explicit direct peers in publish

* add floodPublish logic

* small fixes, publish still half broken

* make sure to waitsub in sparse test

* use var semantics to optimize table access

* wip... lvalues don't work properly sadly...

* big publish refactor, replenish and balance

* fix internal tests

* use g.peers for fanout (todo: don't include flood peers)

* exclude non gossip from fanout

* internal test fixes

* fix flood tests

* fix test's trypublish

* test interop fixes

* make sure to not remove peers from gossip table

* restore old replenishFanout

* cleanups

* restore utility module import

* restore trace vs debug in gossip

* improve fanout replenish behavior further

* triage publish nil peers (issue is on master too but just hidden behind a if/in)

* getGossipPeers fixes

* remove topics from pubsubpeer (was unused)

* simplify rebalanceMesh (following spec) and make it finally reach D_high

* better diagnostics

* merge new pubsubpeer, copy 1.1 to new module

* fix up merge

* conditional enable gossip11 module

* add back topics in peers, re-enable flood publish

* add more heartbeat locking to prevent races

* actually lock the heartbeat

* minor fixes

* with sugar

* merge 1.0

* remove assertion in publish

* fix multistream 1.1 multi proto

* Fix merge oops

* wip

* fix gossip 11 upstream

* gossipsub11 -> gossipsub

* support interop testing

* tests fixing

* fix directchat build

* control prune updates (pb)

* wip parameters

* gossip internal tests fixes

* parameters wip

* finishup with params

* cleanups/wip

* small sugar

* grafted and pruned procs

* wip updateScores

* wip

* fix logging issue

* pubsubpeer, chronicles explicit override

* fix internal gossip tests

* wip

* tables troubleshooting

* score wip

* score wip

* fixes

* fix test utils generateNodes

* don't delete while iterating in score update

* fix grafted defect

* add a handleConnect in subscribeTopic

* pruning improvements

* wip

* score fixes

* post merge - builds gossip tests

* further merge fixes

* rebalance improvements and opportunistic grafting

* fix test for now

* restore explicit peering

* implement peer exchange graft message

* add an hard cap to PX

* backoff time management

* IWANT cap/budget

* Adaptive gossip dissemination

* outbound mesh quota, internal tests fixing

* oversub prune score based, finish outbound quota

* finishup with score and ihave budget

* use go daemon 0.3.0

* import fixes

* byScore cleanup score sorting

* remove pointless scaling in `/` Duration operator

* revert using libp2p org for daemon

* interop fixes

* fixes and cleanup

* remove heartbeat assertion, minor debug fixes

* logging improvements and cleaning up

* (to revert) add some traces

* add explicit topic to gossip rpcs

* pubsub merge fixes and type fix in switch

* Revert "(to revert) add some traces"

This reverts commit 4663eaab6cc336c81cee50bc54025cf0b7bcbd99.

* cleanup some now irrelevant todo

* shuffle peers anyway as score might be disabled

* add missing shuffle

* old merge fix

* more merge fixes

* debug improvements

* re-enable gossip internal tests

* add gossip10 fallback (dormant but tested)

* split gossipsub internal tests into 1.0 and 1.1

Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com>
2020-09-21 11:16:29 +02:00
Jacek Sieka b7e5d1122c
cleanups (#366)
* reuse connection timeout for noise handshake (avoid extra timer)
* enforce nbytes > 0 for readOnce
* avoid some unnecessary memory zeroing
* simplify noise
* fix dumping when noise splits message
2020-09-16 11:55:25 +02:00
Dmitriy Ryajov b0d86b95dd
add peer lifecycle events (#357)
* add peer lifecycle events

* rework peer events to not use connection events

* don't use result in pubsub and switch init

* wip

* use ordered hashes and remove logscope

* logging

* add missing test

* small fixes
2020-09-15 14:19:22 -06:00
Giovanni Petrantoni a6007be428
avoid sending empty seqno and/or fromPeer (gossip rpc) (#364) 2020-09-15 12:33:18 +02:00
Eugene Kabanov e2d46c6b53
Add `libp2p_dump` and `libp2p_dump_dir` compiler time options. (#359)
* Add `libp2p_dump` and `libp2p_dump_dir` compiler time options.

* Add tools/pbcap_parser.

* Add mplex control messages decoding.
2020-09-15 11:16:43 +02:00
Oskar Thorén 5e66f6fbd8
Add logScope to connmanager and pubsubprotobuf (#363) 2020-09-15 08:03:53 +02:00
Jacek Sieka 96d4c44fec
refactor bufferstream to use a queue (#346)
This change modifies how the backpressure algorithm in bufferstream
works - in particular, instead of working byte-by-byte, it will now work
seq-by-seq.

When data arrives, it usually does so in packets - in the current
bufferstream, the packet is read then split into bytes which are fed one
by one to the bufferstream. On the reading side, the bytes are popped of
the bufferstream, again byte by byte, to satisfy `readOnce` requests -
this introduces a lot of synchronization traffic because the checks for
full buffer and for async event handling must be done for every byte.

In this PR, a queue of length 1 is used instead - this means there will
at most exist one "packet" in `pushTo`, one in the queue and one in the
slush buffer that is used to store incomplete reads.

* avoid byte-by-byte copy to buffer, with synchronization in-between
* reuse AsyncQueue synchronization logic instead of rolling own
* avoid writeHandler callback - implement `write` method instead
* simplify EOF signalling by only setting EOF flag in queue reader (and
reset)
* remove BufferStream pipes (unused)
* fixes drainBuffer deadlock when drain is called from within read loop
and thus blocks draining
* fix lpchannel init order
2020-09-10 08:19:13 +02:00