* gossipsub: unsubscribe fixes
* fix KeyError when updating metric of unsubscribed topic
* fix unsubscribe message not being sent to all peers causing them to
keep thinking we're still subscribed
* release memory earlier in a few places
* floodsub fix
In `async` functions, a closure environment is created for variables
that cross an await boundary - this closure environment is kept in
memory for the lifetime of the associated future - this means that
although _some_ variables are no longer used, they still take up memory
for a long time.
In Nimbus, message validation is processed in batches meaning the future
of an incoming gossip message stays around for quite a while - this
leads to memory consumption peaks of 100-200 mb when there are many
attestations in the pipeline.
To avoid excessive memory usage, it's generally better to move non-async
code into proc's such that the variables therein can be released earlier
- this includes the many hidden variables introduced by macro and
template expansion (ie chronicles that does expensive exception
handling)
* move seen table salt to floodsub, use there as well
* shorten seen table salt to size of hash
* avoid unnecessary memory allocations and copies in a few places
* factor out message scoring
* avoid reencoding outgoing message for every peer
* keep checking validators until reject (in case there's both reject and
ignore)
* `readOnce` avoids `readExactly` overhead for single-byte read
* genericAssign -> assign2
* properly propagate initiator information for gossipsub
* Fix pubsubpeer lifetime management
* restore old behavior
* tests fixing
* clamp backoff time value received
* fix member name collisions
* internal test fixes
* better names and explaining of the importance of transport direction
* fixes
* Refactor gossipsub into multiple modules
* splitup further gossipsub
* move more mesh related stuff to behavior
* fix internal tests
* fix PubSubPeer.outbound flag, make it more reliable
* use discard rather then _
* salt ids in seen table
* add subscription validation callback and avoid processing topics we don't care of
* apply penalty on bad subscription
* fix IHave handling IDs
* reduce indenting, add some comments
* fix gossip randombytes generation
* do not descore unwanted topics (might happen, due to timing, needs improvements)
* cleaning up and added tests
* validate subscriptions only when subscribing
* set notice level for failed publish
* fix floodsub behavior
* break stream tracking by type
* use closeWithEOF to await wrapped stream
* fix cancelation leaks
* fix channel leaks
* logging
* use close monitor and always call closeUnderlying
* don't use closeWithEOF
* removing close monitor
* logging
* add helper to read EOF marker after closing stream (else stream stay
alive until timeout/reset)
* don't assert on empty channel message
* don't loop when writing to chronos (no need)
When messages can't be sent to peer, we try to establish a send
connection - this causes messages to stack up as more and more unsent
messages are blocked on the dial lock.
* remove dial lock
* run reconnection loop in background task
* add peer lifecycle events
* rework peer events to not use connection events
* don't use result in pubsub and switch init
* wip
* use ordered hashes and remove logscope
* logging
* add missing test
* small fixes
* mcache fixes
* remove timed cache - the window shifting already removes old messages
* ref -> object
* avoid unnecessary allocations with `[]` operator
* simplify init
* fix several gossipsub/floodsub issues
* floodsub, gossipsub: don't rebroadcast messages that fail validation
(!)
* floodsub, gossipsub: don't crash when unsubscribing from unknown
topics (!)
* gossipsub: don't send message to peers that are not interested in the
topic, when messages don't share topic list
* floodsub: don't repeat all messages for each message when
rebroadcasting
* floodsub: allow sending empty data
* floodsub: fix inefficient unsubscribe
* sync floodsub/gossipsub logging
* gossipsub: include incoming messages in mcache (!)
* gossipsub: don't rebroadcast already-seen messages (!)
* pubsubpeer: remove incoming/outgoing seen caches - these are already
handled in gossipsub, floodsub and will cause trouble when peers try to
resubscribe / regraft topics (because control messages will have same
digest)
* timedcache: reimplement without timers (fixes timer leaks and extreme
inefficiency due to per-message closures, futures etc)
* timedcache: ref -> obj
* remove send lock
When mplex receives data it will block until a reader has processed the
data. Thus, when a large message is received, such as a gossipsub
subscription table, all of mplex will be blocked until all reading is
finished.
However, if at the same time a `dial` to establish a gossipsub send
connection is ongoing, that `dial` will be blocked because mplex is no
longer reading data - specifically, it might indeed be the connection
that's processing the previous data that is waiting for a send
connection.
There are other problems with the current code:
* If an exception is raised, it is not necessarily raised for the same
connection as `p.sendConn`, so resetting `p.sendConn` in the exception
handling is wrong
* `p.isConnected` is checked before taking the lock - thus, if it
returns false, a new dial will be started. If a new task enters `send`
before dial is finished, it will also determine `p.isConnected` is
false, then get stuck on the lock - when the previous task finishes and
releases the lock, the new task will _also_ dial and thus reset
`p.sendConn` causing a leak.
* prefer existing connection
simplifies flow
* move pubsub of off switch, pass switch into pubsub
* use join on lpstreams
* properly cleanup up failed peers
* fix tests
* fix peertable hasPeerId
* fix tests
* rework sending, remove helpers from pubsubpeer, unify in broadcast
* further split broadcast into send
* use send where appropriate
* use formatIt
* improve trace
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
* prefer PeerID in switch api
This avoids ref issues like ref identity and nil
* use existing peerinfo instance if possible
* remove secureCodec
there may be multiple connections per peerinfo with different codecs
* avoid some extra async::
* add finegrained timeouts to pubsub
* use 10 millis timeout in tests
* finalization
* revert timeouts
* use `atEof` for reads
* adjust timeouts and use atEof for reads
* use atEof for reads
* set isEof flag
* no backoff for pubsub streams
* temp timer increase, make macos finalize
* don't call `subscribePeer` in libp2p anymore
* more traces
* leak tests
* lower timeouts
* handle exceptions in control message
* don't use `cancelAndWait`
* handle exceptions in helpers
* wip
* don't send empty messages
* check for leaks properly
* don't use cancelAndWait
* don't await subscribption sends
* remove subscrivePeer calls from switch
* trying without the hooks again
* Fix gossip messages seqno according to spec
* Add peers back to gossipsub table, slow down heartbeat
* Revert "Add peers back to gossipsub table, slow down heartbeat"
This reverts commit 01e2e62172.
* make seqno a threadvar, remove from peerinfo
* seqno refactor, into pubsub