In `async` functions, a closure environment is created for variables
that cross an await boundary - this closure environment is kept in
memory for the lifetime of the associated future - this means that
although _some_ variables are no longer used, they still take up memory
for a long time.
In Nimbus, message validation is processed in batches meaning the future
of an incoming gossip message stays around for quite a while - this
leads to memory consumption peaks of 100-200 mb when there are many
attestations in the pipeline.
To avoid excessive memory usage, it's generally better to move non-async
code into proc's such that the variables therein can be released earlier
- this includes the many hidden variables introduced by macro and
template expansion (ie chronicles that does expensive exception
handling)
* move seen table salt to floodsub, use there as well
* shorten seen table salt to size of hash
* avoid unnecessary memory allocations and copies in a few places
* factor out message scoring
* avoid reencoding outgoing message for every peer
* keep checking validators until reject (in case there's both reject and
ignore)
* `readOnce` avoids `readExactly` overhead for single-byte read
* genericAssign -> assign2
* Refactor gossipsub into multiple modules
* splitup further gossipsub
* move more mesh related stuff to behavior
* fix internal tests
* fix PubSubPeer.outbound flag, make it more reliable
* use discard rather then _
* Remove unused connections in pubsubpeer, also removed wrong usages, add a disconnect bad peers parameter
* handle exceptions in disconnectPeer
* small fix
* use the proper disconnection procedure for gossip peers
* fixes, more metrics add test about disconnection
* hot fix possible null pointers in switch
* silly isnil sugar
* Fix and test gossip directPeer connections
* salt ids in seen table
* add subscription validation callback and avoid processing topics we don't care of
* apply penalty on bad subscription
* fix IHave handling IDs
* reduce indenting, add some comments
* fix gossip randombytes generation
* do not descore unwanted topics (might happen, due to timing, needs improvements)
* cleaning up and added tests
* validate subscriptions only when subscribing
* set notice level for failed publish
* fix floodsub behavior
* add more traces, remove async from rebalance
* more traces
* avoid computng scores when weight is 0.0
* debug colocation, fix an indent in unsubpeer (minor)
* add full ValidationResult coverage
* store in cache only after validation
* gossip 1.0 fixes
* fix typo
* gossip 10 internal test fixes
* test fixing
* refactor peerstats usages
* populate tables if missing when scoring
* move gossip parameters to runtime
* internal test fixes
* add missing params
* restore const parameters are soldi base and use them in init
* more constants tuning
To break a potential read/write deadlock, gossipsub uses an unbounded
queue for writes - when peers are too slow to process this queue, it may
end up growing without bounds causing high memory usage.
Here, we introduce a maximum write queue length after which the peer is
disconnected - the queue is generous enough that any "normal" usage
should be fine - writes that are `await`:ed are not affected, only
writes that are launched in an `asyncSpawn` task or similar.
* avoid unnecessary copy of message when there are no send observers
* release message memory earlier in gossipsub
* simplify pubsubpeer logging
When messages can't be sent to peer, we try to establish a send
connection - this causes messages to stack up as more and more unsent
messages are blocked on the dial lock.
* remove dial lock
* run reconnection loop in background task
* mcache fixes
* remove timed cache - the window shifting already removes old messages
* ref -> object
* avoid unnecessary allocations with `[]` operator
* simplify init
* fix several gossipsub/floodsub issues
* floodsub, gossipsub: don't rebroadcast messages that fail validation
(!)
* floodsub, gossipsub: don't crash when unsubscribing from unknown
topics (!)
* gossipsub: don't send message to peers that are not interested in the
topic, when messages don't share topic list
* floodsub: don't repeat all messages for each message when
rebroadcasting
* floodsub: allow sending empty data
* floodsub: fix inefficient unsubscribe
* sync floodsub/gossipsub logging
* gossipsub: include incoming messages in mcache (!)
* gossipsub: don't rebroadcast already-seen messages (!)
* pubsubpeer: remove incoming/outgoing seen caches - these are already
handled in gossipsub, floodsub and will cause trouble when peers try to
resubscribe / regraft topics (because control messages will have same
digest)
* timedcache: reimplement without timers (fixes timer leaks and extreme
inefficiency due to per-message closures, futures etc)
* timedcache: ref -> obj
* move pubsub of off switch, pass switch into pubsub
* use join on lpstreams
* properly cleanup up failed peers
* fix tests
* fix peertable hasPeerId
* fix tests
* rework sending, remove helpers from pubsubpeer, unify in broadcast
* further split broadcast into send
* use send where appropriate
* use formatIt
* improve trace
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
* add finegrained timeouts to pubsub
* use 10 millis timeout in tests
* finalization
* revert timeouts
* use `atEof` for reads
* adjust timeouts and use atEof for reads
* use atEof for reads
* set isEof flag
* no backoff for pubsub streams
* temp timer increase, make macos finalize
* don't call `subscribePeer` in libp2p anymore
* more traces
* leak tests
* lower timeouts
* handle exceptions in control message
* don't use `cancelAndWait`
* handle exceptions in helpers
* wip
* don't send empty messages
* check for leaks properly
* don't use cancelAndWait
* don't await subscribption sends
* remove subscrivePeer calls from switch
* trying without the hooks again
* Fix gossip messages seqno according to spec
* Add peers back to gossipsub table, slow down heartbeat
* Revert "Add peers back to gossipsub table, slow down heartbeat"
This reverts commit 01e2e62172.
* make seqno a threadvar, remove from peerinfo
* seqno refactor, into pubsub
* more cleanup
* fix tests
* merging master
* remove `withLock` as it conflicts with stdlib
* wip
* more fanout ttl
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
* gossipsub is a function of subscription messages only
* graft/prune work with mesh, get filled up from gossipsub
* fix race conditions with await
* fix exception unsafety when grafting/pruning
* fix allowing up to DHi peers in mesh on incoming graft
* fix metrics in several places
* Peer resultification and defect only
* Fixing some tests
* test fixes
* Rename peer into peerid
* better result error message in identify
* further merge fixes
* don't send public key in message when not signing (information leak)
* don't run rebalance if there are peers in gossip (see #242)
* don't crash randomly on bad peer id from remote
* Remove noise padding payload (spec removed it)
* add log scope in secure
* avoid defect array out of range in switch secure when "na"
* improve identify traces
* wip noise fixes
* noise protobuf adjustments (trying)
* add more debugging messages/traces, improve their actual contents
* re-enable ID check in noise
* bump go daemon tag version
* bump go daemon tag version
* enable noise in daemonapi
* interop testing, (both secio and noise will be tested)
* azure cache bump (p2pd)
* CI changes
- Travis: use Go 1.14
- azure-pipelines.yml: big cleanup
- Azure: bump cache keys
- build 64-bit p2pd on 32-bit Windows
- install both Mingw-w64 architectures
* noise logging fixes
* alternate testing between noise and secio
* increase timeout to avoid VM errors in CI (multistream tests)
* refactor heartbeat management in gossipsub
* remove locking within heartbeat
* refactor heartbeat management in gossipsub
* remove locking within heartbeat
Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
* count published messages
* don't call `switch.dial` in `subscribeToPeer`
* don't use delegation in connection
* move connection out to own file
* don't breakout on reset
* make sure to call close on secured conn
* add lpstream tracing
* don't breackdown by conn id
* fix import
* remove unused lable
* reset connection on exception
* add additional metrics for skipped messages
* check for nil in secure.close
* Start adding some metrics to pubsub
In order to visualize it's functionality
Still WIP
* more metrics
* add per topic metrics
* finishup with requested metrics
* add a metrisServer define to start local server
* PR fixes and cleanup
This means we can use it from other protocols that inherit GossipSub. Otherwise,
a lot of internal state (heartbeat lock etc) doesn't get initialized properly.
* call write until all is written out
* wip: rework with proper half-closed
* add eof and closed handling
* wip
* close connection on chronos close
* don't use read
* make noise work again
* don't reraise just yet
* fixes after backporting
* remove on transport close cleanup
* revert back allread
* rust interop fixes
* read from stream
* inc count before closing
* rebasing master
* store incomming connections
* fix merge
* remove unneeded changes
* use internal close flag to indicate disposal
* call write until all is written out
* add comments to lpchannel fields
* add an eof flag to signal which end closed
* wip: rework with proper half-closed
* add eof and closed handling
* propagate closes to piped
* call parent close
* moving bufferstream trackers out
* move writeLock to bufferstream
* move writeLock out
* remove unused call
* wip
* rebasing master
* fix mplex tests
* wip
* fix bufferstream after backport
* wip
* rename to differentiate from chronos tracker
* close connection on chronos close
* make reset request asyncCheck
* fix channel cleanup
* misc
* don't use read
* fix backports
* make noise work again
* proper exception handling
* don't reraise just yet
* add convenience templates
* dont double wrap
* use async pragma
* fixes after backporting
* muxer owns connection
* remove on transport close cleanup
* revert back allread
* adding some todos
* read from stream
* inc count before closing
* rebasing master
* rebase master
* use correct exception type
* use try/finally insted of defer
* fix compile in trace mode
* reset channels on mplex close