In `async` functions, a closure environment is created for variables
that cross an await boundary - this closure environment is kept in
memory for the lifetime of the associated future - this means that
although _some_ variables are no longer used, they still take up memory
for a long time.
In Nimbus, message validation is processed in batches meaning the future
of an incoming gossip message stays around for quite a while - this
leads to memory consumption peaks of 100-200 mb when there are many
attestations in the pipeline.
To avoid excessive memory usage, it's generally better to move non-async
code into proc's such that the variables therein can be released earlier
- this includes the many hidden variables introduced by macro and
template expansion (ie chronicles that does expensive exception
handling)
* move seen table salt to floodsub, use there as well
* shorten seen table salt to size of hash
* avoid unnecessary memory allocations and copies in a few places
* factor out message scoring
* avoid reencoding outgoing message for every peer
* keep checking validators until reject (in case there's both reject and
ignore)
* `readOnce` avoids `readExactly` overhead for single-byte read
* genericAssign -> assign2
To break a potential read/write deadlock, gossipsub uses an unbounded
queue for writes - when peers are too slow to process this queue, it may
end up growing without bounds causing high memory usage.
Here, we introduce a maximum write queue length after which the peer is
disconnected - the queue is generous enough that any "normal" usage
should be fine - writes that are `await`:ed are not affected, only
writes that are launched in an `asyncSpawn` task or similar.
* avoid unnecessary copy of message when there are no send observers
* release message memory earlier in gossipsub
* simplify pubsubpeer logging
if the connection is already closed (because the remote closes during
identfiy for example), an exception would be raised which would leave
the connection in limbo, beacuse it would not go through the rest of
internalConnect.
Also, if the connection is already closed, the disconnect event would be
scheduled before the connect event :/
* move pubsub of off switch, pass switch into pubsub
* use join on lpstreams
* properly cleanup up failed peers
* fix tests
* fix peertable hasPeerId
* fix tests
* rework sending, remove helpers from pubsubpeer, unify in broadcast
* further split broadcast into send
* use send where appropriate
* use formatIt
* improve trace
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
* Fix gossip messages seqno according to spec
* Add peers back to gossipsub table, slow down heartbeat
* Revert "Add peers back to gossipsub table, slow down heartbeat"
This reverts commit 01e2e62172a7793bb17f0eb8314e2faeb2682173.
* make seqno a threadvar, remove from peerinfo
* seqno refactor, into pubsub
* Minprotobuf initial commit
* Fix noise.
* Add signed integers support.
Add checks for field number value.
Remove some casts.
* Fix compile errors.
* Fix comments and constants.
* more cleanup
* fix tests
* merging master
* remove `withLock` as it conflicts with stdlib
* wip
* more fanout ttl
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
* Peer resultification and defect only
* Fixing some tests
* test fixes
* Rename peer into peerid
* better result error message in identify
* further merge fixes
* don't send public key in message when not signing (information leak)
* don't run rebalance if there are peers in gossip (see #242)
* don't crash randomly on bad peer id from remote
* Start adding some metrics to pubsub
In order to visualize it's functionality
Still WIP
* more metrics
* add per topic metrics
* finishup with requested metrics
* add a metrisServer define to start local server
* PR fixes and cleanup
* add verify signature flag
* add sign flag to enable/disable msg signing
* moving internal tests out to their own file
* cleanup nimble file
* remove unneeded tests
* move pubsub tests out
* fix tests
* Add chronos trackers and used them to sanitize resource disposal
* Chronos trackers for transport tests wip
* No more chronos leaks in testtransport
* Make tcp transport and test more robust when closing
* Test async leaking tracking wip
* Fix a regression in wire connect
* Add chronos trackers to more tests and sanitize resource closure
* Wip fixing floodsub tests
* Floodsub wip
* Made floodsub basically deterministic, hit a nim bug with captures tho
* Wrap up floodsub tests refactor
* Wrapping up
* Add allFuturesThrowing utility
* Fix missing allFuturesThrowing in noise tests!
* Make tests green
* attempt fixing gossipsub failing cases
* Make sure to check also fanout in waitSub
* More verbose traces
* Gossipsub test improvments
* Refactor TcpTransport remove asyncCheck
* Add Connection trackers
* Add stricter connection tracking, wip mplex fix
* More asynccheck removal, in order to avoid connection leaks
* bump chronicles requirement
* Enable tracker dump to check CI output
* Wait for more futures in testmplex
* Remove tracker dump messages
* add tryAndWarn utility, fix mplex issue with go interop
* All allFuturesThrowing to directchat too
* make sure to cleanup on transport close
* Make traces less verbose with shortHexDump utility
* Rename shortHexDump into shortLog
* Improve shortLog, add shortLog for crypto keys
* Add proper shortLog implementations in messages
* Fix signed varints.
Add tests for signed varints.
Remove some casts to allow usage at compile time.
* Fix vsizeof() on 32bit platforms.
* Add `hint` and `zint` types for proper signed integer encoding.
* Fix varint related bugs.
* Update requirements.
* Fix interop tests because of fixed readLine.
* Add putVarint, getVarint and tests.
* fix: don't allow replacing pubkey
* fix: several small improvements
* removing pubkey setter
* improove error handling
* remove the use of Option[T] if not needed
* don't use optional
* fix-ci: temporarily pin p2pd to a working tag
* fix example to comply with latest changes
* bumping p2pd again to a higher version
* Bugfix: Dialing an already connected peer may lead to crash
* Introduced a standard_setup module allowing to instantiate
the `Switch` object in an easier manner.
* Added `Switch.disconnect(peer)`
* Trailing space removed (sorry about polluting the diff)
* extract public and private keys fields from peerid
* allow assigning a public key
* cleaned up TODOs
* make pubsub prefix a const
* public key should be an `Option`
This adds gossipsub and floodsub, as well as basic interop testing with the go libp2p daemon.
* add close event
* wip: gossipsub
* splitting rpc message
* making message handling more consistent
* initial gossipsub implementation
* feat: nim 1.0 cleanup
* wip: gossipsub protobuf
* adding encoding/decoding of gossipsub messages
* add disconnect handler
* add proper gossipsub msg handling
* misc: cleanup for nim 1.0
* splitting floodsub and gossipsub tests
* feat: add mesh rebalansing
* test pubsub
* add mesh rebalansing tests
* testing mesh maintenance
* finishing mcache implementatin
* wip: commenting out broken tests
* wip: don't run heartbeat for now
* switchout debug for trace logging
* testing gossip peer selection algorithm
* test stream piping
* more work around message amplification
* get the peerid from message
* use timed cache as backing store
* allow setting timeout in constructor
* several changes to improve performance
* more through testing of msg amplification
* prevent gc issues
* allow piping to self and prevent deadlocks
* improove floodsub
* allow running hook on cache eviction
* prevent race conditions
* prevent race conditions and improove tests
* use hashes as cache keys
* removing useless file
* don't create a new seq
* re-enable pubsub tests
* fix imports
* reduce number of runs to speed up tests
* break out control message processing
* normalize sleeps between steps
* implement proper transport filtering
* initial interop testing
* clean up floodsub publish logic
* allow dialing without a protocol
* adding multiple reads/writes
* use protobuf varint in mplex
* don't loose conn's peerInfo
* initial interop pubsub tests
* don't duplicate connections/peers
* bring back interop tests
* wip: interop
* re-enable interop and daemon tests
* add multiple read write tests from handlers
* don't cleanup channel prematurely
* use correct channel to send/receive msgs
* adjust tests with latest changes
* include interop tests
* remove temp logging output
* fix ci
* use correct public key serialization
* additional tests for pubsub interop