* adding raises defect across the codebase
* use unittest2
* add windows deps caching
* update mingw link
* die on failed peerinfo initialization
* use result.expect instead of get
* use expect more consistently and rework inits
* use expect more consistently
* throw on missing public key
* remove unused closure annotation
* merge master
* add floodPublish test
* test delivery via control Iwant/have mechanics
* fix issues in control, and add testing
* fix possible backoff issue with pruned routine overriding it
In `async` functions, a closure environment is created for variables
that cross an await boundary - this closure environment is kept in
memory for the lifetime of the associated future - this means that
although _some_ variables are no longer used, they still take up memory
for a long time.
In Nimbus, message validation is processed in batches meaning the future
of an incoming gossip message stays around for quite a while - this
leads to memory consumption peaks of 100-200 mb when there are many
attestations in the pipeline.
To avoid excessive memory usage, it's generally better to move non-async
code into proc's such that the variables therein can be released earlier
- this includes the many hidden variables introduced by macro and
template expansion (ie chronicles that does expensive exception
handling)
* move seen table salt to floodsub, use there as well
* shorten seen table salt to size of hash
* avoid unnecessary memory allocations and copies in a few places
* factor out message scoring
* avoid reencoding outgoing message for every peer
* keep checking validators until reject (in case there's both reject and
ignore)
* `readOnce` avoids `readExactly` overhead for single-byte read
* genericAssign -> assign2
* properly propagate initiator information for gossipsub
* Fix pubsubpeer lifetime management
* restore old behavior
* tests fixing
* clamp backoff time value received
* fix member name collisions
* internal test fixes
* better names and explaining of the importance of transport direction
* fixes
* master merge
* wip
* avoid deadlocks
* tcp limits
* expose client field in chronosstream
* limit incoming connections
* update with new listen api
* fix release
* don't override peerinfo in connection
* rework transport with accept
* use semaphore to track resource ussage
* rework with new transport accept api
* move events to conn manager (#373)
* use semaphore to track resource ussage
* merge master
* expose api to acquire conn slots
* don't fail expensive metrics
* allow tracking and updating connections
* set global connection limits to 80
* add per peer connection limits
* make sure conn is closed if tracking failed
* more descriptive naming for handle
* rework with new transport accept api
* add `getStream` hide `selectConn`
* add TransportClosedError
* make nil explicit
* don't make unnecessary copies of message
* logging
* error handling
* cleanup semaphore
* track connections properly
* throw `TooManyConnections` when tracking outgoing
* use proper exception and handle conventions
* check onCloseHandle for nil
* revert internalConnect changes
* adding upgraded flag
* await stream before closing
* simplify tracking
* wip
* logging
* split connection limits into incoming and outgoing
* further streamline connection limits split counts
* don't use closeWithEOF
* move peer and conn event triggers from switch
* wip
* wip
* wip
* merge master
* handle nil connections properly
* add clarifying comment
* don't raise exc on nil
* no finally
* add proper min/max connections logic
* rebase master
* merge master
* master merge
* remove request timeout
should be addressed in separate PR
* merge master
* share semaphore when in/out limits arent enforced
* merge master
* use import
* pass semaphore to trackConn
* don't close last conn
* use storeConn
* merge master
* use storeConn
* Remove unused connections in pubsubpeer, also removed wrong usages, add a disconnect bad peers parameter
* handle exceptions in disconnectPeer
* small fix
* use the proper disconnection procedure for gossip peers
* fixes, more metrics add test about disconnection
* hot fix possible null pointers in switch
* silly isnil sugar
* Fix and test gossip directPeer connections
* salt ids in seen table
* add subscription validation callback and avoid processing topics we don't care of
* apply penalty on bad subscription
* fix IHave handling IDs
* reduce indenting, add some comments
* fix gossip randombytes generation
* do not descore unwanted topics (might happen, due to timing, needs improvements)
* cleaning up and added tests
* validate subscriptions only when subscribing
* set notice level for failed publish
* fix floodsub behavior
* add more traces, remove async from rebalance
* more traces
* avoid computng scores when weight is 0.0
* debug colocation, fix an indent in unsubpeer (minor)
* add full ValidationResult coverage
* store in cache only after validation
* gossip 1.0 fixes
* fix typo
* gossip 10 internal test fixes
* test fixing
* refactor peerstats usages
* populate tables if missing when scoring
* move gossip parameters to runtime
* internal test fixes
* add missing params
* restore const parameters are soldi base and use them in init
* more constants tuning
When messages can't be sent to peer, we try to establish a send
connection - this causes messages to stack up as more and more unsent
messages are blocked on the dial lock.
* remove dial lock
* run reconnection loop in background task
* channel close race and deadlock fixes
* remove send lock, write chunks in one go
* push some of half-closed implementation to BufferStream
* fix some hangs where LPChannel readers and writers would not always
wake up
* simplify lazy channels
* fix close happening more than once in some orderings
* reenable connection tracking tests
* close channels first on mplex close such that consumers can read bytes
A notable difference is that BufferedStream is no longer considered EOF
until someone has actually read the EOF marker.
* docs, simplification
* add peer lifecycle events
* rework peer events to not use connection events
* don't use result in pubsub and switch init
* wip
* use ordered hashes and remove logscope
* logging
* add missing test
* small fixes
* mcache fixes
* remove timed cache - the window shifting already removes old messages
* ref -> object
* avoid unnecessary allocations with `[]` operator
* simplify init
* fix several gossipsub/floodsub issues
* floodsub, gossipsub: don't rebroadcast messages that fail validation
(!)
* floodsub, gossipsub: don't crash when unsubscribing from unknown
topics (!)
* gossipsub: don't send message to peers that are not interested in the
topic, when messages don't share topic list
* floodsub: don't repeat all messages for each message when
rebroadcasting
* floodsub: allow sending empty data
* floodsub: fix inefficient unsubscribe
* sync floodsub/gossipsub logging
* gossipsub: include incoming messages in mcache (!)
* gossipsub: don't rebroadcast already-seen messages (!)
* pubsubpeer: remove incoming/outgoing seen caches - these are already
handled in gossipsub, floodsub and will cause trouble when peers try to
resubscribe / regraft topics (because control messages will have same
digest)
* timedcache: reimplement without timers (fixes timer leaks and extreme
inefficiency due to per-message closures, futures etc)
* timedcache: ref -> obj
* remove send lock
When mplex receives data it will block until a reader has processed the
data. Thus, when a large message is received, such as a gossipsub
subscription table, all of mplex will be blocked until all reading is
finished.
However, if at the same time a `dial` to establish a gossipsub send
connection is ongoing, that `dial` will be blocked because mplex is no
longer reading data - specifically, it might indeed be the connection
that's processing the previous data that is waiting for a send
connection.
There are other problems with the current code:
* If an exception is raised, it is not necessarily raised for the same
connection as `p.sendConn`, so resetting `p.sendConn` in the exception
handling is wrong
* `p.isConnected` is checked before taking the lock - thus, if it
returns false, a new dial will be started. If a new task enters `send`
before dial is finished, it will also determine `p.isConnected` is
false, then get stuck on the lock - when the previous task finishes and
releases the lock, the new task will _also_ dial and thus reset
`p.sendConn` causing a leak.
* prefer existing connection
simplifies flow
* move pubsub of off switch, pass switch into pubsub
* use join on lpstreams
* properly cleanup up failed peers
* fix tests
* fix peertable hasPeerId
* fix tests
* rework sending, remove helpers from pubsubpeer, unify in broadcast
* further split broadcast into send
* use send where appropriate
* use formatIt
* improve trace
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
* add finegrained timeouts to pubsub
* use 10 millis timeout in tests
* finalization
* revert timeouts
* use `atEof` for reads
* adjust timeouts and use atEof for reads
* use atEof for reads
* set isEof flag
* no backoff for pubsub streams
* temp timer increase, make macos finalize
* don't call `subscribePeer` in libp2p anymore
* more traces
* leak tests
* lower timeouts
* handle exceptions in control message
* don't use `cancelAndWait`
* handle exceptions in helpers
* wip
* don't send empty messages
* check for leaks properly
* don't use cancelAndWait
* don't await subscribption sends
* remove subscrivePeer calls from switch
* trying without the hooks again
* Fix gossip messages seqno according to spec
* Add peers back to gossipsub table, slow down heartbeat
* Revert "Add peers back to gossipsub table, slow down heartbeat"
This reverts commit 01e2e62172.
* make seqno a threadvar, remove from peerinfo
* seqno refactor, into pubsub
* gossipsub is a function of subscription messages only
* graft/prune work with mesh, get filled up from gossipsub
* fix race conditions with await
* fix exception unsafety when grafting/pruning
* fix allowing up to DHi peers in mesh on incoming graft
* fix metrics in several places
* reuse single RNG instance for all crypto key generation
* use foolproof rng
* initRng -> newRng (because it's ref)
* fix test
* imports/exports, chat fix
* fix rsa
* imports and exports
* work around threadvar issue
* fixup
* mac workaround test
* Peer resultification and defect only
* Fixing some tests
* test fixes
* Rename peer into peerid
* better result error message in identify
* further merge fixes