To break a potential read/write deadlock, gossipsub uses an unbounded
queue for writes - when peers are too slow to process this queue, it may
end up growing without bounds causing high memory usage.
Here, we introduce a maximum write queue length after which the peer is
disconnected - the queue is generous enough that any "normal" usage
should be fine - writes that are `await`:ed are not affected, only
writes that are launched in an `asyncSpawn` task or similar.
* avoid unnecessary copy of message when there are no send observers
* release message memory earlier in gossipsub
* simplify pubsubpeer logging
When messages can't be sent to peer, we try to establish a send
connection - this causes messages to stack up as more and more unsent
messages are blocked on the dial lock.
* remove dial lock
* run reconnection loop in background task
* mcache fixes
* remove timed cache - the window shifting already removes old messages
* ref -> object
* avoid unnecessary allocations with `[]` operator
* simplify init
* fix several gossipsub/floodsub issues
* floodsub, gossipsub: don't rebroadcast messages that fail validation
(!)
* floodsub, gossipsub: don't crash when unsubscribing from unknown
topics (!)
* gossipsub: don't send message to peers that are not interested in the
topic, when messages don't share topic list
* floodsub: don't repeat all messages for each message when
rebroadcasting
* floodsub: allow sending empty data
* floodsub: fix inefficient unsubscribe
* sync floodsub/gossipsub logging
* gossipsub: include incoming messages in mcache (!)
* gossipsub: don't rebroadcast already-seen messages (!)
* pubsubpeer: remove incoming/outgoing seen caches - these are already
handled in gossipsub, floodsub and will cause trouble when peers try to
resubscribe / regraft topics (because control messages will have same
digest)
* timedcache: reimplement without timers (fixes timer leaks and extreme
inefficiency due to per-message closures, futures etc)
* timedcache: ref -> obj
* move pubsub of off switch, pass switch into pubsub
* use join on lpstreams
* properly cleanup up failed peers
* fix tests
* fix peertable hasPeerId
* fix tests
* rework sending, remove helpers from pubsubpeer, unify in broadcast
* further split broadcast into send
* use send where appropriate
* use formatIt
* improve trace
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
* add finegrained timeouts to pubsub
* use 10 millis timeout in tests
* finalization
* revert timeouts
* use `atEof` for reads
* adjust timeouts and use atEof for reads
* use atEof for reads
* set isEof flag
* no backoff for pubsub streams
* temp timer increase, make macos finalize
* don't call `subscribePeer` in libp2p anymore
* more traces
* leak tests
* lower timeouts
* handle exceptions in control message
* don't use `cancelAndWait`
* handle exceptions in helpers
* wip
* don't send empty messages
* check for leaks properly
* don't use cancelAndWait
* don't await subscribption sends
* remove subscrivePeer calls from switch
* trying without the hooks again
* Fix gossip messages seqno according to spec
* Add peers back to gossipsub table, slow down heartbeat
* Revert "Add peers back to gossipsub table, slow down heartbeat"
This reverts commit 01e2e62172a7793bb17f0eb8314e2faeb2682173.
* make seqno a threadvar, remove from peerinfo
* seqno refactor, into pubsub
* more cleanup
* fix tests
* merging master
* remove `withLock` as it conflicts with stdlib
* wip
* more fanout ttl
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
* gossipsub is a function of subscription messages only
* graft/prune work with mesh, get filled up from gossipsub
* fix race conditions with await
* fix exception unsafety when grafting/pruning
* fix allowing up to DHi peers in mesh on incoming graft
* fix metrics in several places
* Peer resultification and defect only
* Fixing some tests
* test fixes
* Rename peer into peerid
* better result error message in identify
* further merge fixes
* don't send public key in message when not signing (information leak)
* don't run rebalance if there are peers in gossip (see #242)
* don't crash randomly on bad peer id from remote
* Remove noise padding payload (spec removed it)
* add log scope in secure
* avoid defect array out of range in switch secure when "na"
* improve identify traces
* wip noise fixes
* noise protobuf adjustments (trying)
* add more debugging messages/traces, improve their actual contents
* re-enable ID check in noise
* bump go daemon tag version
* bump go daemon tag version
* enable noise in daemonapi
* interop testing, (both secio and noise will be tested)
* azure cache bump (p2pd)
* CI changes
- Travis: use Go 1.14
- azure-pipelines.yml: big cleanup
- Azure: bump cache keys
- build 64-bit p2pd on 32-bit Windows
- install both Mingw-w64 architectures
* noise logging fixes
* alternate testing between noise and secio
* increase timeout to avoid VM errors in CI (multistream tests)
* refactor heartbeat management in gossipsub
* remove locking within heartbeat
* refactor heartbeat management in gossipsub
* remove locking within heartbeat
Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
* count published messages
* don't call `switch.dial` in `subscribeToPeer`
* don't use delegation in connection
* move connection out to own file
* don't breakout on reset
* make sure to call close on secured conn
* add lpstream tracing
* don't breackdown by conn id
* fix import
* remove unused lable
* reset connection on exception
* add additional metrics for skipped messages
* check for nil in secure.close