* remove send lock
When mplex receives data it will block until a reader has processed the
data. Thus, when a large message is received, such as a gossipsub
subscription table, all of mplex will be blocked until all reading is
finished.
However, if at the same time a `dial` to establish a gossipsub send
connection is ongoing, that `dial` will be blocked because mplex is no
longer reading data - specifically, it might indeed be the connection
that's processing the previous data that is waiting for a send
connection.
There are other problems with the current code:
* If an exception is raised, it is not necessarily raised for the same
connection as `p.sendConn`, so resetting `p.sendConn` in the exception
handling is wrong
* `p.isConnected` is checked before taking the lock - thus, if it
returns false, a new dial will be started. If a new task enters `send`
before dial is finished, it will also determine `p.isConnected` is
false, then get stuck on the lock - when the previous task finishes and
releases the lock, the new task will _also_ dial and thus reset
`p.sendConn` causing a leak.
* prefer existing connection
simplifies flow
* move pubsub of off switch, pass switch into pubsub
* use join on lpstreams
* properly cleanup up failed peers
* fix tests
* fix peertable hasPeerId
* fix tests
* rework sending, remove helpers from pubsubpeer, unify in broadcast
* further split broadcast into send
* use send where appropriate
* use formatIt
* improve trace
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
Every Chronicles log record has an existing `msg` property matching
the static string supplied in the log statement. Thus, it's currently
not possible to use `msg` as the name of a user property:
https://github.com/status-im/nim-chronicles/issues/86
* prefer PeerID in switch api
This avoids ref issues like ref identity and nil
* use existing peerinfo instance if possible
* remove secureCodec
there may be multiple connections per peerinfo with different codecs
* avoid some extra async::
* add finegrained timeouts to pubsub
* use 10 millis timeout in tests
* finalization
* revert timeouts
* use `atEof` for reads
* adjust timeouts and use atEof for reads
* use atEof for reads
* set isEof flag
* no backoff for pubsub streams
* temp timer increase, make macos finalize
* don't call `subscribePeer` in libp2p anymore
* more traces
* leak tests
* lower timeouts
* handle exceptions in control message
* don't use `cancelAndWait`
* handle exceptions in helpers
* wip
* don't send empty messages
* check for leaks properly
* don't use cancelAndWait
* don't await subscribption sends
* remove subscrivePeer calls from switch
* trying without the hooks again
* Fix gossip messages seqno according to spec
* Add peers back to gossipsub table, slow down heartbeat
* Revert "Add peers back to gossipsub table, slow down heartbeat"
This reverts commit 01e2e62172a7793bb17f0eb8314e2faeb2682173.
* make seqno a threadvar, remove from peerinfo
* seqno refactor, into pubsub
* Minprotobuf initial commit
* Fix noise.
* Add signed integers support.
Add checks for field number value.
Remove some casts.
* Fix compile errors.
* Fix comments and constants.
* more cleanup
* fix tests
* merging master
* remove `withLock` as it conflicts with stdlib
* wip
* more fanout ttl
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
* gossipsub is a function of subscription messages only
* graft/prune work with mesh, get filled up from gossipsub
* fix race conditions with await
* fix exception unsafety when grafting/pruning
* fix allowing up to DHi peers in mesh on incoming graft
* fix metrics in several places
* Peer resultification and defect only
* Fixing some tests
* test fixes
* Rename peer into peerid
* better result error message in identify
* further merge fixes
* don't send public key in message when not signing (information leak)
* don't run rebalance if there are peers in gossip (see #242)
* don't crash randomly on bad peer id from remote
* count published messages
* don't call `switch.dial` in `subscribeToPeer`
* add secureconn constructor
* close in the correct order
* concurent dial lock and track in/out conns better
* make tests pass
* add todo comment
* disconect peers that open too many connections
* wip
* do connection and muxer tracking in one place
* prevent nil pointer in observers
* drop connections when peers is over max
* prevent channel leaks
* don't use closure to handle channel
* Remove noise padding payload (spec removed it)
* add log scope in secure
* avoid defect array out of range in switch secure when "na"
* improve identify traces
* wip noise fixes
* noise protobuf adjustments (trying)
* add more debugging messages/traces, improve their actual contents
* re-enable ID check in noise
* bump go daemon tag version
* bump go daemon tag version
* enable noise in daemonapi
* interop testing, (both secio and noise will be tested)
* azure cache bump (p2pd)
* CI changes
- Travis: use Go 1.14
- azure-pipelines.yml: big cleanup
- Azure: bump cache keys
- build 64-bit p2pd on 32-bit Windows
- install both Mingw-w64 architectures
* noise logging fixes
* alternate testing between noise and secio
* increase timeout to avoid VM errors in CI (multistream tests)
* refactor heartbeat management in gossipsub
* remove locking within heartbeat
* refactor heartbeat management in gossipsub
* remove locking within heartbeat
Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
* count published messages
* don't call `switch.dial` in `subscribeToPeer`
* don't use delegation in connection
* move connection out to own file
* don't breakout on reset
* make sure to call close on secured conn
* add lpstream tracing
* don't breackdown by conn id
* fix import
* remove unused lable
* reset connection on exception
* add additional metrics for skipped messages
* check for nil in secure.close
* Start adding some metrics to pubsub
In order to visualize it's functionality
Still WIP
* more metrics
* add per topic metrics
* finishup with requested metrics
* add a metrisServer define to start local server
* PR fixes and cleanup
This means we can use it from other protocols that inherit GossipSub. Otherwise,
a lot of internal state (heartbeat lock etc) doesn't get initialized properly.
* call write until all is written out
* wip: rework with proper half-closed
* add eof and closed handling
* wip
* close connection on chronos close
* don't use read
* make noise work again
* don't reraise just yet
* fixes after backporting
* remove on transport close cleanup
* revert back allread
* rust interop fixes
* read from stream
* inc count before closing
* rebasing master
* store incomming connections
* fix merge
* remove unneeded changes
* use internal close flag to indicate disposal
* call write until all is written out
* add comments to lpchannel fields
* add an eof flag to signal which end closed
* wip: rework with proper half-closed
* add eof and closed handling
* propagate closes to piped
* call parent close
* moving bufferstream trackers out
* move writeLock to bufferstream
* move writeLock out
* remove unused call
* wip
* rebasing master
* fix mplex tests
* wip
* fix bufferstream after backport
* wip
* rename to differentiate from chronos tracker
* close connection on chronos close
* make reset request asyncCheck
* fix channel cleanup
* misc
* don't use read
* fix backports
* make noise work again
* proper exception handling
* don't reraise just yet
* add convenience templates
* dont double wrap
* use async pragma
* fixes after backporting
* muxer owns connection
* remove on transport close cleanup
* revert back allread
* adding some todos
* read from stream
* inc count before closing
* rebasing master
* rebase master
* use correct exception type
* use try/finally insted of defer
* fix compile in trace mode
* reset channels on mplex close
* handle a few exceptions
Some of these are maybe too aggressive, but in return, they'll log
their exception - more refactoring needed to sort this out - right now
we get crashes on unhandled exceptions of unknown origin
* during connection setup
* while closing channels
* while processing pubsubs
* catch exceptions that are raised and don't try to catch exceptions that are not raised
* propagate cancellederror
* one more
* more
* more
* make interop tests less fragile
* Raise expiration time in gossipsub fanout test for slow CI
Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com>
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
* add verify signature flag
* add sign flag to enable/disable msg signing
* moving internal tests out to their own file
* cleanup nimble file
* remove unneeded tests
* move pubsub tests out
* fix tests
* Add chronos trackers and used them to sanitize resource disposal
* Chronos trackers for transport tests wip
* No more chronos leaks in testtransport
* Make tcp transport and test more robust when closing
* Test async leaking tracking wip
* Fix a regression in wire connect
* Add chronos trackers to more tests and sanitize resource closure
* Wip fixing floodsub tests
* Floodsub wip
* Made floodsub basically deterministic, hit a nim bug with captures tho
* Wrap up floodsub tests refactor
* Wrapping up
* Add allFuturesThrowing utility
* Fix missing allFuturesThrowing in noise tests!
* Make tests green
* attempt fixing gossipsub failing cases
* Make sure to check also fanout in waitSub
* More verbose traces
* Gossipsub test improvments
* Refactor TcpTransport remove asyncCheck
* Add Connection trackers
* Add stricter connection tracking, wip mplex fix
* More asynccheck removal, in order to avoid connection leaks
* bump chronicles requirement
* Enable tracker dump to check CI output
* Wait for more futures in testmplex
* Remove tracker dump messages
* add tryAndWarn utility, fix mplex issue with go interop
* All allFuturesThrowing to directchat too
* make sure to cleanup on transport close
* Start removing allFutures
* More allfutures removal
* Complete allFutures removal except legacy and tests
* Introduce table values copies to prevent error
* Switch to allFinished
* Resolve TODOs in flood/gossip
* muxer handler, log and re-raise
* Add a common and flexible way to check multiple futures
* Make traces less verbose with shortHexDump utility
* Rename shortHexDump into shortLog
* Improve shortLog, add shortLog for crypto keys
* Add proper shortLog implementations in messages
* Fix signed varints.
Add tests for signed varints.
Remove some casts to allow usage at compile time.
* Fix vsizeof() on 32bit platforms.
* Add `hint` and `zint` types for proper signed integer encoding.
* Fix varint related bugs.
* Update requirements.
* Fix interop tests because of fixed readLine.
* Add putVarint, getVarint and tests.
* fix: don't allow replacing pubkey
* fix: several small improvements
* removing pubkey setter
* improove error handling
* remove the use of Option[T] if not needed
* don't use optional
* fix-ci: temporarily pin p2pd to a working tag
* fix example to comply with latest changes
* bumping p2pd again to a higher version
* Bugfix: Dialing an already connected peer may lead to crash
* Introduced a standard_setup module allowing to instantiate
the `Switch` object in an easier manner.
* Added `Switch.disconnect(peer)`
* Trailing space removed (sorry about polluting the diff)
* extract public and private keys fields from peerid
* allow assigning a public key
* cleaned up TODOs
* make pubsub prefix a const
* public key should be an `Option`
This adds gossipsub and floodsub, as well as basic interop testing with the go libp2p daemon.
* add close event
* wip: gossipsub
* splitting rpc message
* making message handling more consistent
* initial gossipsub implementation
* feat: nim 1.0 cleanup
* wip: gossipsub protobuf
* adding encoding/decoding of gossipsub messages
* add disconnect handler
* add proper gossipsub msg handling
* misc: cleanup for nim 1.0
* splitting floodsub and gossipsub tests
* feat: add mesh rebalansing
* test pubsub
* add mesh rebalansing tests
* testing mesh maintenance
* finishing mcache implementatin
* wip: commenting out broken tests
* wip: don't run heartbeat for now
* switchout debug for trace logging
* testing gossip peer selection algorithm
* test stream piping
* more work around message amplification
* get the peerid from message
* use timed cache as backing store
* allow setting timeout in constructor
* several changes to improve performance
* more through testing of msg amplification
* prevent gc issues
* allow piping to self and prevent deadlocks
* improove floodsub
* allow running hook on cache eviction
* prevent race conditions
* prevent race conditions and improove tests
* use hashes as cache keys
* removing useless file
* don't create a new seq
* re-enable pubsub tests
* fix imports
* reduce number of runs to speed up tests
* break out control message processing
* normalize sleeps between steps
* implement proper transport filtering
* initial interop testing
* clean up floodsub publish logic
* allow dialing without a protocol
* adding multiple reads/writes
* use protobuf varint in mplex
* don't loose conn's peerInfo
* initial interop pubsub tests
* don't duplicate connections/peers
* bring back interop tests
* wip: interop
* re-enable interop and daemon tests
* add multiple read write tests from handlers
* don't cleanup channel prematurely
* use correct channel to send/receive msgs
* adjust tests with latest changes
* include interop tests
* remove temp logging output
* fix ci
* use correct public key serialization
* additional tests for pubsub interop