* adding raises defect across the codebase
* use unittest2
* add windows deps caching
* update mingw link
* die on failed peerinfo initialization
* use result.expect instead of get
* use expect more consistently and rework inits
* use expect more consistently
* throw on missing public key
* remove unused closure annotation
* merge master
* gossipsub: unsubscribe fixes
* fix KeyError when updating metric of unsubscribed topic
* fix unsubscribe message not being sent to all peers causing them to
keep thinking we're still subscribed
* release memory earlier in a few places
* floodsub fix
* add floodPublish test
* test delivery via control Iwant/have mechanics
* fix issues in control, and add testing
* fix possible backoff issue with pruned routine overriding it
In `async` functions, a closure environment is created for variables
that cross an await boundary - this closure environment is kept in
memory for the lifetime of the associated future - this means that
although _some_ variables are no longer used, they still take up memory
for a long time.
In Nimbus, message validation is processed in batches meaning the future
of an incoming gossip message stays around for quite a while - this
leads to memory consumption peaks of 100-200 mb when there are many
attestations in the pipeline.
To avoid excessive memory usage, it's generally better to move non-async
code into proc's such that the variables therein can be released earlier
- this includes the many hidden variables introduced by macro and
template expansion (ie chronicles that does expensive exception
handling)
* move seen table salt to floodsub, use there as well
* shorten seen table salt to size of hash
* avoid unnecessary memory allocations and copies in a few places
* factor out message scoring
* avoid reencoding outgoing message for every peer
* keep checking validators until reject (in case there's both reject and
ignore)
* `readOnce` avoids `readExactly` overhead for single-byte read
* genericAssign -> assign2
* properly propagate initiator information for gossipsub
* Fix pubsubpeer lifetime management
* restore old behavior
* tests fixing
* clamp backoff time value received
* fix member name collisions
* internal test fixes
* better names and explaining of the importance of transport direction
* fixes
* Address Book POC implementation (#499)
* Address Book POC implementation
* Feat/peerstore impl (#505)
Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com>
* master merge
* wip
* avoid deadlocks
* tcp limits
* expose client field in chronosstream
* limit incoming connections
* update with new listen api
* fix release
* don't override peerinfo in connection
* rework transport with accept
* use semaphore to track resource ussage
* rework with new transport accept api
* move events to conn manager (#373)
* use semaphore to track resource ussage
* merge master
* expose api to acquire conn slots
* don't fail expensive metrics
* allow tracking and updating connections
* set global connection limits to 80
* add per peer connection limits
* make sure conn is closed if tracking failed
* more descriptive naming for handle
* rework with new transport accept api
* add `getStream` hide `selectConn`
* add TransportClosedError
* make nil explicit
* don't make unnecessary copies of message
* logging
* error handling
* cleanup semaphore
* track connections properly
* throw `TooManyConnections` when tracking outgoing
* use proper exception and handle conventions
* check onCloseHandle for nil
* revert internalConnect changes
* adding upgraded flag
* await stream before closing
* simplify tracking
* wip
* logging
* split connection limits into incoming and outgoing
* further streamline connection limits split counts
* don't use closeWithEOF
* move peer and conn event triggers from switch
* wip
* wip
* wip
* merge master
* handle nil connections properly
* add clarifying comment
* don't raise exc on nil
* no finally
* add proper min/max connections logic
* rebase master
* merge master
* master merge
* remove request timeout
should be addressed in separate PR
* merge master
* share semaphore when in/out limits arent enforced
* merge master
* use import
* pass semaphore to trackConn
* don't close last conn
* use storeConn
* merge master
* use storeConn
* Remove unused connections in pubsubpeer, also removed wrong usages, add a disconnect bad peers parameter
* handle exceptions in disconnectPeer
* small fix
* use the proper disconnection procedure for gossip peers
* fixes, more metrics add test about disconnection
* hot fix possible null pointers in switch
* silly isnil sugar
* Fix and test gossip directPeer connections
* add proper cancelation handling
* remove cancelled futures explicitly
* use fifo to keep proper order
* add out of order cancelations test
* make count public
* use `new` instead of `init`
* remove private `queue` from tests
* expose count as a readonly prop
* use `delete()` to preserve seq order
* salt ids in seen table
* add subscription validation callback and avoid processing topics we don't care of
* apply penalty on bad subscription
* fix IHave handling IDs
* reduce indenting, add some comments
* fix gossip randombytes generation
* do not descore unwanted topics (might happen, due to timing, needs improvements)
* cleaning up and added tests
* validate subscriptions only when subscribing
* set notice level for failed publish
* fix floodsub behavior
* adding an upgraded event to conn
* set stopped flag asap
* trigger upgradded event on conn
* set concurrency limit for accepts
* backporting semaphore from tcp-limits2
* export unittests module
* make params explicit
* tone down debug logs
* adding semaphore tests
* use semaphore to throttle concurent upgrades
* add libp2p scope
* trigger upgraded event before any other events
* add event handler for connection upgrade
* cleanup upgraded event on conn close
* make upgrades slot release rebust
* dont forget to release slot on nil connection
* misc
* make sure semaphore is always released
* minor improvements and a nil check
* removing unneeded comment
* make upgradeMonitor a non-closure proc
* make sure the `upgraded` event is initialized
* handle exceptions in accepts when stopping
* don't leak exceptions when stopping accept loops
* add more traces, remove async from rebalance
* more traces
* avoid computng scores when weight is 0.0
* debug colocation, fix an indent in unsubpeer (minor)
* add full ValidationResult coverage
* store in cache only after validation
* gossip 1.0 fixes
* fix typo
* gossip 10 internal test fixes
* test fixing
* refactor peerstats usages
* populate tables if missing when scoring
* check that connection is not closed or eof
* don't release connection lock prematurely
* test that only valid connections can be added
* correct exception type on closed connection
* add clarifying comment
* use closeWithEOF for more stable test
* misc comments
* log stream id in buffestream asserts
* use closeWithEOF to prevent races in tests
* give some time to the remote handler to trigger
* adding more tests to make codecov happy
* handle resets properly with/without pushes/reads
* add clarifying comments
* pushEof should also not be concurrent
* move channel reset to bufferstream
this is where the action happens - lpchannel merely redefines how close
is done
Co-authored-by: Jacek Sieka <jacek@status.im>
* move gossip parameters to runtime
* internal test fixes
* add missing params
* restore const parameters are soldi base and use them in init
* more constants tuning
* rework transport to use the new accept api
* use the new chronos primits
* fixup tests to use the new transport api
* handle all exceptions in upgradeIncoming
* master merge
* add multiaddress exception type
* raise appropriate exception on invalida address
* allow retrying on TransportTooManyError
* adding TODO
* wip
* merge master
* add sleep if nil is returned
* accept loop handles all exceptions
* avoid issues with tray/except/finally
* make consistent with master
* cleanup accept loop
* logging
* Update libp2p/transports/tcptransport.nim
Co-authored-by: Jacek Sieka <jacek@status.im>
* use Direction enum instead of initiator flag
* use consistent import style
* remove experimental `closeWithEOF()`
Co-authored-by: Jacek Sieka <jacek@status.im>
* fix channels not being reset
silly for loop..
* allow only one concurrent read
* fix mplex test race condition
* add some bufferstream eof tests
* deadlock, lost data and hung channel fixes
* prevent concurrent `reset` calls
* reset LPChannel when read is cancelled (since data is lost)
* ensure there's one, and one only, 0-byte readOnce on EOF
* ensure that all data is returned before EOF is returned
* keep running activity monitor for half-closed channels (or they never
get closed)
When messages can't be sent to peer, we try to establish a send
connection - this causes messages to stack up as more and more unsent
messages are blocked on the dial lock.
* remove dial lock
* run reconnection loop in background task
* channel close race and deadlock fixes
* remove send lock, write chunks in one go
* push some of half-closed implementation to BufferStream
* fix some hangs where LPChannel readers and writers would not always
wake up
* simplify lazy channels
* fix close happening more than once in some orderings
* reenable connection tracking tests
* close channels first on mplex close such that consumers can read bytes
A notable difference is that BufferedStream is no longer considered EOF
until someone has actually read the EOF marker.
* docs, simplification
* add peer lifecycle events
* rework peer events to not use connection events
* don't use result in pubsub and switch init
* wip
* use ordered hashes and remove logscope
* logging
* add missing test
* small fixes
* remove almost-empty types module
* lock when writing message (that's the only place the lock matters, and
only when the message is > max msg size)
* logging updates (log in consistent order, makes reading logs easier)
* raise EOF from readExactly only if no bytes have been read (to signal
that _no_ bytes were lost)
This change modifies how the backpressure algorithm in bufferstream
works - in particular, instead of working byte-by-byte, it will now work
seq-by-seq.
When data arrives, it usually does so in packets - in the current
bufferstream, the packet is read then split into bytes which are fed one
by one to the bufferstream. On the reading side, the bytes are popped of
the bufferstream, again byte by byte, to satisfy `readOnce` requests -
this introduces a lot of synchronization traffic because the checks for
full buffer and for async event handling must be done for every byte.
In this PR, a queue of length 1 is used instead - this means there will
at most exist one "packet" in `pushTo`, one in the queue and one in the
slush buffer that is used to store incomplete reads.
* avoid byte-by-byte copy to buffer, with synchronization in-between
* reuse AsyncQueue synchronization logic instead of rolling own
* avoid writeHandler callback - implement `write` method instead
* simplify EOF signalling by only setting EOF flag in queue reader (and
reset)
* remove BufferStream pipes (unused)
* fixes drainBuffer deadlock when drain is called from within read loop
and thus blocks draining
* fix lpchannel init order
* mcache fixes
* remove timed cache - the window shifting already removes old messages
* ref -> object
* avoid unnecessary allocations with `[]` operator
* simplify init
* fix several gossipsub/floodsub issues
* floodsub, gossipsub: don't rebroadcast messages that fail validation
(!)
* floodsub, gossipsub: don't crash when unsubscribing from unknown
topics (!)
* gossipsub: don't send message to peers that are not interested in the
topic, when messages don't share topic list
* floodsub: don't repeat all messages for each message when
rebroadcasting
* floodsub: allow sending empty data
* floodsub: fix inefficient unsubscribe
* sync floodsub/gossipsub logging
* gossipsub: include incoming messages in mcache (!)
* gossipsub: don't rebroadcast already-seen messages (!)
* pubsubpeer: remove incoming/outgoing seen caches - these are already
handled in gossipsub, floodsub and will cause trouble when peers try to
resubscribe / regraft topics (because control messages will have same
digest)
* timedcache: reimplement without timers (fixes timer leaks and extreme
inefficiency due to per-message closures, futures etc)
* timedcache: ref -> obj
* remove send lock
When mplex receives data it will block until a reader has processed the
data. Thus, when a large message is received, such as a gossipsub
subscription table, all of mplex will be blocked until all reading is
finished.
However, if at the same time a `dial` to establish a gossipsub send
connection is ongoing, that `dial` will be blocked because mplex is no
longer reading data - specifically, it might indeed be the connection
that's processing the previous data that is waiting for a send
connection.
There are other problems with the current code:
* If an exception is raised, it is not necessarily raised for the same
connection as `p.sendConn`, so resetting `p.sendConn` in the exception
handling is wrong
* `p.isConnected` is checked before taking the lock - thus, if it
returns false, a new dial will be started. If a new task enters `send`
before dial is finished, it will also determine `p.isConnected` is
false, then get stuck on the lock - when the previous task finishes and
releases the lock, the new task will _also_ dial and thus reset
`p.sendConn` causing a leak.
* prefer existing connection
simplifies flow
* move pubsub of off switch, pass switch into pubsub
* use join on lpstreams
* properly cleanup up failed peers
* fix tests
* fix peertable hasPeerId
* fix tests
* rework sending, remove helpers from pubsubpeer, unify in broadcast
* further split broadcast into send
* use send where appropriate
* use formatIt
* improve trace
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
* peer hooks -> events
* peerinfo -> peerid
* include connection direction in event
* check connection status after event
* lock connmanager lookup also when dialling peer
* clean up un-upgraded connection when upgrade fails
* await peer eventing
* remove join/lifetime future from peerinfo
Peerinfo instances are not unique per peer so the lifetime future is
misleading - it fires when a random connection is closed, not the "last"
one
* document switch values
* naming
* peerevent->conneevent
* add finegrained timeouts to pubsub
* use 10 millis timeout in tests
* finalization
* revert timeouts
* use `atEof` for reads
* adjust timeouts and use atEof for reads
* use atEof for reads
* set isEof flag
* no backoff for pubsub streams
* temp timer increase, make macos finalize
* don't call `subscribePeer` in libp2p anymore
* more traces
* leak tests
* lower timeouts
* handle exceptions in control message
* don't use `cancelAndWait`
* handle exceptions in helpers
* wip
* don't send empty messages
* check for leaks properly
* don't use cancelAndWait
* don't await subscribption sends
* remove subscrivePeer calls from switch
* trying without the hooks again
* Initial commit.
* Workaround nim's bug and add some other compilation error fixes.
* Rename to libp2p_pki_schemes.
Fix secio.
Add tests.
* Attempt to fix command line.
* Fix command line.
Show status in tests.
* add support for channel timeouts
* tests for channel timeout
* add timeouts to standard switch
* fix mplex init
* cleanup timer on stream close
* add comment for `isConnected`
* move cleanup event
* Fix security issue #266.
* Add more tests.
* Fix PeerID tests should not use RSA-512 keys.
* Fix crypto tests to use vectors with 2048+ bits.
* Disable 4096bit RSA key generation for CI debug runs.
* Fix gossip messages seqno according to spec
* Add peers back to gossipsub table, slow down heartbeat
* Revert "Add peers back to gossipsub table, slow down heartbeat"
This reverts commit 01e2e62172.
* make seqno a threadvar, remove from peerinfo
* seqno refactor, into pubsub
* Minprotobuf initial commit
* Fix noise.
* Add signed integers support.
Add checks for field number value.
Remove some casts.
* Fix compile errors.
* Fix comments and constants.
* gossipsub is a function of subscription messages only
* graft/prune work with mesh, get filled up from gossipsub
* fix race conditions with await
* fix exception unsafety when grafting/pruning
* fix allowing up to DHi peers in mesh on incoming graft
* fix metrics in several places
* reuse single RNG instance for all crypto key generation
* use foolproof rng
* initRng -> newRng (because it's ref)
* fix test
* imports/exports, chat fix
* fix rsa
* imports and exports
* work around threadvar issue
* fixup
* mac workaround test
* Peer resultification and defect only
* Fixing some tests
* test fixes
* Rename peer into peerid
* better result error message in identify
* further merge fixes
* don't send public key in message when not signing (information leak)
* don't run rebalance if there are peers in gossip (see #242)
* don't crash randomly on bad peer id from remote
* consolidate reading in lpstream
* remove debug echo
* throw if not enough bytes where read
* tune log level
* set eof flag
* test readExactly to fail on not enough bytes
* count published messages
* don't call `switch.dial` in `subscribeToPeer`
* add secureconn constructor
* close in the correct order
* concurent dial lock and track in/out conns better
* make tests pass
* add todo comment
* disconect peers that open too many connections
* wip
* do connection and muxer tracking in one place
* prevent nil pointer in observers
* drop connections when peers is over max
* prevent channel leaks
* don't use closure to handle channel
* multistream select make sure to not report NA but rather empty string if all fails
Also re-enable tests
* avoid using bad constructs, make multistream.select flow crystal clear
* Remove noise padding payload (spec removed it)
* add log scope in secure
* avoid defect array out of range in switch secure when "na"
* improve identify traces
* wip noise fixes
* noise protobuf adjustments (trying)
* add more debugging messages/traces, improve their actual contents
* re-enable ID check in noise
* bump go daemon tag version
* bump go daemon tag version
* enable noise in daemonapi
* interop testing, (both secio and noise will be tested)
* azure cache bump (p2pd)
* CI changes
- Travis: use Go 1.14
- azure-pipelines.yml: big cleanup
- Azure: bump cache keys
- build 64-bit p2pd on 32-bit Windows
- install both Mingw-w64 architectures
* noise logging fixes
* alternate testing between noise and secio
* increase timeout to avoid VM errors in CI (multistream tests)
* refactor heartbeat management in gossipsub
* remove locking within heartbeat
* refactor heartbeat management in gossipsub
* remove locking within heartbeat
Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
* count published messages
* don't call `switch.dial` in `subscribeToPeer`
* don't use delegation in connection
* move connection out to own file
* don't breakout on reset
* make sure to call close on secured conn
* add lpstream tracing
* don't breackdown by conn id
* fix import
* remove unused lable
* reset connection on exception
* add additional metrics for skipped messages
* check for nil in secure.close
* Less exceptions more results
* Fix daemonapi and interop tests
* Add multibase
* wip multiaddress
* fix the build, consuming new result types
* fix standard setup
* Simplify match, rename into MaError, add more exaustive err text
* Fix the CI issues
* Fix directchat build
* daemon api fixes
* better err messages formatting
Co-authored-by: Zahary Karadjov <zahary@gmail.com>
* call write until all is written out
* wip: rework with proper half-closed
* add eof and closed handling
* wip
* close connection on chronos close
* don't use read
* make noise work again
* don't reraise just yet
* fixes after backporting
* remove on transport close cleanup
* revert back allread
* rust interop fixes
* read from stream
* inc count before closing
* rebasing master
* store incomming connections
* fix merge
* remove unneeded changes
* use internal close flag to indicate disposal
* call write until all is written out
* add comments to lpchannel fields
* add an eof flag to signal which end closed
* wip: rework with proper half-closed
* add eof and closed handling
* propagate closes to piped
* call parent close
* moving bufferstream trackers out
* move writeLock to bufferstream
* move writeLock out
* remove unused call
* wip
* rebasing master
* fix mplex tests
* wip
* fix bufferstream after backport
* wip
* rename to differentiate from chronos tracker
* close connection on chronos close
* make reset request asyncCheck
* fix channel cleanup
* misc
* don't use read
* fix backports
* make noise work again
* proper exception handling
* don't reraise just yet
* add convenience templates
* dont double wrap
* use async pragma
* fixes after backporting
* muxer owns connection
* remove on transport close cleanup
* revert back allread
* adding some todos
* read from stream
* inc count before closing
* rebasing master
* rebase master
* use correct exception type
* use try/finally insted of defer
* fix compile in trace mode
* reset channels on mplex close
* make async for proper exception handling
* tryAndWarn msg messes up Exception msg
* misc: comment out tracker dumps
* cleanup mplex tests
* more informative errors
* give CI time to run
* revert change, bacause it causes races
* handle a few exceptions
Some of these are maybe too aggressive, but in return, they'll log
their exception - more refactoring needed to sort this out - right now
we get crashes on unhandled exceptions of unknown origin
* during connection setup
* while closing channels
* while processing pubsubs
* catch exceptions that are raised and don't try to catch exceptions that are not raised
* propagate cancellederror
* one more
* more
* more
* make interop tests less fragile
* Raise expiration time in gossipsub fanout test for slow CI
Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com>
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
* use stream directly in chronosstream
for now, chronos.AsyncStream is not used to provide any features on top
of chronos.Stream, so in order to simplify the code, chronosstream can
be used directly.
In particular, the exception handling is broken in the current
chronosstream - opening and closing the stream is simplified this way as
well.
A future implementation that actually takes advantage of the AsyncStream
features would wrap AsyncStream instead as a separate lpstream
implementation, leaving this one as-is.
* work around chronos exception type issue
* add verify signature flag
* add sign flag to enable/disable msg signing
* moving internal tests out to their own file
* cleanup nimble file
* remove unneeded tests
* move pubsub tests out
* fix tests
* Add chronos trackers and used them to sanitize resource disposal
* Chronos trackers for transport tests wip
* No more chronos leaks in testtransport
* Make tcp transport and test more robust when closing
* Test async leaking tracking wip
* Fix a regression in wire connect
* Add chronos trackers to more tests and sanitize resource closure
* Wip fixing floodsub tests
* Floodsub wip
* Made floodsub basically deterministic, hit a nim bug with captures tho
* Wrap up floodsub tests refactor
* Wrapping up
* Add allFuturesThrowing utility
* Fix missing allFuturesThrowing in noise tests!
* Make tests green
* attempt fixing gossipsub failing cases
* Make sure to check also fanout in waitSub
* More verbose traces
* Gossipsub test improvments
* Refactor TcpTransport remove asyncCheck
* Add Connection trackers
* Add stricter connection tracking, wip mplex fix
* More asynccheck removal, in order to avoid connection leaks
* bump chronicles requirement
* Enable tracker dump to check CI output
* Wait for more futures in testmplex
* Remove tracker dump messages
* add tryAndWarn utility, fix mplex issue with go interop
* All allFuturesThrowing to directchat too
* make sure to cleanup on transport close
* only check for payload size
* only subscribe if connection succeeded
* fix failing test
* check that the strem is active before openning
* msg type should not be > than 0x7
* fix tests
* check max against enum val
* Start ChaCha20Poly1305 integration (BearSSL)
* Add Curve25519 (BearSSL) required operations for noise
* Fix curve mulgen iterate/derive
* Fix misleading header
* Add chachapoly proper test
* Curve25519 integration tests (failing, something is wrong)
* Add few converters, finish c25519 integration tests
* Remove implicit converters
* removed internal globals
* Start noise implementation
* Fix public() using proper bear mulgen
* Noise protocol WIP
* Noise progress
* Add a quick nim version of HKDF
* Converted hkdf to iterator, useful for noise
* Noise protocol implementation progress
* Noise progress
* XX handshake almost there
* noise progress
* Noise, testing handshake with test vectors
* Noise handshake progress, still wrong somewhere!
* Noise handshake success!
* Full verified noise XX handshake completed
* Fix and rewrite test to be similar to switch one
* Start with connection upgrade
* Switch chachapoly to CT implementations
* Improve HKDF implementation
* Use a type insted of tuple for HandshakeResult
* Remove unnecessary Let
* More cosmetic fixes
* Properly check randomBytes result
* Fix chachapoly signature
* Noise full circle (altho dispatcher is nil cursed)
* Allow nil aads in chachapoly routines
* Noise implementation up to running full test
* Use bearssl HKDF as well
* Directly use bearssl rng for curve25519 keys
* Add a (disabled/no CI) noise interop test server
* WIP on fixing interop issues
* More fixes in noise implementation for interop
* bump chronos requirement (nimble)
* Add a chachapoly test for very small size payloads
* Noise, more tracing
* Add 2 properly working noise tests
* Fix payload packing, following the spec properly (and not go version but
rather rust)
* Sanity, replace discard with asyncCheck
* Small fixes and optimization
* Use stew endian2 rather then system endian module
* Update nimble deps (chronos)
* Minor cosmetic/code sanity fixes
* Noise, handle Nonce max
* Noise tests, make sure to close secured conns
* More polish, improve code readability too
* More polish and testing again which test fails
* Further polishing
* Restore noise tests
* Remove useless Future[void]
* Remove useless CipherState initializer
* add a proper read wait future in second noise test
* Remove noise generic secure implementation for now
* Few fixes to run eth2 sim
* Add more debug info in noise traces
* Merge size + payload write in sendEncryptedMessage
* Revert secure interface, add outgoing property directly in newNoise
* remove sendEncrypted and receiveEncrypted
* Use openarray in chachapoly and curve25519 helpers
* Fix signed varints.
Add tests for signed varints.
Remove some casts to allow usage at compile time.
* Fix vsizeof() on 32bit platforms.
* Add `hint` and `zint` types for proper signed integer encoding.
* Fix varint related bugs.
* Update requirements.
* Fix interop tests because of fixed readLine.
* Add putVarint, getVarint and tests.
* Add peer lifetime feature for PeerInfo.
Refactor peerinfo to use openarrays instead of sequences.
Fix tests and examples to use arrays instead of sequences.
* Add access to lifetime Future[T] itself.
* Implemented lazy stream opening for mplex connections
* Properly fix newStream usage
* Make lazy channel open optional
* Add Lazy channel test
* Cleanup mplex test
* Move lazyness properly into LPChannel
* Connection writeLp back to proc