* gossipsub: adding duplicate arrival metrics
Adding counters for received deduplicated messages and for
duplicates recognized by the seen cache. Note that duplicates that
are not recognized (arrive after seenTTL) are not counted as
duplicates here either.
* gossipsub: adding mcache (message cache for responding IWANT) stats
It is generally assumed that IWANT messages arrive when mcache still
has the message. These stats are to verify this assumption.
* libp2p: adding internal TX queuing stats
Messages are queued in TX before getting written on the stream,
but we have no statistics about these queues. This patch adds
some queue length and queuing time related statistics.
* adding Grafana libp2p dashboard
Adding Grafana dashboard with newly exposed metrics.
* enable libp2p_mplex_metrics in nimble test
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
* Signed envelopes and routing records
* Send signed peer record as part of identify (#649)
* Add SPR from identify to new peer book (#657)
* Send & receive gossipsub PX
* Add Signed Payload
Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com>
* feat: allow msgIdProvider to fail
Closes: #642.
Changes the return type of the msgIdProvider to `Result[MessageID, string]` so that message id generation can fail.
String error type was chosen as this `msgIdProvider` mainly because the failed message id generation drops the message and logs the error provided. Because `msgIdProvider` can be externally provided by library consumers, an enum didn’t make sense and a object seemed to be overkill. Exceptions could have been used as well, however, in this case, Result ergonomics were warranted and prevented wrapping quite a large block of code in try/except.
The `defaultMsgIdProvider` function previously allowed message id generation to fail silently for use in the tests: when seqno or source peerid were not valid, the message id generated was based on a hash of the message data and topic ids. The silent failing was moved to the `defaultMsgIdProvider` used only in the tests so that it could not fail silently in applications.
Unit tests were added for the `defaultMsgIdProvider`.
* Change MsgIdProvider error type to ValidationResult
Currently, `ecnist`'s `toBytes` and `getBytes` methods operate only on
properly initialized keys. If an un-initialized key is given, an
`IndexError` may be raised if the key's `xlen` / `qlen` property is
larger than the maximum buffer size. This patch hardens those functions
to report a proper error in that case.
Note that the library functions called by `init` and `initRaw` already
reject data that does not have the expected length, so these new checks
should not be reachable in practice.
* fix: remove returned Futures from switch.start
The proc `start` returned a seq of futures that was mean to be awaited by the caller. However, the start proc itself awaited each Future before returning it, so the ceremony requiring the caller to await the Future, and returning the Futures themselves was just used to handle errors. But we'll give a better way to handle errors in a future revision
Remove `switch.start` return type (implicit `Future[void]`)
Update tutorials and examples to reflect the change.
* Raise error during failed transport
Replaces logging of error, and adds comment that it should be replaced with a callback in a future PR.
* add test for multiple local addresses
* allow transports to listen on multiple addrs
* fix tcp transport accept
* check switch addrs are correct
* switch test to port 0
* close accepted peers on close
* ignore CancelledError in transport accept
* test ci
* only accept in accept loop
* avoid accept greedyness
* close acceptedPeers
* accept doesn't crash on cancelled fut
* add common transport test
* close conn on handling failure
* close accepted peers in two steps
* test for macos
* revert accept greedyness
* fix dialing cancel
* test chronos fix
* add ws
* ws cancellation
* small fix
* remove chronos blocked test
* fix testping
* Fix transport's switch start (like #609)
* bump chronos
* Websocket: handle both ws & wss
Co-authored-by: Tanguy Cizain <tanguycizain@gmail.com>
Co-authored-by: Tanguy <tanguy@status.im>
* add 'dns' multiaddr protocol
* multiaddr: isWire is true for DNS protocols
* resolve dns on connect
* fix typo
* add dns test
* update resolveDns error handling
* handle multiple dns entries
* start of new resolver
* working dns resolver
* use the DnsResolver
* fix json logs
* small overhaul
* fix dns implem in lp2p
* update dnsclient repo
* add dns test to testnative
* dummy dns server for ut
* better mocked
* moved resolving to transport
* moved mockresolver to libp2p
* test resolve in switch test
* try multiple txt & track leaks
* raise e
* catchable error instead of exception
* save failed dns server
* moved resolve back to dialer
* remove nameresolver from dialer
* start of websocket transport
* more ws tests
* switch to common test
* add close to wsstream
* update ws & chronicles version
* cleanup
* removed multicodec
* clean ws outgoing connections
* renamed to websock
* removed stream from logs
* renamed ws to websock
* add connection closing test to common transport
* close incoming connection on ws stop
* renamed testwebsocket.nim -> testwstransport.nim
* removed raise todo
* split out/in connections
* add wss to tests
* Fix tls (#608)
* change log level
* fixed issue related to stopping
some cosmetic cleanup
* use `allFutures` to stop/close things
Prevent potential race conditions when stopping two or more transports
* misc
* point websock to server-case-object branch
* interop test with go
* removed websock version specification
* add daemon -> native ws test
* fix & test closed read/write
* update readOnce, thanks jangko
Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com>
* little transport cleanup
* rename TcpTransport.init -> TcpTransport.new
* moved transport e2e to common file
* remove localAddress
* rename testtransport -> testtcptransport
* add checktrackers to commontransports
* removed multicodec from transports
* Connect & Peer event handlers now receive a peerinfo
* small peerstore refacto
* implement peerstore in switch
* changed PeerStore to final ref object
* revert libp2p/builders.nim
* Revisit Floodsub (#543)
Fixes#525
add coverage to unsubscribeAll and testing
* add mounted protos to identify message (#546)
* add stable/unstable auto bumps
* fix auto-bump CI
* merge nbc auto bump with CI in order to bump only on CI success
* put conditional locks on nbc bump (#549)
* Fix minor exception issues (#550)
Makes code compatible with
https://github.com/status-im/nim-chronos/pull/166 without requiring it.
* fix nimbus ref for auto-bump stable's PR
* use a builder pattern to build the switch (#551)
* use a builder pattern to build the switch
* with with
* more refs
* builders (#559)
* More builders (#560)
* address some issues pointed out in review
* re-add to prevent breaking other projects
* mem usage cleanups for pubsub (#564)
In `async` functions, a closure environment is created for variables
that cross an await boundary - this closure environment is kept in
memory for the lifetime of the associated future - this means that
although _some_ variables are no longer used, they still take up memory
for a long time.
In Nimbus, message validation is processed in batches meaning the future
of an incoming gossip message stays around for quite a while - this
leads to memory consumption peaks of 100-200 mb when there are many
attestations in the pipeline.
To avoid excessive memory usage, it's generally better to move non-async
code into proc's such that the variables therein can be released earlier
- this includes the many hidden variables introduced by macro and
template expansion (ie chronicles that does expensive exception
handling)
* move seen table salt to floodsub, use there as well
* shorten seen table salt to size of hash
* avoid unnecessary memory allocations and copies in a few places
* factor out message scoring
* avoid reencoding outgoing message for every peer
* keep checking validators until reject (in case there's both reject and
ignore)
* `readOnce` avoids `readExactly` overhead for single-byte read
* genericAssign -> assign2
* More gossip coverage (#553)
* add floodPublish test
* test delivery via control Iwant/have mechanics
* fix issues in control, and add testing
* fix possible backoff issue with pruned routine overriding it
* fix control messages (#566)
* remove unused control graft check in handleControl
* avoid sending empty Iwant messages
* Split dialer (#542)
* extracting dialing logic to dialer
* exposing upgrade methods on transport
* cleanup
* fixing tests to use new interfaces
* add comments
* add base exception class and fix hierarchy
* fix imports
* Merge master (#555)
* Revisit Floodsub (#543)
Fixes#525
add coverage to unsubscribeAll and testing
* add mounted protos to identify message (#546)
* add stable/unstable auto bumps
* fix auto-bump CI
* merge nbc auto bump with CI in order to bump only on CI success
* put conditional locks on nbc bump (#549)
* Fix minor exception issues (#550)
Makes code compatible with
https://github.com/status-im/nim-chronos/pull/166 without requiring it.
* fix nimbus ref for auto-bump stable's PR
* Split dialer (#542)
* extracting dialing logic to dialer
* exposing upgrade methods on transport
* cleanup
* fixing tests to use new interfaces
* add comments
* add base exception class and fix hierarchy
* fix imports
* `doAssert` is `ValueError` not `AssertionError`?
* revert back to `AssertionError`
Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
Co-authored-by: Jacek Sieka <jacek@status.im>
* Builders (#558)
* use a builder pattern to build the switch (#551)
* use a builder pattern to build the switch
* with with
* more refs
* Merge master (#555)
* Revisit Floodsub (#543)
Fixes#525
add coverage to unsubscribeAll and testing
* add mounted protos to identify message (#546)
* add stable/unstable auto bumps
* fix auto-bump CI
* merge nbc auto bump with CI in order to bump only on CI success
* put conditional locks on nbc bump (#549)
* Fix minor exception issues (#550)
Makes code compatible with
https://github.com/status-im/nim-chronos/pull/166 without requiring it.
* fix nimbus ref for auto-bump stable's PR
* Split dialer (#542)
* extracting dialing logic to dialer
* exposing upgrade methods on transport
* cleanup
* fixing tests to use new interfaces
* add comments
* add base exception class and fix hierarchy
* fix imports
* `doAssert` is `ValueError` not `AssertionError`?
* revert back to `AssertionError`
Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
Co-authored-by: Jacek Sieka <jacek@status.im>
* `doAssert` is `ValueError` not `AssertionError`?
* revert back to `AssertionError`
* fix builders
* more builder stuff
* more builders
Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
Co-authored-by: Jacek Sieka <jacek@status.im>
Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
Co-authored-by: Jacek Sieka <jacek@status.im>
* builders (#559)
* More builders (#560)
* address some issues pointed out in review
* re-add to prevent breaking other projects
* Merge master (#555)
* Revisit Floodsub (#543)
Fixes#525
add coverage to unsubscribeAll and testing
* add mounted protos to identify message (#546)
* add stable/unstable auto bumps
* fix auto-bump CI
* merge nbc auto bump with CI in order to bump only on CI success
* put conditional locks on nbc bump (#549)
* Fix minor exception issues (#550)
Makes code compatible with
https://github.com/status-im/nim-chronos/pull/166 without requiring it.
* fix nimbus ref for auto-bump stable's PR
* Split dialer (#542)
* extracting dialing logic to dialer
* exposing upgrade methods on transport
* cleanup
* fixing tests to use new interfaces
* add comments
* add base exception class and fix hierarchy
* fix imports
* `doAssert` is `ValueError` not `AssertionError`?
* revert back to `AssertionError`
Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
Co-authored-by: Jacek Sieka <jacek@status.im>
Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
Co-authored-by: Jacek Sieka <jacek@status.im>
* use a builder pattern to build the switch (#551)
* use a builder pattern to build the switch
* with with
* more refs
* Merge master (#555)
* Revisit Floodsub (#543)
Fixes#525
add coverage to unsubscribeAll and testing
* add mounted protos to identify message (#546)
* add stable/unstable auto bumps
* fix auto-bump CI
* merge nbc auto bump with CI in order to bump only on CI success
* put conditional locks on nbc bump (#549)
* Fix minor exception issues (#550)
Makes code compatible with
https://github.com/status-im/nim-chronos/pull/166 without requiring it.
* fix nimbus ref for auto-bump stable's PR
* Split dialer (#542)
* extracting dialing logic to dialer
* exposing upgrade methods on transport
* cleanup
* fixing tests to use new interfaces
* add comments
* add base exception class and fix hierarchy
* fix imports
* `doAssert` is `ValueError` not `AssertionError`?
* revert back to `AssertionError`
Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
Co-authored-by: Jacek Sieka <jacek@status.im>
* `doAssert` is `ValueError` not `AssertionError`?
* revert back to `AssertionError`
* fix builders
* more builder stuff
* more builders
Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
Co-authored-by: Jacek Sieka <jacek@status.im>
* adding raises defect across the codebase
* use unittest2
* add windows deps caching
* update mingw link
* die on failed peerinfo initialization
* use result.expect instead of get
* use expect more consistently and rework inits
* use expect more consistently
* throw on missing public key
* remove unused closure annotation
* merge master
* gossipsub: unsubscribe fixes
* fix KeyError when updating metric of unsubscribed topic
* fix unsubscribe message not being sent to all peers causing them to
keep thinking we're still subscribed
* release memory earlier in a few places
* floodsub fix
* add floodPublish test
* test delivery via control Iwant/have mechanics
* fix issues in control, and add testing
* fix possible backoff issue with pruned routine overriding it
In `async` functions, a closure environment is created for variables
that cross an await boundary - this closure environment is kept in
memory for the lifetime of the associated future - this means that
although _some_ variables are no longer used, they still take up memory
for a long time.
In Nimbus, message validation is processed in batches meaning the future
of an incoming gossip message stays around for quite a while - this
leads to memory consumption peaks of 100-200 mb when there are many
attestations in the pipeline.
To avoid excessive memory usage, it's generally better to move non-async
code into proc's such that the variables therein can be released earlier
- this includes the many hidden variables introduced by macro and
template expansion (ie chronicles that does expensive exception
handling)
* move seen table salt to floodsub, use there as well
* shorten seen table salt to size of hash
* avoid unnecessary memory allocations and copies in a few places
* factor out message scoring
* avoid reencoding outgoing message for every peer
* keep checking validators until reject (in case there's both reject and
ignore)
* `readOnce` avoids `readExactly` overhead for single-byte read
* genericAssign -> assign2
* properly propagate initiator information for gossipsub
* Fix pubsubpeer lifetime management
* restore old behavior
* tests fixing
* clamp backoff time value received
* fix member name collisions
* internal test fixes
* better names and explaining of the importance of transport direction
* fixes
* Address Book POC implementation (#499)
* Address Book POC implementation
* Feat/peerstore impl (#505)
Co-authored-by: Hanno Cornelius <68783915+jm-clius@users.noreply.github.com>
* Refactor gossipsub into multiple modules
* splitup further gossipsub
* move more mesh related stuff to behavior
* fix internal tests
* fix PubSubPeer.outbound flag, make it more reliable
* use discard rather then _
* master merge
* wip
* avoid deadlocks
* tcp limits
* expose client field in chronosstream
* limit incoming connections
* update with new listen api
* fix release
* don't override peerinfo in connection
* rework transport with accept
* use semaphore to track resource ussage
* rework with new transport accept api
* move events to conn manager (#373)
* use semaphore to track resource ussage
* merge master
* expose api to acquire conn slots
* don't fail expensive metrics
* allow tracking and updating connections
* set global connection limits to 80
* add per peer connection limits
* make sure conn is closed if tracking failed
* more descriptive naming for handle
* rework with new transport accept api
* add `getStream` hide `selectConn`
* add TransportClosedError
* make nil explicit
* don't make unnecessary copies of message
* logging
* error handling
* cleanup semaphore
* track connections properly
* throw `TooManyConnections` when tracking outgoing
* use proper exception and handle conventions
* check onCloseHandle for nil
* revert internalConnect changes
* adding upgraded flag
* await stream before closing
* simplify tracking
* wip
* logging
* split connection limits into incoming and outgoing
* further streamline connection limits split counts
* don't use closeWithEOF
* move peer and conn event triggers from switch
* wip
* wip
* wip
* merge master
* handle nil connections properly
* add clarifying comment
* don't raise exc on nil
* no finally
* add proper min/max connections logic
* rebase master
* merge master
* master merge
* remove request timeout
should be addressed in separate PR
* merge master
* share semaphore when in/out limits arent enforced
* merge master
* use import
* pass semaphore to trackConn
* don't close last conn
* use storeConn
* merge master
* use storeConn
* Remove unused connections in pubsubpeer, also removed wrong usages, add a disconnect bad peers parameter
* handle exceptions in disconnectPeer
* small fix
* use the proper disconnection procedure for gossip peers
* fixes, more metrics add test about disconnection
* hot fix possible null pointers in switch
* silly isnil sugar
* Fix and test gossip directPeer connections
* add proper cancelation handling
* remove cancelled futures explicitly
* use fifo to keep proper order
* add out of order cancelations test
* make count public
* use `new` instead of `init`
* remove private `queue` from tests
* expose count as a readonly prop
* use `delete()` to preserve seq order
* salt ids in seen table
* add subscription validation callback and avoid processing topics we don't care of
* apply penalty on bad subscription
* fix IHave handling IDs
* reduce indenting, add some comments
* fix gossip randombytes generation
* do not descore unwanted topics (might happen, due to timing, needs improvements)
* cleaning up and added tests
* validate subscriptions only when subscribing
* set notice level for failed publish
* fix floodsub behavior
* adding an upgraded event to conn
* set stopped flag asap
* trigger upgradded event on conn
* set concurrency limit for accepts
* backporting semaphore from tcp-limits2
* export unittests module
* make params explicit
* tone down debug logs
* adding semaphore tests
* use semaphore to throttle concurent upgrades
* add libp2p scope
* trigger upgraded event before any other events
* add event handler for connection upgrade
* cleanup upgraded event on conn close
* make upgrades slot release rebust
* dont forget to release slot on nil connection
* misc
* make sure semaphore is always released
* minor improvements and a nil check
* removing unneeded comment
* make upgradeMonitor a non-closure proc
* make sure the `upgraded` event is initialized
* handle exceptions in accepts when stopping
* don't leak exceptions when stopping accept loops
* add more traces, remove async from rebalance
* more traces
* avoid computng scores when weight is 0.0
* debug colocation, fix an indent in unsubpeer (minor)
* add full ValidationResult coverage
* store in cache only after validation
* gossip 1.0 fixes
* fix typo
* gossip 10 internal test fixes
* test fixing
* refactor peerstats usages
* populate tables if missing when scoring
* streamline socket read/write hot path
This avoids some unnecessary memory copying on the hot path of noise /
mplex, as well as getting rid of a few futures - profiling shows that
this is one of the main culprits of small memory allocations, which
makes sense - this is where gossip fan-out happens.
* fewer futures (and corresponding closures) when sending lpchannel
messages
* avoid data copies when encrypting and framing noise messages
* avoid copying tuple when reading noise data (poor c codegen)
* fix setting eof flag in secure read
* write noise frames in one go
...and closing secure socket once is enough
* check that connection is not closed or eof
* don't release connection lock prematurely
* test that only valid connections can be added
* correct exception type on closed connection
* add clarifying comment
* use closeWithEOF for more stable test
* misc comments
* log stream id in buffestream asserts
* use closeWithEOF to prevent races in tests
* give some time to the remote handler to trigger
* adding more tests to make codecov happy