Commit Graph

104 Commits

Author SHA1 Message Date
Tanguy 1a97d0a2f5
Validate pubsub subscriptions (#627)
* Check topic before subscribing
* Block subscribe to invalid topics
2022-01-14 12:40:30 -06:00
Tanguy df566e69db
Fixes for style check (#676) 2021-12-16 11:05:20 +01:00
Tanguy 5885e03680
Add maxMessageSize option to pubsub (#634)
* Add maxMessageSize option to pubsub
* Switched default to 1mb
2021-10-25 12:58:38 +02:00
Tanguy 846baf3853
Various cleanups part 1 (#632)
* raise -> raise exc
* replace stdlib random with bearssl
* object init -> new
* Remove deprecated procs
* getMandatoryField
2021-10-25 10:26:32 +02:00
Menduist d02735dc46
Remove peer info (#610)
Peer Info is now for local peer data only.
For other peers info, use the peer store.

Previous reference to peer info are replaced with the peerid
2021-09-08 11:07:46 +02:00
Tanguy Cizain 93156447ba
Peer Store implement part II (#586)
* Connect & Peer event handlers now receive a peerinfo

* small peerstore refacto

* implement peerstore in switch

* changed PeerStore to final ref object

* revert libp2p/builders.nim
2021-06-08 18:55:24 +02:00
Tanguy Cizain caac8191d2
Change newXXXX procs to XXXX.new (#585)
* newBufferStream -> BufferStream.new

* newMultistream -> MultistreamSelect.new

* newSecio -> Secio.new

* newNoise -> Noise.new

* newPlainText -> PlainText.new

* newPubSubPeer -> PubSubPeer.new

* newIdentify -> Identify.new

* newMuxerProvider -> MuxerProvider.new
2021-06-07 09:32:08 +02:00
Dmitriy Ryajov ac4e060e1a
adding raises defect across the codebase (#572)
* adding raises defect across the codebase

* use unittest2

* add windows deps caching

* update mingw link

* die on failed peerinfo initialization

* use result.expect instead of get

* use expect more consistently and rework inits

* use expect more consistently

* throw on missing public key

* remove unused closure annotation

* merge master
2021-05-21 10:27:01 -06:00
Jacek Sieka 83a20a992a
gossipsub: unsubscribe fixes (#569)
* gossipsub: unsubscribe fixes

* fix KeyError when updating metric of unsubscribed topic
* fix unsubscribe message not being sent to all peers causing them to
keep thinking we're still subscribed
* release memory earlier in a few places

* floodsub fix
2021-05-07 00:43:45 +02:00
Jacek Sieka e285d8bbf4
mem usage cleanups for pubsub (#564)
In `async` functions, a closure environment is created for variables
that cross an await boundary - this closure environment is kept in
memory for the lifetime of the associated future - this means that
although _some_ variables are no longer used, they still take up memory
for a long time.

In Nimbus, message validation is processed in batches meaning the future
of an incoming gossip message stays around for quite a while - this
leads to memory consumption peaks of 100-200 mb when there are many
attestations in the pipeline.

To avoid excessive memory usage, it's generally better to move non-async
code into proc's such that the variables therein can be released earlier
- this includes the many hidden variables introduced by macro and
template expansion (ie chronicles that does expensive exception
handling)

* move seen table salt to floodsub, use there as well
* shorten seen table salt to size of hash
* avoid unnecessary memory allocations and copies in a few places
* factor out message scoring
* avoid reencoding outgoing message for every peer
* keep checking validators until reject (in case there's both reject and
ignore)
* `readOnce` avoids `readExactly` overhead for single-byte read
* genericAssign -> assign2
2021-04-18 10:08:33 +02:00
Jacek Sieka 70deac9e0d
fix peer score accumulation (#541)
* fix accumulating peer score
* fix missing exception handling
* remove unnecessary initHashSet/initTable calls
* simplify peer stats management
* clean up tests a little
* fix some missing raises annotations
2021-03-09 13:22:52 +01:00
Giovanni Petrantoni 02ad017107
Gossipsub fixes and Initiator flagging fixes (#539)
* properly propagate initiator information for gossipsub

* Fix pubsubpeer lifetime management

* restore old behavior

* tests fixing

* clamp backoff time value received

* fix member name collisions

* internal test fixes

* better names and explaining of the importance of transport direction

* fixes
2021-03-03 08:23:40 +09:00
Giovanni Petrantoni e124e342b0
n subscription limits (#528)
* subscription high water, cleanups

* subscription limits test

* newline
2021-02-12 12:27:26 +09:00
Giovanni Petrantoni fd73cf9f9d
Refactor gossipsub into multiple modules (#515)
* Refactor gossipsub into multiple modules

* splitup further gossipsub

* move more mesh related stuff to behavior

* fix internal tests

* fix PubSubPeer.outbound flag, make it more reliable

* use discard rather then _
2021-02-06 09:13:04 +09:00
Giovanni Petrantoni dc48170b0d
Gossip subscription improvements (#497)
* salt ids in seen table

* add subscription validation callback and avoid processing topics we don't care of

* apply penalty on bad subscription

* fix IHave handling IDs

* reduce indenting, add some comments

* fix gossip randombytes generation

* do not descore unwanted topics (might happen, due to timing, needs improvements)

* cleaning up and added tests

* validate subscriptions only when subscribing

* set notice level for failed publish

* fix floodsub behavior
2021-01-13 23:49:44 +09:00
Giovanni Petrantoni b902c030a0
add metrics into chronosstream to identify peers agents (#458)
* add metrics into chronosstream to identify peers agents

* avoid too many agent strings

* use gauge instead of counter for stream metrics

* filter identity on /

* also track bytes traffic

* fix identity tracking closeimpl call

* add gossip rpc metrics

* fix missing metrics inclusions

* metrics fixes and additions

* add a KnownLibP2PAgents strdefine

* enforse toLowerAscii to agent names (metrics)

* incoming rpc metrics

* fix silly mistake in rpc metrics

* fix agent metrics logic

* libp2p_gossipsub_failed_publish metric

* message ids metrics

* libp2p_pubsub_broadcast_ihave metric improvement

* refactor expensive gossip metrics

* more detailed metrics

* metrics improvements

* remove generic metrics for `set` users

* small fixes, add debug counters

* fix counter and add missing subs metrics!

* agent metrics behind -d:libp2p_agents_metrics

* remove testing related code from this PR

* small rebroadcast metric fix

* fix small mistake

* add some guide to the readme in order to use new metrics

* add libp2p_gossipsub_peers_scores metric

* add protobuf metrics to understand bytes traffic precisely

* refactor gossipsub metrics

* remove unused variable

* add more metrics, refactor rebalance metrics

* avoid bad metric concurrent states

* use a stack structure for gossip mesh metrics

* refine sub metrics

* add received subs metrics fixes

* measure handlers of known topics

* sub/unsub counter

* unsubscribeAll log unknown topics

* expose a way to specify known topics at runtime
2021-01-08 14:21:24 +09:00
Giovanni Petrantoni 05e789a34f
Gossipsub refactor (#490)
* refactor peerStats, re-enable scores for testing

* remove gossip 1.0

* cleanup

* codecov matrix fixes

* restore previous score on onNewPeer

* fix coverage n checks

* unsubscribeAll gossipsub fixes

* refactor unsub/sub

* refactor onNewPeer and fix score flow

* disable scores by default (change in tests later)

* fix tests, enable scores in tests

* fix wrongly merged test

* ensure topic removal from topics table

* small typo fix

* testinterop fixes
2020-12-19 15:43:32 +01:00
Giovanni Petrantoni 6c2e743ff3
Race condition in pubsub #469 (#471)
* Race condition in pubsub #469

* use allFinished

* improve cancellation handling
2020-12-19 00:56:46 +09:00
Dmitriy Ryajov d1c689e5ab
adding libp2p tag to logScope (#465) 2020-12-01 11:34:27 -06:00
Dmitriy Ryajov 18443dafc1
rework peer event to take an initiator flag (#456)
* rework peer event to take an initiator flag

* use correct direction for initiator
2020-11-28 10:59:47 -06:00
Giovanni Petrantoni 809df8d04d add some extra gossip metrics 2020-11-26 16:20:34 +09:00
Dmitriy Ryajov 90921bff09
move some importance trace logs to debug (#428) 2020-11-09 22:14:46 -06:00
Jacek Sieka 03639f1446
Revert "Channel leaks (#413)" (#417)
This reverts commit 1de1d49223.
2020-11-01 14:49:25 -06:00
Giovanni Petrantoni 75b023c9e5
gossipsub audit fixes (#412)
* [SEC] gossipsub - rebalanceMesh grafts peers giving preference to low scores #405

* comment score choices

* compiler warning fixes/bug fixes (unsubscribe)

* rebalanceMesh does not enforce D_out quota

* fix outbound grafting

* fight the nim compiler

* fix closure capture bs...

* another closure fix

* #403 rebalance prune fixes

* more test fixing

* #403 fixes

* #402 avoid removing scores on unsub

* #401 handleGraft improvements

* [SEC] handleIHAVE/handleIWANT recommendations

* add a note about peer exchange handling
2020-10-30 21:49:54 +09:00
Dmitriy Ryajov 1de1d49223
Channel leaks (#413)
* break stream tracking by type

* use closeWithEOF to await wrapped stream

* fix cancelation leaks

* fix channel leaks

* logging

* use close monitor and always call closeUnderlying

* don't use closeWithEOF

* removing close monitor

* logging
2020-10-27 11:21:03 -06:00
Giovanni Petrantoni 462da1f7a8
gossip MessageID as seq[byte] (#391)
* gossip MessageID as seq[byte]

* combina hashes in defaultMsgIdProvider

* wip

* fix defaultMsgIdProvider
2020-10-21 12:26:04 +09:00
Giovanni Petrantoni 556213abf4
Extended validators (#395)
* gossip extended validation

* fix flood tests

* fix gossip 1.0 tests

* synthax consistency
2020-10-12 16:56:00 +09:00
Giovanni Petrantoni 98d0cc3a16
defaultMsgIdProvider alternative/test anonymize (#379)
* defaultMsgIdProvider alternative/test anonymize

* avoid freeze during flood tests

* avoid `empty message, skipping` situation

* test observers

* avoid double initPubSub

* fix gossip testing (specially when anonymize is on)

* make azure tests shorter
2020-09-28 09:11:18 +02:00
Jacek Sieka 8ecef46738
reencode gossipsub messages with anonymization (#378)
This helps protect against clients sending more data than they should
and thus getting penalized on topics that require anonymity
2020-09-25 18:39:34 +02:00
Jacek Sieka 25bd0a18f4
small fixes (#374)
* add helper to read EOF marker after closing stream (else stream stay
alive until timeout/reset)
* don't assert on empty channel message
* don't loop when writing to chronos (no need)
2020-09-24 07:30:19 +02:00
Giovanni Petrantoni ec322124ac
allow to omit peerId and seqno (#372)
* allow to omit peerId and seqno

* small refactor

* wip

* fix message encoding

* improve rpc signature logic

* remove peerid from verify

* trace fixes

* fix message test

* fix gossip 1.0 tests
2020-09-23 17:56:33 +02:00
Jacek Sieka 471e5906f6
fix gossipsub memory leak on disconnected peer (#371)
When messages can't be sent to peer, we try to establish a send
connection - this causes messages to stack up as more and more unsent
messages are blocked on the dial lock.

* remove dial lock
* run reconnection loop in background task
2020-09-22 09:05:53 +02:00
Giovanni Petrantoni b99d2039a8
Gossip one one (#240)
* allow multiple codecs per protocol (without breaking things)

* add 1.1 protocol to gossip

* explicit peering part 1

* explicit peering part 2

* explicit peering part 3

* PeerInfo and ControlPrune protocols

* fix encodePrune

* validated always, even explicit peers

* prune by score (score is stub still)

* add a way to pass parameters to gossip

* standard setup fixes

* take into account explicit direct peers in publish

* add floodPublish logic

* small fixes, publish still half broken

* make sure to waitsub in sparse test

* use var semantics to optimize table access

* wip... lvalues don't work properly sadly...

* big publish refactor, replenish and balance

* fix internal tests

* use g.peers for fanout (todo: don't include flood peers)

* exclude non gossip from fanout

* internal test fixes

* fix flood tests

* fix test's trypublish

* test interop fixes

* make sure to not remove peers from gossip table

* restore old replenishFanout

* cleanups

* restore utility module import

* restore trace vs debug in gossip

* improve fanout replenish behavior further

* triage publish nil peers (issue is on master too but just hidden behind a if/in)

* getGossipPeers fixes

* remove topics from pubsubpeer (was unused)

* simplify rebalanceMesh (following spec) and make it finally reach D_high

* better diagnostics

* merge new pubsubpeer, copy 1.1 to new module

* fix up merge

* conditional enable gossip11 module

* add back topics in peers, re-enable flood publish

* add more heartbeat locking to prevent races

* actually lock the heartbeat

* minor fixes

* with sugar

* merge 1.0

* remove assertion in publish

* fix multistream 1.1 multi proto

* Fix merge oops

* wip

* fix gossip 11 upstream

* gossipsub11 -> gossipsub

* support interop testing

* tests fixing

* fix directchat build

* control prune updates (pb)

* wip parameters

* gossip internal tests fixes

* parameters wip

* finishup with params

* cleanups/wip

* small sugar

* grafted and pruned procs

* wip updateScores

* wip

* fix logging issue

* pubsubpeer, chronicles explicit override

* fix internal gossip tests

* wip

* tables troubleshooting

* score wip

* score wip

* fixes

* fix test utils generateNodes

* don't delete while iterating in score update

* fix grafted defect

* add a handleConnect in subscribeTopic

* pruning improvements

* wip

* score fixes

* post merge - builds gossip tests

* further merge fixes

* rebalance improvements and opportunistic grafting

* fix test for now

* restore explicit peering

* implement peer exchange graft message

* add an hard cap to PX

* backoff time management

* IWANT cap/budget

* Adaptive gossip dissemination

* outbound mesh quota, internal tests fixing

* oversub prune score based, finish outbound quota

* finishup with score and ihave budget

* use go daemon 0.3.0

* import fixes

* byScore cleanup score sorting

* remove pointless scaling in `/` Duration operator

* revert using libp2p org for daemon

* interop fixes

* fixes and cleanup

* remove heartbeat assertion, minor debug fixes

* logging improvements and cleaning up

* (to revert) add some traces

* add explicit topic to gossip rpcs

* pubsub merge fixes and type fix in switch

* Revert "(to revert) add some traces"

This reverts commit 4663eaab6cc336c81cee50bc54025cf0b7bcbd99.

* cleanup some now irrelevant todo

* shuffle peers anyway as score might be disabled

* add missing shuffle

* old merge fix

* more merge fixes

* debug improvements

* re-enable gossip internal tests

* add gossip10 fallback (dormant but tested)

* split gossipsub internal tests into 1.0 and 1.1

Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com>
2020-09-21 11:16:29 +02:00
Dmitriy Ryajov b0d86b95dd
add peer lifecycle events (#357)
* add peer lifecycle events

* rework peer events to not use connection events

* don't use result in pubsub and switch init

* wip

* use ordered hashes and remove logscope

* logging

* add missing test

* small fixes
2020-09-15 14:19:22 -06:00
Jacek Sieka c1856fda53
simplify and unify logging (#353)
* use short format for logging peerid
* log peerid:oid for connections
2020-09-06 10:31:47 +02:00
Jacek Sieka 6d91d61844
small cleanups & docs (#347)
* simplify gossipsub heartbeat start / stop
* avoid alloc in peerid check
* stop iterating over seq after unsubscribing item (could crash)
* don't crash on missing private key with enabled sigs (shouldn't happen
but...)
2020-09-04 18:31:43 +02:00
Jacek Sieka 5819c6a9a7
gossipsub / floodsub fixes (#348)
* mcache fixes

* remove timed cache - the window shifting already removes old messages
* ref -> object
* avoid unnecessary allocations with `[]` operator

* simplify init

* fix several gossipsub/floodsub issues

* floodsub, gossipsub: don't rebroadcast messages that fail validation
(!)
* floodsub, gossipsub: don't crash when unsubscribing from unknown
topics (!)
* gossipsub: don't send message to peers that are not interested in the
topic, when messages don't share topic list
* floodsub: don't repeat all messages for each message when
rebroadcasting
* floodsub: allow sending empty data
* floodsub: fix inefficient unsubscribe
* sync floodsub/gossipsub logging
* gossipsub: include incoming messages in mcache (!)
* gossipsub: don't rebroadcast already-seen messages (!)
* pubsubpeer: remove incoming/outgoing seen caches - these are already
handled in gossipsub, floodsub and will cause trouble when peers try to
resubscribe / regraft topics (because control messages will have same
digest)
* timedcache: reimplement without timers (fixes timer leaks and extreme
inefficiency due to per-message closures, futures etc)
* timedcache: ref -> obj
2020-09-04 08:10:32 +02:00
Jacek Sieka cd1c68dbc5
avoid send deadlock by not allowing send to block (#342)
* avoid send deadlock by not allowing send to block

* handle message issues more consistently
2020-09-01 09:33:03 +02:00
Dmitriy Ryajov d3182c4dba
No raise send (#339)
* dont raise in send

* check that the lock is acquire on release
2020-08-20 20:50:33 -06:00
Jacek Sieka eb13845f65 work around send that may raise
`send` can raise exceptions that together with asyncCheck will
crash NBC
2020-08-19 14:25:30 +03:00
Jacek Sieka f46bf0faa4
remove send lock (#334)
* remove send lock

When mplex receives data it will block until a reader has processed the
data. Thus, when a large message is received, such as a gossipsub
subscription table, all of mplex will be blocked until all reading is
finished.

However, if at the same time a `dial` to establish a gossipsub send
connection is ongoing, that `dial` will be blocked because mplex is no
longer reading data - specifically, it might indeed be the connection
that's processing the previous data that is waiting for a send
connection.

There are other problems with the current code:
* If an exception is raised, it is not necessarily raised for the same
connection as `p.sendConn`, so resetting `p.sendConn` in the exception
handling is wrong
* `p.isConnected` is checked before taking the lock - thus, if it
returns false, a new dial will be started. If a new task enters `send`
before dial is finished, it will also determine `p.isConnected` is
false, then get stuck on the lock - when the previous task finishes and
releases the lock, the new task will _also_ dial and thus reset
`p.sendConn` causing a leak.

* prefer existing connection

simplifies flow
2020-08-17 12:38:27 +02:00
Jacek Sieka b12145dff7
avoid crash when subscribe is received (#333)
...by making subscribeTopic synchronous, avoiding a peer table lookup
completely.

rebalanceMesh will be called a second later - it's fine
2020-08-17 12:10:22 +02:00
Jacek Sieka ab864fc747
logging cleanups and small fixes (#331) 2020-08-15 21:50:31 +02:00
Dmitriy Ryajov b76b3e0e9b
Rework pubsub (#322)
* move pubsub of off switch, pass switch into pubsub

* use join on lpstreams

* properly cleanup up failed peers

* fix tests

* fix peertable hasPeerId

* fix tests

* rework sending, remove helpers from pubsubpeer, unify in broadcast

* further split broadcast into send

* use send where appropriate

* use formatIt

* improve trace

Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-08-11 18:05:49 -06:00
Jacek Sieka c6c0c152c0
Dial peerid (#308)
* prefer PeerID in switch api

This avoids ref issues like ref identity and nil

* use existing peerinfo instance if possible

* remove secureCodec

there may be multiple connections per peerinfo with different codecs

* avoid some extra async::
2020-08-06 09:29:27 +02:00
Giovanni Petrantoni 9bbe5e4841
Fix subclass calls to handleDisconnect (#314)
* Fix subclass calls to handleDisconnect

* add peer ID to nil peer debug message
2020-08-06 11:12:52 +09:00
Ștefan Talpalaru 843d32f8db
put expensive metrics under a Nim define (#310) 2020-08-04 17:27:59 -06:00
Dmitriy Ryajov 980764774e
pubsub timeouts tuning (#295)
* add finegrained timeouts to pubsub

* use 10 millis timeout in tests

* finalization

* revert timeouts

* use `atEof` for reads

* adjust timeouts and use atEof for reads

* use atEof for reads

* set isEof flag

* no backoff for pubsub streams

* temp timer increase, make macos finalize

* don't call `subscribePeer` in libp2p anymore

* more traces

* leak tests

* lower timeouts

* handle exceptions in control message

* don't use `cancelAndWait`

* handle exceptions in helpers

* wip

* don't send empty messages

* check for leaks properly

* don't use cancelAndWait

* don't await subscribption sends

* remove subscrivePeer calls from switch

* trying without the hooks again
2020-08-02 23:20:11 -06:00
Jacek Sieka e655a510cd
misc cleanups (#303) 2020-08-02 12:22:49 +02:00
Dmitriy Ryajov f7fdf31365
Pubsub lifetime (#284)
* lifecycle hooks

* tests

* move trace after closed check

* restore 1 second heartbeat

* await close event

* fix tests

* print direction string

* more trace logging

* add pubsub monitor

* add log scope

* adjust idle timeout

* add exc.msg to trace
2020-07-27 13:33:51 -06:00