Giovanni Petrantoni
b902c030a0
add metrics into chronosstream to identify peers agents ( #458 )
...
* add metrics into chronosstream to identify peers agents
* avoid too many agent strings
* use gauge instead of counter for stream metrics
* filter identity on /
* also track bytes traffic
* fix identity tracking closeimpl call
* add gossip rpc metrics
* fix missing metrics inclusions
* metrics fixes and additions
* add a KnownLibP2PAgents strdefine
* enforse toLowerAscii to agent names (metrics)
* incoming rpc metrics
* fix silly mistake in rpc metrics
* fix agent metrics logic
* libp2p_gossipsub_failed_publish metric
* message ids metrics
* libp2p_pubsub_broadcast_ihave metric improvement
* refactor expensive gossip metrics
* more detailed metrics
* metrics improvements
* remove generic metrics for `set` users
* small fixes, add debug counters
* fix counter and add missing subs metrics!
* agent metrics behind -d:libp2p_agents_metrics
* remove testing related code from this PR
* small rebroadcast metric fix
* fix small mistake
* add some guide to the readme in order to use new metrics
* add libp2p_gossipsub_peers_scores metric
* add protobuf metrics to understand bytes traffic precisely
* refactor gossipsub metrics
* remove unused variable
* add more metrics, refactor rebalance metrics
* avoid bad metric concurrent states
* use a stack structure for gossip mesh metrics
* refine sub metrics
* add received subs metrics fixes
* measure handlers of known topics
* sub/unsub counter
* unsubscribeAll log unknown topics
* expose a way to specify known topics at runtime
2021-01-08 14:21:24 +09:00
Giovanni Petrantoni
05e789a34f
Gossipsub refactor ( #490 )
...
* refactor peerStats, re-enable scores for testing
* remove gossip 1.0
* cleanup
* codecov matrix fixes
* restore previous score on onNewPeer
* fix coverage n checks
* unsubscribeAll gossipsub fixes
* refactor unsub/sub
* refactor onNewPeer and fix score flow
* disable scores by default (change in tests later)
* fix tests, enable scores in tests
* fix wrongly merged test
* ensure topic removal from topics table
* small typo fix
* testinterop fixes
2020-12-19 15:43:32 +01:00
Giovanni Petrantoni
6c2e743ff3
Race condition in pubsub #469 ( #471 )
...
* Race condition in pubsub #469
* use allFinished
* improve cancellation handling
2020-12-19 00:56:46 +09:00
Dmitriy Ryajov
d1c689e5ab
adding libp2p tag to logScope ( #465 )
2020-12-01 11:34:27 -06:00
Dmitriy Ryajov
18443dafc1
rework peer event to take an initiator flag ( #456 )
...
* rework peer event to take an initiator flag
* use correct direction for initiator
2020-11-28 10:59:47 -06:00
Giovanni Petrantoni
809df8d04d
add some extra gossip metrics
2020-11-26 16:20:34 +09:00
Dmitriy Ryajov
90921bff09
move some importance trace logs to debug ( #428 )
2020-11-09 22:14:46 -06:00
Jacek Sieka
03639f1446
Revert "Channel leaks ( #413 )" ( #417 )
...
This reverts commit 1de1d49223
.
2020-11-01 14:49:25 -06:00
Giovanni Petrantoni
75b023c9e5
gossipsub audit fixes ( #412 )
...
* [SEC] gossipsub - rebalanceMesh grafts peers giving preference to low scores #405
* comment score choices
* compiler warning fixes/bug fixes (unsubscribe)
* rebalanceMesh does not enforce D_out quota
* fix outbound grafting
* fight the nim compiler
* fix closure capture bs...
* another closure fix
* #403 rebalance prune fixes
* more test fixing
* #403 fixes
* #402 avoid removing scores on unsub
* #401 handleGraft improvements
* [SEC] handleIHAVE/handleIWANT recommendations
* add a note about peer exchange handling
2020-10-30 21:49:54 +09:00
Dmitriy Ryajov
1de1d49223
Channel leaks ( #413 )
...
* break stream tracking by type
* use closeWithEOF to await wrapped stream
* fix cancelation leaks
* fix channel leaks
* logging
* use close monitor and always call closeUnderlying
* don't use closeWithEOF
* removing close monitor
* logging
2020-10-27 11:21:03 -06:00
Giovanni Petrantoni
462da1f7a8
gossip MessageID as seq[byte] ( #391 )
...
* gossip MessageID as seq[byte]
* combina hashes in defaultMsgIdProvider
* wip
* fix defaultMsgIdProvider
2020-10-21 12:26:04 +09:00
Giovanni Petrantoni
556213abf4
Extended validators ( #395 )
...
* gossip extended validation
* fix flood tests
* fix gossip 1.0 tests
* synthax consistency
2020-10-12 16:56:00 +09:00
Giovanni Petrantoni
98d0cc3a16
defaultMsgIdProvider alternative/test anonymize ( #379 )
...
* defaultMsgIdProvider alternative/test anonymize
* avoid freeze during flood tests
* avoid `empty message, skipping` situation
* test observers
* avoid double initPubSub
* fix gossip testing (specially when anonymize is on)
* make azure tests shorter
2020-09-28 09:11:18 +02:00
Jacek Sieka
8ecef46738
reencode gossipsub messages with anonymization ( #378 )
...
This helps protect against clients sending more data than they should
and thus getting penalized on topics that require anonymity
2020-09-25 18:39:34 +02:00
Jacek Sieka
25bd0a18f4
small fixes ( #374 )
...
* add helper to read EOF marker after closing stream (else stream stay
alive until timeout/reset)
* don't assert on empty channel message
* don't loop when writing to chronos (no need)
2020-09-24 07:30:19 +02:00
Giovanni Petrantoni
ec322124ac
allow to omit peerId and seqno ( #372 )
...
* allow to omit peerId and seqno
* small refactor
* wip
* fix message encoding
* improve rpc signature logic
* remove peerid from verify
* trace fixes
* fix message test
* fix gossip 1.0 tests
2020-09-23 17:56:33 +02:00
Jacek Sieka
471e5906f6
fix gossipsub memory leak on disconnected peer ( #371 )
...
When messages can't be sent to peer, we try to establish a send
connection - this causes messages to stack up as more and more unsent
messages are blocked on the dial lock.
* remove dial lock
* run reconnection loop in background task
2020-09-22 09:05:53 +02:00
Giovanni Petrantoni
b99d2039a8
Gossip one one ( #240 )
...
* allow multiple codecs per protocol (without breaking things)
* add 1.1 protocol to gossip
* explicit peering part 1
* explicit peering part 2
* explicit peering part 3
* PeerInfo and ControlPrune protocols
* fix encodePrune
* validated always, even explicit peers
* prune by score (score is stub still)
* add a way to pass parameters to gossip
* standard setup fixes
* take into account explicit direct peers in publish
* add floodPublish logic
* small fixes, publish still half broken
* make sure to waitsub in sparse test
* use var semantics to optimize table access
* wip... lvalues don't work properly sadly...
* big publish refactor, replenish and balance
* fix internal tests
* use g.peers for fanout (todo: don't include flood peers)
* exclude non gossip from fanout
* internal test fixes
* fix flood tests
* fix test's trypublish
* test interop fixes
* make sure to not remove peers from gossip table
* restore old replenishFanout
* cleanups
* restore utility module import
* restore trace vs debug in gossip
* improve fanout replenish behavior further
* triage publish nil peers (issue is on master too but just hidden behind a if/in)
* getGossipPeers fixes
* remove topics from pubsubpeer (was unused)
* simplify rebalanceMesh (following spec) and make it finally reach D_high
* better diagnostics
* merge new pubsubpeer, copy 1.1 to new module
* fix up merge
* conditional enable gossip11 module
* add back topics in peers, re-enable flood publish
* add more heartbeat locking to prevent races
* actually lock the heartbeat
* minor fixes
* with sugar
* merge 1.0
* remove assertion in publish
* fix multistream 1.1 multi proto
* Fix merge oops
* wip
* fix gossip 11 upstream
* gossipsub11 -> gossipsub
* support interop testing
* tests fixing
* fix directchat build
* control prune updates (pb)
* wip parameters
* gossip internal tests fixes
* parameters wip
* finishup with params
* cleanups/wip
* small sugar
* grafted and pruned procs
* wip updateScores
* wip
* fix logging issue
* pubsubpeer, chronicles explicit override
* fix internal gossip tests
* wip
* tables troubleshooting
* score wip
* score wip
* fixes
* fix test utils generateNodes
* don't delete while iterating in score update
* fix grafted defect
* add a handleConnect in subscribeTopic
* pruning improvements
* wip
* score fixes
* post merge - builds gossip tests
* further merge fixes
* rebalance improvements and opportunistic grafting
* fix test for now
* restore explicit peering
* implement peer exchange graft message
* add an hard cap to PX
* backoff time management
* IWANT cap/budget
* Adaptive gossip dissemination
* outbound mesh quota, internal tests fixing
* oversub prune score based, finish outbound quota
* finishup with score and ihave budget
* use go daemon 0.3.0
* import fixes
* byScore cleanup score sorting
* remove pointless scaling in `/` Duration operator
* revert using libp2p org for daemon
* interop fixes
* fixes and cleanup
* remove heartbeat assertion, minor debug fixes
* logging improvements and cleaning up
* (to revert) add some traces
* add explicit topic to gossip rpcs
* pubsub merge fixes and type fix in switch
* Revert "(to revert) add some traces"
This reverts commit 4663eaab6c
.
* cleanup some now irrelevant todo
* shuffle peers anyway as score might be disabled
* add missing shuffle
* old merge fix
* more merge fixes
* debug improvements
* re-enable gossip internal tests
* add gossip10 fallback (dormant but tested)
* split gossipsub internal tests into 1.0 and 1.1
Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com>
2020-09-21 11:16:29 +02:00
Dmitriy Ryajov
b0d86b95dd
add peer lifecycle events ( #357 )
...
* add peer lifecycle events
* rework peer events to not use connection events
* don't use result in pubsub and switch init
* wip
* use ordered hashes and remove logscope
* logging
* add missing test
* small fixes
2020-09-15 14:19:22 -06:00
Jacek Sieka
c1856fda53
simplify and unify logging ( #353 )
...
* use short format for logging peerid
* log peerid:oid for connections
2020-09-06 10:31:47 +02:00
Jacek Sieka
6d91d61844
small cleanups & docs ( #347 )
...
* simplify gossipsub heartbeat start / stop
* avoid alloc in peerid check
* stop iterating over seq after unsubscribing item (could crash)
* don't crash on missing private key with enabled sigs (shouldn't happen
but...)
2020-09-04 18:31:43 +02:00
Jacek Sieka
5819c6a9a7
gossipsub / floodsub fixes ( #348 )
...
* mcache fixes
* remove timed cache - the window shifting already removes old messages
* ref -> object
* avoid unnecessary allocations with `[]` operator
* simplify init
* fix several gossipsub/floodsub issues
* floodsub, gossipsub: don't rebroadcast messages that fail validation
(!)
* floodsub, gossipsub: don't crash when unsubscribing from unknown
topics (!)
* gossipsub: don't send message to peers that are not interested in the
topic, when messages don't share topic list
* floodsub: don't repeat all messages for each message when
rebroadcasting
* floodsub: allow sending empty data
* floodsub: fix inefficient unsubscribe
* sync floodsub/gossipsub logging
* gossipsub: include incoming messages in mcache (!)
* gossipsub: don't rebroadcast already-seen messages (!)
* pubsubpeer: remove incoming/outgoing seen caches - these are already
handled in gossipsub, floodsub and will cause trouble when peers try to
resubscribe / regraft topics (because control messages will have same
digest)
* timedcache: reimplement without timers (fixes timer leaks and extreme
inefficiency due to per-message closures, futures etc)
* timedcache: ref -> obj
2020-09-04 08:10:32 +02:00
Jacek Sieka
cd1c68dbc5
avoid send deadlock by not allowing send to block ( #342 )
...
* avoid send deadlock by not allowing send to block
* handle message issues more consistently
2020-09-01 09:33:03 +02:00
Dmitriy Ryajov
d3182c4dba
No raise send ( #339 )
...
* dont raise in send
* check that the lock is acquire on release
2020-08-20 20:50:33 -06:00
Jacek Sieka
eb13845f65
work around send that may raise
...
`send` can raise exceptions that together with asyncCheck will
crash NBC
2020-08-19 14:25:30 +03:00
Jacek Sieka
f46bf0faa4
remove send lock ( #334 )
...
* remove send lock
When mplex receives data it will block until a reader has processed the
data. Thus, when a large message is received, such as a gossipsub
subscription table, all of mplex will be blocked until all reading is
finished.
However, if at the same time a `dial` to establish a gossipsub send
connection is ongoing, that `dial` will be blocked because mplex is no
longer reading data - specifically, it might indeed be the connection
that's processing the previous data that is waiting for a send
connection.
There are other problems with the current code:
* If an exception is raised, it is not necessarily raised for the same
connection as `p.sendConn`, so resetting `p.sendConn` in the exception
handling is wrong
* `p.isConnected` is checked before taking the lock - thus, if it
returns false, a new dial will be started. If a new task enters `send`
before dial is finished, it will also determine `p.isConnected` is
false, then get stuck on the lock - when the previous task finishes and
releases the lock, the new task will _also_ dial and thus reset
`p.sendConn` causing a leak.
* prefer existing connection
simplifies flow
2020-08-17 12:38:27 +02:00
Jacek Sieka
b12145dff7
avoid crash when subscribe is received ( #333 )
...
...by making subscribeTopic synchronous, avoiding a peer table lookup
completely.
rebalanceMesh will be called a second later - it's fine
2020-08-17 12:10:22 +02:00
Jacek Sieka
ab864fc747
logging cleanups and small fixes ( #331 )
2020-08-15 21:50:31 +02:00
Dmitriy Ryajov
b76b3e0e9b
Rework pubsub ( #322 )
...
* move pubsub of off switch, pass switch into pubsub
* use join on lpstreams
* properly cleanup up failed peers
* fix tests
* fix peertable hasPeerId
* fix tests
* rework sending, remove helpers from pubsubpeer, unify in broadcast
* further split broadcast into send
* use send where appropriate
* use formatIt
* improve trace
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-08-11 18:05:49 -06:00
Jacek Sieka
c6c0c152c0
Dial peerid ( #308 )
...
* prefer PeerID in switch api
This avoids ref issues like ref identity and nil
* use existing peerinfo instance if possible
* remove secureCodec
there may be multiple connections per peerinfo with different codecs
* avoid some extra async::
2020-08-06 09:29:27 +02:00
Giovanni Petrantoni
9bbe5e4841
Fix subclass calls to handleDisconnect ( #314 )
...
* Fix subclass calls to handleDisconnect
* add peer ID to nil peer debug message
2020-08-06 11:12:52 +09:00
Ștefan Talpalaru
843d32f8db
put expensive metrics under a Nim define ( #310 )
2020-08-04 17:27:59 -06:00
Dmitriy Ryajov
980764774e
pubsub timeouts tuning ( #295 )
...
* add finegrained timeouts to pubsub
* use 10 millis timeout in tests
* finalization
* revert timeouts
* use `atEof` for reads
* adjust timeouts and use atEof for reads
* use atEof for reads
* set isEof flag
* no backoff for pubsub streams
* temp timer increase, make macos finalize
* don't call `subscribePeer` in libp2p anymore
* more traces
* leak tests
* lower timeouts
* handle exceptions in control message
* don't use `cancelAndWait`
* handle exceptions in helpers
* wip
* don't send empty messages
* check for leaks properly
* don't use cancelAndWait
* don't await subscribption sends
* remove subscrivePeer calls from switch
* trying without the hooks again
2020-08-02 23:20:11 -06:00
Jacek Sieka
e655a510cd
misc cleanups ( #303 )
2020-08-02 12:22:49 +02:00
Dmitriy Ryajov
f7fdf31365
Pubsub lifetime ( #284 )
...
* lifecycle hooks
* tests
* move trace after closed check
* restore 1 second heartbeat
* await close event
* fix tests
* print direction string
* more trace logging
* add pubsub monitor
* add log scope
* adjust idle timeout
* add exc.msg to trace
2020-07-27 13:33:51 -06:00
Giovanni Petrantoni
3b088f8980
Fix some unsubscribe issues and add unsubscribeAll helper ( #282 )
...
* Fix some unsub issues and add unsuball helper
* batch sendprune in unsubscribe methods
* add unsubscribeAll for floodsub
2020-07-20 10:16:13 -06:00
Dmitriy Ryajov
94196fee71
Connections and pubsub peers cleanup ( #279 )
...
* better peer tracking and cleanup
* check if peer and conn is nil
* test name
* make timeout more agressive
* rename method for better clarity
2020-07-17 13:46:24 -06:00
Dmitriy Ryajov
0348773ec9
Connection manager ( #277 )
...
* splitting out connection management
* wip
* wip conn mngr tests
* set peerinfo in contructor
* comments and documentation
* tests
* wip
* add `None` to detect untagged connections
* use `PeerID` to index connections
* fix tests
* remove useless equality
2020-07-17 09:36:48 -06:00
Jacek Sieka
170685f9c6
gossipsub fixes ( #276 )
...
* graft up to D peers
* fix logging so it's clear who is grafting/pruning who
* clear fanout when grafting
2020-07-16 21:26:57 +02:00
Jacek Sieka
c76152f2c1
Simplify send ( #271 )
...
* PubSubPeer.send single message
* gossipsub: simplify send further
2020-07-16 12:06:57 +02:00
Dmitriy Ryajov
f35b8999b3
some light cleanup for pub/gossip sub ( #273 )
...
* move peer table out to its own file
* move peer table
* cleanup `==` and add one to peerinfo
* add peertable
* missed equality check
2020-07-15 13:18:55 -06:00
Giovanni Petrantoni
d7bab37119
Fix gossip messages seqno according to spec ( #253 )
...
* Fix gossip messages seqno according to spec
* Add peers back to gossipsub table, slow down heartbeat
* Revert "Add peers back to gossipsub table, slow down heartbeat"
This reverts commit 01e2e62172
.
* make seqno a threadvar, remove from peerinfo
* seqno refactor, into pubsub
2020-07-14 21:51:33 -06:00
Jacek Sieka
87e58c1c8d
metrics: one more pubsub peers fix
2020-07-13 16:16:46 +02:00
Jacek Sieka
c7895ccc52
metrics: fix pubsub_peers add metric
2020-07-13 16:15:27 +02:00
Giovanni Petrantoni
fcda0f6ce1
PubSubPeer tables refactor ( #263 )
...
* refactor peer tables
* tests fixing
* override PubSubPeer equality
* fix pubsubpeer comparison
2020-07-13 15:32:38 +02:00
Dmitriy Ryajov
4c815d75e7
More gossip cleanup ( #257 )
...
* more cleanup
* correct pubsub peer count
* close the stream first
* handle cancelation
* fix tests
* fix fanout ttl
* merging master
* remove `withLock` as it conflicts with stdlib
* fix trace build
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-07-09 14:21:47 -06:00
Giovanni Petrantoni
f9e0a1f069
CI fix handleDisconnect (pubsub)
2020-07-09 13:56:59 +09:00
Giovanni Petrantoni
9b8b159abb
Remove other spurious getStacktrace in pubsub traces
2020-07-09 13:19:34 +09:00
Giovanni Petrantoni
4698f41a91
Remove stacktrace logging from pubsub connect
2020-07-09 12:23:03 +09:00
Dmitriy Ryajov
a52763cc6d
fix publishing ( #250 )
...
* use var semantics to optimize table access
* wip... lvalues don't work properly sadly...
* big publish refactor, replenish and balance
* fix internal tests
* use g.peers for fanout (todo: don't include flood peers)
* exclude non gossip from fanout
* internal test fixes
* fix flood tests
* fix test's trypublish
* test interop fixes
* make sure to not remove peers from gossip table
* restore old replenishFanout
* cleanups
* Cleanup resources (#246 )
* consolidate reading in lpstream
* remove debug echo
* tune log level
* add channel cleanup and cancelation handling
* cancelation handling
* cancelation handling
* cancelation handling
* cancelation handling
* cleanup and cancelation handling
* cancelation handling
* cancelation
* tests
* rename isConnected to connected
* remove testing trace
* comment out debug stacktraces
* explicit raises
* restore trace vs debug in gossip
* improve fanout replenish behavior further
* cleanup stale peers more eaguerly
* synchronize connection cleanup and small refactor
* close client first and call parent second
* disconnect failed peers on publish
* check for publish result
* fix tests
* fix tests
* always call close
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-07-07 18:33:05 -06:00