Commit Graph

310 Commits

Author SHA1 Message Date
Jacek Sieka 83a20a992a
gossipsub: unsubscribe fixes (#569)
* gossipsub: unsubscribe fixes

* fix KeyError when updating metric of unsubscribed topic
* fix unsubscribe message not being sent to all peers causing them to
keep thinking we're still subscribed
* release memory earlier in a few places

* floodsub fix
2021-05-07 00:43:45 +02:00
Giovanni Petrantoni 9f301964ed
fix control messages (#566)
* remove unused control graft check in handleControl

* avoid sending empty Iwant messages
2021-04-28 10:03:03 +09:00
Giovanni Petrantoni f81a085d0b
More gossip coverage (#553)
* add floodPublish test

* test delivery via control Iwant/have mechanics

* fix issues in control, and add testing

* fix possible backoff issue with pruned routine overriding it
2021-04-22 18:51:22 +09:00
Jacek Sieka e285d8bbf4
mem usage cleanups for pubsub (#564)
In `async` functions, a closure environment is created for variables
that cross an await boundary - this closure environment is kept in
memory for the lifetime of the associated future - this means that
although _some_ variables are no longer used, they still take up memory
for a long time.

In Nimbus, message validation is processed in batches meaning the future
of an incoming gossip message stays around for quite a while - this
leads to memory consumption peaks of 100-200 mb when there are many
attestations in the pipeline.

To avoid excessive memory usage, it's generally better to move non-async
code into proc's such that the variables therein can be released earlier
- this includes the many hidden variables introduced by macro and
template expansion (ie chronicles that does expensive exception
handling)

* move seen table salt to floodsub, use there as well
* shorten seen table salt to size of hash
* avoid unnecessary memory allocations and copies in a few places
* factor out message scoring
* avoid reencoding outgoing message for every peer
* keep checking validators until reject (in case there's both reject and
ignore)
* `readOnce` avoids `readExactly` overhead for single-byte read
* genericAssign -> assign2
2021-04-18 10:08:33 +02:00
Dmitriy Ryajov 290866dd62
builders (#559) 2021-04-05 16:06:45 -06:00
Giovanni Petrantoni 4760df1e31 fix build with libp2p_agents_metrics switch 2021-03-15 01:42:47 +00:00
Jacek Sieka 70deac9e0d
fix peer score accumulation (#541)
* fix accumulating peer score
* fix missing exception handling
* remove unnecessary initHashSet/initTable calls
* simplify peer stats management
* clean up tests a little
* fix some missing raises annotations
2021-03-09 13:22:52 +01:00
Giovanni Petrantoni 269d3df351
Consolidate metrics collection for mesh (#540)
* Consolidate metrics collection for mesh

* more fixes

* wrapping up

* Update libp2p/protocols/pubsub/gossipsub/behavior.nim

Co-authored-by: Jacek Sieka <jacek@status.im>

* Update libp2p/protocols/pubsub/gossipsub/behavior.nim

Co-authored-by: Jacek Sieka <jacek@status.im>

* Update libp2p/protocols/pubsub/gossipsub/behavior.nim

Co-authored-by: Jacek Sieka <jacek@status.im>

Co-authored-by: Jacek Sieka <jacek@status.im>
2021-03-03 22:11:21 +01:00
Giovanni Petrantoni 34c2fbeb16 small gossipsub metrics change 2021-03-03 08:41:21 +00:00
Giovanni Petrantoni 02ad017107
Gossipsub fixes and Initiator flagging fixes (#539)
* properly propagate initiator information for gossipsub

* Fix pubsubpeer lifetime management

* restore old behavior

* tests fixing

* clamp backoff time value received

* fix member name collisions

* internal test fixes

* better names and explaining of the importance of transport direction

* fixes
2021-03-03 08:23:40 +09:00
Giovanni Petrantoni c1334c6d89 pubsubpeer better address management 2021-02-28 04:53:17 +00:00
Giovanni Petrantoni 7b2727d930 avoid leaking in peersInIP, don't depend on sendConn 2021-02-27 23:49:56 +09:00
Giovanni Petrantoni 67d0926e89 use in any case PeerID for peersInIP to avoid keeping references 2021-02-27 21:31:59 +09:00
Giovanni Petrantoni fae38e0146 fix PubSubPeer hashing issues 2021-02-26 19:19:15 +09:00
Giovanni Petrantoni 45300c28a9
[SEC] gossipsub - handleIHAVE/handleIWANT recommendations & notes (#535)
Fixes #400
2021-02-26 14:27:42 +09:00
Giovanni Petrantoni c1d8317e3c fix badly merged code in gossipsub.colocationFactor 2021-02-26 12:39:57 +09:00
Giovanni Petrantoni 922cd92f94 don't check if peers have `sendConn` when disconnecting for bad scoring 2021-02-22 10:04:02 +09:00
Giovanni Petrantoni 51d8cd4ade
[SEC] gossipsub - rebalanceMesh may prune up to D_lo on oversubscription (#531)
Fixes #403
2021-02-13 13:39:32 +09:00
Giovanni Petrantoni e124e342b0
n subscription limits (#528)
* subscription high water, cleanups

* subscription limits test

* newline
2021-02-12 12:27:26 +09:00
Giovanni Petrantoni fff54fa23c add more diagnostics when gossip publish fails 2021-02-09 18:42:59 +09:00
Giovanni Petrantoni 646557597d lower some gossipsub logging to debug level 2021-02-08 10:11:41 +09:00
Giovanni Petrantoni fd73cf9f9d
Refactor gossipsub into multiple modules (#515)
* Refactor gossipsub into multiple modules

* splitup further gossipsub

* move more mesh related stuff to behavior

* fix internal tests

* fix PubSubPeer.outbound flag, make it more reliable

* use discard rather then _
2021-02-06 09:13:04 +09:00
Giovanni Petrantoni 5aebf0990e
peer stats fixes (#511)
Gossipsub fix, required by nimbus, merging into master as low impact
2021-01-29 12:41:51 +09:00
Giovanni Petrantoni 1d77d37f17
Gossipsub scoring fixes (#508)
* Fix some problematics when running with full scoring

* more fixes
2021-01-25 21:13:42 +09:00
Dmitriy Ryajov 64b822e8f0
remove blank spaces 2021-01-18 15:32:42 -06:00
Giovanni Petrantoni b57101f265 add an invalid topic subscriptions metric 2021-01-15 18:55:54 +09:00
Giovanni Petrantoni 1fb783eb7f let apps decide if they want to penalize peers on invalid topic 2021-01-15 18:50:42 +09:00
Giovanni Petrantoni 6542b913df set "ignoring invalid topic subscription" to trace level 2021-01-15 18:48:58 +09:00
Giovanni Petrantoni 240ec84ffb
Gossipsub wip (#502)
* Remove unused connections in pubsubpeer, also removed wrong usages, add a disconnect bad peers parameter

* handle exceptions in disconnectPeer

* small fix

* use the proper disconnection procedure for gossip peers

* fixes, more metrics add test about disconnection

* hot fix possible null pointers in switch

* silly isnil sugar

* Fix and test gossip directPeer connections
2021-01-15 13:48:03 +09:00
Giovanni Petrantoni dc48170b0d
Gossip subscription improvements (#497)
* salt ids in seen table

* add subscription validation callback and avoid processing topics we don't care of

* apply penalty on bad subscription

* fix IHave handling IDs

* reduce indenting, add some comments

* fix gossip randombytes generation

* do not descore unwanted topics (might happen, due to timing, needs improvements)

* cleaning up and added tests

* validate subscriptions only when subscribing

* set notice level for failed publish

* fix floodsub behavior
2021-01-13 23:49:44 +09:00
Giovanni Petrantoni b902c030a0
add metrics into chronosstream to identify peers agents (#458)
* add metrics into chronosstream to identify peers agents

* avoid too many agent strings

* use gauge instead of counter for stream metrics

* filter identity on /

* also track bytes traffic

* fix identity tracking closeimpl call

* add gossip rpc metrics

* fix missing metrics inclusions

* metrics fixes and additions

* add a KnownLibP2PAgents strdefine

* enforse toLowerAscii to agent names (metrics)

* incoming rpc metrics

* fix silly mistake in rpc metrics

* fix agent metrics logic

* libp2p_gossipsub_failed_publish metric

* message ids metrics

* libp2p_pubsub_broadcast_ihave metric improvement

* refactor expensive gossip metrics

* more detailed metrics

* metrics improvements

* remove generic metrics for `set` users

* small fixes, add debug counters

* fix counter and add missing subs metrics!

* agent metrics behind -d:libp2p_agents_metrics

* remove testing related code from this PR

* small rebroadcast metric fix

* fix small mistake

* add some guide to the readme in order to use new metrics

* add libp2p_gossipsub_peers_scores metric

* add protobuf metrics to understand bytes traffic precisely

* refactor gossipsub metrics

* remove unused variable

* add more metrics, refactor rebalance metrics

* avoid bad metric concurrent states

* use a stack structure for gossip mesh metrics

* refine sub metrics

* add received subs metrics fixes

* measure handlers of known topics

* sub/unsub counter

* unsubscribeAll log unknown topics

* expose a way to specify known topics at runtime
2021-01-08 14:21:24 +09:00
Dmitriy Ryajov b2ea5a3c77
Concurrent upgrades (#489)
* adding an upgraded event to conn

* set stopped flag asap

* trigger upgradded event on conn

* set concurrency limit for accepts

* backporting semaphore from tcp-limits2

* export unittests module

* make params explicit

* tone down debug logs

* adding semaphore tests

* use semaphore to throttle concurent upgrades

* add libp2p scope

* trigger upgraded event before any other events

* add event handler for connection upgrade

* cleanup upgraded event on conn close

* make upgrades slot release rebust

* dont forget to release slot on nil connection

* misc

* make sure semaphore is always released

* minor improvements and a nil check

* removing unneeded comment

* make upgradeMonitor a non-closure proc

* make sure the `upgraded` event is initialized

* handle exceptions in accepts when stopping

* don't leak exceptions when stopping accept loops
2021-01-04 12:59:05 -06:00
Giovanni Petrantoni 5e79d3ab9c avoid triggering unsubscribeAll from unsubscribe (gossip) 2020-12-20 17:08:03 +09:00
Giovanni Petrantoni 4858e0ab15
Gossipsub refactor pt2 (#495)
* add sub/unsub test

* remove unused variable from gossip
2020-12-20 00:45:34 +09:00
Giovanni Petrantoni 05e789a34f
Gossipsub refactor (#490)
* refactor peerStats, re-enable scores for testing

* remove gossip 1.0

* cleanup

* codecov matrix fixes

* restore previous score on onNewPeer

* fix coverage n checks

* unsubscribeAll gossipsub fixes

* refactor unsub/sub

* refactor onNewPeer and fix score flow

* disable scores by default (change in tests later)

* fix tests, enable scores in tests

* fix wrongly merged test

* ensure topic removal from topics table

* small typo fix

* testinterop fixes
2020-12-19 15:43:32 +01:00
Giovanni Petrantoni 6c2e743ff3
Race condition in pubsub #469 (#471)
* Race condition in pubsub #469

* use allFinished

* improve cancellation handling
2020-12-19 00:56:46 +09:00
Jacek Sieka a1a5f9abac
nice msgid log formatting (#491) 2020-12-18 16:14:07 +01:00
Giovanni Petrantoni 52628d1a2e avoid using debug logging in gossip, just because 2020-12-17 17:21:09 +09:00
Jacek Sieka 0c331d5200
simplify noise frame construction (#478) 2020-12-16 13:10:06 +01:00
Jacek Sieka 9e5ba64c48
avoid creating prune message unless we're pruning (#487) 2020-12-15 22:46:03 +01:00
Giovanni Petrantoni 18d69a5c41
Workaround the gossipsub table race condition (#486) 2020-12-15 12:32:11 -06:00
Giovanni Petrantoni f8f0bc1bd8
Gossip improvements (#476)
* add more traces, remove async from rebalance

* more traces

* avoid computng scores when weight is 0.0

* debug colocation, fix an indent in unsubpeer (minor)

* add full ValidationResult coverage

* store in cache only after validation

* gossip 1.0 fixes

* fix typo

* gossip 10 internal test fixes

* test fixing

* refactor peerstats usages

* populate tables if missing when scoring
2020-12-15 10:25:22 +09:00
Jacek Sieka 6f1ecc8df7
streamline socket read/write hot path (#473)
* streamline socket read/write hot path

This avoids some unnecessary memory copying on the hot path of noise /
mplex, as well as getting rid of a few futures - profiling shows that
this is one of the main culprits of small memory allocations, which
makes sense - this is where gossip fan-out happens.

* fewer futures (and corresponding closures) when sending lpchannel
messages
* avoid data copies when encrypting and framing noise messages
* avoid copying tuple when reading noise data (poor c codegen)
* fix setting eof flag in secure read

* write noise frames in one go

...and closing secure socket once is enough
2020-12-09 08:56:40 -06:00
Jacek Sieka 1befeb8c2e
clean up peerid (#470)
* fix dangling cstring on error return
* remove some useless inlines
* less mallocs in shortlog
* proc -> func
* rename test
2020-12-03 13:53:16 -06:00
Dmitriy Ryajov d1c689e5ab
adding libp2p tag to logScope (#465) 2020-12-01 11:34:27 -06:00
Giovanni Petrantoni e1648d4404 fix mcache logic check in gossipsub 2020-12-01 23:55:51 +09:00
Giovanni Petrantoni b4738d723c
Some gossip fixes (#467)
* fix some missing rpc in rebalanceMesh

* clarify some variable names and lifetime

* further improvements
2020-12-01 11:44:09 +01:00
Dmitriy Ryajov 18443dafc1
rework peer event to take an initiator flag (#456)
* rework peer event to take an initiator flag

* use correct direction for initiator
2020-11-28 10:59:47 -06:00
Dmitriy Ryajov 3d44fcb8b3
use cancelAndAwait to mitigate further hangs (#459) 2020-11-28 09:48:06 -06:00
Giovanni Petrantoni 02b20440f2
Limit ihave emission (#462)
* add some limits to ihave emission, go has them, rust does not actually

* restore shuffling of IDs

* add some context
2020-11-28 16:27:39 +09:00