Commit Graph

43 Commits

Author SHA1 Message Date
Jacek Sieka e285d8bbf4
mem usage cleanups for pubsub (#564)
In `async` functions, a closure environment is created for variables
that cross an await boundary - this closure environment is kept in
memory for the lifetime of the associated future - this means that
although _some_ variables are no longer used, they still take up memory
for a long time.

In Nimbus, message validation is processed in batches meaning the future
of an incoming gossip message stays around for quite a while - this
leads to memory consumption peaks of 100-200 mb when there are many
attestations in the pipeline.

To avoid excessive memory usage, it's generally better to move non-async
code into proc's such that the variables therein can be released earlier
- this includes the many hidden variables introduced by macro and
template expansion (ie chronicles that does expensive exception
handling)

* move seen table salt to floodsub, use there as well
* shorten seen table salt to size of hash
* avoid unnecessary memory allocations and copies in a few places
* factor out message scoring
* avoid reencoding outgoing message for every peer
* keep checking validators until reject (in case there's both reject and
ignore)
* `readOnce` avoids `readExactly` overhead for single-byte read
* genericAssign -> assign2
2021-04-18 10:08:33 +02:00
Giovanni Petrantoni b902c030a0
add metrics into chronosstream to identify peers agents (#458)
* add metrics into chronosstream to identify peers agents

* avoid too many agent strings

* use gauge instead of counter for stream metrics

* filter identity on /

* also track bytes traffic

* fix identity tracking closeimpl call

* add gossip rpc metrics

* fix missing metrics inclusions

* metrics fixes and additions

* add a KnownLibP2PAgents strdefine

* enforse toLowerAscii to agent names (metrics)

* incoming rpc metrics

* fix silly mistake in rpc metrics

* fix agent metrics logic

* libp2p_gossipsub_failed_publish metric

* message ids metrics

* libp2p_pubsub_broadcast_ihave metric improvement

* refactor expensive gossip metrics

* more detailed metrics

* metrics improvements

* remove generic metrics for `set` users

* small fixes, add debug counters

* fix counter and add missing subs metrics!

* agent metrics behind -d:libp2p_agents_metrics

* remove testing related code from this PR

* small rebroadcast metric fix

* fix small mistake

* add some guide to the readme in order to use new metrics

* add libp2p_gossipsub_peers_scores metric

* add protobuf metrics to understand bytes traffic precisely

* refactor gossipsub metrics

* remove unused variable

* add more metrics, refactor rebalance metrics

* avoid bad metric concurrent states

* use a stack structure for gossip mesh metrics

* refine sub metrics

* add received subs metrics fixes

* measure handlers of known topics

* sub/unsub counter

* unsubscribeAll log unknown topics

* expose a way to specify known topics at runtime
2021-01-08 14:21:24 +09:00
Giovanni Petrantoni 462da1f7a8
gossip MessageID as seq[byte] (#391)
* gossip MessageID as seq[byte]

* combina hashes in defaultMsgIdProvider

* wip

* fix defaultMsgIdProvider
2020-10-21 12:26:04 +09:00
Giovanni Petrantoni e3bdb9eb13
decode properly ControlPrune (#392) 2020-10-09 09:12:38 +09:00
Giovanni Petrantoni 98d0cc3a16
defaultMsgIdProvider alternative/test anonymize (#379)
* defaultMsgIdProvider alternative/test anonymize

* avoid freeze during flood tests

* avoid `empty message, skipping` situation

* test observers

* avoid double initPubSub

* fix gossip testing (specially when anonymize is on)

* make azure tests shorter
2020-09-28 09:11:18 +02:00
Jacek Sieka 8ecef46738
reencode gossipsub messages with anonymization (#378)
This helps protect against clients sending more data than they should
and thus getting penalized on topics that require anonymity
2020-09-25 18:39:34 +02:00
Jacek Sieka 17e00e642a
limit write queue length (#376)
To break a potential read/write deadlock, gossipsub uses an unbounded
queue for writes - when peers are too slow to process this queue, it may
end up growing without bounds causing high memory usage.

Here, we introduce a maximum write queue length after which the peer is
disconnected - the queue is generous enough that any "normal" usage
should be fine - writes that are `await`:ed are not affected, only
writes that are launched in an `asyncSpawn` task or similar.

* avoid unnecessary copy of message when there are no send observers
* release message memory earlier in gossipsub
* simplify pubsubpeer logging
2020-09-24 18:43:20 +02:00
Giovanni Petrantoni ec322124ac
allow to omit peerId and seqno (#372)
* allow to omit peerId and seqno

* small refactor

* wip

* fix message encoding

* improve rpc signature logic

* remove peerid from verify

* trace fixes

* fix message test

* fix gossip 1.0 tests
2020-09-23 17:56:33 +02:00
Giovanni Petrantoni b99d2039a8
Gossip one one (#240)
* allow multiple codecs per protocol (without breaking things)

* add 1.1 protocol to gossip

* explicit peering part 1

* explicit peering part 2

* explicit peering part 3

* PeerInfo and ControlPrune protocols

* fix encodePrune

* validated always, even explicit peers

* prune by score (score is stub still)

* add a way to pass parameters to gossip

* standard setup fixes

* take into account explicit direct peers in publish

* add floodPublish logic

* small fixes, publish still half broken

* make sure to waitsub in sparse test

* use var semantics to optimize table access

* wip... lvalues don't work properly sadly...

* big publish refactor, replenish and balance

* fix internal tests

* use g.peers for fanout (todo: don't include flood peers)

* exclude non gossip from fanout

* internal test fixes

* fix flood tests

* fix test's trypublish

* test interop fixes

* make sure to not remove peers from gossip table

* restore old replenishFanout

* cleanups

* restore utility module import

* restore trace vs debug in gossip

* improve fanout replenish behavior further

* triage publish nil peers (issue is on master too but just hidden behind a if/in)

* getGossipPeers fixes

* remove topics from pubsubpeer (was unused)

* simplify rebalanceMesh (following spec) and make it finally reach D_high

* better diagnostics

* merge new pubsubpeer, copy 1.1 to new module

* fix up merge

* conditional enable gossip11 module

* add back topics in peers, re-enable flood publish

* add more heartbeat locking to prevent races

* actually lock the heartbeat

* minor fixes

* with sugar

* merge 1.0

* remove assertion in publish

* fix multistream 1.1 multi proto

* Fix merge oops

* wip

* fix gossip 11 upstream

* gossipsub11 -> gossipsub

* support interop testing

* tests fixing

* fix directchat build

* control prune updates (pb)

* wip parameters

* gossip internal tests fixes

* parameters wip

* finishup with params

* cleanups/wip

* small sugar

* grafted and pruned procs

* wip updateScores

* wip

* fix logging issue

* pubsubpeer, chronicles explicit override

* fix internal gossip tests

* wip

* tables troubleshooting

* score wip

* score wip

* fixes

* fix test utils generateNodes

* don't delete while iterating in score update

* fix grafted defect

* add a handleConnect in subscribeTopic

* pruning improvements

* wip

* score fixes

* post merge - builds gossip tests

* further merge fixes

* rebalance improvements and opportunistic grafting

* fix test for now

* restore explicit peering

* implement peer exchange graft message

* add an hard cap to PX

* backoff time management

* IWANT cap/budget

* Adaptive gossip dissemination

* outbound mesh quota, internal tests fixing

* oversub prune score based, finish outbound quota

* finishup with score and ihave budget

* use go daemon 0.3.0

* import fixes

* byScore cleanup score sorting

* remove pointless scaling in `/` Duration operator

* revert using libp2p org for daemon

* interop fixes

* fixes and cleanup

* remove heartbeat assertion, minor debug fixes

* logging improvements and cleaning up

* (to revert) add some traces

* add explicit topic to gossip rpcs

* pubsub merge fixes and type fix in switch

* Revert "(to revert) add some traces"

This reverts commit 4663eaab6cc336c81cee50bc54025cf0b7bcbd99.

* cleanup some now irrelevant todo

* shuffle peers anyway as score might be disabled

* add missing shuffle

* old merge fix

* more merge fixes

* debug improvements

* re-enable gossip internal tests

* add gossip10 fallback (dormant but tested)

* split gossipsub internal tests into 1.0 and 1.1

Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com>
2020-09-21 11:16:29 +02:00
Giovanni Petrantoni a6007be428
avoid sending empty seqno and/or fromPeer (gossip rpc) (#364) 2020-09-15 12:33:18 +02:00
Oskar Thorén 5e66f6fbd8
Add logScope to connmanager and pubsubprotobuf (#363) 2020-09-15 08:03:53 +02:00
Jacek Sieka c1856fda53
simplify and unify logging (#353)
* use short format for logging peerid
* log peerid:oid for connections
2020-09-06 10:31:47 +02:00
Jacek Sieka 16a008db75
fix connection event order when connection dies early (#351)
if the connection is already closed (because the remote closes during
identfiy for example), an exception would be raised which would leave
the connection in limbo, beacuse it would not go through the rest of
internalConnect.

Also, if the connection is already closed, the disconnect event would be
scheduled before the connect event :/
2020-09-04 20:30:26 +02:00
Jacek Sieka 6d91d61844
small cleanups & docs (#347)
* simplify gossipsub heartbeat start / stop
* avoid alloc in peerid check
* stop iterating over seq after unsubscribing item (could crash)
* don't crash on missing private key with enabled sigs (shouldn't happen
but...)
2020-09-04 18:31:43 +02:00
Jacek Sieka cd1c68dbc5
avoid send deadlock by not allowing send to block (#342)
* avoid send deadlock by not allowing send to block

* handle message issues more consistently
2020-09-01 09:33:03 +02:00
Dmitriy Ryajov b76b3e0e9b
Rework pubsub (#322)
* move pubsub of off switch, pass switch into pubsub

* use join on lpstreams

* properly cleanup up failed peers

* fix tests

* fix peertable hasPeerId

* fix tests

* rework sending, remove helpers from pubsubpeer, unify in broadcast

* further split broadcast into send

* use send where appropriate

* use formatIt

* improve trace

Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-08-11 18:05:49 -06:00
Jacek Sieka e655a510cd
misc cleanups (#303) 2020-08-02 12:22:49 +02:00
Jacek Sieka c76152f2c1
Simplify send (#271)
* PubSubPeer.send single message

* gossipsub: simplify send further
2020-07-16 12:06:57 +02:00
Eugene Kabanov b832668768
Minprotobuf refactoring 2 (#269)
* Protobuf refactoring stage II.

* Remove NoError.

* Change trace level for invalid message.
2020-07-15 10:25:39 +02:00
Giovanni Petrantoni d7bab37119
Fix gossip messages seqno according to spec (#253)
* Fix gossip messages seqno according to spec

* Add peers back to gossipsub table, slow down heartbeat

* Revert "Add peers back to gossipsub table, slow down heartbeat"

This reverts commit 01e2e62172a7793bb17f0eb8314e2faeb2682173.

* make seqno a threadvar, remove from peerinfo

* seqno refactor, into pubsub
2020-07-14 21:51:33 -06:00
Eugene Kabanov efb952f18b
[WIP] Minprotobuf refactoring (#259)
* Minprotobuf initial commit

* Fix noise.

* Add signed integers support.
Add checks for field number value.
Remove some casts.

* Fix compile errors.

* Fix comments and constants.
2020-07-13 14:43:07 +02:00
Dmitriy Ryajov bec9a0658f
Cleanup rpc handler (#261)
* more cleanup

* fix tests

* merging master

* remove `withLock` as it conflicts with stdlib

* wip

* more fanout ttl

Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-07-09 17:54:16 -06:00
Giovanni Petrantoni ec00c7fc50
Peer resultification and defect only (#245)
* Peer resultification and defect only

* Fixing some tests

* test fixes

* Rename peer into peerid

* better result error message in identify

* further merge fixes
2020-07-01 08:25:09 +02:00
Jacek Sieka aa6756dfe0
allow message id provider to be specified (#243)
* don't send public key in message when not signing (information leak)
* don't run rebalance if there are peers in gossip (see #242)
* don't crash randomly on bad peer id from remote
2020-06-28 09:56:38 -06:00
Viktor Kirilov 1afec627c2
proper name for topics so that we can filter dynamically using chronicles (#210)
* proper name for topics so that we can filter dynamically using chronicles

* lowercase
2020-06-10 10:48:01 +02:00
Giovanni Petrantoni 82b4ed8f44 use declareCounter rather then gauge for certain metrics 2020-06-07 16:41:23 +09:00
Giovanni Petrantoni a6a2a81711
Start adding some metrics to pubsub (#192)
* Start adding some metrics to pubsub

In order to visualize it's functionality
Still WIP

* more metrics

* add per topic metrics

* finishup with requested metrics

* add a metrisServer define to start local server

* PR fixes and cleanup
2020-06-07 09:15:21 +02:00
Dmitriy Ryajov 5960d42c50
remove casts from (#203) 2020-06-02 20:21:11 -06:00
Dmitriy Ryajov 93e5805c01 better exception handling 2020-06-02 09:10:27 -06:00
Dmitriy Ryajov 7b6e1c0688
Gossipsub interop (#189)
* interop fixes

* add custom messageid provider and fix seqno

* use ECDSA for speed

* adding messageid tests

* breakout from publish loop

* addressing review comments

* remove unneded var

* dont stop broadcasting on failed peers
2020-05-27 12:33:49 -06:00
Dmitriy Ryajov 9132f16927
gossipsub fixes (#186) 2020-05-21 14:24:20 -06:00
Giovanni Petrantoni 7dcb807f64
Crypto utilities resultification (#150) 2020-05-18 07:25:55 +02:00
Jacek Sieka 330da51819
removals (#159)
* remove unused stream methods
* reimplement some of them with proc's
* remove broken tests
* Error->Defect for defect
* warning fixes
2020-05-06 18:31:47 +02:00
Dmitriy Ryajov 6da4d2af48
Pubsub signatures flags (#161)
* add verify signature flag

* add sign flag to enable/disable msg signing

* moving internal tests out to their own file

* cleanup nimble file

* remove unneeded tests

* move pubsub tests out

* fix tests
2020-05-06 11:26:08 +02:00
Giovanni Petrantoni 4c6a123d31
Add chronos trackers and used them to sanitize resource disposal (#131)
* Add chronos trackers and used them to sanitize resource disposal

* Chronos trackers for transport tests wip

* No more chronos leaks in testtransport

* Make tcp transport and test more robust when closing

* Test async leaking tracking wip

* Fix a regression in wire connect

* Add chronos trackers to more tests and sanitize resource closure

* Wip fixing floodsub tests

* Floodsub wip

* Made floodsub basically deterministic, hit a nim bug with captures tho

* Wrap up floodsub tests refactor

* Wrapping up

* Add allFuturesThrowing utility

* Fix missing allFuturesThrowing in noise tests!

* Make tests green

* attempt fixing gossipsub failing cases

* Make sure to check also fanout in waitSub

* More verbose traces

* Gossipsub test improvments

* Refactor TcpTransport remove asyncCheck

* Add Connection trackers

* Add stricter connection tracking, wip mplex fix

* More asynccheck removal, in order to avoid connection leaks

* bump chronicles requirement

* Enable tracker dump to check CI output

* Wait for more futures in testmplex

* Remove tracker dump messages

* add tryAndWarn utility, fix mplex issue with go interop

* All allFuturesThrowing to directchat too

* make sure to cleanup on transport close
2020-04-21 10:24:42 +09:00
Giovanni Petrantoni 0a3e4a764b
Less verbose traces (#112)
* Make traces less verbose with shortHexDump utility

* Rename shortHexDump into shortLog

* Improve shortLog, add shortLog for crypto keys

* Add proper shortLog implementations in messages
2020-03-23 15:03:36 +09:00
Eugene Kabanov 5701d937c8
Signed variable integers fixes. (#96)
* Fix signed varints.
Add tests for signed varints.
Remove some casts to allow usage at compile time.

* Fix vsizeof() on 32bit platforms.

* Add `hint` and `zint` types for proper signed integer encoding.

* Fix varint related bugs.

* Update requirements.

* Fix interop tests because of fixed readLine.

* Add putVarint, getVarint and tests.
2020-03-06 20:19:43 +01:00
Zahary Karadjov 7bd305471c
Make sure the library can compile with json logging in trace mode 2020-02-04 15:17:39 +01:00
Dmitriy Ryajov 68cc57669e
Feat/pubsub validators (#58)
* feat: adding validator hooks to pubsub

* expose add/remove validators on switch

* do less unnecessary copyng
2019-12-16 23:24:03 -06:00
Dmitriy Ryajov 293a219dbe
Cleanup (#55)
* fix: don't allow replacing pubkey

* fix: several small improvements

* removing pubkey setter

* improove error handling

* remove the use of Option[T] if not needed

* don't use optional

* fix-ci: temporarily pin p2pd to a working tag

* fix example to comply with latest changes

* bumping p2pd again to a higher version
2019-12-10 14:50:35 -06:00
Zahary Karadjov 454f658ba8
Fixes and tweaks related to the beacon node integration
* Bugfix: Dialing an already connected peer may lead to crash

* Introduced a standard_setup module allowing to instantiate
  the `Switch` object in an easier manner.

* Added `Switch.disconnect(peer)`

* Trailing space removed (sorry about polluting the diff)
2019-12-08 23:58:43 +02:00
Dmitriy Ryajov 5f6fcc3d90
extract public and private keys fields from peerid (#44)
* extract public and private keys fields from peerid

* allow assigning a public key

* cleaned up TODOs

* make pubsub prefix a const

* public key should be an `Option`
2019-12-07 10:36:39 -06:00
Dmitriy Ryajov e623e70e7b
PubSub (Gossip & Flood) Implementation (#36)
This adds gossipsub and floodsub, as well as basic interop testing with the go libp2p daemon. 

* add close event

* wip: gossipsub

* splitting rpc message

* making message handling more consistent

* initial gossipsub implementation

* feat: nim 1.0 cleanup

* wip: gossipsub protobuf

* adding encoding/decoding of gossipsub messages

* add disconnect handler

* add proper gossipsub msg handling

* misc: cleanup for nim 1.0

* splitting floodsub and gossipsub tests

* feat: add mesh rebalansing

* test pubsub

* add mesh rebalansing tests

* testing mesh maintenance

* finishing mcache implementatin

* wip: commenting out broken tests

* wip: don't run heartbeat for now

* switchout debug for trace logging

* testing gossip peer selection algorithm

* test stream piping

* more work around message amplification

* get the peerid from message

* use timed cache as backing store

* allow setting timeout in constructor

* several changes to improve performance

* more through testing of msg amplification

* prevent gc issues

* allow piping to self and prevent deadlocks

* improove floodsub

* allow running hook on cache eviction

* prevent race conditions

* prevent race conditions and improove tests

* use hashes as cache keys

* removing useless file

* don't create a new seq

* re-enable pubsub tests

* fix imports

* reduce number of runs to speed up tests

* break out control message processing

* normalize sleeps between steps

* implement proper transport filtering

* initial interop testing

* clean up floodsub publish logic

* allow dialing without a protocol

* adding multiple reads/writes

* use protobuf varint in mplex

* don't loose conn's peerInfo

* initial interop pubsub tests

* don't duplicate connections/peers

* bring back interop tests

* wip: interop

* re-enable interop and daemon tests

* add multiple read write tests from handlers

* don't cleanup channel prematurely

* use correct channel to send/receive msgs

* adjust tests with latest changes

* include interop tests

* remove temp logging output

* fix ci

* use correct public key serialization

* additional tests for pubsub interop
2019-12-05 20:16:18 -06:00