nim-libp2p

Commit Graph

Author	SHA1	Message	Date
Jacek Sieka	17e00e642a	limit write queue length (#376 ) To break a potential read/write deadlock, gossipsub uses an unbounded queue for writes - when peers are too slow to process this queue, it may end up growing without bounds causing high memory usage. Here, we introduce a maximum write queue length after which the peer is disconnected - the queue is generous enough that any "normal" usage should be fine - writes that are `await`:ed are not affected, only writes that are launched in an `asyncSpawn` task or similar. * avoid unnecessary copy of message when there are no send observers * release message memory earlier in gossipsub * simplify pubsubpeer logging	2020-09-24 18:43:20 +02:00
Jacek Sieka	471e5906f6	fix gossipsub memory leak on disconnected peer (#371 ) When messages can't be sent to peer, we try to establish a send connection - this causes messages to stack up as more and more unsent messages are blocked on the dial lock. * remove dial lock * run reconnection loop in background task	2020-09-22 09:05:53 +02:00
Giovanni Petrantoni	b99d2039a8	Gossip one one (#240 ) * allow multiple codecs per protocol (without breaking things) * add 1.1 protocol to gossip * explicit peering part 1 * explicit peering part 2 * explicit peering part 3 * PeerInfo and ControlPrune protocols * fix encodePrune * validated always, even explicit peers * prune by score (score is stub still) * add a way to pass parameters to gossip * standard setup fixes * take into account explicit direct peers in publish * add floodPublish logic * small fixes, publish still half broken * make sure to waitsub in sparse test * use var semantics to optimize table access * wip... lvalues don't work properly sadly... * big publish refactor, replenish and balance * fix internal tests * use g.peers for fanout (todo: don't include flood peers) * exclude non gossip from fanout * internal test fixes * fix flood tests * fix test's trypublish * test interop fixes * make sure to not remove peers from gossip table * restore old replenishFanout * cleanups * restore utility module import * restore trace vs debug in gossip * improve fanout replenish behavior further * triage publish nil peers (issue is on master too but just hidden behind a if/in) * getGossipPeers fixes * remove topics from pubsubpeer (was unused) * simplify rebalanceMesh (following spec) and make it finally reach D_high * better diagnostics * merge new pubsubpeer, copy 1.1 to new module * fix up merge * conditional enable gossip11 module * add back topics in peers, re-enable flood publish * add more heartbeat locking to prevent races * actually lock the heartbeat * minor fixes * with sugar * merge 1.0 * remove assertion in publish * fix multistream 1.1 multi proto * Fix merge oops * wip * fix gossip 11 upstream * gossipsub11 -> gossipsub * support interop testing * tests fixing * fix directchat build * control prune updates (pb) * wip parameters * gossip internal tests fixes * parameters wip * finishup with params * cleanups/wip * small sugar * grafted and pruned procs * wip updateScores * wip * fix logging issue * pubsubpeer, chronicles explicit override * fix internal gossip tests * wip * tables troubleshooting * score wip * score wip * fixes * fix test utils generateNodes * don't delete while iterating in score update * fix grafted defect * add a handleConnect in subscribeTopic * pruning improvements * wip * score fixes * post merge - builds gossip tests * further merge fixes * rebalance improvements and opportunistic grafting * fix test for now * restore explicit peering * implement peer exchange graft message * add an hard cap to PX * backoff time management * IWANT cap/budget * Adaptive gossip dissemination * outbound mesh quota, internal tests fixing * oversub prune score based, finish outbound quota * finishup with score and ihave budget * use go daemon 0.3.0 * import fixes * byScore cleanup score sorting * remove pointless scaling in `/` Duration operator * revert using libp2p org for daemon * interop fixes * fixes and cleanup * remove heartbeat assertion, minor debug fixes * logging improvements and cleaning up * (to revert) add some traces * add explicit topic to gossip rpcs * pubsub merge fixes and type fix in switch * Revert "(to revert) add some traces" This reverts commit `4663eaab6c`. * cleanup some now irrelevant todo * shuffle peers anyway as score might be disabled * add missing shuffle * old merge fix * more merge fixes * debug improvements * re-enable gossip internal tests * add gossip10 fallback (dormant but tested) * split gossipsub internal tests into 1.0 and 1.1 Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com>	2020-09-21 11:16:29 +02:00
Jacek Sieka	96d4c44fec	refactor bufferstream to use a queue (#346 ) This change modifies how the backpressure algorithm in bufferstream works - in particular, instead of working byte-by-byte, it will now work seq-by-seq. When data arrives, it usually does so in packets - in the current bufferstream, the packet is read then split into bytes which are fed one by one to the bufferstream. On the reading side, the bytes are popped of the bufferstream, again byte by byte, to satisfy `readOnce` requests - this introduces a lot of synchronization traffic because the checks for full buffer and for async event handling must be done for every byte. In this PR, a queue of length 1 is used instead - this means there will at most exist one "packet" in `pushTo`, one in the queue and one in the slush buffer that is used to store incomplete reads. * avoid byte-by-byte copy to buffer, with synchronization in-between * reuse AsyncQueue synchronization logic instead of rolling own * avoid writeHandler callback - implement `write` method instead * simplify EOF signalling by only setting EOF flag in queue reader (and reset) * remove BufferStream pipes (unused) * fixes drainBuffer deadlock when drain is called from within read loop and thus blocks draining * fix lpchannel init order	2020-09-10 08:19:13 +02:00
Jacek Sieka	5b347adf58	logging fixes and small cleanups (#361 ) In particular, allow longer multistream select reads	2020-09-09 19:12:08 +02:00
Jacek Sieka	c1856fda53	simplify and unify logging (#353 ) * use short format for logging peerid * log peerid:oid for connections	2020-09-06 10:31:47 +02:00
Eugene Kabanov	0b85192119	Remove asyncCheck from codebase. (#345 ) * Remove asyncCheck from codebase. * Replace all `discard` statements with new `asyncSpawn`. * Bump `nim-chronos` requirement.	2020-09-04 18:30:45 +02:00
Jacek Sieka	5819c6a9a7	gossipsub / floodsub fixes (#348 ) * mcache fixes * remove timed cache - the window shifting already removes old messages * ref -> object * avoid unnecessary allocations with `[]` operator * simplify init * fix several gossipsub/floodsub issues * floodsub, gossipsub: don't rebroadcast messages that fail validation (!) * floodsub, gossipsub: don't crash when unsubscribing from unknown topics (!) * gossipsub: don't send message to peers that are not interested in the topic, when messages don't share topic list * floodsub: don't repeat all messages for each message when rebroadcasting * floodsub: allow sending empty data * floodsub: fix inefficient unsubscribe * sync floodsub/gossipsub logging * gossipsub: include incoming messages in mcache (!) * gossipsub: don't rebroadcast already-seen messages (!) * pubsubpeer: remove incoming/outgoing seen caches - these are already handled in gossipsub, floodsub and will cause trouble when peers try to resubscribe / regraft topics (because control messages will have same digest) * timedcache: reimplement without timers (fixes timer leaks and extreme inefficiency due to per-message closures, futures etc) * timedcache: ref -> obj	2020-09-04 08:10:32 +02:00
Jacek Sieka	cd1c68dbc5	avoid send deadlock by not allowing send to block (#342 ) * avoid send deadlock by not allowing send to block * handle message issues more consistently	2020-09-01 09:33:03 +02:00
Zahary Karadjov	af0955c58b	Add comments explaning a possible deadlock	2020-08-18 13:51:41 +03:00
Zahary Karadjov	60122a044c	Restore interop with Lighthouse by preventing concurrent meshsub dials	2020-08-17 22:40:58 +03:00
Jacek Sieka	53877e97bd	trace logs	2020-08-17 12:39:25 +02:00
Jacek Sieka	f46bf0faa4	remove send lock (#334 ) * remove send lock When mplex receives data it will block until a reader has processed the data. Thus, when a large message is received, such as a gossipsub subscription table, all of mplex will be blocked until all reading is finished. However, if at the same time a `dial` to establish a gossipsub send connection is ongoing, that `dial` will be blocked because mplex is no longer reading data - specifically, it might indeed be the connection that's processing the previous data that is waiting for a send connection. There are other problems with the current code: * If an exception is raised, it is not necessarily raised for the same connection as `p.sendConn`, so resetting `p.sendConn` in the exception handling is wrong * `p.isConnected` is checked before taking the lock - thus, if it returns false, a new dial will be started. If a new task enters `send` before dial is finished, it will also determine `p.isConnected` is false, then get stuck on the lock - when the previous task finishes and releases the lock, the new task will _also_ dial and thus reset `p.sendConn` causing a leak. * prefer existing connection simplifies flow	2020-08-17 12:38:27 +02:00
Dmitriy Ryajov	b76b3e0e9b	Rework pubsub (#322 ) * move pubsub of off switch, pass switch into pubsub * use join on lpstreams * properly cleanup up failed peers * fix tests * fix peertable hasPeerId * fix tests * rework sending, remove helpers from pubsubpeer, unify in broadcast * further split broadcast into send * use send where appropriate * use formatIt * improve trace Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>	2020-08-11 18:05:49 -06:00
zah	fbb59c3638	`msg` is a reserved property name in Chronicles (#321 ) Every Chronicles log record has an existing `msg` property matching the static string supplied in the log statement. Thus, it's currently not possible to use `msg` as the name of a user property: https://github.com/status-im/nim-chronicles/issues/86	2020-08-07 16:46:00 -06:00
Ștefan Talpalaru	843d32f8db	put expensive metrics under a Nim define (#310 )	2020-08-04 17:27:59 -06:00
Dmitriy Ryajov	b6877b8aac	increase send timeout for prune and graft msgs (#306 ) * increase send timeout for prune and graft msgs * use trace logs for subscribe monitor	2020-08-03 17:55:42 -06:00
Dmitriy Ryajov	980764774e	pubsub timeouts tuning (#295 ) * add finegrained timeouts to pubsub * use 10 millis timeout in tests * finalization * revert timeouts * use `atEof` for reads * adjust timeouts and use atEof for reads * use atEof for reads * set isEof flag * no backoff for pubsub streams * temp timer increase, make macos finalize * don't call `subscribePeer` in libp2p anymore * more traces * leak tests * lower timeouts * handle exceptions in control message * don't use `cancelAndWait` * handle exceptions in helpers * wip * don't send empty messages * check for leaks properly * don't use cancelAndWait * don't await subscribption sends * remove subscrivePeer calls from switch * trying without the hooks again	2020-08-02 23:20:11 -06:00
Dmitriy Ryajov	94196fee71	Connections and pubsub peers cleanup (#279 ) * better peer tracking and cleanup * check if peer and conn is nil * test name * make timeout more agressive * rename method for better clarity	2020-07-17 13:46:24 -06:00
Dmitriy Ryajov	0348773ec9	Connection manager (#277 ) * splitting out connection management * wip * wip conn mngr tests * set peerinfo in contructor * comments and documentation * tests * wip * add `None` to detect untagged connections * use `PeerID` to index connections * fix tests * remove useless equality	2020-07-17 09:36:48 -06:00
Jacek Sieka	170685f9c6	gossipsub fixes (#276 ) * graft up to D peers * fix logging so it's clear who is grafting/pruning who * clear fanout when grafting	2020-07-16 21:26:57 +02:00
Jacek Sieka	c76152f2c1	Simplify send (#271 ) * PubSubPeer.send single message * gossipsub: simplify send further	2020-07-16 12:06:57 +02:00
Dmitriy Ryajov	f35b8999b3	some light cleanup for pub/gossip sub (#273 ) * move peer table out to its own file * move peer table * cleanup `==` and add one to peerinfo * add peertable * missed equality check	2020-07-15 13:18:55 -06:00
Eugene Kabanov	b832668768	Minprotobuf refactoring 2 (#269 ) * Protobuf refactoring stage II. * Remove NoError. * Change trace level for invalid message.	2020-07-15 10:25:39 +02:00
Giovanni Petrantoni	d7bab37119	Fix gossip messages seqno according to spec (#253 ) * Fix gossip messages seqno according to spec * Add peers back to gossipsub table, slow down heartbeat * Revert "Add peers back to gossipsub table, slow down heartbeat" This reverts commit `01e2e62172`. * make seqno a threadvar, remove from peerinfo * seqno refactor, into pubsub	2020-07-14 21:51:33 -06:00
Giovanni Petrantoni	fcda0f6ce1	PubSubPeer tables refactor (#263 ) * refactor peer tables * tests fixing * override PubSubPeer equality * fix pubsubpeer comparison	2020-07-13 15:32:38 +02:00
Dmitriy Ryajov	4c815d75e7	More gossip cleanup (#257 ) * more cleanup * correct pubsub peer count * close the stream first * handle cancelation * fix tests * fix fanout ttl * merging master * remove `withLock` as it conflicts with stdlib * fix trace build Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>	2020-07-09 14:21:47 -06:00
Jacek Sieka	c720e042fc	clean up mesh handling logic (#260 ) * gossipsub is a function of subscription messages only * graft/prune work with mesh, get filled up from gossipsub * fix race conditions with await * fix exception unsafety when grafting/pruning * fix allowing up to DHi peers in mesh on incoming graft * fix metrics in several places	2020-07-09 11:16:46 -06:00
Dmitriy Ryajov	a52763cc6d	fix publishing (#250 ) * use var semantics to optimize table access * wip... lvalues don't work properly sadly... * big publish refactor, replenish and balance * fix internal tests * use g.peers for fanout (todo: don't include flood peers) * exclude non gossip from fanout * internal test fixes * fix flood tests * fix test's trypublish * test interop fixes * make sure to not remove peers from gossip table * restore old replenishFanout * cleanups * Cleanup resources (#246) * consolidate reading in lpstream * remove debug echo * tune log level * add channel cleanup and cancelation handling * cancelation handling * cancelation handling * cancelation handling * cancelation handling * cleanup and cancelation handling * cancelation handling * cancelation * tests * rename isConnected to connected * remove testing trace * comment out debug stacktraces * explicit raises * restore trace vs debug in gossip * improve fanout replenish behavior further * cleanup stale peers more eaguerly * synchronize connection cleanup and small refactor * close client first and call parent second * disconnect failed peers on publish * check for publish result * fix tests * fix tests * always call close Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>	2020-07-07 18:33:05 -06:00
Giovanni Petrantoni	ec00c7fc50	Peer resultification and defect only (#245 ) * Peer resultification and defect only * Fixing some tests * test fixes * Rename peer into peerid * better result error message in identify * further merge fixes	2020-07-01 08:25:09 +02:00
Dmitriy Ryajov	c788a6a3c0	Cleanup resources (#246 ) * consolidate reading in lpstream * remove debug echo * tune log level * add channel cleanup and cancelation handling * cancelation handling * cancelation handling * cancelation handling * cancelation handling * cleanup and cancelation handling * cancelation handling * cancelation * tests * rename isConnected to connected * remove testing trace * comment out debug stacktraces * explicit raises	2020-06-29 09:15:31 -06:00
Jacek Sieka	aa6756dfe0	allow message id provider to be specified (#243 ) * don't send public key in message when not signing (information leak) * don't run rebalance if there are peers in gossip (see #242) * don't crash randomly on bad peer id from remote	2020-06-28 09:56:38 -06:00
Dmitriy Ryajov	7a95f1844b	Concurrent dials (#238 ) * count published messages * don't call `switch.dial` in `subscribeToPeer` * add secureconn constructor * close in the correct order * concurent dial lock and track in/out conns better * make tests pass * add todo comment * disconect peers that open too many connections * wip * do connection and muxer tracking in one place * prevent nil pointer in observers * drop connections when peers is over max * prevent channel leaks * don't use closure to handle channel	2020-06-24 09:08:44 -06:00
Dmitriy Ryajov	5b28e8c488	Cleanup lpstream, Connection and BufferStream (#228 ) * count published messages * don't call `switch.dial` in `subscribeToPeer` * don't use delegation in connection * move connection out to own file * don't breakout on reset * make sure to call close on secured conn * add lpstream tracing * don't breackdown by conn id * fix import * remove unused lable * reset connection on exception * add additional metrics for skipped messages * check for nil in secure.close	2020-06-19 11:29:43 -06:00
Dmitriy Ryajov	9d9f793b4f	add metrics for sent messages by topic and peer (#220 )	2020-06-15 17:39:03 -06:00
Dmitriy Ryajov	ac04ca6e31	make sure keys exist and more metrics (#215 )	2020-06-11 20:20:58 -06:00
Dmitriy Ryajov	55a294a5c9	better pubsub metrics (#214 )	2020-06-11 12:09:34 -06:00
Viktor Kirilov	1afec627c2	proper name for topics so that we can filter dynamically using chronicles (#210 ) * proper name for topics so that we can filter dynamically using chronicles * lowercase	2020-06-10 10:48:01 +02:00
Dmitriy Ryajov	ee281310c0	move trace log	2020-06-08 10:40:08 -06:00
Giovanni Petrantoni	82b4ed8f44	use declareCounter rather then gauge for certain metrics	2020-06-07 16:41:23 +09:00
Giovanni Petrantoni	a6a2a81711	Start adding some metrics to pubsub (#192 ) * Start adding some metrics to pubsub In order to visualize it's functionality Still WIP * more metrics * add per topic metrics * finishup with requested metrics * add a metrisServer define to start local server * PR fixes and cleanup	2020-06-07 09:15:21 +02:00
Dmitriy Ryajov	130c64f33a	don't return nil in dial (#205 ) * dont return nil in dial * don't crash on pubsub send	2020-06-05 18:17:05 -06:00
Dmitriy Ryajov	bb8bff2195	add sparse message propagation tests to gossipsub (#202 ) * add sparce tests to gossipsub * add send hooks * remove `all`	2020-06-02 17:53:38 -06:00
Dmitriy Ryajov	1b4876d26d	emulate `defered`	2020-06-02 09:10:27 -06:00
Dmitriy Ryajov	86e1c8169c	decorate observers hooks with {.raises: [Defect].} move hooks logic out into standalone procs License: MIT Signed-off-by: Dmitriy Ryajov <dryajov@gmail.com>	2020-06-02 09:10:27 -06:00
Dmitriy Ryajov	daef00fc7b	don't crash schlesi-dev	2020-06-02 09:10:27 -06:00
Dmitriy Ryajov	93e5805c01	better exception handling	2020-06-02 09:10:27 -06:00
Dmitriy Ryajov	9132f16927	gossipsub fixes (#186 )	2020-05-21 14:24:20 -06:00
Dmitriy Ryajov	7900fd9f61	Half closed (#174 ) * call write until all is written out * add comments to lpchannel fields * add an eof flag to signal which end closed * wip: rework with proper half-closed * add eof and closed handling * propagate closes to piped * call parent close * moving bufferstream trackers out * move writeLock to bufferstream * move writeLock out * remove unused call * wip * rebasing master * fix mplex tests * wip * fix bufferstream after backport * wip * rename to differentiate from chronos tracker * close connection on chronos close * make reset request asyncCheck * fix channel cleanup * misc * don't use read * fix backports * make noise work again * proper exception handling * don't reraise just yet * add convenience templates * dont double wrap * use async pragma * fixes after backporting * muxer owns connection * remove on transport close cleanup * revert back allread * adding some todos * read from stream * inc count before closing * rebasing master * rebase master * use correct exception type * use try/finally insted of defer * fix compile in trace mode * reset channels on mplex close	2020-05-19 18:14:15 -06:00
Dmitriy Ryajov	f8029e7359	use sha256 digest as cache keys (#135 ) * use sha256 digest as cache keys * rebasing master	2020-05-18 14:49:49 -06:00

1 2

75 Commits