nim-libp2p

Commit Graph

Author	SHA1	Message	Date
Jacek Sieka	e285d8bbf4	mem usage cleanups for pubsub (#564 ) In `async` functions, a closure environment is created for variables that cross an await boundary - this closure environment is kept in memory for the lifetime of the associated future - this means that although _some_ variables are no longer used, they still take up memory for a long time. In Nimbus, message validation is processed in batches meaning the future of an incoming gossip message stays around for quite a while - this leads to memory consumption peaks of 100-200 mb when there are many attestations in the pipeline. To avoid excessive memory usage, it's generally better to move non-async code into proc's such that the variables therein can be released earlier - this includes the many hidden variables introduced by macro and template expansion (ie chronicles that does expensive exception handling) * move seen table salt to floodsub, use there as well * shorten seen table salt to size of hash * avoid unnecessary memory allocations and copies in a few places * factor out message scoring * avoid reencoding outgoing message for every peer * keep checking validators until reject (in case there's both reject and ignore) * `readOnce` avoids `readExactly` overhead for single-byte read * genericAssign -> assign2	2021-04-18 10:08:33 +02:00
Jacek Sieka	54031c9e9b	Fix minor exception issues (#550 ) Makes code compatible with https://github.com/status-im/nim-chronos/pull/166 without requiring it.	2021-03-23 07:45:25 +01:00
Giovanni Petrantoni	4760df1e31	fix build with libp2p_agents_metrics switch	2021-03-15 01:42:47 +00:00
Jacek Sieka	70deac9e0d	fix peer score accumulation (#541 ) * fix accumulating peer score * fix missing exception handling * remove unnecessary initHashSet/initTable calls * simplify peer stats management * clean up tests a little * fix some missing raises annotations	2021-03-09 13:22:52 +01:00
Giovanni Petrantoni	02ad017107	Gossipsub fixes and Initiator flagging fixes (#539 ) * properly propagate initiator information for gossipsub * Fix pubsubpeer lifetime management * restore old behavior * tests fixing * clamp backoff time value received * fix member name collisions * internal test fixes * better names and explaining of the importance of transport direction * fixes	2021-03-03 08:23:40 +09:00
Dmitriy Ryajov	fb493d1a4a	Connection limits tests (#509 ) * connection limit tests * remove use of secio * check that upgraded fut is not nil * rebuild	2021-01-27 21:27:33 -06:00
Dmitriy Ryajov	0959877b29	Connection limits (#384 ) * master merge * wip * avoid deadlocks * tcp limits * expose client field in chronosstream * limit incoming connections * update with new listen api * fix release * don't override peerinfo in connection * rework transport with accept * use semaphore to track resource ussage * rework with new transport accept api * move events to conn manager (#373) * use semaphore to track resource ussage * merge master * expose api to acquire conn slots * don't fail expensive metrics * allow tracking and updating connections * set global connection limits to 80 * add per peer connection limits * make sure conn is closed if tracking failed * more descriptive naming for handle * rework with new transport accept api * add `getStream` hide `selectConn` * add TransportClosedError * make nil explicit * don't make unnecessary copies of message * logging * error handling * cleanup semaphore * track connections properly * throw `TooManyConnections` when tracking outgoing * use proper exception and handle conventions * check onCloseHandle for nil * revert internalConnect changes * adding upgraded flag * await stream before closing * simplify tracking * wip * logging * split connection limits into incoming and outgoing * further streamline connection limits split counts * don't use closeWithEOF * move peer and conn event triggers from switch * wip * wip * wip * merge master * handle nil connections properly * add clarifying comment * don't raise exc on nil * no finally * add proper min/max connections logic * rebase master * merge master * master merge * remove request timeout should be addressed in separate PR * merge master * share semaphore when in/out limits arent enforced * merge master * use import * pass semaphore to trackConn * don't close last conn * use storeConn * merge master * use storeConn	2021-01-20 22:00:24 -06:00
Dmitriy Ryajov	34e330353f	better `upgraded` lifetime handling (avoid NPE) (#506 ) * avoid npe on connection upgrade * add `onUpgraded` event	2021-01-18 16:27:29 -06:00
Giovanni Petrantoni	b902c030a0	add metrics into chronosstream to identify peers agents (#458 ) * add metrics into chronosstream to identify peers agents * avoid too many agent strings * use gauge instead of counter for stream metrics * filter identity on / * also track bytes traffic * fix identity tracking closeimpl call * add gossip rpc metrics * fix missing metrics inclusions * metrics fixes and additions * add a KnownLibP2PAgents strdefine * enforse toLowerAscii to agent names (metrics) * incoming rpc metrics * fix silly mistake in rpc metrics * fix agent metrics logic * libp2p_gossipsub_failed_publish metric * message ids metrics * libp2p_pubsub_broadcast_ihave metric improvement * refactor expensive gossip metrics * more detailed metrics * metrics improvements * remove generic metrics for `set` users * small fixes, add debug counters * fix counter and add missing subs metrics! * agent metrics behind -d:libp2p_agents_metrics * remove testing related code from this PR * small rebroadcast metric fix * fix small mistake * add some guide to the readme in order to use new metrics * add libp2p_gossipsub_peers_scores metric * add protobuf metrics to understand bytes traffic precisely * refactor gossipsub metrics * remove unused variable * add more metrics, refactor rebalance metrics * avoid bad metric concurrent states * use a stack structure for gossip mesh metrics * refine sub metrics * add received subs metrics fixes * measure handlers of known topics * sub/unsub counter * unsubscribeAll log unknown topics * expose a way to specify known topics at runtime	2021-01-08 14:21:24 +09:00
Dmitriy Ryajov	b2ea5a3c77	Concurrent upgrades (#489 ) * adding an upgraded event to conn * set stopped flag asap * trigger upgradded event on conn * set concurrency limit for accepts * backporting semaphore from tcp-limits2 * export unittests module * make params explicit * tone down debug logs * adding semaphore tests * use semaphore to throttle concurent upgrades * add libp2p scope * trigger upgraded event before any other events * add event handler for connection upgrade * cleanup upgraded event on conn close * make upgrades slot release rebust * dont forget to release slot on nil connection * misc * make sure semaphore is always released * minor improvements and a nil check * removing unneeded comment * make upgradeMonitor a non-closure proc * make sure the `upgraded` event is initialized * handle exceptions in accepts when stopping * don't leak exceptions when stopping accept loops	2021-01-04 12:59:05 -06:00
Jacek Sieka	b52dab9fd7	use stew/leb128 (#481 ) * avoids multiple reallocations in readLp * simplifies varint implementation * remove vbuffer.length (unused)	2020-12-15 12:15:22 -06:00
Dmitriy Ryajov	e9d4679059	Race in connection setup (#464 ) * check that connection is not closed or eof * don't release connection lock prematurely * test that only valid connections can be added * correct exception type on closed connection * add clarifying comment * use closeWithEOF for more stable test * misc comments * log stream id in buffestream asserts * use closeWithEOF to prevent races in tests * give some time to the remote handler to trigger * adding more tests to make codecov happy	2020-12-02 19:24:48 -06:00
Dmitriy Ryajov	d1c689e5ab	adding libp2p tag to logScope (#465 )	2020-12-01 11:34:27 -06:00
Dmitriy Ryajov	94e672ead0	allow concurrent closeWithEOF (#466 ) * allow concurrent closeWithEOF * add dedicated closedWithEOF flag	2020-12-01 09:44:21 +01:00
Jacek Sieka	5c2a54bdd9	fix timeoutmonitor loop (#463 ) * fix timeoutmonitor loop * Clarify that cancellation can happen while in timeoutMonitor	2020-11-29 13:34:19 +01:00
Dmitriy Ryajov	ca9c5c85e4	dont break chronicles logging streamline connsetup (#455 )	2020-11-25 13:34:48 -06:00
Dmitriy Ryajov	1d16d22f5f	Don't allow concurrent pushdata (#444 ) * handle resets properly with/without pushes/reads * add clarifying comments * pushEof should also not be concurrent * move channel reset to bufferstream this is where the action happens - lpchannel merely redefines how close is done Co-authored-by: Jacek Sieka <jacek@status.im>	2020-11-23 09:07:11 -06:00
Dmitriy Ryajov	92fa4110c1	Rework transport to use chronos accept (#420 ) * rework transport to use the new accept api * use the new chronos primits * fixup tests to use the new transport api * handle all exceptions in upgradeIncoming * master merge * add multiaddress exception type * raise appropriate exception on invalida address * allow retrying on TransportTooManyError * adding TODO * wip * merge master * add sleep if nil is returned * accept loop handles all exceptions * avoid issues with tray/except/finally * make consistent with master * cleanup accept loop * logging * Update libp2p/transports/tcptransport.nim Co-authored-by: Jacek Sieka <jacek@status.im> * use Direction enum instead of initiator flag * use consistent import style * remove experimental `closeWithEOF()` Co-authored-by: Jacek Sieka <jacek@status.im>	2020-11-18 20:06:42 -06:00
Jacek Sieka	74acd0a33a	fix channels not being reset (#439 ) * fix channels not being reset silly for loop.. * allow only one concurrent read * fix mplex test race condition * add some bufferstream eof tests * deadlock, lost data and hung channel fixes * prevent concurrent `reset` calls * reset LPChannel when read is cancelled (since data is lost) * ensure there's one, and one only, 0-byte readOnce on EOF * ensure that all data is returned before EOF is returned * keep running activity monitor for half-closed channels (or they never get closed)	2020-11-17 08:59:25 -06:00
Dmitriy Ryajov	90921bff09	move some importance trace logs to debug (#428 )	2020-11-09 22:14:46 -06:00
Dmitriy Ryajov	4fb3f50d2c	Reset channels on close (#425 ) * reset when failed to read/write muxed conn * add more comprehensive resource cleanup tests * style * cleanup tests	2020-11-06 09:24:24 -06:00
Dmitriy Ryajov	3956f3fd69	make sure all streams are tracked (#422 ) * make sure all streams are tracked * revert unnecesary change	2020-11-04 21:52:54 -06:00
Dmitriy Ryajov	43a77e60a1	split stream counts by direction (#418 )	2020-11-01 16:23:26 -06:00
Jacek Sieka	03639f1446	Revert "Channel leaks (#413 )" (#417 ) This reverts commit `1de1d49223`.	2020-11-01 14:49:25 -06:00
Dmitriy Ryajov	1de1d49223	Channel leaks (#413 ) * break stream tracking by type * use closeWithEOF to await wrapped stream * fix cancelation leaks * fix channel leaks * logging * use close monitor and always call closeUnderlying * don't use closeWithEOF * removing close monitor * logging	2020-10-27 11:21:03 -06:00
Jacek Sieka	17e00e642a	limit write queue length (#376 ) To break a potential read/write deadlock, gossipsub uses an unbounded queue for writes - when peers are too slow to process this queue, it may end up growing without bounds causing high memory usage. Here, we introduce a maximum write queue length after which the peer is disconnected - the queue is generous enough that any "normal" usage should be fine - writes that are `await`:ed are not affected, only writes that are launched in an `asyncSpawn` task or similar. * avoid unnecessary copy of message when there are no send observers * release message memory earlier in gossipsub * simplify pubsubpeer logging	2020-09-24 18:43:20 +02:00
Jacek Sieka	25bd0a18f4	small fixes (#374 ) * add helper to read EOF marker after closing stream (else stream stay alive until timeout/reset) * don't assert on empty channel message * don't loop when writing to chronos (no need)	2020-09-24 07:30:19 +02:00
Jacek Sieka	49a12e619d	channel close race and deadlock fixes (#368 ) * channel close race and deadlock fixes * remove send lock, write chunks in one go * push some of half-closed implementation to BufferStream * fix some hangs where LPChannel readers and writers would not always wake up * simplify lazy channels * fix close happening more than once in some orderings * reenable connection tracking tests * close channels first on mplex close such that consumers can read bytes A notable difference is that BufferedStream is no longer considered EOF until someone has actually read the EOF marker. * docs, simplification	2020-09-21 19:48:19 +02:00
Jacek Sieka	b7e5d1122c	cleanups (#366 ) * reuse connection timeout for noise handshake (avoid extra timer) * enforce nbytes > 0 for readOnce * avoid some unnecessary memory zeroing * simplify noise * fix dumping when noise splits message	2020-09-16 11:55:25 +02:00
Jacek Sieka	0db45462cd	mplex fixes (#362 ) * remove almost-empty types module * lock when writing message (that's the only place the lock matters, and only when the message is > max msg size) * logging updates (log in consistent order, makes reading logs easier) * raise EOF from readExactly only if no bytes have been read (to signal that _no_ bytes were lost)	2020-09-14 10:19:54 +02:00
Jacek Sieka	96d4c44fec	refactor bufferstream to use a queue (#346 ) This change modifies how the backpressure algorithm in bufferstream works - in particular, instead of working byte-by-byte, it will now work seq-by-seq. When data arrives, it usually does so in packets - in the current bufferstream, the packet is read then split into bytes which are fed one by one to the bufferstream. On the reading side, the bytes are popped of the bufferstream, again byte by byte, to satisfy `readOnce` requests - this introduces a lot of synchronization traffic because the checks for full buffer and for async event handling must be done for every byte. In this PR, a queue of length 1 is used instead - this means there will at most exist one "packet" in `pushTo`, one in the queue and one in the slush buffer that is used to store incomplete reads. * avoid byte-by-byte copy to buffer, with synchronization in-between * reuse AsyncQueue synchronization logic instead of rolling own * avoid writeHandler callback - implement `write` method instead * simplify EOF signalling by only setting EOF flag in queue reader (and reset) * remove BufferStream pipes (unused) * fixes drainBuffer deadlock when drain is called from within read loop and thus blocks draining * fix lpchannel init order	2020-09-10 08:19:13 +02:00
Jacek Sieka	5b347adf58	logging fixes and small cleanups (#361 ) In particular, allow longer multistream select reads	2020-09-09 19:12:08 +02:00
Jacek Sieka	c1856fda53	simplify and unify logging (#353 ) * use short format for logging peerid * log peerid:oid for connections	2020-09-06 10:31:47 +02:00
Jacek Sieka	cd1c68dbc5	avoid send deadlock by not allowing send to block (#342 ) * avoid send deadlock by not allowing send to block * handle message issues more consistently	2020-09-01 09:33:03 +02:00
Dmitriy Ryajov	d3182c4dba	No raise send (#339 ) * dont raise in send * check that the lock is acquire on release	2020-08-20 20:50:33 -06:00
Jacek Sieka	790b67c923	work around bufferstream deadlock (#332 ) mplex backpressure handling deadlocks with something	2020-08-17 12:45:54 +02:00
Jacek Sieka	ab864fc747	logging cleanups and small fixes (#331 )	2020-08-15 21:50:31 +02:00
Jacek Sieka	397f9edfd4	simplify mplex (#327 ) * less async * less copying of data * less redundant cleanup	2020-08-15 07:58:30 +02:00
Dmitriy Ryajov	d1f1e1b31e	add missing mplex half closed test (#326 )	2020-08-12 07:23:49 +02:00
Dmitriy Ryajov	b76b3e0e9b	Rework pubsub (#322 ) * move pubsub of off switch, pass switch into pubsub * use join on lpstreams * properly cleanup up failed peers * fix tests * fix peertable hasPeerId * fix tests * rework sending, remove helpers from pubsubpeer, unify in broadcast * further split broadcast into send * use send where appropriate * use formatIt * improve trace Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>	2020-08-11 18:05:49 -06:00
Dmitriy Ryajov	2325692f55	Fix half closed (#324 ) * don't call `close` in `remoteClose` * make sure timeout are properly propagted * fix tests * adding remote close write test	2020-08-10 16:17:11 -06:00
Jacek Sieka	7c2ab38da1	cleanups (#319 )	2020-08-06 20:14:40 +02:00
Giovanni Petrantoni	5c986cf657	Fix build, add some raises (#315 ) * Fix build, add some raises * wip * wip more raises * missing exc object in mplex * proper lifetime for subscribePeer Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com>	2020-08-05 19:30:57 -06:00
Dmitriy Ryajov	cf2b42b914	Moving idle timeout to Connection to enable across all connection streams (#307 ) * move idle timeout logic to connection * more informative logs * more informative logs	2020-08-04 07:22:05 -06:00
Jacek Sieka	e655a510cd	misc cleanups (#303 )	2020-08-02 12:22:49 +02:00
Dmitriy Ryajov	0348773ec9	Connection manager (#277 ) * splitting out connection management * wip * wip conn mngr tests * set peerinfo in contructor * comments and documentation * tests * wip * add `None` to detect untagged connections * use `PeerID` to index connections * fix tests * remove useless equality	2020-07-17 09:36:48 -06:00
Ștefan Talpalaru	b8b0a2b4bc	CI: build binaries with TRACE & JSON logs (#268 ) Also: remove unused imports.	2020-07-14 02:02:16 +02:00
Dmitriy Ryajov	181cf73ca7	Drain buffer (#264 ) * drain lpchannel on reset * move drainBuffer to bufferstream	2020-07-12 18:37:10 +02:00
Dmitriy Ryajov	4c815d75e7	More gossip cleanup (#257 ) * more cleanup * correct pubsub peer count * close the stream first * handle cancelation * fix tests * fix fanout ttl * merging master * remove `withLock` as it conflicts with stdlib * fix trace build Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>	2020-07-09 14:21:47 -06:00
Jacek Sieka	45c089ff0d	noise updates (#255 ) * clear secrets explicitly * simplify keygen * avoid some trivial memory allocations * fix little endian encoding of nonce	2020-07-09 02:53:19 -06:00

1 2 3

109 Commits