Commit Graph

139 Commits

Author SHA1 Message Date
Jacek Sieka 9b815efe8f
gossipsub: don't subscribe to floodsub also (#352) 2020-09-04 22:53:03 +02:00
Jacek Sieka 16a008db75
fix connection event order when connection dies early (#351)
if the connection is already closed (because the remote closes during
identfiy for example), an exception would be raised which would leave
the connection in limbo, beacuse it would not go through the rest of
internalConnect.

Also, if the connection is already closed, the disconnect event would be
scheduled before the connect event :/
2020-09-04 20:30:26 +02:00
Jacek Sieka 6d91d61844
small cleanups & docs (#347)
* simplify gossipsub heartbeat start / stop
* avoid alloc in peerid check
* stop iterating over seq after unsubscribing item (could crash)
* don't crash on missing private key with enabled sigs (shouldn't happen
but...)
2020-09-04 18:31:43 +02:00
Eugene Kabanov 0b85192119
Remove asyncCheck from codebase. (#345)
* Remove asyncCheck from codebase.

* Replace all `discard` statements with new `asyncSpawn`.

* Bump `nim-chronos` requirement.
2020-09-04 18:30:45 +02:00
Jacek Sieka 5819c6a9a7
gossipsub / floodsub fixes (#348)
* mcache fixes

* remove timed cache - the window shifting already removes old messages
* ref -> object
* avoid unnecessary allocations with `[]` operator

* simplify init

* fix several gossipsub/floodsub issues

* floodsub, gossipsub: don't rebroadcast messages that fail validation
(!)
* floodsub, gossipsub: don't crash when unsubscribing from unknown
topics (!)
* gossipsub: don't send message to peers that are not interested in the
topic, when messages don't share topic list
* floodsub: don't repeat all messages for each message when
rebroadcasting
* floodsub: allow sending empty data
* floodsub: fix inefficient unsubscribe
* sync floodsub/gossipsub logging
* gossipsub: include incoming messages in mcache (!)
* gossipsub: don't rebroadcast already-seen messages (!)
* pubsubpeer: remove incoming/outgoing seen caches - these are already
handled in gossipsub, floodsub and will cause trouble when peers try to
resubscribe / regraft topics (because control messages will have same
digest)
* timedcache: reimplement without timers (fixes timer leaks and extreme
inefficiency due to per-message closures, futures etc)
* timedcache: ref -> obj
2020-09-04 08:10:32 +02:00
Jacek Sieka cd1c68dbc5
avoid send deadlock by not allowing send to block (#342)
* avoid send deadlock by not allowing send to block

* handle message issues more consistently
2020-09-01 09:33:03 +02:00
Dmitriy Ryajov d3182c4dba
No raise send (#339)
* dont raise in send

* check that the lock is acquire on release
2020-08-20 20:50:33 -06:00
Jacek Sieka eb13845f65 work around send that may raise
`send` can raise exceptions that together with asyncCheck will
crash NBC
2020-08-19 14:25:30 +03:00
Zahary Karadjov af0955c58b
Add comments explaning a possible deadlock 2020-08-18 13:51:41 +03:00
Zahary Karadjov 60122a044c
Restore interop with Lighthouse by preventing concurrent meshsub dials 2020-08-17 22:40:58 +03:00
Jacek Sieka 53877e97bd
trace logs 2020-08-17 12:39:25 +02:00
Jacek Sieka f46bf0faa4
remove send lock (#334)
* remove send lock

When mplex receives data it will block until a reader has processed the
data. Thus, when a large message is received, such as a gossipsub
subscription table, all of mplex will be blocked until all reading is
finished.

However, if at the same time a `dial` to establish a gossipsub send
connection is ongoing, that `dial` will be blocked because mplex is no
longer reading data - specifically, it might indeed be the connection
that's processing the previous data that is waiting for a send
connection.

There are other problems with the current code:
* If an exception is raised, it is not necessarily raised for the same
connection as `p.sendConn`, so resetting `p.sendConn` in the exception
handling is wrong
* `p.isConnected` is checked before taking the lock - thus, if it
returns false, a new dial will be started. If a new task enters `send`
before dial is finished, it will also determine `p.isConnected` is
false, then get stuck on the lock - when the previous task finishes and
releases the lock, the new task will _also_ dial and thus reset
`p.sendConn` causing a leak.

* prefer existing connection

simplifies flow
2020-08-17 12:38:27 +02:00
Jacek Sieka b12145dff7
avoid crash when subscribe is received (#333)
...by making subscribeTopic synchronous, avoiding a peer table lookup
completely.

rebalanceMesh will be called a second later - it's fine
2020-08-17 12:10:22 +02:00
Jacek Sieka ab864fc747
logging cleanups and small fixes (#331) 2020-08-15 21:50:31 +02:00
Dmitriy Ryajov b76b3e0e9b
Rework pubsub (#322)
* move pubsub of off switch, pass switch into pubsub

* use join on lpstreams

* properly cleanup up failed peers

* fix tests

* fix peertable hasPeerId

* fix tests

* rework sending, remove helpers from pubsubpeer, unify in broadcast

* further split broadcast into send

* use send where appropriate

* use formatIt

* improve trace

Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-08-11 18:05:49 -06:00
zah fbb59c3638
`msg` is a reserved property name in Chronicles (#321)
Every Chronicles log record has an existing `msg` property matching
the static string supplied in the log statement. Thus, it's currently
not possible to use `msg` as the name of a user property:

https://github.com/status-im/nim-chronicles/issues/86
2020-08-07 16:46:00 -06:00
Jacek Sieka c6c0c152c0
Dial peerid (#308)
* prefer PeerID in switch api

This avoids ref issues like ref identity and nil

* use existing peerinfo instance if possible

* remove secureCodec

there may be multiple connections per peerinfo with different codecs

* avoid some extra async::
2020-08-06 09:29:27 +02:00
Giovanni Petrantoni 9bbe5e4841
Fix subclass calls to handleDisconnect (#314)
* Fix subclass calls to handleDisconnect

* add peer ID to nil peer debug message
2020-08-06 11:12:52 +09:00
Giovanni Petrantoni 5c986cf657
Fix build, add some raises (#315)
* Fix build, add some raises

* wip

* wip more raises

* missing exc object in mplex

* proper lifetime for subscribePeer

Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com>
2020-08-05 19:30:57 -06:00
Ștefan Talpalaru bd5d43874a
more expensive metrics (#312) 2020-08-05 14:02:26 +02:00
Ștefan Talpalaru 843d32f8db
put expensive metrics under a Nim define (#310) 2020-08-04 17:27:59 -06:00
Dmitriy Ryajov b6877b8aac
increase send timeout for prune and graft msgs (#306)
* increase send timeout for prune and graft msgs

* use trace logs for subscribe monitor
2020-08-03 17:55:42 -06:00
Dmitriy Ryajov 980764774e
pubsub timeouts tuning (#295)
* add finegrained timeouts to pubsub

* use 10 millis timeout in tests

* finalization

* revert timeouts

* use `atEof` for reads

* adjust timeouts and use atEof for reads

* use atEof for reads

* set isEof flag

* no backoff for pubsub streams

* temp timer increase, make macos finalize

* don't call `subscribePeer` in libp2p anymore

* more traces

* leak tests

* lower timeouts

* handle exceptions in control message

* don't use `cancelAndWait`

* handle exceptions in helpers

* wip

* don't send empty messages

* check for leaks properly

* don't use cancelAndWait

* don't await subscribption sends

* remove subscrivePeer calls from switch

* trying without the hooks again
2020-08-02 23:20:11 -06:00
Jacek Sieka e655a510cd
misc cleanups (#303) 2020-08-02 12:22:49 +02:00
Dmitriy Ryajov f7fdf31365
Pubsub lifetime (#284)
* lifecycle hooks

* tests

* move trace after closed check

* restore 1 second heartbeat

* await close event

* fix tests

* print direction string

* more trace logging

* add pubsub monitor

* add log scope

* adjust idle timeout

* add exc.msg to trace
2020-07-27 13:33:51 -06:00
Giovanni Petrantoni 3b088f8980
Fix some unsubscribe issues and add unsubscribeAll helper (#282)
* Fix some unsub issues and add unsuball helper

* batch sendprune in unsubscribe methods

* add unsubscribeAll for floodsub
2020-07-20 10:16:13 -06:00
Dmitriy Ryajov 94196fee71
Connections and pubsub peers cleanup (#279)
* better peer tracking and cleanup

* check if peer and conn is nil

* test name

* make timeout more agressive

* rename method for better clarity
2020-07-17 13:46:24 -06:00
Dmitriy Ryajov 0348773ec9
Connection manager (#277)
* splitting out connection management

* wip

* wip conn mngr tests

* set peerinfo in contructor

* comments and documentation

* tests

* wip

* add `None` to detect untagged connections

* use `PeerID` to index connections

* fix tests

* remove useless equality
2020-07-17 09:36:48 -06:00
Jacek Sieka 170685f9c6
gossipsub fixes (#276)
* graft up to D peers
* fix logging so it's clear who is grafting/pruning who
* clear fanout when grafting
2020-07-16 21:26:57 +02:00
Jacek Sieka c76152f2c1
Simplify send (#271)
* PubSubPeer.send single message

* gossipsub: simplify send further
2020-07-16 12:06:57 +02:00
Dmitriy Ryajov f35b8999b3
some light cleanup for pub/gossip sub (#273)
* move peer table out to its own file

* move peer table

* cleanup `==` and add one to peerinfo

* add peertable

* missed equality check
2020-07-15 13:18:55 -06:00
Eugene Kabanov b832668768
Minprotobuf refactoring 2 (#269)
* Protobuf refactoring stage II.

* Remove NoError.

* Change trace level for invalid message.
2020-07-15 10:25:39 +02:00
Giovanni Petrantoni d7bab37119
Fix gossip messages seqno according to spec (#253)
* Fix gossip messages seqno according to spec

* Add peers back to gossipsub table, slow down heartbeat

* Revert "Add peers back to gossipsub table, slow down heartbeat"

This reverts commit 01e2e62172.

* make seqno a threadvar, remove from peerinfo

* seqno refactor, into pubsub
2020-07-14 21:51:33 -06:00
Ștefan Talpalaru b8b0a2b4bc
CI: build binaries with TRACE & JSON logs (#268)
Also: remove unused imports.
2020-07-14 02:02:16 +02:00
Jacek Sieka c6c2d99907
one more log fix 2020-07-13 20:19:20 +02:00
Jacek Sieka 76853f064a
json logging again 2020-07-13 19:59:49 +02:00
Jacek Sieka 6620b7a00b
more comment fixes 2020-07-13 19:30:18 +02:00
Jacek Sieka 0d4c74b33a
comment log that can't be json-serialized 2020-07-13 18:36:49 +02:00
Jacek Sieka 061c54d3c6
logging fixes 2020-07-13 17:26:05 +02:00
Jacek Sieka 87e58c1c8d
metrics: one more pubsub peers fix 2020-07-13 16:16:46 +02:00
Jacek Sieka c7895ccc52
metrics: fix pubsub_peers add metric 2020-07-13 16:15:27 +02:00
Giovanni Petrantoni fcda0f6ce1
PubSubPeer tables refactor (#263)
* refactor peer tables

* tests fixing

* override PubSubPeer equality

* fix pubsubpeer comparison
2020-07-13 15:32:38 +02:00
Eugene Kabanov efb952f18b
[WIP] Minprotobuf refactoring (#259)
* Minprotobuf initial commit

* Fix noise.

* Add signed integers support.
Add checks for field number value.
Remove some casts.

* Fix compile errors.

* Fix comments and constants.
2020-07-13 14:43:07 +02:00
Dmitriy Ryajov bec9a0658f
Cleanup rpc handler (#261)
* more cleanup

* fix tests

* merging master

* remove `withLock` as it conflicts with stdlib

* wip

* more fanout ttl

Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-07-09 17:54:16 -06:00
Dmitriy Ryajov 4c815d75e7
More gossip cleanup (#257)
* more cleanup

* correct pubsub peer count

* close the stream first

* handle cancelation

* fix tests

* fix fanout ttl

* merging master

* remove `withLock` as it conflicts with stdlib

* fix trace build

Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-07-09 14:21:47 -06:00
Jacek Sieka c720e042fc
clean up mesh handling logic (#260)
* gossipsub is a function of subscription messages only
* graft/prune work with mesh, get filled up from gossipsub
* fix race conditions with await
* fix exception unsafety when grafting/pruning
* fix allowing up to DHi peers in mesh on incoming graft
* fix metrics in several places
2020-07-09 11:16:46 -06:00
Giovanni Petrantoni 4e12d0d97a nil check peer before disconnect 2020-07-09 17:20:45 +09:00
Giovanni Petrantoni f9e0a1f069 CI fix handleDisconnect (pubsub) 2020-07-09 13:56:59 +09:00
Giovanni Petrantoni 9b8b159abb Remove other spurious getStacktrace in pubsub traces 2020-07-09 13:19:34 +09:00
Giovanni Petrantoni 4bcb567d47 fix gossip tests 2020-07-09 12:34:36 +09:00