959 Commits

Author SHA1 Message Date
Giovanni Petrantoni
4663eaab6c (to revert) add some traces 2020-09-14 15:23:21 +09:00
Giovanni Petrantoni
b6470707a8 logging improvements and cleaning up 2020-09-14 14:05:25 +09:00
Giovanni Petrantoni
e4da3e4f67 remove heartbeat assertion, minor debug fixes 2020-09-12 21:05:21 +09:00
Giovanni Petrantoni
3fed781eda fixes and cleanup 2020-09-12 18:09:32 +09:00
Giovanni Petrantoni
13b623bc81 interop fixes 2020-09-12 11:45:53 +09:00
Giovanni Petrantoni
edc055c1e8 revert using libp2p org for daemon 2020-09-11 23:03:29 +09:00
Giovanni Petrantoni
3d24c4c4ee remove pointless scaling in / Duration operator 2020-09-11 22:29:02 +09:00
Giovanni Petrantoni
3b6a26c0bd byScore cleanup score sorting 2020-09-11 22:24:46 +09:00
Giovanni Petrantoni
af96d6cb10 import fixes 2020-09-11 22:18:00 +09:00
Giovanni Petrantoni
505385e89f use go daemon 0.3.0 2020-09-11 22:15:08 +09:00
Giovanni Petrantoni
d75cc6edca Merge branch 'master' into gossip-one-one 2020-09-11 21:34:03 +09:00
Jacek Sieka
96d4c44fec
refactor bufferstream to use a queue (#346)
This change modifies how the backpressure algorithm in bufferstream
works - in particular, instead of working byte-by-byte, it will now work
seq-by-seq.

When data arrives, it usually does so in packets - in the current
bufferstream, the packet is read then split into bytes which are fed one
by one to the bufferstream. On the reading side, the bytes are popped of
the bufferstream, again byte by byte, to satisfy `readOnce` requests -
this introduces a lot of synchronization traffic because the checks for
full buffer and for async event handling must be done for every byte.

In this PR, a queue of length 1 is used instead - this means there will
at most exist one "packet" in `pushTo`, one in the queue and one in the
slush buffer that is used to store incomplete reads.

* avoid byte-by-byte copy to buffer, with synchronization in-between
* reuse AsyncQueue synchronization logic instead of rolling own
* avoid writeHandler callback - implement `write` method instead
* simplify EOF signalling by only setting EOF flag in queue reader (and
reset)
* remove BufferStream pipes (unused)
* fixes drainBuffer deadlock when drain is called from within read loop
and thus blocks draining
* fix lpchannel init order
2020-09-10 08:19:13 +02:00
Jacek Sieka
5b347adf58
logging fixes and small cleanups (#361)
In particular, allow longer multistream select reads
2020-09-09 19:12:08 +02:00
Jacek Sieka
63b38734bd
fix poor performance in LRU cache (#360)
it turns out (in NBC) a heap is sufficiently slow becuase of all the
deletes that it makes more sense to go with a linked list
2020-09-09 18:28:46 +02:00
Giovanni Petrantoni
84cc85b536 Merge branch 'master' into gossip-one-one 2020-09-08 18:03:13 +09:00
Jacek Sieka
82c179db9e
mplex fixes (#356)
* close the right connection when channel send fails
* don't crash on channel id that is not unique
2020-09-08 08:24:28 +02:00
Jacek Sieka
2b72d485a3
a few more log fixes (#355) 2020-09-07 14:15:11 +02:00
Jacek Sieka
c1856fda53
simplify and unify logging (#353)
* use short format for logging peerid
* log peerid:oid for connections
2020-09-06 10:31:47 +02:00
Jacek Sieka
9b815efe8f
gossipsub: don't subscribe to floodsub also (#352) 2020-09-04 22:53:03 +02:00
Jacek Sieka
16a008db75
fix connection event order when connection dies early (#351)
if the connection is already closed (because the remote closes during
identfiy for example), an exception would be raised which would leave
the connection in limbo, beacuse it would not go through the rest of
internalConnect.

Also, if the connection is already closed, the disconnect event would be
scheduled before the connect event :/
2020-09-04 20:30:26 +02:00
Jacek Sieka
6d91d61844
small cleanups & docs (#347)
* simplify gossipsub heartbeat start / stop
* avoid alloc in peerid check
* stop iterating over seq after unsubscribing item (could crash)
* don't crash on missing private key with enabled sigs (shouldn't happen
but...)
2020-09-04 18:31:43 +02:00
Eugene Kabanov
0b85192119
Remove asyncCheck from codebase. (#345)
* Remove asyncCheck from codebase.

* Replace all `discard` statements with new `asyncSpawn`.

* Bump `nim-chronos` requirement.
2020-09-04 18:30:45 +02:00
Jacek Sieka
5819c6a9a7
gossipsub / floodsub fixes (#348)
* mcache fixes

* remove timed cache - the window shifting already removes old messages
* ref -> object
* avoid unnecessary allocations with `[]` operator

* simplify init

* fix several gossipsub/floodsub issues

* floodsub, gossipsub: don't rebroadcast messages that fail validation
(!)
* floodsub, gossipsub: don't crash when unsubscribing from unknown
topics (!)
* gossipsub: don't send message to peers that are not interested in the
topic, when messages don't share topic list
* floodsub: don't repeat all messages for each message when
rebroadcasting
* floodsub: allow sending empty data
* floodsub: fix inefficient unsubscribe
* sync floodsub/gossipsub logging
* gossipsub: include incoming messages in mcache (!)
* gossipsub: don't rebroadcast already-seen messages (!)
* pubsubpeer: remove incoming/outgoing seen caches - these are already
handled in gossipsub, floodsub and will cause trouble when peers try to
resubscribe / regraft topics (because control messages will have same
digest)
* timedcache: reimplement without timers (fixes timer leaks and extreme
inefficiency due to per-message closures, futures etc)
* timedcache: ref -> obj
2020-09-04 08:10:32 +02:00
Eugene Kabanov
c0bc73ddac
Fix Azure CI x86 problems. (#350) 2020-09-03 20:13:37 +03:00
Jacek Sieka
cd1c68dbc5
avoid send deadlock by not allowing send to block (#342)
* avoid send deadlock by not allowing send to block

* handle message issues more consistently
2020-09-01 09:33:03 +02:00
Giovanni Petrantoni
e225f3cf5a Merge branch 'master' into gossip-one-one 2020-08-24 17:41:45 +09:00
Dmitriy Ryajov
d3182c4dba
No raise send (#339)
* dont raise in send

* check that the lock is acquire on release
2020-08-20 20:50:33 -06:00
Giovanni Petrantoni
840a76915e warn -> debug log levels in errors.nim 2020-08-20 16:53:28 +09:00
Jacek Sieka
eb13845f65 work around send that may raise
`send` can raise exceptions that together with asyncCheck will
crash NBC
2020-08-19 14:25:30 +03:00
Giovanni Petrantoni
3b8e85c792 Merge branch 'master' into gossip-one-one 2020-08-19 18:41:38 +09:00
Zahary Karadjov
af0955c58b
Add comments explaning a possible deadlock 2020-08-18 13:51:41 +03:00
Zahary Karadjov
60122a044c
Restore interop with Lighthouse by preventing concurrent meshsub dials 2020-08-17 22:40:58 +03:00
Jacek Sieka
833a5b8e57
add muxer nil check 2020-08-17 13:32:02 +02:00
Jacek Sieka
cfcda3c3ef
work around race conditions between identify and other protocols
when identify is run on incoming connections, the connmanager tables are
updated too late for incoming connections to properly be handled

this is a quickfix that will eventually need cleaning up
2020-08-17 13:29:45 +02:00
Jacek Sieka
790b67c923
work around bufferstream deadlock (#332)
mplex backpressure handling deadlocks with something
2020-08-17 12:45:54 +02:00
Jacek Sieka
53877e97bd
trace logs 2020-08-17 12:39:25 +02:00
Jacek Sieka
f46bf0faa4
remove send lock (#334)
* remove send lock

When mplex receives data it will block until a reader has processed the
data. Thus, when a large message is received, such as a gossipsub
subscription table, all of mplex will be blocked until all reading is
finished.

However, if at the same time a `dial` to establish a gossipsub send
connection is ongoing, that `dial` will be blocked because mplex is no
longer reading data - specifically, it might indeed be the connection
that's processing the previous data that is waiting for a send
connection.

There are other problems with the current code:
* If an exception is raised, it is not necessarily raised for the same
connection as `p.sendConn`, so resetting `p.sendConn` in the exception
handling is wrong
* `p.isConnected` is checked before taking the lock - thus, if it
returns false, a new dial will be started. If a new task enters `send`
before dial is finished, it will also determine `p.isConnected` is
false, then get stuck on the lock - when the previous task finishes and
releases the lock, the new task will _also_ dial and thus reset
`p.sendConn` causing a leak.

* prefer existing connection

simplifies flow
2020-08-17 12:38:27 +02:00
Jacek Sieka
b12145dff7
avoid crash when subscribe is received (#333)
...by making subscribeTopic synchronous, avoiding a peer table lookup
completely.

rebalanceMesh will be called a second later - it's fine
2020-08-17 12:10:22 +02:00
Giovanni Petrantoni
eebaeb779f Merge branch 'master' into gossip-one-one 2020-08-17 11:44:23 +09:00
Giovanni Petrantoni
afaa7f2a84 finishup with score and ihave budget 2020-08-17 01:20:50 +09:00
Jacek Sieka
ab864fc747
logging cleanups and small fixes (#331) 2020-08-15 21:50:31 +02:00
Giovanni Petrantoni
1661344e17 oversub prune score based, finish outbound quota 2020-08-15 16:25:29 +09:00
Jacek Sieka
397f9edfd4
simplify mplex (#327)
* less async
* less copying of data
* less redundant cleanup
2020-08-15 07:58:30 +02:00
Jacek Sieka
9c7e055310
set activity flag on noise / secio (#330) 2020-08-15 07:36:15 +02:00
Giovanni Petrantoni
69ff05a4f4 outbound mesh quota, internal tests fixing 2020-08-15 12:51:51 +09:00
Giovanni Petrantoni
309a4998c8 Adaptive gossip dissemination 2020-08-15 11:57:22 +09:00
Giovanni Petrantoni
3e8349b918 IWANT cap/budget 2020-08-14 17:07:26 +09:00
Giovanni Petrantoni
c6fc8dee54 backoff time management 2020-08-14 11:59:19 +09:00
Giovanni Petrantoni
ce61d84db4 add an hard cap to PX 2020-08-13 21:21:17 +09:00
Giovanni Petrantoni
585cfc5996 implement peer exchange graft message 2020-08-13 20:41:54 +09:00