Commit Graph

55 Commits

Author SHA1 Message Date
Jacek Sieka f46bf0faa4
remove send lock (#334)
* remove send lock

When mplex receives data it will block until a reader has processed the
data. Thus, when a large message is received, such as a gossipsub
subscription table, all of mplex will be blocked until all reading is
finished.

However, if at the same time a `dial` to establish a gossipsub send
connection is ongoing, that `dial` will be blocked because mplex is no
longer reading data - specifically, it might indeed be the connection
that's processing the previous data that is waiting for a send
connection.

There are other problems with the current code:
* If an exception is raised, it is not necessarily raised for the same
connection as `p.sendConn`, so resetting `p.sendConn` in the exception
handling is wrong
* `p.isConnected` is checked before taking the lock - thus, if it
returns false, a new dial will be started. If a new task enters `send`
before dial is finished, it will also determine `p.isConnected` is
false, then get stuck on the lock - when the previous task finishes and
releases the lock, the new task will _also_ dial and thus reset
`p.sendConn` causing a leak.

* prefer existing connection

simplifies flow
2020-08-17 12:38:27 +02:00
Dmitriy Ryajov b76b3e0e9b
Rework pubsub (#322)
* move pubsub of off switch, pass switch into pubsub

* use join on lpstreams

* properly cleanup up failed peers

* fix tests

* fix peertable hasPeerId

* fix tests

* rework sending, remove helpers from pubsubpeer, unify in broadcast

* further split broadcast into send

* use send where appropriate

* use formatIt

* improve trace

Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-08-11 18:05:49 -06:00
Dmitriy Ryajov 980764774e
pubsub timeouts tuning (#295)
* add finegrained timeouts to pubsub

* use 10 millis timeout in tests

* finalization

* revert timeouts

* use `atEof` for reads

* adjust timeouts and use atEof for reads

* use atEof for reads

* set isEof flag

* no backoff for pubsub streams

* temp timer increase, make macos finalize

* don't call `subscribePeer` in libp2p anymore

* more traces

* leak tests

* lower timeouts

* handle exceptions in control message

* don't use `cancelAndWait`

* handle exceptions in helpers

* wip

* don't send empty messages

* check for leaks properly

* don't use cancelAndWait

* don't await subscribption sends

* remove subscrivePeer calls from switch

* trying without the hooks again
2020-08-02 23:20:11 -06:00
Dmitriy Ryajov f7fdf31365
Pubsub lifetime (#284)
* lifecycle hooks

* tests

* move trace after closed check

* restore 1 second heartbeat

* await close event

* fix tests

* print direction string

* more trace logging

* add pubsub monitor

* add log scope

* adjust idle timeout

* add exc.msg to trace
2020-07-27 13:33:51 -06:00
Giovanni Petrantoni c3af7659b0
Add more checks and fix some issues in gossip tests (#281) 2020-07-20 15:55:00 +09:00
Dmitriy Ryajov f35b8999b3
some light cleanup for pub/gossip sub (#273)
* move peer table out to its own file

* move peer table

* cleanup `==` and add one to peerinfo

* add peertable

* missed equality check
2020-07-15 13:18:55 -06:00
Giovanni Petrantoni d7bab37119
Fix gossip messages seqno according to spec (#253)
* Fix gossip messages seqno according to spec

* Add peers back to gossipsub table, slow down heartbeat

* Revert "Add peers back to gossipsub table, slow down heartbeat"

This reverts commit 01e2e62172a7793bb17f0eb8314e2faeb2682173.

* make seqno a threadvar, remove from peerinfo

* seqno refactor, into pubsub
2020-07-14 21:51:33 -06:00
Ștefan Talpalaru b8b0a2b4bc
CI: build binaries with TRACE & JSON logs (#268)
Also: remove unused imports.
2020-07-14 02:02:16 +02:00
Giovanni Petrantoni fcda0f6ce1
PubSubPeer tables refactor (#263)
* refactor peer tables

* tests fixing

* override PubSubPeer equality

* fix pubsubpeer comparison
2020-07-13 15:32:38 +02:00
Dmitriy Ryajov 4c815d75e7
More gossip cleanup (#257)
* more cleanup

* correct pubsub peer count

* close the stream first

* handle cancelation

* fix tests

* fix fanout ttl

* merging master

* remove `withLock` as it conflicts with stdlib

* fix trace build

Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-07-09 14:21:47 -06:00
Jacek Sieka c720e042fc
clean up mesh handling logic (#260)
* gossipsub is a function of subscription messages only
* graft/prune work with mesh, get filled up from gossipsub
* fix race conditions with await
* fix exception unsafety when grafting/pruning
* fix allowing up to DHi peers in mesh on incoming graft
* fix metrics in several places
2020-07-09 11:16:46 -06:00
Dmitriy Ryajov a52763cc6d
fix publishing (#250)
* use var semantics to optimize table access

* wip... lvalues don't work properly sadly...

* big publish refactor, replenish and balance

* fix internal tests

* use g.peers for fanout (todo: don't include flood peers)

* exclude non gossip from fanout

* internal test fixes

* fix flood tests

* fix test's trypublish

* test interop fixes

* make sure to not remove peers from gossip table

* restore old replenishFanout

* cleanups

* Cleanup resources (#246)

* consolidate reading in lpstream

* remove debug echo

* tune log level

* add channel cleanup and cancelation handling

* cancelation handling

* cancelation handling

* cancelation handling

* cancelation handling

* cleanup and cancelation handling

* cancelation handling

* cancelation

* tests

* rename isConnected to connected

* remove testing trace

* comment out debug stacktraces

* explicit raises

* restore trace vs debug in gossip

* improve fanout replenish behavior further

* cleanup stale peers more eaguerly

* synchronize connection cleanup and small refactor

* close client first and call parent second

* disconnect failed peers on publish

* check for publish result

* fix tests

* fix tests

* always call close

Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-07-07 18:33:05 -06:00
Jacek Sieka d522537b19
reuse single RNG instance for all crypto key generation (#249)
* reuse single RNG instance for all crypto key generation

* use foolproof rng

* initRng -> newRng (because it's ref)

* fix test

* imports/exports, chat fix

* fix rsa

* imports and exports

* work around threadvar issue

* fixup

* mac workaround test
2020-07-07 13:14:11 +02:00
Giovanni Petrantoni ec00c7fc50
Peer resultification and defect only (#245)
* Peer resultification and defect only

* Fixing some tests

* test fixes

* Rename peer into peerid

* better result error message in identify

* further merge fixes
2020-07-01 08:25:09 +02:00
Jacek Sieka aa6756dfe0
allow message id provider to be specified (#243)
* don't send public key in message when not signing (information leak)
* don't run rebalance if there are peers in gossip (see #242)
* don't crash randomly on bad peer id from remote
2020-06-28 09:56:38 -06:00
Dmitriy Ryajov 902880ef1f
consolidate reading in lpstream (#241)
* consolidate reading in lpstream

* remove debug echo

* throw if not enough bytes where read

* tune log level

* set eof flag

* test readExactly to fail on not enough bytes
2020-06-27 11:33:34 -06:00
Dmitriy Ryajov 7a95f1844b
Concurrent dials (#238)
* count published messages

* don't call `switch.dial` in `subscribeToPeer`

* add secureconn constructor

* close in the correct order

* concurent dial lock and track in/out conns better

* make tests pass

* add todo comment

* disconect peers that open too many connections

* wip

* do connection and muxer tracking in one place

* prevent nil pointer in observers

* drop connections when peers is over max

* prevent channel leaks

* don't use closure to handle channel
2020-06-24 09:08:44 -06:00
Giovanni Petrantoni 7852c6dd0f
Noise and eth2/nbc fixes (#226)
* Remove noise padding payload (spec removed it)

* add log scope in secure

* avoid defect array out of range in switch secure when "na"

* improve identify traces

* wip noise fixes

* noise protobuf adjustments (trying)

* add more debugging messages/traces, improve their actual contents

* re-enable ID check in noise

* bump go daemon tag version

* bump go daemon tag version

* enable noise in daemonapi

* interop testing, (both secio and noise will be tested)

* azure cache bump (p2pd)

* CI changes

- Travis: use Go 1.14
- azure-pipelines.yml: big cleanup
- Azure: bump cache keys
- build 64-bit p2pd on 32-bit Windows
- install both Mingw-w64 architectures

* noise logging fixes

* alternate testing between noise and secio

* increase timeout to avoid VM errors in CI (multistream tests)

* refactor heartbeat management in gossipsub

* remove locking within heartbeat

* refactor heartbeat management in gossipsub

* remove locking within heartbeat

Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
2020-06-20 19:56:55 +09:00
Dmitriy Ryajov 5b28e8c488
Cleanup lpstream, Connection and BufferStream (#228)
* count published messages

* don't call `switch.dial` in `subscribeToPeer`

* don't use delegation in connection

* move connection out to own file

* don't breakout on reset

* make sure to call close on secured conn

* add lpstream tracing

* don't breackdown by conn id

* fix import

* remove unused lable

* reset  connection on exception

* add additional metrics for skipped messages

* check for nil in secure.close
2020-06-19 11:29:43 -06:00
Dmitriy Ryajov 5960d42c50
remove casts from (#203) 2020-06-02 20:21:11 -06:00
Dmitriy Ryajov bb8bff2195
add sparse message propagation tests to gossipsub (#202)
* add sparce tests to gossipsub

* add send hooks

* remove `all`
2020-06-02 17:53:38 -06:00
Dmitriy Ryajov 20c68a2018 use all() for futures and track connections 2020-06-02 09:10:27 -06:00
Dmitriy Ryajov 6112de746d remove unneeded changes 2020-06-02 09:10:27 -06:00
Dmitriy Ryajov 5f704e6825 rust interop fixes 2020-06-02 09:10:27 -06:00
Dmitriy Ryajov 7b6e1c0688
Gossipsub interop (#189)
* interop fixes

* add custom messageid provider and fix seqno

* use ECDSA for speed

* adding messageid tests

* breakout from publish loop

* addressing review comments

* remove unneded var

* dont stop broadcasting on failed peers
2020-05-27 12:33:49 -06:00
Dmitriy Ryajov 9132f16927
gossipsub fixes (#186) 2020-05-21 14:24:20 -06:00
Dmitriy Ryajov ba53c08b3c
Track incoming connections (#181)
* call write until all is written out

* wip: rework with proper half-closed

* add eof and closed handling

* wip

* close connection on chronos close

* don't use read

* make noise work again

* don't reraise just yet

* fixes after backporting

* remove on transport close cleanup

* revert back allread

* rust interop fixes

* read from stream

* inc count before closing

* rebasing master

* store incomming connections

* fix merge

* remove unneeded changes

* use internal close flag to indicate disposal
2020-05-21 11:33:48 -06:00
Dmitriy Ryajov 1819502fb5
Cleanup - tests and logging (#178)
* make async for proper exception handling

* tryAndWarn msg messes up Exception msg

* misc: comment out tracker dumps

* cleanup mplex tests

* more informative errors

* give CI time to run

* revert change, bacause it causes races
2020-05-18 07:49:49 -06:00
Giovanni Petrantoni 7dcb807f64
Crypto utilities resultification (#150) 2020-05-18 07:25:55 +02:00
Giovanni Petrantoni 100d6ef595 Raise expiration time in gossipsub fanout test for slow CI 2020-05-15 11:01:33 +09:00
Jacek Sieka 3053f03814 fix varint issues
* fixes #111
2020-05-11 09:12:23 -06:00
Jacek Sieka ccd019b328
use stream directly in chronosstream (#163)
* use stream directly in chronosstream

for now, chronos.AsyncStream is not used to provide any features on top
of chronos.Stream, so in order to simplify the code, chronosstream can
be used directly.

In particular, the exception handling is broken in the current
chronosstream - opening and closing the stream is simplified this way as
well.

A future implementation that actually takes advantage of the AsyncStream
features would wrap AsyncStream instead as a separate lpstream
implementation, leaving this one as-is.

* work around chronos exception type issue
2020-05-08 22:10:06 +02:00
Giovanni Petrantoni c889224012 Add PubSub observer+ hooks (they can modify as well) 2020-05-08 13:31:52 -06:00
Jacek Sieka 330da51819
removals (#159)
* remove unused stream methods
* reimplement some of them with proc's
* remove broken tests
* Error->Defect for defect
* warning fixes
2020-05-06 18:31:47 +02:00
Dmitriy Ryajov 6da4d2af48
Pubsub signatures flags (#161)
* add verify signature flag

* add sign flag to enable/disable msg signing

* moving internal tests out to their own file

* cleanup nimble file

* remove unneeded tests

* move pubsub tests out

* fix tests
2020-05-06 11:26:08 +02:00
Giovanni Petrantoni 4c6a123d31
Add chronos trackers and used them to sanitize resource disposal (#131)
* Add chronos trackers and used them to sanitize resource disposal

* Chronos trackers for transport tests wip

* No more chronos leaks in testtransport

* Make tcp transport and test more robust when closing

* Test async leaking tracking wip

* Fix a regression in wire connect

* Add chronos trackers to more tests and sanitize resource closure

* Wip fixing floodsub tests

* Floodsub wip

* Made floodsub basically deterministic, hit a nim bug with captures tho

* Wrap up floodsub tests refactor

* Wrapping up

* Add allFuturesThrowing utility

* Fix missing allFuturesThrowing in noise tests!

* Make tests green

* attempt fixing gossipsub failing cases

* Make sure to check also fanout in waitSub

* More verbose traces

* Gossipsub test improvments

* Refactor TcpTransport remove asyncCheck

* Add Connection trackers

* Add stricter connection tracking, wip mplex fix

* More asynccheck removal, in order to avoid connection leaks

* bump chronicles requirement

* Enable tracker dump to check CI output

* Wait for more futures in testmplex

* Remove tracker dump messages

* add tryAndWarn utility, fix mplex issue with go interop

* All allFuturesThrowing to directchat too

* make sure to cleanup on transport close
2020-04-21 10:24:42 +09:00
Dmitriy Ryajov 5a00510b1f wip: increase timeout 2020-02-25 17:52:08 -06:00
Dmitriy Ryajov eb49d4b218 no empty proto dials and add connect method 2020-02-25 17:52:08 -06:00
Dmitriy Ryajov fbcef69891 implicitelly dial pubsub if enabled 2020-02-21 09:21:06 -06:00
Dmitriy Ryajov bf70428316 revert tests order back 2020-02-16 11:31:35 -06:00
Dmitriy Ryajov 9023bf786d remove sleeps 2020-02-16 11:31:35 -06:00
Dmitriy Ryajov acdaeb8f5d working out synchronization issues 2020-02-16 11:31:35 -06:00
Dmitriy Ryajov f6c4d2130a don't forget to await for switch to close 2020-02-16 11:31:35 -06:00
Dmitriy Ryajov 934c858542 increase timeout to allow floodsub to finish 2020-02-16 11:31:35 -06:00
Dmitriy Ryajov 681d324a10 increase timeouts to acomodate for CI runs 2020-02-16 11:31:35 -06:00
Dmitriy Ryajov 7f8eb0272e cleanup and fix tests 2020-02-16 11:31:35 -06:00
Dmitriy Ryajov d42833947a fix gossipsub mesh test 2020-01-09 21:59:27 -06:00
Dmitriy Ryajov bb0430e62b remove unnecesary clear timers call 2020-01-09 14:46:00 -06:00
Dmitriy Ryajov d5f92663bc make tests pass 2020-01-09 12:55:21 -06:00
Dmitriy Ryajov 0fb1f1c5b8 strenghten pubsub interop testing 2019-12-24 10:35:35 -06:00