Commit Graph

67 Commits

Author SHA1 Message Date
Jacek Sieka cd1c68dbc5
avoid send deadlock by not allowing send to block (#342)
* avoid send deadlock by not allowing send to block

* handle message issues more consistently
2020-09-01 09:33:03 +02:00
Dmitriy Ryajov d3182c4dba
No raise send (#339)
* dont raise in send

* check that the lock is acquire on release
2020-08-20 20:50:33 -06:00
Jacek Sieka eb13845f65 work around send that may raise
`send` can raise exceptions that together with asyncCheck will
crash NBC
2020-08-19 14:25:30 +03:00
Jacek Sieka f46bf0faa4
remove send lock (#334)
* remove send lock

When mplex receives data it will block until a reader has processed the
data. Thus, when a large message is received, such as a gossipsub
subscription table, all of mplex will be blocked until all reading is
finished.

However, if at the same time a `dial` to establish a gossipsub send
connection is ongoing, that `dial` will be blocked because mplex is no
longer reading data - specifically, it might indeed be the connection
that's processing the previous data that is waiting for a send
connection.

There are other problems with the current code:
* If an exception is raised, it is not necessarily raised for the same
connection as `p.sendConn`, so resetting `p.sendConn` in the exception
handling is wrong
* `p.isConnected` is checked before taking the lock - thus, if it
returns false, a new dial will be started. If a new task enters `send`
before dial is finished, it will also determine `p.isConnected` is
false, then get stuck on the lock - when the previous task finishes and
releases the lock, the new task will _also_ dial and thus reset
`p.sendConn` causing a leak.

* prefer existing connection

simplifies flow
2020-08-17 12:38:27 +02:00
Jacek Sieka b12145dff7
avoid crash when subscribe is received (#333)
...by making subscribeTopic synchronous, avoiding a peer table lookup
completely.

rebalanceMesh will be called a second later - it's fine
2020-08-17 12:10:22 +02:00
Jacek Sieka ab864fc747
logging cleanups and small fixes (#331) 2020-08-15 21:50:31 +02:00
Dmitriy Ryajov b76b3e0e9b
Rework pubsub (#322)
* move pubsub of off switch, pass switch into pubsub

* use join on lpstreams

* properly cleanup up failed peers

* fix tests

* fix peertable hasPeerId

* fix tests

* rework sending, remove helpers from pubsubpeer, unify in broadcast

* further split broadcast into send

* use send where appropriate

* use formatIt

* improve trace

Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-08-11 18:05:49 -06:00
Jacek Sieka c6c0c152c0
Dial peerid (#308)
* prefer PeerID in switch api

This avoids ref issues like ref identity and nil

* use existing peerinfo instance if possible

* remove secureCodec

there may be multiple connections per peerinfo with different codecs

* avoid some extra async::
2020-08-06 09:29:27 +02:00
Giovanni Petrantoni 9bbe5e4841
Fix subclass calls to handleDisconnect (#314)
* Fix subclass calls to handleDisconnect

* add peer ID to nil peer debug message
2020-08-06 11:12:52 +09:00
Ștefan Talpalaru 843d32f8db
put expensive metrics under a Nim define (#310) 2020-08-04 17:27:59 -06:00
Dmitriy Ryajov 980764774e
pubsub timeouts tuning (#295)
* add finegrained timeouts to pubsub

* use 10 millis timeout in tests

* finalization

* revert timeouts

* use `atEof` for reads

* adjust timeouts and use atEof for reads

* use atEof for reads

* set isEof flag

* no backoff for pubsub streams

* temp timer increase, make macos finalize

* don't call `subscribePeer` in libp2p anymore

* more traces

* leak tests

* lower timeouts

* handle exceptions in control message

* don't use `cancelAndWait`

* handle exceptions in helpers

* wip

* don't send empty messages

* check for leaks properly

* don't use cancelAndWait

* don't await subscribption sends

* remove subscrivePeer calls from switch

* trying without the hooks again
2020-08-02 23:20:11 -06:00
Jacek Sieka e655a510cd
misc cleanups (#303) 2020-08-02 12:22:49 +02:00
Dmitriy Ryajov f7fdf31365
Pubsub lifetime (#284)
* lifecycle hooks

* tests

* move trace after closed check

* restore 1 second heartbeat

* await close event

* fix tests

* print direction string

* more trace logging

* add pubsub monitor

* add log scope

* adjust idle timeout

* add exc.msg to trace
2020-07-27 13:33:51 -06:00
Giovanni Petrantoni 3b088f8980
Fix some unsubscribe issues and add unsubscribeAll helper (#282)
* Fix some unsub issues and add unsuball helper

* batch sendprune in unsubscribe methods

* add unsubscribeAll for floodsub
2020-07-20 10:16:13 -06:00
Dmitriy Ryajov 94196fee71
Connections and pubsub peers cleanup (#279)
* better peer tracking and cleanup

* check if peer and conn is nil

* test name

* make timeout more agressive

* rename method for better clarity
2020-07-17 13:46:24 -06:00
Dmitriy Ryajov 0348773ec9
Connection manager (#277)
* splitting out connection management

* wip

* wip conn mngr tests

* set peerinfo in contructor

* comments and documentation

* tests

* wip

* add `None` to detect untagged connections

* use `PeerID` to index connections

* fix tests

* remove useless equality
2020-07-17 09:36:48 -06:00
Jacek Sieka 170685f9c6
gossipsub fixes (#276)
* graft up to D peers
* fix logging so it's clear who is grafting/pruning who
* clear fanout when grafting
2020-07-16 21:26:57 +02:00
Jacek Sieka c76152f2c1
Simplify send (#271)
* PubSubPeer.send single message

* gossipsub: simplify send further
2020-07-16 12:06:57 +02:00
Dmitriy Ryajov f35b8999b3
some light cleanup for pub/gossip sub (#273)
* move peer table out to its own file

* move peer table

* cleanup `==` and add one to peerinfo

* add peertable

* missed equality check
2020-07-15 13:18:55 -06:00
Giovanni Petrantoni d7bab37119
Fix gossip messages seqno according to spec (#253)
* Fix gossip messages seqno according to spec

* Add peers back to gossipsub table, slow down heartbeat

* Revert "Add peers back to gossipsub table, slow down heartbeat"

This reverts commit 01e2e62172.

* make seqno a threadvar, remove from peerinfo

* seqno refactor, into pubsub
2020-07-14 21:51:33 -06:00
Jacek Sieka 87e58c1c8d
metrics: one more pubsub peers fix 2020-07-13 16:16:46 +02:00
Jacek Sieka c7895ccc52
metrics: fix pubsub_peers add metric 2020-07-13 16:15:27 +02:00
Giovanni Petrantoni fcda0f6ce1
PubSubPeer tables refactor (#263)
* refactor peer tables

* tests fixing

* override PubSubPeer equality

* fix pubsubpeer comparison
2020-07-13 15:32:38 +02:00
Dmitriy Ryajov 4c815d75e7
More gossip cleanup (#257)
* more cleanup

* correct pubsub peer count

* close the stream first

* handle cancelation

* fix tests

* fix fanout ttl

* merging master

* remove `withLock` as it conflicts with stdlib

* fix trace build

Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-07-09 14:21:47 -06:00
Giovanni Petrantoni f9e0a1f069 CI fix handleDisconnect (pubsub) 2020-07-09 13:56:59 +09:00
Giovanni Petrantoni 9b8b159abb Remove other spurious getStacktrace in pubsub traces 2020-07-09 13:19:34 +09:00
Giovanni Petrantoni 4698f41a91 Remove stacktrace logging from pubsub connect 2020-07-09 12:23:03 +09:00
Dmitriy Ryajov a52763cc6d
fix publishing (#250)
* use var semantics to optimize table access

* wip... lvalues don't work properly sadly...

* big publish refactor, replenish and balance

* fix internal tests

* use g.peers for fanout (todo: don't include flood peers)

* exclude non gossip from fanout

* internal test fixes

* fix flood tests

* fix test's trypublish

* test interop fixes

* make sure to not remove peers from gossip table

* restore old replenishFanout

* cleanups

* Cleanup resources (#246)

* consolidate reading in lpstream

* remove debug echo

* tune log level

* add channel cleanup and cancelation handling

* cancelation handling

* cancelation handling

* cancelation handling

* cancelation handling

* cleanup and cancelation handling

* cancelation handling

* cancelation

* tests

* rename isConnected to connected

* remove testing trace

* comment out debug stacktraces

* explicit raises

* restore trace vs debug in gossip

* improve fanout replenish behavior further

* cleanup stale peers more eaguerly

* synchronize connection cleanup and small refactor

* close client first and call parent second

* disconnect failed peers on publish

* check for publish result

* fix tests

* fix tests

* always call close

Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-07-07 18:33:05 -06:00
Giovanni Petrantoni ec00c7fc50
Peer resultification and defect only (#245)
* Peer resultification and defect only

* Fixing some tests

* test fixes

* Rename peer into peerid

* better result error message in identify

* further merge fixes
2020-07-01 08:25:09 +02:00
Dmitriy Ryajov c788a6a3c0
Cleanup resources (#246)
* consolidate reading in lpstream

* remove debug echo

* tune log level

* add channel cleanup and cancelation handling

* cancelation handling

* cancelation handling

* cancelation handling

* cancelation handling

* cleanup and cancelation handling

* cancelation handling

* cancelation

* tests

* rename isConnected to connected

* remove testing trace

* comment out debug stacktraces

* explicit raises
2020-06-29 09:15:31 -06:00
Jacek Sieka aa6756dfe0
allow message id provider to be specified (#243)
* don't send public key in message when not signing (information leak)
* don't run rebalance if there are peers in gossip (see #242)
* don't crash randomly on bad peer id from remote
2020-06-28 09:56:38 -06:00
Dmitriy Ryajov 5b28e8c488
Cleanup lpstream, Connection and BufferStream (#228)
* count published messages

* don't call `switch.dial` in `subscribeToPeer`

* don't use delegation in connection

* move connection out to own file

* don't breakout on reset

* make sure to call close on secured conn

* add lpstream tracing

* don't breackdown by conn id

* fix import

* remove unused lable

* reset  connection on exception

* add additional metrics for skipped messages

* check for nil in secure.close
2020-06-19 11:29:43 -06:00
Dmitriy Ryajov fe828d87d8
count published messages (#224) 2020-06-16 22:14:02 -06:00
Dmitriy Ryajov ac04ca6e31
make sure keys exist and more metrics (#215) 2020-06-11 20:20:58 -06:00
Dmitriy Ryajov 6b196ad7b4
remove pubsub peer on disconnect (#212)
* remove pubsub peer on disconnect

* make sure lock is aquired

* add $

* count upgrades/dials/disconnects
2020-06-11 08:45:59 -06:00
Viktor Kirilov 1afec627c2
proper name for topics so that we can filter dynamically using chronicles (#210)
* proper name for topics so that we can filter dynamically using chronicles

* lowercase
2020-06-10 10:48:01 +02:00
Giovanni Petrantoni 82b4ed8f44 use declareCounter rather then gauge for certain metrics 2020-06-07 16:41:23 +09:00
Giovanni Petrantoni a6a2a81711
Start adding some metrics to pubsub (#192)
* Start adding some metrics to pubsub

In order to visualize it's functionality
Still WIP

* more metrics

* add per topic metrics

* finishup with requested metrics

* add a metrisServer define to start local server

* PR fixes and cleanup
2020-06-07 09:15:21 +02:00
Dmitriy Ryajov 293b7da295 typo 2020-06-02 09:10:27 -06:00
Dmitriy Ryajov 93e5805c01 better exception handling 2020-06-02 09:10:27 -06:00
Dmitriy Ryajov d4bdb42046 gossipsub fixes 2020-06-02 09:10:27 -06:00
Dmitriy Ryajov 7b6e1c0688
Gossipsub interop (#189)
* interop fixes

* add custom messageid provider and fix seqno

* use ECDSA for speed

* adding messageid tests

* breakout from publish loop

* addressing review comments

* remove unneded var

* dont stop broadcasting on failed peers
2020-05-27 12:33:49 -06:00
Dmitriy Ryajov 9132f16927
gossipsub fixes (#186) 2020-05-21 14:24:20 -06:00
Dmitriy Ryajov ba53c08b3c
Track incoming connections (#181)
* call write until all is written out

* wip: rework with proper half-closed

* add eof and closed handling

* wip

* close connection on chronos close

* don't use read

* make noise work again

* don't reraise just yet

* fixes after backporting

* remove on transport close cleanup

* revert back allread

* rust interop fixes

* read from stream

* inc count before closing

* rebasing master

* store incomming connections

* fix merge

* remove unneeded changes

* use internal close flag to indicate disposal
2020-05-21 11:33:48 -06:00
Jacek Sieka 69abf5097d
handle a few exceptions (#170)
* handle a few exceptions

Some of these are maybe too aggressive, but in return, they'll log
their exception - more refactoring needed to sort this out - right now
we get crashes on unhandled exceptions of unknown origin

* during connection setup
* while closing channels
* while processing pubsubs

* catch exceptions that are raised and don't try to catch exceptions that are not raised

* propagate cancellederror

* one more

* more

* more

* make interop tests less fragile

* Raise expiration time in gossipsub fanout test for slow CI

Co-authored-by: Dmitriy Ryajov <dryajov@gmail.com>
Co-authored-by: Giovanni Petrantoni <giovanni@fragcolor.xyz>
2020-05-14 21:56:56 -06:00
Giovanni Petrantoni c889224012 Add PubSub observer+ hooks (they can modify as well) 2020-05-08 13:31:52 -06:00
Jacek Sieka 330da51819
removals (#159)
* remove unused stream methods
* reimplement some of them with proc's
* remove broken tests
* Error->Defect for defect
* warning fixes
2020-05-06 18:31:47 +02:00
Dmitriy Ryajov 6da4d2af48
Pubsub signatures flags (#161)
* add verify signature flag

* add sign flag to enable/disable msg signing

* moving internal tests out to their own file

* cleanup nimble file

* remove unneeded tests

* move pubsub tests out

* fix tests
2020-05-06 11:26:08 +02:00
Giovanni Petrantoni 303ec297da
Start removing allFutures (#125)
* Start removing allFutures

* More allfutures removal

* Complete allFutures removal except legacy and tests

* Introduce table values copies to prevent error

* Switch to allFinished

* Resolve TODOs in flood/gossip

* muxer handler, log and re-raise

* Add a common and flexible way to check multiple futures
2020-04-11 13:08:25 +09:00
Dmitriy Ryajov 6cbcc7859e reduse usssage of asyncCheck 2020-04-07 12:16:59 -06:00