nim-eth/eth/p2p
Jamie Lokier e4b4b7f4af
discv4: Fix Kademlia crash when trying to sync (#342)
Fixes status-im/nim-eth#341, status-im/nimbus-eth1#489.

When using discv4 (Kademlia) to find peers, there is a crash after a few
minutes.  It occurs for most of us on Eth1 mainnet, and everyone on Ropsten.

The cause is `findNodes` being called twice in succession to the same peer,
within about 5 seconds of each other.  ("About" 5 seconds, because Chronos does
not guarantee to run the timeout branch at a particular time, due to queuing
and clock reading delays.)

Then `findNodes` sends a duplicate message to the peer and calls
`waitNeighbours` to listen for the reply.  There's already a `waitNeighbours`
callback in a shared table, so that function hits an assert failure.

Ignoring the assert would be wrong as it would break timeout logic, and sending
`FindNodes` twice in rapid succession also makes us a bad peer.

As a simple workaround, just skip `findNodes` in this state and return a fake
empty `Neighbours` reply.  This is a bit of a hack as `findNodes` should not be
called like this; there's a logic error at a higher level.  But it works.

Tested for about 4 days constant operation on Ropsten.  The crash which occured
every few minutes no longer occurs, and discv4 keeps working.

Signed-off-by: Jamie Lokier <jamie@shareable.org>
2021-04-02 23:29:02 +02:00
..
discoveryv5 Use chronos http server for dcli metrics and remove insecure compile flag (#343) 2021-04-02 17:29:38 +02:00
private Add raises annotations to make exception tracking work (#336) 2021-03-24 12:52:09 +01:00
rlpx_protocols Add raises annotation to the FilterMsgHandler proc type (#337) 2021-03-25 15:06:12 +01:00
auth.nim use bearssl rng throughout (#265) 2020-07-07 10:56:26 +02:00
blockchain_sync.nim cleanups (#226) 2020-04-18 10:17:59 +02:00
blockchain_utils.nim Cleanup unneeded check in getBlockHeaders 2019-07-09 17:06:20 +02:00
bootnodes.nim add goerli bootnodes 2020-06-19 12:15:05 +03:00
discovery.nim use bearssl rng throughout (#265) 2020-07-07 10:56:26 +02:00
ecies.nim Fix LSWAP problem. (#275) 2020-07-10 23:30:34 +02:00
enode.nim Secp more refactor (#211) 2020-04-06 18:24:15 +02:00
kademlia.nim discv4: Fix Kademlia crash when trying to sync (#342) 2021-04-02 23:29:02 +02:00
mock_peers.nim Secp more refactor (#211) 2020-04-06 18:24:15 +02:00
p2p_backends_helpers.nim Remove some unused code from NBC by making it RLPx-specific 2020-10-05 17:28:58 +03:00
p2p_protocol_dsl.nim Fix compilation for 1.4 2020-10-16 20:06:59 +03:00
p2p_tracing.nim Deal with bit rot in the p2p tracing support 2020-03-18 20:43:53 +02:00
p2p_tracing_ctail_plugin.nim Moved eth-p2p to eth 2019-02-05 17:40:29 +02:00
peer_pool.nim turn networkId into distinct uint 2021-02-13 17:43:17 +07:00
rlpx.nim Add raises annotations to make exception tracking work (#336) 2021-03-24 12:52:09 +01:00
rlpxcrypt.nim cleanup (#238) 2020-05-21 11:58:19 +02:00
sync.nim Rebrand asyncdispatch2 to chronos (#2) 2019-02-06 17:01:04 +01:00
whispernodes.nim Add Status test nodes (#216) 2020-04-08 15:21:48 +02:00