Jamie Lokier e4b4b7f4af
discv4: Fix Kademlia crash when trying to sync (#342)
Fixes status-im/nim-eth#341, status-im/nimbus-eth1#489.

When using discv4 (Kademlia) to find peers, there is a crash after a few
minutes.  It occurs for most of us on Eth1 mainnet, and everyone on Ropsten.

The cause is `findNodes` being called twice in succession to the same peer,
within about 5 seconds of each other.  ("About" 5 seconds, because Chronos does
not guarantee to run the timeout branch at a particular time, due to queuing
and clock reading delays.)

Then `findNodes` sends a duplicate message to the peer and calls
`waitNeighbours` to listen for the reply.  There's already a `waitNeighbours`
callback in a shared table, so that function hits an assert failure.

Ignoring the assert would be wrong as it would break timeout logic, and sending
`FindNodes` twice in rapid succession also makes us a bad peer.

As a simple workaround, just skip `findNodes` in this state and return a fake
empty `Neighbours` reply.  This is a bit of a hack as `findNodes` should not be
called like this; there's a logic error at a higher level.  But it works.

Tested for about 4 days constant operation on Ropsten.  The crash which occured
every few minutes no longer occurs, and discv4 keeps working.

Signed-off-by: Jamie Lokier <jamie@shareable.org>
2021-04-02 23:29:02 +02:00
..
2020-12-18 19:30:53 +02:00
2020-04-20 20:14:39 +02:00
2019-02-05 12:10:36 +02:00
2019-02-05 12:45:09 +02:00
2020-07-07 10:56:26 +02:00
2021-02-13 17:43:17 +07:00
2020-04-20 20:14:39 +02:00
2019-02-05 14:01:10 +02:00