mirror of https://github.com/status-im/nim-eth.git
discv4: Fix Kademlia crash when trying to sync (#342)
Fixes status-im/nim-eth#341, status-im/nimbus-eth1#489. When using discv4 (Kademlia) to find peers, there is a crash after a few minutes. It occurs for most of us on Eth1 mainnet, and everyone on Ropsten. The cause is `findNodes` being called twice in succession to the same peer, within about 5 seconds of each other. ("About" 5 seconds, because Chronos does not guarantee to run the timeout branch at a particular time, due to queuing and clock reading delays.) Then `findNodes` sends a duplicate message to the peer and calls `waitNeighbours` to listen for the reply. There's already a `waitNeighbours` callback in a shared table, so that function hits an assert failure. Ignoring the assert would be wrong as it would break timeout logic, and sending `FindNodes` twice in rapid succession also makes us a bad peer. As a simple workaround, just skip `findNodes` in this state and return a fake empty `Neighbours` reply. This is a bit of a hack as `findNodes` should not be called like this; there's a logic error at a higher level. But it works. Tested for about 4 days constant operation on Ropsten. The crash which occured every few minutes no longer occurs, and discv4 keeps working. Signed-off-by: Jamie Lokier <jamie@shareable.org>
This commit is contained in:
parent
c5dace27ca
commit
e4b4b7f4af
|
@ -371,6 +371,13 @@ proc lookup*(k: KademliaProtocol, nodeId: NodeId): Future[seq[Node]] {.async.} =
|
||||||
var nodesSeen = initHashSet[Node]()
|
var nodesSeen = initHashSet[Node]()
|
||||||
|
|
||||||
proc findNode(nodeId: NodeId, remote: Node): Future[seq[Node]] {.async.} =
|
proc findNode(nodeId: NodeId, remote: Node): Future[seq[Node]] {.async.} =
|
||||||
|
if remote in k.neighboursCallbacks:
|
||||||
|
# Sometimes findNode is called while another findNode is already in flight.
|
||||||
|
# It's a bug when this happens, and the logic should probably be fixed
|
||||||
|
# elsewhere. However, this small fix has been tested and proven adequate.
|
||||||
|
debug "Peer in k.neighboursCallbacks already", peer = remote
|
||||||
|
result = newSeqOfCap[Node](BUCKET_SIZE)
|
||||||
|
return
|
||||||
k.wire.sendFindNode(remote, nodeId)
|
k.wire.sendFindNode(remote, nodeId)
|
||||||
var candidates = await k.waitNeighbours(remote)
|
var candidates = await k.waitNeighbours(remote)
|
||||||
if candidates.len == 0:
|
if candidates.len == 0:
|
||||||
|
|
Loading…
Reference in New Issue