Chrysostomos Nanakos 71bd679365
fix(discovery): prevent premature node eviction from routing table
The findNode and findNodeFast operations were using the default aggressive
removal threshold (1.0) when timing out, while other timeout operations
(ping, talkReq, getProviders) correctly used NoreplyRemoveThreshold (0.5).

This inconsistency caused nodes with excellent reliability (1.0) to be removed
during heavy load scenarios when findNode/findNodeFast operations timed out,
even though the nodes were still healthy and simply slow to respond.

Changed findNode and findNodeFast timeout paths to use NoreplyRemoveThreshold,
ensuring consistent and more tolerant behavior across all timeout scenarios.
This aligns with Kademlia's recommendation to be conservative about removing
nodes, especially during temporary network congestion.

Evidence from logs showing the issue:

DBG - Node added to routing table           topics="discv5 routingtable" tid=1 n=1ff*7a561e:10.244.0.208:6890
DBG - bucket                                topics="discv5" tid=1 depth=0 len=2 standby=0
DBG - node                                  topics="discv5" tid=1 n=130*db8a1b:10.244.2.207:6890 rttMin=1 rttAvg=2 reliability=1.0
DBG - node                                  topics="discv5" tid=1 n=1ff*7a561e:10.244.0.208:6890 rttMin=1 rttAvg=14 reliability=1.0
DBG - Node removed from routing table       topics="discv5 routingtable" tid=1 n=1ff*7a561e:10.244.0.208:6890
DBG - Total nodes in discv5 routing table   topics="discv5" tid=1 total=1
DBG - bucket                                topics="discv5" tid=1 depth=0 len=1 standby=0
DBG - node                                  topics="discv5" tid=1 n=130*db8a1b:10.244.2.207:6890 rttMin=1 rttAvg=165 reliability=0.957
DBG - Node removed from routing table       topics="discv5 routingtable" tid=1 n=130*db8a1b:10.244.2.207:6890
DBG - Total nodes in discv5 routing table   topics="discv5" tid=1 total=0

First entry shows a node with perfect reliability (1.0) and 14ms RTT being
removed. Second shows a node with 95.7% reliability also being evicted.

Signed-off-by: Chrysostomos Nanakos <chris@include.gr>
2025-12-16 14:53:41 +02:00
2024-12-09 18:56:18 +01:00
2023-07-12 14:58:29 -07:00
2025-12-15 13:46:04 +01:00

A DHT implementation for Logos Storage

License: Apache License: MIT Stability: experimental CI (GitHub Actions) codecov

This DHT implementation is aiming to provide a DHT for Logos Storage with the following properties

  • flexible secure transport usage with
    • fast UDP based operation
    • eventual fallback to TCP-based operation (maybe though libp2p)
    • eventually support operation on top of libp2p
  • flexible message encoding that plays well with the above transports
  • provide node lookup, content storage/lookup, and provider storage/lookup operations

Current implementation is based on nim-eth's Discovery v5 implementation.

Base files were copied from status-im/nim-eth@779d767b024175a51cf74c79ec7513301ebe2f46

Building

This repo is setup to use Nimble lockfiles. This requires Nimble 0.14+ which isn't installed by default when this was written. If nimble -v reports 0.13.x then you will need to install Nimble 0.14. Note that using Nimble 0.14 changes how Nimble behaves!

Nimble 0.14 can be install by:

nimble install nimble@0.14.2

After this you can setup your Nimble environment. Note that this will build the pinned version of Nim! The first run can take ~15 minutes.

nimble setup # creates a nimble.paths used for rest of Nimble commands
nimble testAll

You can also run tasks directly:

nim testAll
Description
A DHT based on Discv5 with libp2p provider records support
Readme
Languages
Nim 100%