mirror of
https://github.com/logos-storage/logos-storage-nim-dht.git
synced 2026-01-02 13:33:08 +00:00
The findNode and findNodeFast operations were using the default aggressive removal threshold (1.0) when timing out, while other timeout operations (ping, talkReq, getProviders) correctly used NoreplyRemoveThreshold (0.5). This inconsistency caused nodes with excellent reliability (1.0) to be removed during heavy load scenarios when findNode/findNodeFast operations timed out, even though the nodes were still healthy and simply slow to respond. Changed findNode and findNodeFast timeout paths to use NoreplyRemoveThreshold, ensuring consistent and more tolerant behavior across all timeout scenarios. This aligns with Kademlia's recommendation to be conservative about removing nodes, especially during temporary network congestion. Evidence from logs showing the issue: DBG - Node added to routing table topics="discv5 routingtable" tid=1 n=1ff*7a561e:10.244.0.208:6890 DBG - bucket topics="discv5" tid=1 depth=0 len=2 standby=0 DBG - node topics="discv5" tid=1 n=130*db8a1b:10.244.2.207:6890 rttMin=1 rttAvg=2 reliability=1.0 DBG - node topics="discv5" tid=1 n=1ff*7a561e:10.244.0.208:6890 rttMin=1 rttAvg=14 reliability=1.0 DBG - Node removed from routing table topics="discv5 routingtable" tid=1 n=1ff*7a561e:10.244.0.208:6890 DBG - Total nodes in discv5 routing table topics="discv5" tid=1 total=1 DBG - bucket topics="discv5" tid=1 depth=0 len=1 standby=0 DBG - node topics="discv5" tid=1 n=130*db8a1b:10.244.2.207:6890 rttMin=1 rttAvg=165 reliability=0.957 DBG - Node removed from routing table topics="discv5 routingtable" tid=1 n=130*db8a1b:10.244.2.207:6890 DBG - Total nodes in discv5 routing table topics="discv5" tid=1 total=0 First entry shows a node with perfect reliability (1.0) and 14ms RTT being removed. Second shows a node with 95.7% reliability also being evicted. Signed-off-by: Chrysostomos Nanakos <chris@include.gr>