Chrysostomos Nanakos 71bd679365
fix(discovery): prevent premature node eviction from routing table
The findNode and findNodeFast operations were using the default aggressive
removal threshold (1.0) when timing out, while other timeout operations
(ping, talkReq, getProviders) correctly used NoreplyRemoveThreshold (0.5).

This inconsistency caused nodes with excellent reliability (1.0) to be removed
during heavy load scenarios when findNode/findNodeFast operations timed out,
even though the nodes were still healthy and simply slow to respond.

Changed findNode and findNodeFast timeout paths to use NoreplyRemoveThreshold,
ensuring consistent and more tolerant behavior across all timeout scenarios.
This aligns with Kademlia's recommendation to be conservative about removing
nodes, especially during temporary network congestion.

Evidence from logs showing the issue:

DBG - Node added to routing table           topics="discv5 routingtable" tid=1 n=1ff*7a561e:10.244.0.208:6890
DBG - bucket                                topics="discv5" tid=1 depth=0 len=2 standby=0
DBG - node                                  topics="discv5" tid=1 n=130*db8a1b:10.244.2.207:6890 rttMin=1 rttAvg=2 reliability=1.0
DBG - node                                  topics="discv5" tid=1 n=1ff*7a561e:10.244.0.208:6890 rttMin=1 rttAvg=14 reliability=1.0
DBG - Node removed from routing table       topics="discv5 routingtable" tid=1 n=1ff*7a561e:10.244.0.208:6890
DBG - Total nodes in discv5 routing table   topics="discv5" tid=1 total=1
DBG - bucket                                topics="discv5" tid=1 depth=0 len=1 standby=0
DBG - node                                  topics="discv5" tid=1 n=130*db8a1b:10.244.2.207:6890 rttMin=1 rttAvg=165 reliability=0.957
DBG - Node removed from routing table       topics="discv5 routingtable" tid=1 n=130*db8a1b:10.244.2.207:6890
DBG - Total nodes in discv5 routing table   topics="discv5" tid=1 total=0

First entry shows a node with perfect reliability (1.0) and 14ms RTT being
removed. Second shows a node with 95.7% reliability also being evicted.

Signed-off-by: Chrysostomos Nanakos <chris@include.gr>
2025-12-16 14:53:41 +02:00
..
2025-12-11 13:47:10 +01:00
2024-08-20 11:04:48 +02:00
2022-09-12 17:13:34 -06:00
2024-08-20 11:04:48 +02:00
2024-08-20 11:04:48 +02:00