nimbus-eth1/nimbus/sync/snap/worker/com/get_trie_nodes.nim

182 lines
5.6 KiB
Nim
Raw Normal View History

Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
# Nimbus
# Copyright (c) 2018-2021 Status Research & Development GmbH
# Licensed and distributed under either of
# * Apache License, version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
# http://www.apache.org/licenses/LICENSE-2.0)
# * MIT license ([LICENSE-MIT](LICENSE-MIT) or
# http://opensource.org/licenses/MIT)
# at your option. This file may not be copied, modified, or distributed
# except according to those terms.
{.push raises: [].}
import
std/[options, sequtils],
chronos,
eth/[common, p2p],
"../../.."/[protocol, protocol/trace_config],
"../.."/[constants, range_desc, worker_desc],
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
./com_error
logScope:
topics = "snap-fetch"
type
# SnapTrieNodes = object
# nodes*: seq[Blob]
GetTrieNodes* = object
leftOver*: seq[SnapTriePaths] ## Unprocessed data
nodes*: seq[NodeSpecs] ## `nodeKey` field unused with `NodeSpecs`
ProcessReplyStep = object
leftOver: SnapTriePaths # Unprocessed data sets
nodes: seq[NodeSpecs] # Processed nodes
topInx: int # Index of first unprocessed item
# ------------------------------------------------------------------------------
# Private functions
# ------------------------------------------------------------------------------
proc getTrieNodesReq(
buddy: SnapBuddyRef;
stateRoot: Hash256;
paths: seq[SnapTriePaths];
pivot: string;
): Future[Result[Option[SnapTrieNodes],void]]
{.async.} =
let
peer = buddy.peer
try:
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
let reply = await peer.getTrieNodes(
stateRoot, paths, fetchRequestBytesLimit)
return ok(reply)
except CatchableError as e:
let error {.used.} = e.msg
trace trSnapRecvError & "waiting for GetByteCodes reply", peer, pivot,
error
return err()
proc processReplyStep(
paths: SnapTriePaths;
nodeBlobs: seq[Blob];
startInx: int
): ProcessReplyStep =
## Process reply item, return unprocessed remainder
# Account node request
if paths.slotPaths.len == 0:
if nodeBlobs[startInx].len == 0:
result.leftOver.accPath = paths.accPath
else:
result.nodes.add NodeSpecs(
partialPath: paths.accPath,
data: nodeBlobs[startInx])
result.topInx = startInx + 1
return
# Storage paths request
let
nSlotPaths = paths.slotPaths.len
maxLen = min(nSlotPaths, nodeBlobs.len - startInx)
# Fill up nodes
for n in 0 ..< maxlen:
let nodeBlob = nodeBlobs[startInx + n]
if 0 < nodeBlob.len:
result.nodes.add NodeSpecs(
partialPath: paths.slotPaths[n],
data: nodeBlob)
else:
result.leftOver.slotPaths.add paths.slotPaths[n]
result.topInx = startInx + maxLen
# Was that all for this step? Otherwise add some left over.
if maxLen < nSlotPaths:
result.leftOver.slotPaths &= paths.slotPaths[maxLen ..< nSlotPaths]
if 0 < result.leftOver.slotPaths.len:
result.leftOver.accPath = paths.accPath
# ------------------------------------------------------------------------------
# Public functions
# ------------------------------------------------------------------------------
proc getTrieNodes*(
buddy: SnapBuddyRef;
stateRoot: Hash256; # Current DB base (see `pivot` for logging)
paths: seq[SnapTriePaths]; # Nodes to fetch
pivot: string; # For logging, instead of `stateRoot`
): Future[Result[GetTrieNodes,ComError]]
{.async.} =
## Fetch data using the `snap#` protocol, returns the trie nodes requested
## (if any.)
let
peer {.used.} = buddy.peer
nPaths = paths.len
if nPaths == 0:
return err(ComEmptyRequestArguments)
let nTotal = paths.mapIt(min(1,it.slotPaths.len)).foldl(a+b, 0)
if trSnapTracePacketsOk:
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
trace trSnapSendSending & "GetTrieNodes", peer, pivot, nPaths, nTotal
let trieNodes = block:
let rc = await buddy.getTrieNodesReq(stateRoot, paths, pivot)
if rc.isErr:
return err(ComNetworkProblem)
if rc.value.isNone:
trace trSnapRecvTimeoutWaiting & "for TrieNodes", peer, pivot, nPaths
return err(ComResponseTimeout)
let blobs = rc.value.get.nodes
if nTotal < blobs.len:
# Ooops, makes no sense
return err(ComTooManyTrieNodes)
blobs
let
nNodes = trieNodes.len
if nNodes == 0:
# github.com/ethereum/devp2p/blob/master/caps/snap.md#gettrienodes-0x06
#
# Notes:
# * Nodes must always respond to the query.
# * The returned nodes must be in the request order.
# * If the node does not have the state for the requested state root or for
# any requested account paths, it must return an empty reply. It is the
# responsibility of the caller to query an state not older than 128
# blocks; and the caller is expected to only ever query existing trie
# nodes.
# * The responding node is allowed to return less data than requested
# (serving QoS limits), but the node must return at least one trie node.
trace trSnapRecvReceived & "empty TrieNodes", peer, pivot, nPaths, nNodes
return err(ComNoByteCodesAvailable)
# Assemble return value
var
dd = GetTrieNodes()
inx = 0
for p in paths:
let step = p.processReplyStep(trieNodes, inx)
if 0 < step.leftOver.accPath.len or
0 < step.leftOver.slotPaths.len:
dd.leftOver.add step.leftOver
if 0 < step.nodes.len:
dd.nodes &= step.nodes
inx = step.topInx
if trieNodes.len <= inx:
break
trace trSnapRecvReceived & "TrieNodes", peer, pivot,
nPaths, nNodes, nLeftOver=dd.leftOver.len
return ok(dd)
# ------------------------------------------------------------------------------
# End
# ------------------------------------------------------------------------------