nimbus-eth1/nimbus/sync/snap/worker/db/hexary_inspect.nim

380 lines
13 KiB
Nim
Raw Normal View History

# nimbus-eth1
# Copyright (c) 2021 Status Research & Development GmbH
# Licensed under either of
# * Apache License, version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
# http://www.apache.org/licenses/LICENSE-2.0)
# * MIT license ([LICENSE-MIT](LICENSE-MIT) or
# http://opensource.org/licenses/MIT)
# at your option. This file may not be copied, modified, or distributed
# except according to those terms.
import
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
std/[sequtils, strutils, tables],
chronicles,
eth/[common, trie/nibbles],
stew/results,
../../range_desc,
"."/[hexary_desc, hexary_paths]
{.push raises: [].}
logScope:
topics = "snap-db"
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
type
TrieNodeStatCtxRef* = ref object
## Context to resume searching for dangling links
case persistent*: bool
of true:
hddCtx*: seq[(NodeKey,NibblesSeq)]
else:
memCtx*: seq[(RepairKey,NibblesSeq)]
TrieNodeStat* = object
## Trie inspection report
dangling*: seq[NodeSpecs] ## Referes to nodes with incomplete refs
count*: uint64 ## Number of nodes visited
level*: uint8 ## Maximum nesting depth of dangling nodes
stopped*: bool ## Potential loop detected if `true`
resumeCtx*: TrieNodeStatCtxRef ## Context for resuming inspection
const
extraTraceMessages = false # or true
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
when extraTraceMessages:
import stew/byteutils
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
# ------------------------------------------------------------------------------
# Private helpers, debugging
# ------------------------------------------------------------------------------
proc ppDangling(a: seq[NodeSpecs]; maxItems = 30): string =
proc ppBlob(w: Blob): string =
w.mapIt(it.toHex(2)).join.toLowerAscii
let
q = a.mapIt(it.partialPath.ppBlob)[0 ..< min(maxItems,a.len)]
andMore = if maxItems < a.len: ", ..[#" & $a.len & "].." else: ""
"{" & q.join(",") & andMore & "}"
# ------------------------------------------------------------------------------
# Private helpers
# ------------------------------------------------------------------------------
proc convertTo(key: RepairKey; T: type NodeKey): T =
## Might be lossy, check before use
discard result.init(key.ByteArray33[1 .. 32])
proc convertTo(key: Blob; T: type NodeKey): T =
## Might be lossy, check before use
discard result.init(key)
# ------------------------------------------------------------------------------
# Private functions
# ------------------------------------------------------------------------------
proc processLink(
db: HexaryTreeDbRef;
stats: var TrieNodeStat;
inspect: var seq[(RepairKey,NibblesSeq)];
trail: NibblesSeq;
child: RepairKey;
) =
## Helper for `hexaryInspect()`
if not child.isZero:
if not child.isNodeKey:
# Oops -- caught in the middle of a repair process? Just register
# this node
Prep for full sync after snap make 4 (#1282) * Re-arrange fetching storage slots in batch module why; Previously, fetching partial slot ranges first has a chance of terminating the worker peer 9due to network error) while there were many inheritable storage slots on the queue. Now, inheritance is checked first, then full slot ranges and finally partial ranges. * Update logging * Bundled node information for healing into single object `NodeSpecs` why: Previously, partial paths and node keys were kept in separate variables. This approach was error prone due to copying/reassembling function argument objects. As all partial paths, keys, and node data types are more or less handled as `Blob`s over the network (using Eth/6x, or Snap/1) it makes sense to hold these `Blob`s as named field in a single object (even if not all fields are active for the current purpose.) * For good housekeeping, using `NodeKey` type only for account keys why: previously, a mixture of `NodeKey` and `Hash256` was used. Now, only state or storage root keys use the `Hash256` type. * Always accept latest pivot (and not a slightly older one) why; For testing it was tried to use a slightly older pivot state root than available. Some anecdotal tests seemed to suggest an advantage so that more peers are willing to serve on that older pivot. But this could not be confirmed in subsequent tests (still anecdotal, though.) As a side note, the distance of the latest pivot to its predecessor is at least 128 (or whatever the constant `minPivotBlockDistance` is assigned to.) * Reshuffle name components for some file and function names why: Clarifies purpose: "storages" becomes: "storage slots" "store" becomes: "range fetch" * Stash away currently unused modules in sub-folder named "notused"
2022-10-27 13:49:28 +00:00
stats.dangling.add NodeSpecs(
partialPath: trail.hexPrefixEncode(isLeaf = false))
elif db.tab.hasKey(child):
inspect.add (child,trail)
else:
Prep for full sync after snap make 4 (#1282) * Re-arrange fetching storage slots in batch module why; Previously, fetching partial slot ranges first has a chance of terminating the worker peer 9due to network error) while there were many inheritable storage slots on the queue. Now, inheritance is checked first, then full slot ranges and finally partial ranges. * Update logging * Bundled node information for healing into single object `NodeSpecs` why: Previously, partial paths and node keys were kept in separate variables. This approach was error prone due to copying/reassembling function argument objects. As all partial paths, keys, and node data types are more or less handled as `Blob`s over the network (using Eth/6x, or Snap/1) it makes sense to hold these `Blob`s as named field in a single object (even if not all fields are active for the current purpose.) * For good housekeeping, using `NodeKey` type only for account keys why: previously, a mixture of `NodeKey` and `Hash256` was used. Now, only state or storage root keys use the `Hash256` type. * Always accept latest pivot (and not a slightly older one) why; For testing it was tried to use a slightly older pivot state root than available. Some anecdotal tests seemed to suggest an advantage so that more peers are willing to serve on that older pivot. But this could not be confirmed in subsequent tests (still anecdotal, though.) As a side note, the distance of the latest pivot to its predecessor is at least 128 (or whatever the constant `minPivotBlockDistance` is assigned to.) * Reshuffle name components for some file and function names why: Clarifies purpose: "storages" becomes: "storage slots" "store" becomes: "range fetch" * Stash away currently unused modules in sub-folder named "notused"
2022-10-27 13:49:28 +00:00
stats.dangling.add NodeSpecs(
partialPath: trail.hexPrefixEncode(isLeaf = false),
nodeKey: child.convertTo(NodeKey))
proc processLink(
getFn: HexaryGetFn;
stats: var TrieNodeStat;
inspect: var seq[(NodeKey,NibblesSeq)];
trail: NibblesSeq;
child: Rlp;
) {.gcsafe, raises: [CatchableError]} =
## Ditto
if not child.isEmpty:
let childBlob = child.toBytes
if childBlob.len != 32:
# Oops -- that is wrong, although the only sensible action is to
# register the node and otherwise ignore it
Prep for full sync after snap make 4 (#1282) * Re-arrange fetching storage slots in batch module why; Previously, fetching partial slot ranges first has a chance of terminating the worker peer 9due to network error) while there were many inheritable storage slots on the queue. Now, inheritance is checked first, then full slot ranges and finally partial ranges. * Update logging * Bundled node information for healing into single object `NodeSpecs` why: Previously, partial paths and node keys were kept in separate variables. This approach was error prone due to copying/reassembling function argument objects. As all partial paths, keys, and node data types are more or less handled as `Blob`s over the network (using Eth/6x, or Snap/1) it makes sense to hold these `Blob`s as named field in a single object (even if not all fields are active for the current purpose.) * For good housekeeping, using `NodeKey` type only for account keys why: previously, a mixture of `NodeKey` and `Hash256` was used. Now, only state or storage root keys use the `Hash256` type. * Always accept latest pivot (and not a slightly older one) why; For testing it was tried to use a slightly older pivot state root than available. Some anecdotal tests seemed to suggest an advantage so that more peers are willing to serve on that older pivot. But this could not be confirmed in subsequent tests (still anecdotal, though.) As a side note, the distance of the latest pivot to its predecessor is at least 128 (or whatever the constant `minPivotBlockDistance` is assigned to.) * Reshuffle name components for some file and function names why: Clarifies purpose: "storages" becomes: "storage slots" "store" becomes: "range fetch" * Stash away currently unused modules in sub-folder named "notused"
2022-10-27 13:49:28 +00:00
stats.dangling.add NodeSpecs(
partialPath: trail.hexPrefixEncode(isLeaf = false))
else:
Prep for full sync after snap make 4 (#1282) * Re-arrange fetching storage slots in batch module why; Previously, fetching partial slot ranges first has a chance of terminating the worker peer 9due to network error) while there were many inheritable storage slots on the queue. Now, inheritance is checked first, then full slot ranges and finally partial ranges. * Update logging * Bundled node information for healing into single object `NodeSpecs` why: Previously, partial paths and node keys were kept in separate variables. This approach was error prone due to copying/reassembling function argument objects. As all partial paths, keys, and node data types are more or less handled as `Blob`s over the network (using Eth/6x, or Snap/1) it makes sense to hold these `Blob`s as named field in a single object (even if not all fields are active for the current purpose.) * For good housekeeping, using `NodeKey` type only for account keys why: previously, a mixture of `NodeKey` and `Hash256` was used. Now, only state or storage root keys use the `Hash256` type. * Always accept latest pivot (and not a slightly older one) why; For testing it was tried to use a slightly older pivot state root than available. Some anecdotal tests seemed to suggest an advantage so that more peers are willing to serve on that older pivot. But this could not be confirmed in subsequent tests (still anecdotal, though.) As a side note, the distance of the latest pivot to its predecessor is at least 128 (or whatever the constant `minPivotBlockDistance` is assigned to.) * Reshuffle name components for some file and function names why: Clarifies purpose: "storages" becomes: "storage slots" "store" becomes: "range fetch" * Stash away currently unused modules in sub-folder named "notused"
2022-10-27 13:49:28 +00:00
let childKey = childBlob.convertTo(NodeKey)
if 0 < child.toBytes.getFn().len:
inspect.add (childKey,trail)
else:
Prep for full sync after snap make 4 (#1282) * Re-arrange fetching storage slots in batch module why; Previously, fetching partial slot ranges first has a chance of terminating the worker peer 9due to network error) while there were many inheritable storage slots on the queue. Now, inheritance is checked first, then full slot ranges and finally partial ranges. * Update logging * Bundled node information for healing into single object `NodeSpecs` why: Previously, partial paths and node keys were kept in separate variables. This approach was error prone due to copying/reassembling function argument objects. As all partial paths, keys, and node data types are more or less handled as `Blob`s over the network (using Eth/6x, or Snap/1) it makes sense to hold these `Blob`s as named field in a single object (even if not all fields are active for the current purpose.) * For good housekeeping, using `NodeKey` type only for account keys why: previously, a mixture of `NodeKey` and `Hash256` was used. Now, only state or storage root keys use the `Hash256` type. * Always accept latest pivot (and not a slightly older one) why; For testing it was tried to use a slightly older pivot state root than available. Some anecdotal tests seemed to suggest an advantage so that more peers are willing to serve on that older pivot. But this could not be confirmed in subsequent tests (still anecdotal, though.) As a side note, the distance of the latest pivot to its predecessor is at least 128 (or whatever the constant `minPivotBlockDistance` is assigned to.) * Reshuffle name components for some file and function names why: Clarifies purpose: "storages" becomes: "storage slots" "store" becomes: "range fetch" * Stash away currently unused modules in sub-folder named "notused"
2022-10-27 13:49:28 +00:00
stats.dangling.add NodeSpecs(
partialPath: trail.hexPrefixEncode(isLeaf = false),
nodeKey: childKey)
# ------------------------------------------------------------------------------
# Public functions
# ------------------------------------------------------------------------------
proc to*(resumeCtx: TrieNodeStatCtxRef; T: type seq[NodeSpecs]): T =
## Convert resumption context to nodes that can be used otherwise. This
## function might be useful for error recovery.
##
## Note: In a non-persistant case, temporary `RepairKey` type node specs
## that cannot be converted to `NodeKey` type nodes are silently dropped.
## This should be no problem as a hexary trie with `RepairKey` type node
## refs must be repaired or discarded anyway.
if resumeCtx.persistent:
for (key,trail) in resumeCtx.hddCtx:
result.add NodeSpecs(
partialPath: trail.hexPrefixEncode(isLeaf = false),
nodeKey: key)
else:
for (key,trail) in resumeCtx.memCtx:
if key.isNodeKey:
result.add NodeSpecs(
partialPath: trail.hexPrefixEncode(isLeaf = false),
nodeKey: key.convertTo(NodeKey))
proc hexaryInspectTrie*(
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
db: HexaryTreeDbRef; # Database
root: NodeKey; # State root
partialPaths: seq[Blob] = @[]; # Starting paths for search
resumeCtx: TrieNodeStatCtxRef = nil; # Context for resuming inspection
suspendAfter = high(uint64); # To be resumed
stopAtLevel = 64u8; # Width-first depth level
maxDangling = high(int); # Maximal number of dangling results
): TrieNodeStat
{.gcsafe, raises: [KeyError]} =
## Starting with the argument list `paths`, find all the non-leaf nodes in
## the hexary trie which have at least one node key reference missing in
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
## the trie database. The references for these nodes are collected and
## returned.
##
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
## * Argument `partialPaths` list entries that do not refer to an existing
## and allocated hexary trie node are silently ignored. So are enytries
## that not refer to either a valid extension or a branch type node.
##
## * This function traverses the hexary trie in *width-first* mode
## simultaneously for any entry of the argument `partialPaths` list. Abart
## from completing the search there are three conditions when the search
## pauses to return the current state (via `resumeCtx`, see next bullet
## point):
## + The depth level of the running algorithm exceeds `stopAtLevel`.
## + The number of visited nodes exceeds `suspendAfter`.
## + Te number of cunnently collected dangling nodes exceeds `maxDangling`.
## If the function pauses because the current depth exceeds `stopAtLevel`
## then the `stopped` flag of the result object will be set, as well.
##
## * When paused for some of the reasons listed above, the `resumeCtx` field
## of the result object contains the current state so that the function
## can resume searching from where is paused. An application using this
## feature could look like:
## ::
## var ctx = TrieNodeStatCtxRef()
## while not ctx.isNil:
## let state = hexaryInspectTrie(db, root, paths, resumeCtx=ctx, 1024)
## ...
## ctx = state.resumeCtx
##
let rootKey = root.to(RepairKey)
if not db.tab.hasKey(rootKey):
return TrieNodeStat()
var
reVisit: seq[(RepairKey,NibblesSeq)]
again: seq[(RepairKey,NibblesSeq)]
resumeOk = false
# Initialise lists from previous session
if not resumeCtx.isNil and
not resumeCtx.persistent and
0 < resumeCtx.memCtx.len:
resumeOk = true
reVisit = resumeCtx.memCtx
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
if partialPaths.len == 0 and not resumeOk:
reVisit.add (rootKey,EmptyNibbleRange)
else:
# Add argument paths
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
for w in partialPaths:
let (isLeaf,nibbles) = hexPrefixDecode w
if not isLeaf:
let rc = nibbles.hexaryPathNodeKey(rootKey, db, missingOk=false)
if rc.isOk:
reVisit.add (rc.value.to(RepairKey), nibbles)
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
# Stopping on `suspendAfter` has precedence over `stopAtLevel`
while 0 < reVisit.len and result.count <= suspendAfter:
if stopAtLevel < result.level:
result.stopped = true
break
for n in 0 ..< reVisit.len:
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
if suspendAfter < result.count or
maxDangling <= result.dangling.len:
# Swallow rest
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
again &= reVisit[n ..< reVisit.len]
break
let
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
(rKey, parentTrail) = reVisit[n]
node = db.tab[rKey]
# parent = rKey.convertTo(NodeKey) -- unused
case node.kind:
of Extension:
let
trail = parentTrail & node.ePfx
child = node.eLink
db.processLink(stats=result, inspect=again, trail, child)
of Branch:
for n in 0 ..< 16:
let
trail = parentTrail & @[n.byte].initNibbleRange.slice(1)
child = node.bLink[n]
db.processLink(stats=result, inspect=again, trail, child)
of Leaf:
# Ooops, forget node and key
discard
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
result.count.inc
# End `for`
result.level.inc
swap(reVisit, again)
again.setLen(0)
# End while
if 0 < reVisit.len:
result.resumeCtx = TrieNodeStatCtxRef(
persistent: false,
memCtx: reVisit)
proc hexaryInspectTrie*(
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
getFn: HexaryGetFn; # Database abstraction
rootKey: NodeKey; # State root
partialPaths: seq[Blob] = @[]; # Starting paths for search
resumeCtx: TrieNodeStatCtxRef = nil; # Context for resuming inspection
suspendAfter = high(uint64); # To be resumed
stopAtLevel = 64u8; # Width-first depth level
maxDangling = high(int); # Maximal number of dangling results
): TrieNodeStat
{.gcsafe, raises: [CatchableError]} =
## Variant of `hexaryInspectTrie()` for persistent database.
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
when extraTraceMessages:
let nPaths = paths.len
let root = rootKey.to(Blob)
if root.getFn().len == 0:
when extraTraceMessages:
trace "Hexary inspect: missing root", nPaths, maxLeafPaths,
rootKey=root.toHex
return TrieNodeStat()
var
reVisit: seq[(NodeKey,NibblesSeq)]
again: seq[(NodeKey,NibblesSeq)]
resumeOk = false
# Initialise lists from previous session
if not resumeCtx.isNil and
resumeCtx.persistent and
0 < resumeCtx.hddCtx.len:
resumeOk = true
reVisit = resumeCtx.hddCtx
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
if partialPaths.len == 0 and not resumeOk:
reVisit.add (rootKey,EmptyNibbleRange)
else:
# Add argument paths
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
for w in partialPaths:
let (isLeaf,nibbles) = hexPrefixDecode w
if not isLeaf:
let rc = nibbles.hexaryPathNodeKey(rootKey, getFn, missingOk=false)
if rc.isOk:
reVisit.add (rc.value, nibbles)
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
# Stopping on `suspendAfter` has precedence over `stopAtLevel`
while 0 < reVisit.len and result.count <= suspendAfter:
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
when extraTraceMessages:
trace "Hexary inspect processing", nPaths, maxLeafPaths,
level=result.level, nReVisit=reVisit.len, nDangling=result.dangling.len
if stopAtLevel < result.level:
result.stopped = true
break
for n in 0 ..< reVisit.len:
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
if suspendAfter < result.count or
maxDangling <= result.dangling.len:
# Swallow rest
again = again & reVisit[n ..< reVisit.len]
break
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
let
(parent, parentTrail) = reVisit[n]
parentBlob = parent.to(Blob).getFn()
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
if parentBlob.len == 0:
# Ooops, forget node and key
continue
let nodeRlp = rlpFromBytes parentBlob
case nodeRlp.listLen:
of 2:
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
let (isLeaf,xPfx) = hexPrefixDecode nodeRlp.listElem(0).toBytes
if not isleaf:
let
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
trail = parentTrail & xPfx
child = nodeRlp.listElem(1)
getFn.processLink(stats=result, inspect=again, trail, child)
of 17:
for n in 0 ..< 16:
let
trail = parentTrail & @[n.byte].initNibbleRange.slice(1)
child = nodeRlp.listElem(n)
getFn.processLink(stats=result, inspect=again, trail, child)
else:
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
# Ooops, forget node and key
discard
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
result.count.inc
# End `for`
result.level.inc
swap(reVisit, again)
again.setLen(0)
# End while
if 0 < reVisit.len:
result.resumeCtx = TrieNodeStatCtxRef(
persistent: true,
hddCtx: reVisit)
when extraTraceMessages:
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
trace "Hexary inspect finished", nPaths, maxLeafPaths,
level=result.level, nResumeCtx=reVisit.len, nDangling=result.dangling.len,
Prep for full sync after snap (#1253) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`
2022-10-08 17:20:50 +00:00
maxLevel=stopAtLevel, stopped=result.stopped
Snap sync refactor healing (#1397) * Simplify accounts healing threshold management why: Was over-engineered. details: Previously, healing was based on recursive hexary trie perusal. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Control number of dangling result nodes in `hexaryInspectTrie()` also: + Returns number of visited nodes available for logging so the maximum number of nodes can be tuned accordingly. + Some code and docu update * Update names of constants why: Declutter, more systematic naming * Re-implemented `worker_desc.merge()` for storage slots why: Provided as proper queue management in `storage_queue_helper`. details: + Several append modes (replaces `merge()`) + Added third queue to record entries currently fetched by a worker. So another parallel running worker can safe the complete set of storage slots in as checkpoint. This was previously lost. * Refactor healing why: Simplify and remove deep hexary trie perusal for finding completeness. Due to "cheap" envelope decomposition of a range complement for the hexary trie, the cost of running extra laps have become time-affordable again and a simple trigger mechanism for healing will do. * Docu update * Run a storage job only once in download loop why: Download failure or rejection (i.e. missing data) lead to repeated fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
# ------------------------------------------------------------------------------
# Public functions, debugging
# ------------------------------------------------------------------------------
proc pp*(a: TrieNodeStat; db: HexaryTreeDbRef; maxItems = 30): string =
result = "(" & $a.level
if a.stopped:
result &= "stopped,"
result &= $a.dangling.len & "," &
a.dangling.ppDangling(maxItems) & ")"
# ------------------------------------------------------------------------------
# End
# ------------------------------------------------------------------------------