* Update nearby/neighbour leaf nodes finder
details:
Update return error codes so that in the case that there is no more
leaf node beyond the search direction, the particular error code
`NearbyBeyondRange` is returned.
* Compile largest interval range containing only this leaf point
why:
Will be needed in snap sync for adding single leaf nodes to the range
of already allocated nodes.
* Reorg `hexary_inspect.nim`
why:
Merged the nodes collecting algorithm for persistent and in-memory
into a single generic function `hexary_inspect.inspectTrieImpl()`
* Update fetching accounts range failure handling in `rangeFetchAccounts()`
why:
Rejected response leads now to fetching for another account range. Only
repeated failures (or all done) terminate the algorithm.
* Update accounts healing
why:
+ Fixed looping over a bogus node response that could not inserted into
the database. As a solution, these nodes are locally registered and not
asked for in this download cycle.
+ Sub-optimal handling of interval range for a healed account leaf node.
Now the maximal range interval containing this node is registered as
processed which leafs to de-fragementation of the processed (and
unprocessed) range list(s). So *gap* ranges which are known not to
cover any account leaf node are not asked for on the network, anymore.
+ Sporadically remove empty interval ranges (if any)
* Update logging, better variable names
* Redesign snap1 message GetTrieNodes argument prototypes
why:
A list of sub-objects `seq[SnapTriePath]` is more intuitive to work with
than an opaque definition `seq[seq[Blob]]` because the inner object
`SnapTriePath` object has a dedicated inner structure (for how to
interprete `seq[Blob]`.)
* Collect some public constants into `constants.nim` file
* Reorg `hexary_paths.nim`
why:
+ Collecting nodes following a partial path properly ending at an
extension node failed to collect this last node.
+ Merged the nodes collecting algorithm for persistent and in-memory
into a single generic function `hexary_paths.rootPathExtend()`
info:
Extracted common tasks to `hexary_nodes_helper.nim`
* Implement `StorageRanges` message handler for snap/1 protocol
* Silence some compiler gossip -- part 1, tx_pool
details:
Mostly removing redundant imports and `Defect` tracer after switch
to nim 1.6
* Silence some compiler gossip -- part 2, clique
details:
Mostly removing redundant imports and `Defect` tracer after switch
to nim 1.6
* Silence some compiler gossip -- part 3, misc core
details:
Mostly removing redundant imports and `Defect` tracer after switch
to nim 1.6
* Silence some compiler gossip -- part 4, sync
details:
Mostly removing redundant imports and `Defect` tracer after switch
to nim 1.6
* Clique update
why:
Missing exception annotation
* Simplify accounts healing threshold management
why:
Was over-engineered.
details:
Previously, healing was based on recursive hexary trie perusal.
Due to "cheap" envelope decomposition of a range complement for the
hexary trie, the cost of running extra laps have become time-affordable
again and a simple trigger mechanism for healing will do.
* Control number of dangling result nodes in `hexaryInspectTrie()`
also:
+ Returns number of visited nodes available for logging so the maximum
number of nodes can be tuned accordingly.
+ Some code and docu update
* Update names of constants
why:
Declutter, more systematic naming
* Re-implemented `worker_desc.merge()` for storage slots
why:
Provided as proper queue management in `storage_queue_helper`.
details:
+ Several append modes (replaces `merge()`)
+ Added third queue to record entries currently fetched by a worker. So
another parallel running worker can safe the complete set of storage
slots in as checkpoint. This was previously lost.
* Refactor healing
why:
Simplify and remove deep hexary trie perusal for finding completeness.
Due to "cheap" envelope decomposition of a range complement for the
hexary trie, the cost of running extra laps have become time-affordable
again and a simple trigger mechanism for healing will do.
* Docu update
* Run a storage job only once in download loop
why:
Download failure or rejection (i.e. missing data) lead to repeated
fetch requests until peer disconnects, otherwise.
* Miscellaneous tweaks & fixes
details:
+ Catch `TransportError` exception in `legacy.nim` module
+ Fix self-calling wrapper `hexaryEnvelopeTouchedBy()`
* Update documentation, logging etc.
* Changed `checkNode` batch list `seq[Blob]` => `seq[NodeSpecs]`
why:
The `NodeSpecs` type as used here is a tuple `(partial-path,node-key)`.
When `checkNode` partial paths are collected, also the node key is
available so it should be registered and not repeatedly recovered from
the database.
* Add optional begin/end trace statement in snap scheduler
why:
Allows to trace invoked entity and scheduler state variables
* Rename and update dismantle => hexaryEnvelopeDecompose()
why:
+ As for naming, a positive connotation is prefered
+ The unit tests were really insufficient
+ The function result was wrong on a few boundry conditions
detail:
+ Extracted the function from `hexary_paths.nim` and re-implemented
it together with other envelope functions => `hexary_envelope.nim`
+ Re-wrote docu for `hexaryEnvelopeDecompose()`
* Relaxed right condition for `hexaryEnvelopeDecompose()` range argument
why;
Previously, the right point of the argument interval had to be a path
to an allocated leaf node. While this is typically a given for accounts,
it is easier to require an arbitrary range of paths (or keys) with
the requirement of a `boundary proof` for left and right (i.e. enough
nodes in the database to find the end points.)
also:
Bug fixes for related functions (typos, missing conditions etc.)
* Add missing unit tests include file
* Add quick hexary trie inspector, called `dismantle()`
why:
+ Full hexary trie perusal is slow if running down leaf nodes
+ For known range of leaf nodes, work out the UInt126-complement of
partial sub-trie paths (for existing nodes). The result should cover
no (or only a few) sub-tries with leaf nodes.
* Extract common healing methods => `sub_tries_helper.nim`
details:
Also apply quick hexary trie inspection tool `dismantle()`
Replace `inspectAccountsTrie()` wrapper by `hexaryInspectTrie()`
* Re-arrange task dispatching in main peer worker
* Refactor accounts and storage slots downloaders
* Rename `HexaryDbError` => `HexaryError`
* Piecemeal trie inspection
details:
Trie inspection will stop after maximum number of nodes visited.
The inspection can be resumed using the returned state from the
last session.
why:
This feature allows for task switch between `piecemeal` sessions.
* Extract pivot helper code from `worker.nim` => `pivot_helper.nim`
* Accounts import will now return dangling paths from `proof` nodes
why:
With proper bookkeeping, this can be used to start healing without
analysing the the probably full trie.
* Update `unprocessed` account range handling
why:
More generally, the API of a pairs of unprocessed intervals favours
the first set and not before that is exhausted the second set comes
into play.
This was unfortunately implemented which caused the ranges to be
unnecessarily fractioned. Now the number of range interval typically
remains in the lower single digit numbers.
* Save sync state after end of downloading some accounts
details:
restore/resume to be implemented later
* Update logging
* Fix node hash associated with partial path for missing nodes
why:
Healing uses the partial paths for fetching nodes from the network. The
node hash (or key) is used to verify the node data retrieved.
The trie inspector function returned the parent hash instead of the node hash
with the partial path when a missing node was detected. So all nodes
for healing were rejected.
* Must not modify sequence while looping over it
* Re-arrange fetching storage slots in batch module
why;
Previously, fetching partial slot ranges first has a chance of
terminating the worker peer 9due to network error) while there were
many inheritable storage slots on the queue.
Now, inheritance is checked first, then full slot ranges and finally
partial ranges.
* Update logging
* Bundled node information for healing into single object `NodeSpecs`
why:
Previously, partial paths and node keys were kept in separate variables.
This approach was error prone due to copying/reassembling function
argument objects.
As all partial paths, keys, and node data types are more or less handled
as `Blob`s over the network (using Eth/6x, or Snap/1) it makes sense to
hold these `Blob`s as named field in a single object (even if not all
fields are active for the current purpose.)
* For good housekeeping, using `NodeKey` type only for account keys
why:
previously, a mixture of `NodeKey` and `Hash256` was used. Now, only
state or storage root keys use the `Hash256` type.
* Always accept latest pivot (and not a slightly older one)
why;
For testing it was tried to use a slightly older pivot state root than
available. Some anecdotal tests seemed to suggest an advantage so that
more peers are willing to serve on that older pivot. But this could not
be confirmed in subsequent tests (still anecdotal, though.)
As a side note, the distance of the latest pivot to its predecessor is
at least 128 (or whatever the constant `minPivotBlockDistance` is
assigned to.)
* Reshuffle name components for some file and function names
why:
Clarifies purpose:
"storages" becomes: "storage slots"
"store" becomes: "range fetch"
* Stash away currently unused modules in sub-folder named "notused"
* Re-model persistent database access
why:
Storage slots healing just run on the wrong sub-trie (i.e. the wrong
key mapping). So get/put and bulk functions now use the definitions
in `snapdb_desc` (earlier there were some shortcuts for `get()`.)
* Fixes: missing return code, typo, redundant imports etc.
* Remove obsolete debugging directives from `worker_desc` module
* Correct failing unit tests for storage slots trie inspection
why:
Some pathological cases for the extended tests do not produce any
hexary trie data. This is rightly detected by the trie inspection
and the result checks needed to adjusted.
* Rename `LeafRange` => `NodeTagRange`
* Replacing storage slot partition point by interval
why:
The partition point only allows to describe slots `[point,high(Uint256)]`
for fetching interval slot ranges. This has been generalised for any
interval.
* Replacing `SnapAccountRanges` by `SnapTrieRangeBatch`
why:
Generalised healing status for accounts, and later for storage slots.
* Improve accounts healing loop
* Split `snap_db` into accounts and storage modules
why:
It is cleaner to have separate session descriptors for accounts and
storage slots (based on a common base descriptor.)
Also, persistent storage handling might be changed in future which
requires the storage slot implementation disentangled from the accounts
handling.
* Re-model worker queues for storage slots
why:
There is a dynamic list of storage sub-tries, each one has to be
treated similar to the accounts database. This applied to slot
interval downloads as well as to healing
* Compress some return value report lists for snapdb methods
why:
No need to report all handling details for work items that are filteres
out and discarded, anyway.
* Remove inner loop frame from healing function
why:
The healing function runs as a loop body already.
* Split fetch accounts into sub-modules
details:
There will be separated modules for accounts snapshot, storage snapshot,
and healing for either.
* Allow to rebase pivot before negotiated header
why:
Peers seem to have not too many snapshots available. By setting back the
pivot block header slightly, the chances might be higher to find more
peers to serve this pivot. Experiment on mainnet showed that setting back
too much (tested with 1024), the chances to find matching snapshot peers
seem to decrease.
* Add accounts healing
* Update variable/field naming in `worker_desc` for readability
* Handle leaf nodes in accounts healing
why:
There is no need to fetch accounts when they had been added by the
healing process. On the flip side, these accounts must be checked for
storage data and the batch queue updated, accordingly.
* Reorganising accounts hash ranges batch queue
why:
The aim is to formally cover as many accounts as possible for different
pivot state root environments. Formerly, this was tried by starting the
accounts batch queue at a random value for each pivot (and wrapping
around.)
Now, each pivot environment starts with an interval set mutually
disjunct from any interval set retrieved with other pivot state roots.
also:
Stop fishing for more pivots in `worker` if 100% download is reached
* Reorganise/update accounts healing
why:
Error handling was wrong and the (math. complexity of) whole process
could be better managed.
details:
Much of the algorithm is now documented at the top of the file
`heal_accounts.nim`
* Miscellaneous updates TBC
* Disentangled pivot2 module from snap
why:
Wrote as template on top of sync so it can be shared by fast and snap
sync.
* Renamed and relocated pivot sources
* Integrated `best_pivot` module into full and snap sync
why:
Full sync used an older version of `best_pivot`
* isolating download module from full sync
why;
might be shared with snap sync at a later stage
* Added inspect module
why:
Find dangling references for trie healing support.
details:
+ This patch set provides only the inspect module and some unit tests.
+ There are also extensive unit tests which need bulk data from the
`nimbus-eth1-blob` module.
* Alternative pivot finder
why:
Attempt to be faster on start up. Also tying to decouple pivot finder
somehow by providing different mechanisms (this one runs in `single`
mode.)
* Use inspect module for healing
details:
+ After some progress with account and storage data, the inspect facility
is used to find dangling links in the database to be filled nose-wise.
+ This is a crude attempt to cobble together functional elements. The
set up needs to be honed.
* fix scheduler to avoid starting dead peers
why:
Some peers drop out while in `sleepAsync()`. So extra `if` clauses
make sure that this event is detected early.
* Bug fixes causing crashes
details:
+ prettify.toPC():
int/intToStr() numeric range over/underflow
+ hexary_inspect.hexaryInspectPath():
take care of half initialised step with branch but missing index into
branch array
* improve handling of dropped peers in alternaive pivot finder
why:
Strange things may happen while querying data from the network.
Additional checks make sure that the state of other peers is updated
immediately.
* Update trace messages
* reorganise snap fetch & store schedule