Commit Graph

15 Commits

Author SHA1 Message Date
Jordan Hrycaj c5e895aaab
Code reorg 4 snap sync suite (#1560)
* Rename `playXXX` => `passXXX`

why:
  Better purpose match

* Code massage, log message updates

* Moved `ticker.nim` to `misc` folder to be used the same by full and snap sync

why:
  Simplifies maintenance

* Move `worker/pivot*` => `worker/pass/pass_snap/*`

why:
  better for maintenance

* Moved helper source file => `pass/pass_snap/helper`

* Renamed ComError => GetError, `worker/com/` => `worker/get/`

* Keep ticker enable flag in worker descriptor

why:
  This allows to pass this flag with the descriptor and not an extra
  function argument when calling the setup function.

* Extracted setup/release code from `worker.nim` => `pass/pass_init.nim`
2023-04-24 21:24:07 +01:00
Jordan Hrycaj f40a066cc6
Update snap sync ready to succeed at lab test (#1556)
* Extract RocksDB timing tests from snap unit tests as separate module

why:
  Declutter, make space for more snap related unit tests.

* Renamed `undumpNextGroup()` => `undumpBlocks()`

why:
  Source file name is called `undump_blocks.nim` which should be sort
  of in sync with the method name(s).

* Implement snap/1 server method `getByteCodes()`

* Implement snap/1 client method `getByteCodes()`

* Implement faculty for handling contract code fetching via snap/1

* Provide persistent storage for contract code records

* Implement contract code snap sync fetch & store

* Code massage, cosmetics

* Unit tests for verifying snap sync snapshot dump

details:
  Use `undump_kvp.dumpAllDb()` to dump any database.
2023-04-21 22:11:04 +01:00
Jordan Hrycaj 5e865edec0
Update snap client storage slots download and healing (#1529)
* Fix fringe condition for `GetStorageRanges` message handler

why:
  Receiving a proved empty range was not considered at all. This lead to
  inconsistencies of the return value which led to subsequent errors.

* Update storage range bulk download

details;
  Mainly re-org of storage queue processing in `storage_queue_helper.nim`

* Update logging variables/messages

* Update storage slots healing

details:
  Mainly clean up after improved helper functions from the sources
  `find_missing_nodes.nim` and `storage_queue_helper.nim`.

* Simplify account fetch

why:
  To much fuss made tolerating some errors. There will be an overall
  strategy implemented where the concert of download and healing function
  is orchestrated.

* Add error resilience to the concert of download and healing.

why:
  The idea is that a peer might stop serving snap/1 accounts and storage
  slot downloads while still able to support fetching nodes for healing.
2023-04-04 14:36:18 +01:00
Jordan Hrycaj c01045c246
Update snap client account healing (#1521)
* Update nearby/neighbour leaf nodes finder

details:
  Update return error codes so that in the case that there is no more
  leaf node beyond the search direction, the particular error code
  `NearbyBeyondRange` is returned.

* Compile largest interval range containing only this leaf point

why:
  Will be needed in snap sync for adding single leaf nodes to the range
  of already allocated nodes.

* Reorg `hexary_inspect.nim`

why:
 Merged the nodes collecting algorithm for persistent and in-memory
 into a single generic function `hexary_inspect.inspectTrieImpl()`

* Update fetching accounts range failure handling in `rangeFetchAccounts()`

why:
  Rejected response leads now to fetching for another account range. Only
  repeated failures (or all done) terminate the algorithm.

* Update accounts healing

why:
+ Fixed looping over a bogus node response that could not inserted into
  the database. As a solution, these nodes are locally registered and not
  asked for in this download cycle.
+ Sub-optimal handling of interval range for a healed account leaf node.
  Now the maximal range interval containing this node is registered as
  processed which leafs to de-fragementation of the processed (and
  unprocessed) range list(s). So *gap* ranges which are known not to
  cover any account leaf node are not asked for on the network, anymore.
+ Sporadically remove empty interval ranges (if any)

* Update logging, better variable names
2023-03-25 10:44:48 +00:00
Jordan Hrycaj 33023aaf39
Update snap server client test scenario (#1518)
* Redesign snap1 message GetTrieNodes argument prototypes

why:
  A list of sub-objects `seq[SnapTriePath]` is more intuitive to work with
  than an opaque definition `seq[seq[Blob]]` because the inner object
  `SnapTriePath` object has a dedicated inner structure (for how to
  interprete `seq[Blob]`.)

* Collect some public constants into `constants.nim` file

* Reorg `hexary_paths.nim`

why:
+ Collecting nodes following a partial path properly ending at an
  extension node failed to collect this last node.
+ Merged the nodes collecting algorithm for persistent and in-memory
  into a single generic function `hexary_paths.rootPathExtend()`

info:
  Extracted common tasks to `hexary_nodes_helper.nim`

* Implement `StorageRanges` message handler for snap/1 protocol
2023-03-22 20:11:49 +00:00
Jordan Hrycaj 89ae9621c4
Silence compiler gossip after nim upgrade (#1454)
* Silence some compiler gossip -- part 1, tx_pool

details:
  Mostly removing redundant imports and `Defect` tracer after switch
  to nim 1.6

* Silence some compiler gossip -- part 2, clique

details:
  Mostly removing redundant imports and `Defect` tracer after switch
  to nim 1.6

* Silence some compiler gossip -- part 3, misc core

details:
  Mostly removing redundant imports and `Defect` tracer after switch
  to nim 1.6

* Silence some compiler gossip -- part 4, sync

details:
  Mostly removing redundant imports and `Defect` tracer after switch
  to nim 1.6

* Clique update

why:
  Missing exception annotation
2023-01-30 22:10:23 +00:00
Jordan Hrycaj 707e47ac38
External beacon stream tracker (#1433)
* Register external beacon stream header

why:
  This will be used to sync the peers against.

* Update total coverage book-keeping for 100% roll-over

details:
  Provide commonly available/used function

* Replace best pivot by beacon stream tracker

details:
  Beacon stream header cache will be updated by external chain monitor via
  RPC. This cached header will then be used to sync the pivot.
2023-01-17 09:28:14 +00:00
Jordan Hrycaj 88b315bb41
Snap sync refactor healing (#1397)
* Simplify accounts healing threshold management

why:
  Was over-engineered.

details:
  Previously, healing was based on recursive hexary trie perusal.

  Due to "cheap" envelope decomposition of a range complement for the
  hexary trie, the cost of running extra laps have become time-affordable
  again and a simple trigger mechanism for healing will do.

* Control number of dangling result nodes in `hexaryInspectTrie()`

also:
+ Returns number of visited nodes available for logging so the maximum
  number of nodes can be tuned accordingly.
+ Some code and docu update

* Update names of constants

why:
  Declutter, more systematic naming

* Re-implemented `worker_desc.merge()` for storage slots

why:
  Provided as proper queue management in `storage_queue_helper`.

details:
+ Several append modes (replaces `merge()`)
+ Added third queue to record entries currently fetched by a worker. So
  another parallel running worker can safe the complete set of storage
  slots in as checkpoint. This was previously lost.

* Refactor healing

why:
  Simplify and remove deep hexary trie perusal for finding completeness.

   Due to "cheap" envelope decomposition of a range complement for the
   hexary trie, the cost of running extra laps have become time-affordable
   again and a simple trigger mechanism for healing will do.

* Docu update

* Run a storage job only once in download loop

why:
  Download failure or rejection (i.e. missing data) lead to repeated
  fetch requests until peer disconnects, otherwise.
2022-12-24 09:54:18 +00:00
Jordan Hrycaj bd42ebb193
Snap sync refactor accounts healing (#1392)
* Relocated mothballing (i.e. swap-in preparation) logic

details:
  Mothballing was previously tested & started after downloading
  account ranges in `range_fetch_accounts`.

  Whenever current download or healing stops because of a pivot change,
  swap-in preparation is needed (otherwise some storage slots may get
  lost when swap-in takes place.)

  Also, `execSnapSyncAction()` has been moved back to `pivot_helper`.

* Reorganised source file directories

details:
  Grouped pivot focused modules into `pivot` directory

* Renamed `checkNodes`, `sickSubTries` as `nodes.check`, `nodes.missing`

why:
  Both lists are typically used together as pair. Renaming `sickSubTries`
  reflects moving away from a healing centric view towards a swap-in
  attitude.

* Multi times coverage recording

details:
  Per pivot account ranges are accumulated into coverage range set. This
  set fill eventually contain a singe range of account hashes [0..2^256]
  which amounts to 100% capacity.

  A counter has been added that is incremented whenever max capacity is
  reached. The accumulated range is then reset to empty.

  The effect of this setting is that the coverage can be evenly duplicated.
  So 200% would not accumulate on a particular region.

* Update range length comparisons (mod 2^256)

why:
  A range interval can have sizes 1..2^256 as it cannot be empty by
  definition. The number of points in a range intervals set can have
  0..2^256 points. As the scalar range is a residue class modulo 2^256,
  the residue class 0 means length 2^256 for a range interval, but can
  be 0 or 2^256 for the number of points in a range intervals set.

* Generalised `hexaryEnvelopeDecompose()`

details:
  Compile the complement of the union of some (processed) intervals and
  express this complement as a list of envelopes of sub-tries.

  This facility is directly applicable to swap-in book-keeping.

* Re-factor `swapIn()`

why:
  Good idea but baloney implementation. The main algorithm is based on
  the generalised version of `hexaryEnvelopeDecompose()` which has been
  derived from this implementation.

* Refactor `healAccounts()` using `hexaryEnvelopeDecompose()` as main driver

why:
  Previously, the hexary trie was searched recursively for dangling nodes
  which has a poor worst case performance already when the trie  is
  reasonably populated.

  The function `hexaryEnvelopeDecompose()` is a magnitude faster because
  it does not peruse existing sub-tries in order to find missing nodes
  although result is not fully compatible with the previous function.

  So recursive search is used in a limited mode only when the decomposer
  will not deliver a useful result.

* Logging & maintenance fixes

details:
  Preparation for abandoning buddy-global healing variables `node`,
  `resumeCtx`, and `lockTriePerusal`. These variable are trie-perusal
  centric which will be run on the back burner in favour of
  `hexaryEnvelopeDecompose()` which is used for accounts healing already.
2022-12-19 21:22:09 +00:00
Jordan Hrycaj cc2c888a63
Snap sync swap in other pivots (#1363)
* Provide index to reconstruct missing storage slots

why;
  Pivots will be changed anymore once they are officially archived. The
  account of the archived pivots are ready to be swapped into the active
  pivot. This leaves open how to treat storage slots not fetched yet.

  Solution: when mothballing, an `account->storage-root` index is
  compiled that can be used when swapping in accounts.

* Implement swap-in from earlier pivots

details;
  When most accounts are covered by the current and previous pivot
  sessions, swapping inthe accounts and storage slots  (i.e. registering
  account ranges done) from earlier pivots takes place if there is a
  common sub-trie.

* Throttle pivot change when healing state has bean reached

why:
  There is a hope to complete the current pivot, so pivot update can be
  throttled. This is achieved by setting another minimum block number
  distance for the pivot headers. This feature is still experimental
2022-12-12 22:00:24 +00:00
Jordan Hrycaj 44a57496d9
Snap sync interval complement method to speed up trie perusal (#1328)
* Add quick hexary trie inspector, called `dismantle()`

why:
+ Full hexary trie perusal is slow if running down leaf nodes
+ For known range of leaf nodes, work out the UInt126-complement of
  partial sub-trie paths (for existing nodes). The result should cover
  no (or only a few) sub-tries with leaf nodes.

* Extract common healing methods => `sub_tries_helper.nim`

details:
  Also apply quick hexary trie inspection tool `dismantle()`
  Replace `inspectAccountsTrie()` wrapper by `hexaryInspectTrie()`

* Re-arrange task dispatching in main peer worker

* Refactor accounts and storage slots downloaders

* Rename `HexaryDbError` => `HexaryError`
2022-11-28 09:03:23 +00:00
Jordan Hrycaj 7688148565
Snap sync can start on saved checkpoint (#1327)
* Stop negotiating pivot if peer repeatedly replies w/usesless answers

why:
  There is some fringe condition where a peer replies with legit but
  useless empty headers repetely. This goes on until somebody stops.
  We stop now.

* Rename `missingNodes` => `sickSubTries`

why:
  These (probably missing) nodes represent in reality fully or partially
  missing sub-tries. The top nodes may even exist, e.g. as a shallow
  sub-trie.

also:
  Keep track of account healing on/of by bool variable `accountsHealing`
  controlled in `pivot_helper.execSnapSyncAction()`

* Add `nimbus` option argument `snapCtx` for starting snap recovery (if any)

also:
+ Trigger the recovery (or similar) process from inside the global peer
  worker initialisation `worker.setup()` and not by the `snap.start()`
  function.
+ Have `runPool()` returned a `bool` code to indicate early stop to
  scheduler.

* Can import partial snap sync checkpoint at start

details:
 + Modified what is stored with the checkpoint in `snapdb_pivot.nim`
 + Will be loaded within `runDaemon()` if activated

* Forgot to import total coverage range

why:
  Only the top (or latest) pivot needs coverage but the total coverage
  is the list of all ranges for all pivots -- simply forgotten.
2022-11-25 14:56:42 +00:00
Jordan Hrycaj bba1bea4c8
Snap sync state save (#1302)
* Piecemeal trie inspection

details:
  Trie inspection will stop after maximum number of nodes visited.
  The inspection can be resumed using the returned state from the
  last session.

why:
  This feature allows for task switch between `piecemeal` sessions.

* Extract pivot helper code from `worker.nim` => `pivot_helper.nim`

* Accounts import will now return dangling paths from `proof` nodes

why:
  With proper bookkeeping, this can be used to start healing without
  analysing the the probably full trie.

* Update `unprocessed` account range handling

why:
  More generally, the API of a pairs of unprocessed intervals favours
  the first set and not before that is exhausted the second set comes
  into play.

  This was unfortunately implemented which caused the ranges to be
  unnecessarily fractioned. Now the number of range interval typically
  remains in the lower single digit numbers.

* Save sync state after end of downloading some accounts

details:
  restore/resume to be implemented later
2022-11-16 23:51:06 +00:00
Jordan Hrycaj e14fd4b96c
Prep for full sync after snap make 6 (#1291)
* Update log ticker, using time interval rather than ticker count

why:
  Counting and logging ticker occurrences is inherently imprecise. So
  time intervals are used.

* Use separate storage tables for snap sync data

* Left boundary proof update

why:
  Was not properly implemented, yet.

* Capture pivot in peer worker (aka buddy) tasks

why:
  The pivot environment is linked to the `buddy` descriptor. While
  there is a task switch, the pivot may change. So it is passed on as
  function argument `env` rather than retrieved from the buddy at
  the start of a sub-function.

* Split queues `fetchStorage` into `fetchStorageFull` and `fetchStoragePart`

* Remove obsolete account range returned from `GetAccountRange` message

why:
  Handler returned the wrong right value of the range. This range was
  for convenience, only.

* Prioritise storage slots if the queue becomes large

why:
  Currently, accounts processing is prioritised up until all accounts
  are downloaded. The new prioritisation has two thresholds for
  + start processing storage slots with a new worker
  + stop account processing and switch to storage processing

also:
  Provide api for `SnapTodoRanges` pair of range sets in `worker_desc.nim`

* Generalise left boundary proof for accounts or storage slots.

why:
  Detailed explanation how this works is documented with
  `snapdb_accounts.importAccounts()`.

  Instead of enforcing a left boundary proof (which is still the default),
  the importer functions return a list of `holes` (aka node paths) found in
  the argument ranges of leaf nodes. This in turn is used by the book
   keeping software for data download.

* Forgot to pass on variable in function wrapper

also:
  + Start healing not before 99% accounts covered (previously 95%)
  + Logging updated/prettified
2022-11-08 18:56:04 +00:00
Jordan Hrycaj a689e9185a
Prep for full sync after snap make 5 (#1286)
* Update docu and logging

* Extracted and updated constants from `worker_desc` into separate file

* Update and re-calibrate communication error handling

* Allow simplified pivot negotiation

why:
  This feature allows to turn off pivot negotiation so that peers agree
  on a a pivot header.

  For snap sync with fast changing pivots this only throttles the sync
  process. The finally downloaded DB snapshot is typically a merged
  version of different pivot states augmented by a healing process.

* Re-model worker queues for accounts download & healing

why:
  Currently there is only one data fetch per download or healing task.
  This task is then repeated by the scheduler after a short time. In
  many cases, this short time seems enough for some peers to decide to
  terminate connection.

* Update main task batch `runMulti()`

details:
  The function `runMulti()` is activated in quasi-parallel mode by the
  scheduler. This function calls the download, healing and fast-sync
  functions.

  While in debug mode, after each set of jobs run by this function the
  database is analysed (by the `snapdb_check` module) and the result
  printed.
2022-11-01 15:07:44 +00:00