Commit Graph

27 Commits

Author SHA1 Message Date
jangko 9dd256cae7
eth wire handlers: implement transactions exchange 2022-11-11 09:24:32 +07:00
Jordan Hrycaj e14fd4b96c
Prep for full sync after snap make 6 (#1291)
* Update log ticker, using time interval rather than ticker count

why:
  Counting and logging ticker occurrences is inherently imprecise. So
  time intervals are used.

* Use separate storage tables for snap sync data

* Left boundary proof update

why:
  Was not properly implemented, yet.

* Capture pivot in peer worker (aka buddy) tasks

why:
  The pivot environment is linked to the `buddy` descriptor. While
  there is a task switch, the pivot may change. So it is passed on as
  function argument `env` rather than retrieved from the buddy at
  the start of a sub-function.

* Split queues `fetchStorage` into `fetchStorageFull` and `fetchStoragePart`

* Remove obsolete account range returned from `GetAccountRange` message

why:
  Handler returned the wrong right value of the range. This range was
  for convenience, only.

* Prioritise storage slots if the queue becomes large

why:
  Currently, accounts processing is prioritised up until all accounts
  are downloaded. The new prioritisation has two thresholds for
  + start processing storage slots with a new worker
  + stop account processing and switch to storage processing

also:
  Provide api for `SnapTodoRanges` pair of range sets in `worker_desc.nim`

* Generalise left boundary proof for accounts or storage slots.

why:
  Detailed explanation how this works is documented with
  `snapdb_accounts.importAccounts()`.

  Instead of enforcing a left boundary proof (which is still the default),
  the importer functions return a list of `holes` (aka node paths) found in
  the argument ranges of leaf nodes. This in turn is used by the book
   keeping software for data download.

* Forgot to pass on variable in function wrapper

also:
  + Start healing not before 99% accounts covered (previously 95%)
  + Logging updated/prettified
2022-11-08 18:56:04 +00:00
Jordan Hrycaj a689e9185a
Prep for full sync after snap make 5 (#1286)
* Update docu and logging

* Extracted and updated constants from `worker_desc` into separate file

* Update and re-calibrate communication error handling

* Allow simplified pivot negotiation

why:
  This feature allows to turn off pivot negotiation so that peers agree
  on a a pivot header.

  For snap sync with fast changing pivots this only throttles the sync
  process. The finally downloaded DB snapshot is typically a merged
  version of different pivot states augmented by a healing process.

* Re-model worker queues for accounts download & healing

why:
  Currently there is only one data fetch per download or healing task.
  This task is then repeated by the scheduler after a short time. In
  many cases, this short time seems enough for some peers to decide to
  terminate connection.

* Update main task batch `runMulti()`

details:
  The function `runMulti()` is activated in quasi-parallel mode by the
  scheduler. This function calls the download, healing and fast-sync
  functions.

  While in debug mode, after each set of jobs run by this function the
  database is analysed (by the `snapdb_check` module) and the result
  printed.
2022-11-01 15:07:44 +00:00
Jordan Hrycaj a8df4c1165
Fix trie inspector for healing (#1284)
* Update logging

* Fix node hash associated with partial path for missing nodes

why:
  Healing uses the partial paths for fetching nodes from the network. The
  node hash (or key) is used to verify the node data retrieved.

  The trie inspector function returned the parent hash instead of the node hash
  with the partial path when a missing node was detected. So all nodes
  for healing were rejected.

* Must not modify sequence while looping over it
2022-10-28 08:26:17 +01:00
Jordan Hrycaj 1b4572ed3b
Prep for full sync after snap make 4 (#1282)
* Re-arrange fetching storage slots in batch module

why;
  Previously, fetching partial slot ranges first has a chance of
  terminating the worker peer 9due to network error) while there were
  many inheritable storage slots on the queue.

  Now, inheritance is checked first, then full slot ranges and finally
  partial ranges.

* Update logging

* Bundled node information for healing into single object `NodeSpecs`

why:
  Previously, partial paths and node keys were kept in separate variables.
  This approach was error prone due to copying/reassembling function
  argument objects.

  As all partial paths, keys, and node data types are more or less handled
  as `Blob`s over the network (using Eth/6x, or Snap/1) it makes sense to
  hold these `Blob`s as named field in a single object (even if not all
  fields are active for the current purpose.)

* For good housekeeping, using `NodeKey` type only for account keys

why:
  previously, a mixture of `NodeKey` and `Hash256` was used. Now, only
  state or storage root keys use the `Hash256` type.

* Always accept latest pivot (and not a slightly older one)

why;
  For testing it was tried to use a slightly older pivot state root than
  available. Some anecdotal tests seemed to suggest an advantage so that
  more peers are willing to serve on that older pivot. But this could not
  be confirmed in subsequent tests (still anecdotal, though.)

  As a side note, the distance of the latest pivot to its predecessor is
  at least 128 (or whatever the constant `minPivotBlockDistance` is
  assigned to.)

* Reshuffle name components for some file and function names

why:
  Clarifies purpose:
  "storages" becomes: "storage slots"
  "store" becomes: "range fetch"

* Stash away currently unused modules in sub-folder named "notused"
2022-10-27 14:49:28 +01:00
Jordan Hrycaj 82ceec313d
Prettify logging for snap sync environment (#1278)
* Multiple storage batches at a time

why:
  Previously only some small portion was processed at a time so the peer
  might have gone when the process was resumed at a later time

* Renamed some field of snap/1 protocol response object

why:
  Documented as `slots` is in reality a per-account list of slot lists. So
  the new name `slotLists` better reflects the nature of the beast.

* Some minor healing re-arrangements for storage slot tries

why;
  Resolving all complete inherited slots tries first in sync mode keeps
  the worker queues smaller which improves logging.

* Prettify logging, comments update etc.
2022-10-21 20:29:42 +01:00
Jordan Hrycaj c0d580715e
Remodel persistent snapdb access (#1274)
* Re-model persistent database access

why:
  Storage slots healing just run on the wrong sub-trie (i.e. the wrong
  key mapping). So get/put and bulk functions now use the definitions
  in `snapdb_desc` (earlier there were some shortcuts for `get()`.)

* Fixes: missing return code, typo, redundant imports etc.

* Remove obsolete debugging directives from `worker_desc` module

* Correct failing unit tests for storage slots trie inspection

why:
  Some pathological cases for the extended tests do not produce any
  hexary trie data. This is rightly detected by the trie inspection
  and the result checks needed to adjusted.
2022-10-20 17:59:54 +01:00
Jordan Hrycaj 85fdb61699
Prep for full sync after snap make 3 (#1270)
* For snap sync, publish `EthWireRef` in sync descriptor

why:
  currently used for noise control

* Detect and reuse existing storage slots

* Provide healing module for storage slots

* Update statistic ticker (adding range factor for unprocessed storage)

* Complete mere function for work item ranges

why:
  Merging interval into existing partial item was missing

* Show av storage queue lengths in ticker

detail;
  Previous attempt shows average completeness which did not tell much

* Correct the meaning of the storage counter (per pivot)

detail:
  Is the # accounts that have a storage saved
2022-10-19 11:04:06 +01:00
jangko 3fa1b012e6
initial wire protocol transformation
rework on the eth wire protocol handlers.
curently still missing 4 handlers implementation.
but the framework is ready for eexpansion.
2022-10-15 19:48:21 +07:00
Jordan Hrycaj 8c7d91512b
Prep for full sync after snap mark2 (#1263)
* Rename `LeafRange` => `NodeTagRange`

* Replacing storage slot partition point by interval

why:
  The partition point only allows to describe slots `[point,high(Uint256)]`
  for fetching interval slot ranges. This has been generalised for any
  interval.

* Replacing `SnapAccountRanges` by `SnapTrieRangeBatch`

why:
  Generalised healing status for accounts, and later for storage slots.

* Improve accounts healing loop

* Split `snap_db` into accounts and storage modules

why:
  It is cleaner to have separate session descriptors for accounts and
  storage slots (based on a common base descriptor.)

  Also, persistent storage handling might be changed in future which
  requires the storage slot implementation disentangled from the accounts
  handling.

* Re-model worker queues for storage slots

why:
  There is a dynamic list of storage sub-tries, each one has to be
  treated similar to the accounts database. This applied to slot
  interval downloads as well as to healing

* Compress some return value report lists for snapdb methods

why:
  No need to report all handling details for work items that are filteres
  out and discarded, anyway.

* Remove inner loop frame from healing function

why:
  The healing function runs as a loop body already.
2022-10-14 17:40:32 +01:00
Jordan Hrycaj d53eacb854
Prep for full sync after snap (#1253)
* Split fetch accounts into sub-modules

details:
  There will be separated modules for accounts snapshot, storage snapshot,
  and healing for either.

* Allow to rebase pivot before negotiated header

why:
  Peers seem to have not too many snapshots available. By setting back the
  pivot block header slightly, the chances might be higher to find more
  peers to serve this pivot. Experiment on mainnet showed that setting back
  too much (tested with 1024), the chances to find matching snapshot peers
  seem to decrease.

* Add accounts healing

* Update variable/field naming in `worker_desc` for readability

* Handle leaf nodes in accounts healing

why:
  There is no need to fetch accounts when they had been added by the
  healing process. On the flip side, these accounts must be checked for
  storage data and the batch queue updated, accordingly.

* Reorganising accounts hash ranges batch queue

why:
  The aim is to formally cover as many accounts as possible for different
  pivot state root environments. Formerly, this was tried by starting the
  accounts batch queue at a random value for each pivot (and wrapping
  around.)

  Now, each pivot environment starts with an interval set mutually
  disjunct from any interval set retrieved with other pivot state roots.

also:
  Stop fishing for more pivots in `worker` if 100% download is reached

* Reorganise/update accounts healing

why:
  Error handling was wrong and the (math. complexity of) whole process
  could be better managed.

details:
  Much of the algorithm is now documented at the top of the file
  `heal_accounts.nim`
2022-10-08 18:20:50 +01:00
Jordan Hrycaj eca5882238
Isolating sync action modules (#1249)
* Miscellaneous updates TBC

* Disentangled pivot2 module from snap

why:
  Wrote as template on top of sync so it can be shared by fast and snap
  sync.

* Renamed and relocated pivot sources

* Integrated `best_pivot` module into full and snap sync

why:
  Full sync used an older version of `best_pivot`

* isolating download module from full sync

why;
  might be shared with snap sync at a later stage
2022-09-30 09:22:14 +01:00
Jordan Hrycaj 4ff0948fed
Snap sync accounts healing (#1225)
* Added inspect module

why:
  Find dangling references for trie healing support.

details:
 + This patch set provides only the inspect module and some unit tests.
 + There are also extensive unit tests which need bulk data from the
   `nimbus-eth1-blob` module.

* Alternative pivot finder

why:
  Attempt to be faster on start up. Also tying to decouple pivot finder
  somehow by providing different mechanisms (this one runs in `single`
  mode.)

* Use inspect module for healing

details:
 + After some progress with account and storage data, the inspect facility
   is used to find dangling links in the database to be filled nose-wise.
 + This is a crude attempt to cobble together functional elements. The
   set up needs to be honed.

* fix scheduler to avoid starting dead peers

why:
  Some peers drop out while in `sleepAsync()`. So extra `if` clauses
  make sure that this event is detected early.

* Bug fixes causing crashes

details:

+ prettify.toPC():
  int/intToStr() numeric range over/underflow

+ hexary_inspect.hexaryInspectPath():
  take care of half initialised step with branch but missing index into
  branch array

* improve handling of dropped peers in alternaive pivot finder

why:
  Strange things may happen while querying data from the network.
  Additional checks make sure that the state of other peers is updated
  immediately.

* Update trace messages

* reorganise snap fetch & store schedule
2022-09-16 08:24:12 +01:00
Jacek Sieka c2ed731fa5
eth: adapt to smaller eth_types (#1210) 2022-09-03 20:15:35 +02:00
Jordan Hrycaj 72a31593a9
Snap fetch account storage data (#1211)
* Removed database write comparison statistics

* Provide life storage tests data

details:
  database dumps on external repo `nimbus-eth1`-blobs`

* Update hexary tree interpolation for storage bulk tests

* fetch storage update
2022-09-02 19:16:09 +01:00
Jordan Hrycaj de2c13e136
Update snap offline tests (#1199)
* Re-implemented `hexaryFollow()` in a more general fashion

details:
+ New name for re-implemented `hexaryFollow()` is `hexaryPath()`
+ Renamed `rTreeFollow()` as `hexaryPath()`

why:
  Returning similarly organised structures, the results of the
  `hexaryPath()` functions become comparable when running over
  the persistent and the in-memory databases.

* Added traversal functionality for persistent ChainDB

* Using `Account` values as re-packed Blob

* Repack samples as compressed data files

* Produce test data

details:
+ Can force pivot state root switch after minimal coverage.
+ For emulating certain network behaviour, downloading accounts stops for
  a particular pivot state root if 30% (some static number) coverage is
  reached. Following accounts are downloaded for a later pivot state root.
2022-08-24 14:44:18 +01:00
Jordan Hrycaj f07945d37b
Misc snap sync updates (#1192)
* Bump nim-stew

why:
  Need fixed interval set

* Keep track of accumulated account ranges over all state roots

* Added comments and explanations to unit tests

* typo
2022-08-17 08:30:11 +01:00
Jordan Hrycaj 7489784ba8
Snap sync accounts db code reorg (#1189)
* Extracted functionality into sub-modules for maintainability

* Setting SST bulk load as default in `accounts_db`

details:
+ currently, the same data are stored via rocksdb if available, and
  the same via embedded `storage_type` with (non-standard) prefix 200
  for time comparisons
+ fallback to normal `put()` unless rocksdb is accessible
2022-08-15 16:51:50 +01:00
Jordan Hrycaj 7d7e26d45f
Experimental bulk loader tests (#1187)
why:
  Rocksdb bulk loading might provide a slight advantage when loading
  larger data sets into the system
2022-08-12 16:42:07 +01:00
Jordan Hrycaj 5f0e89a41e
Snap accounts bulk import preparer (#1183)
* Provided common scheduler API, applied to `full` sync

* Use hexary trie as storage for proofs_db records

also:
 + Store metadata with account for keeping track of account state
 + add iterator over accounts

* Common scheduler API applied to `snap` sync

* Prepare for accounts bulk import

details:
+ Added some ad-hoc checks for proving accounts data received from the
  snap/1 (will be replaced by proper database version when ready)
+ Added code that dumps some of the received snap/1 data into a file
  (turned of by default, see `worker_desc.nim`)
2022-08-04 09:04:30 +01:00
Jordan Hrycaj 73b628491d
Clique snapshots reorg (#1169)
* Add persistent snapshot size logging

why:
  Suspecting too much space used

snapshot statistic:
  [..]
  blockNumber=2214912 nSnaps=2236 snapsTotal=1.14m
  blockNumber=2215936 nSnaps=2237 snapsTotal=1.14m
  [..]
  Persisting blocks fromBlock=2216449 toBlock=2216640
  36458496	datadir-nimbus-goerlish/data/nimbus/

* Replace legacy `lru_cache` by `keyed_queue`

why:
  `keyed_queue` generalises `lru_cache`

snapshot statistic:
  [..]
  blockNumber=2234368 nSnaps=2259 snapsTotal=1.15m
  blockNumber=2235392 nSnaps=2260 snapsTotal=1.15m
  [..]
  Persisting blocks fromBlock=2235649 toBlock=2235840
  37627288	datadir-nimbus-goerlish/data/nimbus/

* Increase persistent snapshot storage interval by 300%

snapshot statistic:
      [..]
      blockNumber=2232320 nSnaps=620 snapsTotal=0.30m
      blockNumber=2236416 nSnaps=621 snapsTotal=0.30m
      [..]
      Persisting blocks fromBlock=2237185 toBlock=2237376
      37627288	datadir-nimbus-goerlish/data/nimbus/

* Cull legacy debugging environment for clique

why:
  Chronicles provides a better choice (when properly set up)
2022-07-21 19:16:28 +01:00
Jordan Hrycaj 5d98f68c09
Sync update to work with sepolia reorgs (#1168)
* Error return in `persistBlocks()` on initial `VmState` roblem

why:
  previously threw an exception

* Updated sync mode option

why:
 using enum rather than bool => space for more

* Added sync mode `full`, re-factued legacy sync

also:
  rebased

* Fix typo (crashes `pesistBlocks()` otherwise)

also:
  rebase to master

* Reduce log ticker noise by suppressing duplicate messages

* Clarify staged queue overflow handling

why:
  backtrack/re-org mode in `stageItem()` should be detected by both,
  the global indicator or the work item where it might have moved into.

also:
  rebased
2022-07-21 13:14:41 +01:00
Jordan Hrycaj 134fe26997
Store proved snap accounts (#1145)
* Relocated `IntervalSets` to nim-stew repo

* Accumulate accounts on temporary kv-DB

why:
  Explore the data as returned from snap/1. Will be converted to a
  `eth/db` next.

details:
  Verify and accumulate per/state-root accounts downloaded via snap.

also:
  Some unit tests

* Replace `Table` by `TrieDatabaseRef` for accounts accumulator

* update ticker statistics

details:
  mean/variance based counter update

* allow persistent db for proved accounts

* rebase, and globally activate unit test

* fix statistics
2022-07-01 12:42:17 +01:00
Jordan Hrycaj c123e1eb93
Updated account scheduler (#1124)
* Using `IntervalSet` type data for `LeafRange`

* Updated log ticker

* Update to `eth67`

details:
  Disabled by default, use `ENABLE_LEGACY_ETH66=0` to enable
  No support for `Get/NodeData` dialogue via eth, anymore

* Dissolved fetch/common.nim

details;
  the log/ticker part becomes ticker.nim
  the interval range management is merged into fetch.nim

* Updated account scheduler

why:
  The previous scheduler fetched each account once (for different state
  roots.) The updated scheduler re-calibrates after a change of the state
  root and potentially (until told otherwise) fetches all possible
  accounts.

* Fix `high(P)` fringe cases in `IntervalSet` handling

why:
  The `high(P)` value for a point type `P` cannot be represented with
  half open intervals `[a,b)` for a,b points of `P`. So this single value
  needs extra treatment which was slightly wrong.

* Updated docu/comments

also:
  rebased

* Update scheduler

details:
  Change the `pivot` management when creating new accounts lists. It is
  strictly increasing (and wrapping around) depending on last updated
  accounts list.
2022-06-16 09:58:50 +01:00
jangko a37f8b17e2
fix rpc and websocket server config if they share the same port with engine api
also fixes json-rpc-engine-api server and websocket-engine-api server shutdown code,
checking if they actually created or not at startup.

fix #1119
2022-06-13 19:32:19 +07:00
Jordan Hrycaj 76f6de8059
Normalise snap objects (#1114)
* Fix/recover download flag

why:
  The fetch indicator used to control the data download somehow got
  lost during re-org.

* Updated chronicles/logger topics

* Reorganised run state flags

why:
  The original code used a pair of boolean flags `(stopped,stopThisState)`
  which was translated to three states running, stoppedPending, and
  stopped. It is currently not clear whether collapsing some states was
  correct. So the original logic has been re-stored, albeit wrapped into
  directives like `isStopped()` etc.

also:
  Moving some function bodies in `worker.nim`

* Moved `reply_data.nim` and `validate_trienode.nim` to sub-directory `fetch_trie`

why:
  Only used in `fetch_trie.nim`.

* Move `fetch_*` file and directory objects to `fetch` subdirectory

why:
  Only used in `fetch.nim`

* Added start/stop and/or setup/release methods for all sub-modules

why:
  good housekeeping

also:
  updated getters/setters for ctrl states
  updated trace messages
2022-06-06 14:42:08 +01:00
Jordan Hrycaj 96bb09457e
Snap sync rename objects (#1099)
* Disentangle `collect` module from `reply_data`

why:
  Now the module visible from `collect` for fetching data is `peer/fetch`
  only.

* Merge `SnapPeerHunt` into `collect`

why:
  This part needs to be known by `collect`, only

* rename collect => worker

* Dissolve `sync_fetch_xdesc` module into `common`

why:
  Descriptor is only used in `common` and `fetch_trie`

* rename `snap/peer` directory => `snap/worker`

* rename `SnapSync` -> `Worker`, `SnapPeer` -> `WorkerBuddy`

* moved `snap/base_desc.nim` -> `snap/worker/worker_desc.nim`

* Unified opaque object ref naming in `worker_desc.nim`

details:
  indicated my inheriting module (exactly one, always)
2022-05-24 09:07:39 +01:00