nimbus-eth1

Commit Graph

Author	SHA1	Message	Date
Jordan Hrycaj	f20f20f962	Prepare snap server client test scenario (#1483 ) * Enable `snap/1` accounts range service * Allow to change the garbage collector to `boehm` as a Makefile option. why: There is still an unsolved memory corruption problem that might be related to the standard `gc`. It seemingly goes away if the `gc` is changed to `boehm`. Specifying another `gc` on the make level simplifies debugging and development. * Code cosmetics details: * updated exception annotations * extracted `worker_desc.nim` from `full/worker.nim` * etc. * Implement option to state a sync modifier file why: This allows to specify extra sync type specific options which might change over time. This file is regularly checked for updates. * Implement a threshold when to suspend full syncing why: For a test scenario, a full sync beep may work as a local snap server. There is no need to download the full block chain. details: The file containing the pivot specs is specified by the `--sync-ctrl-file` option. It is regularly parsed for updates.	2023-03-02 09:57:58 +00:00
Jordan Hrycaj	89ae9621c4	Silence compiler gossip after nim upgrade (#1454 ) * Silence some compiler gossip -- part 1, tx_pool details: Mostly removing redundant imports and `Defect` tracer after switch to nim 1.6 * Silence some compiler gossip -- part 2, clique details: Mostly removing redundant imports and `Defect` tracer after switch to nim 1.6 * Silence some compiler gossip -- part 3, misc core details: Mostly removing redundant imports and `Defect` tracer after switch to nim 1.6 * Silence some compiler gossip -- part 4, sync details: Mostly removing redundant imports and `Defect` tracer after switch to nim 1.6 * Clique update why: Missing exception annotation	2023-01-30 22:10:23 +00:00
Kim De Mey	a669b51ec5	Bump Nim to 1.6 and resolve the related issues (#1445 ) Two unresolved items currently: - Three tests that are temporarily disabled as they fail in the macro_assembler code, which seems to be due to an ambigious identifier Stop (Ops and chronos ServerCommand enum). - i386 CI disabled as it fails at Nim compilation already. Failed tests where already ignored for this target.	2023-01-26 13:37:19 +01:00
Jordan Hrycaj	30135ab1ef	Simplify beacon stream pivot update (#1435 ) * Simplify pivot update why: No need to fetch the pivot header from the network when it can be be made available in the ivot cache also: Keep `txPool` update disabled while syncing * Cosmetics, tune down some logging noise * Support `snap/1` without `eth/6?` why: Eth is not needed here. * Snap is an (optional) extension of `eth` so: It it must be supported somehow. Nevertheless it will be currently unused in the snap syncer.	2023-01-18 08:31:57 +00:00
Jordan Hrycaj	707e47ac38	External beacon stream tracker (#1433 ) * Register external beacon stream header why: This will be used to sync the peers against. * Update total coverage book-keeping for 100% roll-over details: Provide commonly available/used function * Replace best pivot by beacon stream tracker details: Beacon stream header cache will be updated by external chain monitor via RPC. This cached header will then be used to sync the pivot.	2023-01-17 09:28:14 +00:00
Jordan Hrycaj	bd42ebb193	Snap sync refactor accounts healing (#1392 ) * Relocated mothballing (i.e. swap-in preparation) logic details: Mothballing was previously tested & started after downloading account ranges in `range_fetch_accounts`. Whenever current download or healing stops because of a pivot change, swap-in preparation is needed (otherwise some storage slots may get lost when swap-in takes place.) Also, `execSnapSyncAction()` has been moved back to `pivot_helper`. * Reorganised source file directories details: Grouped pivot focused modules into `pivot` directory * Renamed `checkNodes`, `sickSubTries` as `nodes.check`, `nodes.missing` why: Both lists are typically used together as pair. Renaming `sickSubTries` reflects moving away from a healing centric view towards a swap-in attitude. * Multi times coverage recording details: Per pivot account ranges are accumulated into coverage range set. This set fill eventually contain a singe range of account hashes [0..2^256] which amounts to 100% capacity. A counter has been added that is incremented whenever max capacity is reached. The accumulated range is then reset to empty. The effect of this setting is that the coverage can be evenly duplicated. So 200% would not accumulate on a particular region. * Update range length comparisons (mod 2^256) why: A range interval can have sizes 1..2^256 as it cannot be empty by definition. The number of points in a range intervals set can have 0..2^256 points. As the scalar range is a residue class modulo 2^256, the residue class 0 means length 2^256 for a range interval, but can be 0 or 2^256 for the number of points in a range intervals set. * Generalised `hexaryEnvelopeDecompose()` details: Compile the complement of the union of some (processed) intervals and express this complement as a list of envelopes of sub-tries. This facility is directly applicable to swap-in book-keeping. * Re-factor `swapIn()` why: Good idea but baloney implementation. The main algorithm is based on the generalised version of `hexaryEnvelopeDecompose()` which has been derived from this implementation. * Refactor `healAccounts()` using `hexaryEnvelopeDecompose()` as main driver why: Previously, the hexary trie was searched recursively for dangling nodes which has a poor worst case performance already when the trie is reasonably populated. The function `hexaryEnvelopeDecompose()` is a magnitude faster because it does not peruse existing sub-tries in order to find missing nodes although result is not fully compatible with the previous function. So recursive search is used in a limited mode only when the decomposer will not deliver a useful result. * Logging & maintenance fixes details: Preparation for abandoning buddy-global healing variables `node`, `resumeCtx`, and `lockTriePerusal`. These variable are trie-perusal centric which will be run on the back burner in favour of `hexaryEnvelopeDecompose()` which is used for accounts healing already.	2022-12-19 21:22:09 +00:00
Jordan Hrycaj	7688148565	Snap sync can start on saved checkpoint (#1327 ) * Stop negotiating pivot if peer repeatedly replies w/usesless answers why: There is some fringe condition where a peer replies with legit but useless empty headers repetely. This goes on until somebody stops. We stop now. * Rename `missingNodes` => `sickSubTries` why: These (probably missing) nodes represent in reality fully or partially missing sub-tries. The top nodes may even exist, e.g. as a shallow sub-trie. also: Keep track of account healing on/of by bool variable `accountsHealing` controlled in `pivot_helper.execSnapSyncAction()` * Add `nimbus` option argument `snapCtx` for starting snap recovery (if any) also: + Trigger the recovery (or similar) process from inside the global peer worker initialisation `worker.setup()` and not by the `snap.start()` function. + Have `runPool()` returned a `bool` code to indicate early stop to scheduler. * Can import partial snap sync checkpoint at start details: + Modified what is stored with the checkpoint in `snapdb_pivot.nim` + Will be loaded within `runDaemon()` if activated * Forgot to import total coverage range why: Only the top (or latest) pivot needs coverage but the total coverage is the list of all ranges for all pivots -- simply forgotten.	2022-11-25 14:56:42 +00:00
Jordan Hrycaj	bba1bea4c8	Snap sync state save (#1302 ) * Piecemeal trie inspection details: Trie inspection will stop after maximum number of nodes visited. The inspection can be resumed using the returned state from the last session. why: This feature allows for task switch between `piecemeal` sessions. * Extract pivot helper code from `worker.nim` => `pivot_helper.nim` * Accounts import will now return dangling paths from `proof` nodes why: With proper bookkeeping, this can be used to start healing without analysing the the probably full trie. * Update `unprocessed` account range handling why: More generally, the API of a pairs of unprocessed intervals favours the first set and not before that is exhausted the second set comes into play. This was unfortunately implemented which caused the ranges to be unnecessarily fractioned. Now the number of range interval typically remains in the lower single digit numbers. * Save sync state after end of downloading some accounts details: restore/resume to be implemented later	2022-11-16 23:51:06 +00:00
Jordan Hrycaj	9aa925cf36	Update sync scheduler (#1297 ) * Add `stop()` methods to shutdown to shutdown procedure why: Nasty behaviour when hitting Ctrl-C, otherwise * Add background service to sync scheduler why: The background service will be used for sync data import and recovery after restart. It is controlled by the sync scheduler for an easy turn/on off API. also: Simplified snap ticker time calc. * Fix typo	2022-11-14 14:13:00 +00:00
Jordan Hrycaj	e14fd4b96c	Prep for full sync after snap make 6 (#1291 ) * Update log ticker, using time interval rather than ticker count why: Counting and logging ticker occurrences is inherently imprecise. So time intervals are used. * Use separate storage tables for snap sync data * Left boundary proof update why: Was not properly implemented, yet. * Capture pivot in peer worker (aka buddy) tasks why: The pivot environment is linked to the `buddy` descriptor. While there is a task switch, the pivot may change. So it is passed on as function argument `env` rather than retrieved from the buddy at the start of a sub-function. * Split queues `fetchStorage` into `fetchStorageFull` and `fetchStoragePart` * Remove obsolete account range returned from `GetAccountRange` message why: Handler returned the wrong right value of the range. This range was for convenience, only. * Prioritise storage slots if the queue becomes large why: Currently, accounts processing is prioritised up until all accounts are downloaded. The new prioritisation has two thresholds for + start processing storage slots with a new worker + stop account processing and switch to storage processing also: Provide api for `SnapTodoRanges` pair of range sets in `worker_desc.nim` * Generalise left boundary proof for accounts or storage slots. why: Detailed explanation how this works is documented with `snapdb_accounts.importAccounts()`. Instead of enforcing a left boundary proof (which is still the default), the importer functions return a list of `holes` (aka node paths) found in the argument ranges of leaf nodes. This in turn is used by the book keeping software for data download. * Forgot to pass on variable in function wrapper also: + Start healing not before 99% accounts covered (previously 95%) + Logging updated/prettified	2022-11-08 18:56:04 +00:00
Jordan Hrycaj	a689e9185a	Prep for full sync after snap make 5 (#1286 ) * Update docu and logging * Extracted and updated constants from `worker_desc` into separate file * Update and re-calibrate communication error handling * Allow simplified pivot negotiation why: This feature allows to turn off pivot negotiation so that peers agree on a a pivot header. For snap sync with fast changing pivots this only throttles the sync process. The finally downloaded DB snapshot is typically a merged version of different pivot states augmented by a healing process. * Re-model worker queues for accounts download & healing why: Currently there is only one data fetch per download or healing task. This task is then repeated by the scheduler after a short time. In many cases, this short time seems enough for some peers to decide to terminate connection. * Update main task batch `runMulti()` details: The function `runMulti()` is activated in quasi-parallel mode by the scheduler. This function calls the download, healing and fast-sync functions. While in debug mode, after each set of jobs run by this function the database is analysed (by the `snapdb_check` module) and the result printed.	2022-11-01 15:07:44 +00:00
Jordan Hrycaj	1b4572ed3b	Prep for full sync after snap make 4 (#1282 ) * Re-arrange fetching storage slots in batch module why; Previously, fetching partial slot ranges first has a chance of terminating the worker peer 9due to network error) while there were many inheritable storage slots on the queue. Now, inheritance is checked first, then full slot ranges and finally partial ranges. * Update logging * Bundled node information for healing into single object `NodeSpecs` why: Previously, partial paths and node keys were kept in separate variables. This approach was error prone due to copying/reassembling function argument objects. As all partial paths, keys, and node data types are more or less handled as `Blob`s over the network (using Eth/6x, or Snap/1) it makes sense to hold these `Blob`s as named field in a single object (even if not all fields are active for the current purpose.) * For good housekeeping, using `NodeKey` type only for account keys why: previously, a mixture of `NodeKey` and `Hash256` was used. Now, only state or storage root keys use the `Hash256` type. * Always accept latest pivot (and not a slightly older one) why; For testing it was tried to use a slightly older pivot state root than available. Some anecdotal tests seemed to suggest an advantage so that more peers are willing to serve on that older pivot. But this could not be confirmed in subsequent tests (still anecdotal, though.) As a side note, the distance of the latest pivot to its predecessor is at least 128 (or whatever the constant `minPivotBlockDistance` is assigned to.) * Reshuffle name components for some file and function names why: Clarifies purpose: "storages" becomes: "storage slots" "store" becomes: "range fetch" * Stash away currently unused modules in sub-folder named "notused"	2022-10-27 14:49:28 +01:00
Jordan Hrycaj	82ceec313d	Prettify logging for snap sync environment (#1278 ) * Multiple storage batches at a time why: Previously only some small portion was processed at a time so the peer might have gone when the process was resumed at a later time * Renamed some field of snap/1 protocol response object why: Documented as `slots` is in reality a per-account list of slot lists. So the new name `slotLists` better reflects the nature of the beast. * Some minor healing re-arrangements for storage slot tries why; Resolving all complete inherited slots tries first in sync mode keeps the worker queues smaller which improves logging. * Prettify logging, comments update etc.	2022-10-21 20:29:42 +01:00
Jordan Hrycaj	c0d580715e	Remodel persistent snapdb access (#1274 ) * Re-model persistent database access why: Storage slots healing just run on the wrong sub-trie (i.e. the wrong key mapping). So get/put and bulk functions now use the definitions in `snapdb_desc` (earlier there were some shortcuts for `get()`.) * Fixes: missing return code, typo, redundant imports etc. * Remove obsolete debugging directives from `worker_desc` module * Correct failing unit tests for storage slots trie inspection why: Some pathological cases for the extended tests do not produce any hexary trie data. This is rightly detected by the trie inspection and the result checks needed to adjusted.	2022-10-20 17:59:54 +01:00
Jordan Hrycaj	85fdb61699	Prep for full sync after snap make 3 (#1270 ) * For snap sync, publish `EthWireRef` in sync descriptor why: currently used for noise control * Detect and reuse existing storage slots * Provide healing module for storage slots * Update statistic ticker (adding range factor for unprocessed storage) * Complete mere function for work item ranges why: Merging interval into existing partial item was missing * Show av storage queue lengths in ticker detail; Previous attempt shows average completeness which did not tell much * Correct the meaning of the storage counter (per pivot) detail: Is the # accounts that have a storage saved	2022-10-19 11:04:06 +01:00
Jordan Hrycaj	d53eacb854	Prep for full sync after snap (#1253 ) * Split fetch accounts into sub-modules details: There will be separated modules for accounts snapshot, storage snapshot, and healing for either. * Allow to rebase pivot before negotiated header why: Peers seem to have not too many snapshots available. By setting back the pivot block header slightly, the chances might be higher to find more peers to serve this pivot. Experiment on mainnet showed that setting back too much (tested with 1024), the chances to find matching snapshot peers seem to decrease. * Add accounts healing * Update variable/field naming in `worker_desc` for readability * Handle leaf nodes in accounts healing why: There is no need to fetch accounts when they had been added by the healing process. On the flip side, these accounts must be checked for storage data and the batch queue updated, accordingly. * Reorganising accounts hash ranges batch queue why: The aim is to formally cover as many accounts as possible for different pivot state root environments. Formerly, this was tried by starting the accounts batch queue at a random value for each pivot (and wrapping around.) Now, each pivot environment starts with an interval set mutually disjunct from any interval set retrieved with other pivot state roots. also: Stop fishing for more pivots in `worker` if 100% download is reached * Reorganise/update accounts healing why: Error handling was wrong and the (math. complexity of) whole process could be better managed. details: Much of the algorithm is now documented at the top of the file `heal_accounts.nim`	2022-10-08 18:20:50 +01:00
Jordan Hrycaj	eca5882238	Isolating sync action modules (#1249 ) * Miscellaneous updates TBC * Disentangled pivot2 module from snap why: Wrote as template on top of sync so it can be shared by fast and snap sync. * Renamed and relocated pivot sources * Integrated `best_pivot` module into full and snap sync why: Full sync used an older version of `best_pivot` * isolating download module from full sync why; might be shared with snap sync at a later stage	2022-09-30 09:22:14 +01:00
Jordan Hrycaj	72a31593a9	Snap fetch account storage data (#1211 ) * Removed database write comparison statistics * Provide life storage tests data details: database dumps on external repo `nimbus-eth1`-blobs` * Update hexary tree interpolation for storage bulk tests * fetch storage update	2022-09-02 19:16:09 +01:00
Jordan Hrycaj	de2c13e136	Update snap offline tests (#1199 ) * Re-implemented `hexaryFollow()` in a more general fashion details: + New name for re-implemented `hexaryFollow()` is `hexaryPath()` + Renamed `rTreeFollow()` as `hexaryPath()` why: Returning similarly organised structures, the results of the `hexaryPath()` functions become comparable when running over the persistent and the in-memory databases. * Added traversal functionality for persistent ChainDB * Using `Account` values as re-packed Blob * Repack samples as compressed data files * Produce test data details: + Can force pivot state root switch after minimal coverage. + For emulating certain network behaviour, downloading accounts stops for a particular pivot state root if 30% (some static number) coverage is reached. Following accounts are downloaded for a later pivot state root.	2022-08-24 14:44:18 +01:00
Jordan Hrycaj	f07945d37b	Misc snap sync updates (#1192 ) * Bump nim-stew why: Need fixed interval set * Keep track of accumulated account ranges over all state roots * Added comments and explanations to unit tests * typo	2022-08-17 08:30:11 +01:00
Jordan Hrycaj	7d7e26d45f	Experimental bulk loader tests (#1187 ) why: Rocksdb bulk loading might provide a slight advantage when loading larger data sets into the system	2022-08-12 16:42:07 +01:00
Jordan Hrycaj	5f0e89a41e	Snap accounts bulk import preparer (#1183 ) * Provided common scheduler API, applied to `full` sync * Use hexary trie as storage for proofs_db records also: + Store metadata with account for keeping track of account state + add iterator over accounts * Common scheduler API applied to `snap` sync * Prepare for accounts bulk import details: + Added some ad-hoc checks for proving accounts data received from the snap/1 (will be replaced by proper database version when ready) + Added code that dumps some of the received snap/1 data into a file (turned of by default, see `worker_desc.nim`)	2022-08-04 09:04:30 +01:00
Jordan Hrycaj	73b628491d	Clique snapshots reorg (#1169 ) * Add persistent snapshot size logging why: Suspecting too much space used snapshot statistic: [..] blockNumber=2214912 nSnaps=2236 snapsTotal=1.14m blockNumber=2215936 nSnaps=2237 snapsTotal=1.14m [..] Persisting blocks fromBlock=2216449 toBlock=2216640 36458496 datadir-nimbus-goerlish/data/nimbus/ * Replace legacy `lru_cache` by `keyed_queue` why: `keyed_queue` generalises `lru_cache` snapshot statistic: [..] blockNumber=2234368 nSnaps=2259 snapsTotal=1.15m blockNumber=2235392 nSnaps=2260 snapsTotal=1.15m [..] Persisting blocks fromBlock=2235649 toBlock=2235840 37627288 datadir-nimbus-goerlish/data/nimbus/ * Increase persistent snapshot storage interval by 300% snapshot statistic: [..] blockNumber=2232320 nSnaps=620 snapsTotal=0.30m blockNumber=2236416 nSnaps=621 snapsTotal=0.30m [..] Persisting blocks fromBlock=2237185 toBlock=2237376 37627288 datadir-nimbus-goerlish/data/nimbus/ * Cull legacy debugging environment for clique why: Chronicles provides a better choice (when properly set up)	2022-07-21 19:16:28 +01:00
Jordan Hrycaj	5d98f68c09	Sync update to work with sepolia reorgs (#1168 ) * Error return in `persistBlocks()` on initial `VmState` roblem why: previously threw an exception * Updated sync mode option why: using enum rather than bool => space for more * Added sync mode `full`, re-factued legacy sync also: rebased * Fix typo (crashes `pesistBlocks()` otherwise) also: rebase to master * Reduce log ticker noise by suppressing duplicate messages * Clarify staged queue overflow handling why: backtrack/re-org mode in `stageItem()` should be detected by both, the global indicator or the work item where it might have moved into. also: rebased	2022-07-21 13:14:41 +01:00
Jordan Hrycaj	134fe26997	Store proved snap accounts (#1145 ) * Relocated `IntervalSets` to nim-stew repo * Accumulate accounts on temporary kv-DB why: Explore the data as returned from snap/1. Will be converted to a `eth/db` next. details: Verify and accumulate per/state-root accounts downloaded via snap. also: Some unit tests * Replace `Table` by `TrieDatabaseRef` for accounts accumulator * update ticker statistics details: mean/variance based counter update * allow persistent db for proved accounts * rebase, and globally activate unit test * fix statistics	2022-07-01 12:42:17 +01:00
Jordan Hrycaj	c123e1eb93	Updated account scheduler (#1124 ) * Using `IntervalSet` type data for `LeafRange` * Updated log ticker * Update to `eth67` details: Disabled by default, use `ENABLE_LEGACY_ETH66=0` to enable No support for `Get/NodeData` dialogue via eth, anymore * Dissolved fetch/common.nim details; the log/ticker part becomes ticker.nim the interval range management is merged into fetch.nim * Updated account scheduler why: The previous scheduler fetched each account once (for different state roots.) The updated scheduler re-calibrates after a change of the state root and potentially (until told otherwise) fetches all possible accounts. * Fix `high(P)` fringe cases in `IntervalSet` handling why: The `high(P)` value for a point type `P` cannot be represented with half open intervals `[a,b)` for a,b points of `P`. So this single value needs extra treatment which was slightly wrong. * Updated docu/comments also: rebased * Update scheduler details: Change the `pivot` management when creating new accounts lists. It is strictly increasing (and wrapping around) depending on last updated accounts list.	2022-06-16 09:58:50 +01:00

26 Commits