nimbus-eth2

Commit Graph

Author	SHA1	Message	Date
tersec	f0ada15dac	automated CL spec ref URL updates from v1.1.9 to v1.1.10 (#3455 )	2022-03-02 10:00:21 +00:00
Jacek Sieka	40a4c01086	chaindag: don't keep backfill block table in memory (#3429 ) This PR names and documents the concept of the archive: a range of slots for which we have degraded functionality in terms of historical access - in particular: * we don't support rewinding to states in this range * we don't keep an in-memory representation of the block dag The archive de-facto exists in a trusted-node-synced node, but this PR gives it a name and drops the in-memory digest index. In order to satisfy `GetBlocksByRange` requests, we ensure that we have blocks for the entire archive period via backfill. Future versions may relax this further, adding a "pre-archive" period that is fully pruned. During by-slot searches in the archive (both for libp2p and rest requests), an extra database lookup is used to covert the given `slot` to a `root` - future versions will avoid this using era files which natively are indexed by `slot`. That said, the lookup is quite fast compared to the actual block loading given how trivial the table is - it's hard to measure, even. A collateral benefit of this PR is that checkpoint-synced nodes will see 100-200MB memory usage savings, thanks to the dropped in-memory cache - future pruning work will bring this benefit to full nodes as well. * document chaindag storage architecture and assumptions * look up parent using block id instead of full block in clearance (future-proofing the code against a future in which blocks come from era files) * simplify finalized block init, always writing the backfill portion to db at startup (to ensure lookups work as expected) * preallocate some extra memory for finalized blocks, to avoid immediate realloc	2022-02-26 19:16:19 +01:00
Jacek Sieka	adfe655b16	db: make block loading generic (#3413 ) Streamline lookup with Forky and BeaconBlockFork (then we can do the same for era) We use type to avoid conditionals, as fork is often already known at a "higher" level. * load blockid before loading block by root - this is needed to map root to slot and will eventually be done via block summary table for "old" blocks Co-authored-by: tersec <tersec@users.noreply.github.com>	2022-02-21 09:48:02 +01:00
tersec	79761c78a4	proc -> func, mainly in spec/state transition and adjecent modules (#3405 )	2022-02-17 11:53:55 +00:00
tersec	5eecb9a21f	rename no{R=>r}eturn, no{I=>i}init, short{l=>L}og, E{T=>t}h2Node, Beacon{c=>C}hainDB (#3403 )	2022-02-16 23:24:44 +01:00
Jacek Sieka	7db5647a6e	clean up / document init (#3387 ) * clean up / document init * drop `immutable_validators` data (pre-altair) * document versions where data is first added * avoid needlessly loading genesis block data on startup * add a few more internal database consistency checks * remove duplicate state root lookup on state load * comment	2022-02-16 16:44:04 +01:00
tersec	873a8ec1e6	use isZeroMemory for Eth2Digest comparisons (#3386 ) * use isZeroMemory for Eth2Digest comparisons * use Eth2Digest.isZero abstraction	2022-02-14 05:26:19 +00:00
Jacek Sieka	40fe8f5336	fix missing backfill when restarting node When node is restarted before backfill has started but after some blocks have finalized with forward sync, we would not start the backfill. * also clean up one last `SomeSome`	2022-02-11 23:08:50 +02:00
Zahary Karadjov	215caa21ae	Eth1 monitor fixes * Fix a resource leak introduced in https://github.com/status-im/nimbus-eth2/pull/3279 * Don't restart the Eth1 syncing proggress from scratch in case of monitor failures during Eth2 syncing. * Switch to the primary operator as soon as it is back online. * Log the web3 credentials in fewer places Other changes: The 'web3 test' command has been enhanced to obtain and print more data regarding the selected provider.	2022-02-03 14:01:55 +02:00
tersec	8e6a920bf4	rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH (#3350 ) * rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH * fix REST test rules	2022-02-02 14:06:55 +01:00
Jacek Sieka	ff4f2a6b6c	better log on finalized slot failure	2022-02-01 21:23:18 +01:00
Jacek Sieka	3df9ffca9f	val-mon: remove redundant `_total` suffix from counters It turns out nim-metrics adds this suffix on its own - it also turns out some of the names are non-conventional and need follow-up.	2022-01-31 18:51:24 +02:00
Jacek Sieka	ad327a8769	Fix counters in validator monitor totals mode (#3332 ) The current counters set gauges etc to the value of the _last_ validator to be processed - as the name of the feature implies, we should be using sums instead. * fix missing beacon state metrics on startup, pre-first-head-selection * fix epoch metrics not being updated on cross-epoch reorg	2022-01-31 08:36:29 +01:00
Jacek Sieka	d583e8e4ac	Store finalized block roots in database (3s startup) (#3320 ) * Store finalized block roots in database (3s startup) When the chain has finalized a checkpoint, the history from that point onwards becomes linear - this is exploited in `.era` files to allow constant-time by-slot lookups. In the database, we can do the same by storing finalized block roots in a simple sparse table indexed by slot, bringing the two representations closer to each other in terms of conceptual layout and performance. Doing so has a number of interesting effects: * mainnet startup time is improved 3-5x (3s on my laptop) * the _first_ startup might take slightly longer as the new index is being built - ~10s on the same laptop * we no longer rely on the beacon block summaries to load the full dag - this is a lot faster because we no longer have to look up each block by parent root * a collateral benefit is that we no longer need to load the full summaries table into memory - we get the RSS benefits of #3164 without the CPU hit. Other random stuff: * simplify forky block generics * fix withManyWrites multiple evaluation * fix validator key cache not being updated properly in chaindag read-only mode * drop pre-altair summaries from `kvstore` * recreate missing summaries from altair+ blocks as well (in case database has lost some to an involuntary restart) * print database startup timings in chaindag load log * avoid allocating superfluos state at startup * use a recursive sql query to load the summaries of the unfinalized blocks	2022-01-30 18:51:04 +02:00
tersec	29e2169585	phase 0 & altair beacon chain and altair validator spec URL updates (#3339 )	2022-01-29 13:53:31 +00:00
Jacek Sieka	e264276b36	keep unviables in quarantine (#3331 ) they remain unviable even after a reorg	2022-01-28 11:59:55 +01:00
Jacek Sieka	d076e1a11b	ncli_db: import states and blocks from era file (#3313 )	2022-01-25 09:28:26 +01:00
tersec	351c2fd48a	rename mergeData to bellatrixData and mergeFork to bellatrixFork (#3315 )	2022-01-24 16:23:13 +00:00
Jacek Sieka	61342c2449	limit by-root requests to non-finalized blocks (#3293 ) * limit by-root requests to non-finalized blocks Presently, we keep a mapping from block root to `BlockRef` in memory - this has simplified reasoning about the dag, but is not sustainable with the chain growing. We can distinguish between two cases where by-root access is useful: * unfinalized blocks - this is where the beacon chain is operating generally, by validating incoming data as interesting for future fork choice decisions - bounded by the length of the unfinalized period * finalized blocks - historical access in the REST API etc - no bounds, really In this PR, we limit the by-root block index to the first use case: finalized chain data can more efficiently be addressed by slot number. Future work includes: * limiting the `BlockRef` horizon in general - each instance is 40 bytes+overhead which adds up - this needs further refactoring to deal with the tail vs state problem * persisting the finalized slot-to-hash index - this one also keeps growing unbounded (albeit slowly) Anyway, this PR easily shaves ~128mb of memory usage at the time of writing. * No longer honor `BeaconBlocksByRoot` requests outside of the non-finalized period - previously, Nimbus would generously return any block through this libp2p request - per the spec, finalized blocks should be fetched via `BeaconBlocksByRange` instead. * return `Opt[BlockRef]` instead of `nil` when blocks can't be found - this becomes a lot more common now and thus deserves more attention * `dag.blocks` -> `dag.forkBlocks` - this index only carries unfinalized blocks from now - `finalizedBlocks` covers the other `BlockRef` instances * in backfill, verify that the last backfilled block leads back to genesis, or panic * add backfill timings to log * fix missing check that `BlockRef` block can be fetched with `getForkedBlock` reliably * shortcut doppelganger check when feature is not enabled * in REST/JSON-RPC, fetch blocks without involving `BlockRef` * fix dag.blocks ref	2022-01-21 13:33:16 +02:00
Dustin Brody	9699858422	rename MERGE_FORK_VERSION to BELLATRIX_FORK_VERSION	2022-01-20 19:33:05 +02:00
Jacek Sieka	570379d3d9	Backfiller (#3263 ) Backfilling is the process of downloading historical blocks via P2P that are required to fulfill `GetBlocksByRange` duties - this happens during both trusted node and finalized checkpoint syncs. In particular, backfilling happens after syncing to head, such that attestation work can start as soon as possible. * Fix SyncQueue initialization procedure. Remove usage of `awaitne`. Add cancellation support. Remove unneeded `sleepAsync()` if peer's head is older than needed. Add `direction` field to all logs. Fix syncmanager wedge issue. Add proper resource cleaning procedure on backward sync finish. Co-authored-by: cheatfate <eugene.kabanov@status.im>	2022-01-20 08:25:45 +01:00
tersec	9c0c9c98ce	complete switch to beacon_chain/specs/datatypes/bellatrix (#3295 )	2022-01-18 13:36:52 +00:00
Zahary Karadjov	47f1f7ff1a	More efficient reward data persistance; Address review comments The new format is based on compressed CSV files in two channels: * Detailed per-epoch data * Aggregated "daily" summaries The use of append-only CSV file speeds up significantly the epoch processing speed during data generation. The use of compression results in smaller storage requirements overall. The use of the aggregated files has a very minor cost in both CPU and storage, but leads to near interactive speed for report generation. Other changes: - Implemented support for graceful shut downs to avoid corrupting the saved files. - Fixed a memory leak caused by lacking `StateCache` clean up on each iteration. - Addressed review comments - Moved the rewards and penalties calculation code in a separate module Required invasive changes to existing modules: - The `data` field of the `KeyedBlockRef` type is made public to be used by the validator rewards monitor's Chain DAG update procedure. - The `getForkedBlock` procedure from the `blockchain_dag.nim` module is made public to be used by the validator rewards monitor's Chain DAG update procedure.	2022-01-18 01:56:56 +02:00
Zahary Karadjov	29aad0241b	Precise per-component ETH-denominated rewards tracking This is an alternative take on https://github.com/status-im/nimbus-eth2/pull/3107 that aims for more minimal interventions in the spec modules at the expense of duplicating more of the spec logic in ncli_db.	2022-01-18 01:56:56 +02:00
Jacek Sieka	836f6984bb	move `state_transition` to `Result` (#3284 ) * better error messages in api * avoid `BlockData` copies when replaying blocks	2022-01-17 12:19:58 +01:00
Jacek Sieka	805e85e1ff	time: spring cleaning (#3262 ) Time in the beacon chain is expressed relative to the genesis time - this PR creates a `beacon_time` module that collects helpers and utilities for dealing the time units - the new module does not deal with actual wall time (that's remains in `beacon_clock`). Collecting the time related stuff in one place makes it easier to find, avoids some circular imports and allows more easily identifying the code actually needs wall time to operate. * move genesis-time-related functionality into `spec/beacon_time` * avoid using `chronos.Duration` for time differences - it does not support negative values (such as when something happens earlier than it should) * saturate conversions between `FAR_FUTURE_XXX`, so as to avoid overflows * fix delay reporting in validator client so it uses the expected deadline of the slot, not "closest wall slot" * simplify looping over the slots of an epoch * `compute_start_slot_at_epoch` -> `start_slot` * `compute_epoch_at_slot` -> `epoch` A follow-up PR will (likely) introduce saturating arithmetic for the time units - this is merely code moves, renames and fixing of small bugs.	2022-01-11 11:01:54 +01:00
tersec	ae61512ee9	rename upgrade_to_{merge,bellatrix}; detect unchanging spec YAMLs (#3265 )	2022-01-10 09:39:43 +00:00
Jacek Sieka	6f7e0e3393	REST cleanups (#3255 ) * REST cleanups * reject out-of-range committee requests * print all hex values as lower-case * allow requesting state information by head state root * turn `DomainType` into array (follow spec) * `uint_to_bytesXX` -> `uint_to_bytes` (follow spec) * fix wrong dependent root in `/eth/v1/validator/duties/proposer/` * update documentation - `--subscribe-all-subnets` is no longer needed when using the REST interface with validator clients * more fixes * common helpers for dependent block * remove test rules obsoleted by more strict epoch tests * fix trailing commas * Update docs/the_nimbus_book/src/rest-api.md * Update docs/the_nimbus_book/src/rest-api.md Co-authored-by: sacha <sacha@status.im>	2022-01-08 22:06:34 +02:00
Jacek Sieka	ba99c8fe4f	update era file documentation / impl (#3226 ) Overhaul of era files, including documentation and reference implementations * store blocks, then state, then slot indices for easy lookup at low cost * document era file rationale * altair+ support in era writer	2022-01-07 11:13:19 +01:00
tersec	0fd8bf7b56	spec URL updates (#3254 )	2022-01-06 18:35:38 +00:00
Jacek Sieka	0a4728a241	Handle access to historical data for which there is no state (#3217 ) With checkpoint sync in particular, and state pruning in the future, loading states or state-dependent data may fail. This PR adjusts the code to allow this to be handled gracefully. In particular, the new availability assumption is that states are always available for the finalized checkpoint and newer, but may fail for anything older. The `tail` remains the point where state loading de-facto fails, meaning that between the tail and the finalized checkpoint, we can still get historical data (but code should be prepared to handle this as an error). However, to harden the code against long replays, several operations which are assumed to work only with non-final data (such as gossip verification and validator duties) now limit their search horizon to post-finalized data. * harden several state-dependent operations by logging an error instead of introducing a panic when state loading fails * `withState` -> `withUpdatedState` to differentiate from the other `withState` * `updateStateData` can now fail if no state is found in database - it is also hardened against excessively long replays * `getEpochRef` can now fail when replay fails * reject blocks with invalid target root - they would be ignored previously * fix recursion bug in `isProposed`	2022-01-05 19:38:04 +01:00
zah	fba1f08a5e	Implement #3129 (Optimized history traversals in the REST API) (#3219 ) * Fix REST some rest call signatures and implement a simple API benchmark tool * Implement #3129 (Optimized history traversals in the REST API) Other notable changes: The `updateStateData` procedure in the `blockchain_dag.nim` module is optimized to not rewind down to the last snapshot state saved in the database if the supplied input state can be used as a starting point instead. * Disallow await in withStateForBlockSlot	2022-01-05 15:49:10 +01:00
tersec	5878d34117	rename forkDigests.merge to forkDigests.bellatrix (#3245 )	2022-01-05 14:24:15 +00:00
tersec	b81c06edab	rename Beacon{Block,State}Fork.Merge to Bellatrix; update copyright years (#3240 )	2022-01-04 09:45:38 +00:00
Jacek Sieka	c4ce59e55b	Assorted logging improvements (#3237 ) * log doppelganger detection when it activates and when it causes missed duties * less prominent eth1 sync progress * log in-progress sync at notice only when actually missing duties * better detail in replay log * don't log finalization checkpoints - this is quite verbose when syncing and already included in "Slot start"	2022-01-03 22:18:49 +01:00
Jacek Sieka	7ec97a6b35	Fix missing checkpoint states` (#3225 ) With the right sequence of events (for example a REST request or a validation), it can happen that the first traversal across a state checkpoint boundary is done without storing that state on disk - this causes problens when replaying states, because now states may be missing from the database. Here, we simply avoid using the caches when advancing a state that will go into the database, ensuring that the information lost during caching always is permanently stored. * fix recursion bug in `isProposed`	2021-12-30 12:33:03 +01:00
tersec	0d4e49f946	Merge fork gossip support (#3213 ) * Merge fork gossip support * index directly by BeaconStateFork and remove debugging log statement	2021-12-21 15:24:23 +01:00
Jacek Sieka	1021e3324e	Revert writing backfill root to database (#3215 ) Introduced in #3171, it turns out we can just follow the block headers to achieve the same effect * leaves the constant in the code so as to avoid confusion when reading database that had the constant written (such as the fleet nodes and other unstable users)	2021-12-21 11:40:14 +01:00
Jacek Sieka	c270ec21e4	Validator monitoring (#2925 ) Validator monitoring based on and mostly compatible with the implementation in Lighthouse - tracks additional logs and metrics for specified validators so as to stay on top on performance. The implementation works more or less the following way: * Validator pubkeys are singled out for monitoring - these can be running on the node or not * For every action that the validator takes, we record steps in the process such as messages being seen on the network or published in the API * When the dust settles at the end of an epoch, we report the information from one epoch before that, which coincides with the balances being updated - this is a tradeoff between being correct (waiting for finalization) and providing relevant information in a timely manner)	2021-12-20 20:20:31 +01:00
Jacek Sieka	03005f48e1	Backfill support for ChainDAG (#3171 ) In the ChainDAG, 3 block pointers are kept: genesis, tail and head. This PR adds one more block pointer: the backfill block which represents the block that has been backfilled so far. When doing a checkpoint sync, a random block is given as starting point - this is the tail block, and we require that the tail block has a corresponding state. When backfilling, we end up with blocks without corresponding states, hence we cannot use `tail` as a backfill pointer - there is no state. Nonetheless, we need to keep track of where we are in the backfill process between restarts, such that we can answer GetBeaconBlocksByRange requests. This PR adds the basic support for backfill handling - it needs to be integrated with backfill sync, and the REST API needs to be adjusted to take advantage of the new backfilled blocks when responding to certain requests. Future work will also enable moving the tail in either direction: * pruning means moving the tail forward in time and removing states * backwards means recreating past states from genesis, such that intermediate states are recreated step by step all the way to the tail - at that point, tail, genesis and backfill will match up. * backfilling is done when backfill != genesis - later, this will be the WSS checkpoint instead	2021-12-13 14:36:06 +01:00
Jacek Sieka	9f27f0d97c	BlockId reform (#3176 ) * BlockId reform Introduce `BlockId` that helps track a root/slot pair - this prepares the codebase for backfilling and handling out-of-dag blocks * move block dag code to separate module * fix finalised state root in REST event stream * fix finalised head computation on head update, when starting from checkpoint * clean up chaindag init * revert `epochAncestor` change in introduced in #3144 that would return an epoch ancestor from the canoncial history instead of the given history, causing `EpochRef` keys to point to the wrong block	2021-12-09 19:06:21 +02:00
Jacek Sieka	069bccd51b	batch-verify sync messages for a small perf boost (#3151 ) * batch-verify sync messages for a small perf boost Generally reuses the same structure as attestation and aggregate verification * normalize `signatures` and `signature_batch` to use the same pattern of verification * normalize parameter names, order etc for signature stuff in general * avoid calling `blsSign` directly - instead, go through `signatures` consistently	2021-12-09 14:56:54 +02:00
tersec	2ca28fb861	Merge BeaconBlock gossip validation (#3165 ) * Merge BeaconBlock gossip validation * figure/ground inversion * revert cosmetic cleanups to reduce merge conflicts	2021-12-08 17:29:22 +00:00
Etan Kissling	38e64b3441	cleanup sync subcommittee accessors This removes some dead code from `getSubcommitteePositionsAux` which is no longer needed since the introduction of `SyncCommitteeCache`. This also cleans up some formatting, uses `let` instead of `var` where possible, and uses implicit `pairs` in one case for consistency.	2021-12-07 18:17:03 +02:00
Jacek Sieka	89d6a1b403	Introduce slot->BlockRef mapping for finalized chain (#3144 ) * Introduce slot->BlockRef mapping for finalized chain The finalized chain is linear, thus we can use a seq to lookup blocks by slot number. Here, we introduce such a seq, even though in the future, it should likely be backed by a database structure instead, or, more likely, a flat era file with a flat lookup index. This dramatically speeds up requests by slot, such as those coming from the REST interface or GetBlocksByRange, as these are currently served by a linear iteration from head. * fix REST block requests to not return blocks from an earlier slot when the given slot is empty * fix StateId interpretation such that it doesn't treat state roots as block roots * don't load full block from database just to return its root	2021-12-06 20:52:35 +02:00
Jacek Sieka	1a8b7469e3	move quarantine outside of chaindag (#3124 ) * move quarantine outside of chaindag The quarantine has been part of the ChainDAG for the longest time, but this design has a few issues: * the function in which blocks are verified and added to the dag becomes reentrant and therefore difficult to reason about - we're currently using a stateful flag to work around it * quarantined blocks bypass the processing queue leading to a processing stampede * the quarantine flow is unsuitable for orphaned attestations - these should also should be quarantined eventually Instead of processing the quarantine inside ChainDAG, this PR moves re-queueing to `block_processor` which already is responsible for dealing with follow-up work when a block is added to the dag This sets the stage for keeping attestations in the quarantine as well. Also: * make `BlockError` `{.pure.}` * avoid use of `ValidationResult` in block clearance (that's for gossip)	2021-12-06 10:49:01 +01:00
tersec	4378f3f096	almost all remaining ethereum/{eth2.0-specs -> consensus-specs} (#3158 )	2021-12-03 20:01:13 +00:00
Jacek Sieka	aa1dea03cd	speed up gossip and sync block validation (#3143 ) * avoid recomputing hash for block signature check * check block slot match before hitting the database	2021-12-01 10:52:40 +01:00
Etan Kissling	eb777a6c8b	allow `withState` to be called multiple times This allows `blockchain_dag`'s `withState` template to be called more than once in a single function. This led to a compilation error before because the injected variables and functions shared the same scope.	2021-11-29 15:24:12 +02:00
Jacek Sieka	9c2f43ed0e	Speed up altair block processing 2x (#3115 ) * Speed up altair block processing >2x Like #3089, this PR drastially speeds up historical REST queries and other long state replays. * cache sync committee validator indices * use ~80mb less memory for validator pubkey mappings * batch-verify sync aggregate signature (fixes #2985) * document sync committee hack with head block vs sync message block * add batch signature verification failure tests Before: ``` ../env.sh nim c -d:release -r ncli_db --db:mainnet_0/db bench --start-slot:-1000 All time are ms Average, StdDev, Min, Max, Samples, Test Validation is turned off meaning that no BLS operations are performed 5830.675, 0.000, 5830.675, 5830.675, 1, Initialize DB 0.481, 1.878, 0.215, 59.167, 981, Load block from database 8422.566, 0.000, 8422.566, 8422.566, 1, Load state from database 6.996, 1.678, 0.042, 14.385, 969, Advance slot, non-epoch 93.217, 8.318, 84.192, 122.209, 32, Advance slot, epoch 20.513, 23.665, 11.510, 201.561, 981, Apply block, no slot processing 0.000, 0.000, 0.000, 0.000, 0, Database load 0.000, 0.000, 0.000, 0.000, 0, Database store ``` After: ``` 7081.422, 0.000, 7081.422, 7081.422, 1, Initialize DB 0.553, 2.122, 0.175, 66.692, 981, Load block from database 5439.446, 0.000, 5439.446, 5439.446, 1, Load state from database 6.829, 1.575, 0.043, 12.156, 969, Advance slot, non-epoch 94.716, 2.749, 88.395, 100.026, 32, Advance slot, epoch 11.636, 23.766, 4.889, 205.250, 981, Apply block, no slot processing 0.000, 0.000, 0.000, 0.000, 0, Database load 0.000, 0.000, 0.000, 0.000, 0, Database store ``` * add comment	2021-11-24 13:43:50 +01:00

1 2 3

125 Commits