nimbus-eth2

Commit Graph

Author	SHA1	Message	Date
Etan Kissling	035ca015e6	continue validator duties if chain does not progress for a long time (#6101 ) Nimbus currently stops performing validator duties if the blockchain does not progress for `node.config.syncHorizon` slots. This means that the chain won't recover because no new blocks are proposed. To fix that, continue performing validator duties if no progress is registered for a long time, and none of our peers is indicating any progress.	2024-03-20 03:23:53 +01:00
Etan Kissling	f54fa083b4	fix EIP-7044 implementation when using batch verification (#5953 ) In #5120, EIP-7044 support got added to the state transition function to force `CAPELLA_FORK_VERSION` to be used when validiting `VoluntaryExit` messages, irrespective of their `epoch`. In #5637, similar logic was added when batch verifying BLS signatures, which is used during gossip validation (libp2p gossipsub, and req/resp). However, that logic did not match the one introduced in #5120, and only uses `CAPELLA_FORK_VERSION` when a `VoluntaryExit`'s `epoch` was set to a value `>= CAPELLA_FORK_EPOCH`. Otherwise, `BELLATRIX_FORK_VERSION` would still be used when validating `VoluntaryExit`, e.g., with `epoch` set to `0`, as is the case in this Holesky block: - https://holesky.beaconcha.in/slot/1076985#voluntary-exits Extracting the correct logic from #5120 into a function, and reusing it when verifying BLS signatures fixes this issue, and also leverages the exhaustive EF test suite that covers the (correct) #5120 logic. This fix only affects networks that have EIP-7044 applied (post-Deneb). Without the fix, Deneb blocks with a `VoluntaryExit` with `epoch` set to `< CAPELLA_FORK_EPOCH` incorrectly fail to validate despite being valid. Incorrect blocks that contain a malicious `VoluntaryExit` with `epoch` set to `< CAPELLA_FORK_EPOCH` and signed using `BELLATRIX_FORK_VERSION` _would_ pass the BLS verification stage, but subsequently fail the state transition logic. Such blocks would still correctly be labeled invalid.	2024-02-25 15:25:26 +01:00
tersec	a4680cb7fa	refactor addHeadBlock() to research/ and tests/ helper (#5874 ) * refactor addHeadBlock() to research/ and tests/ helper * rm now-dead code	2024-02-09 23:46:51 +00:00
Etan Kissling	7c53841cd8	Revert "Revert "fix checkpoint block potentially not getting backfilled into DB (#5863 )" (#5871 )" (#5875 ) This reverts commit `1575478b72`.	2024-02-09 20:44:54 +01:00
tersec	1575478b72	Revert "fix checkpoint block potentially not getting backfilled into DB (#5863 )" (#5871 ) This reverts commit `65e6f892de`.	2024-02-09 12:49:07 +00:00
Etan Kissling	65e6f892de	fix checkpoint block potentially not getting backfilled into DB (#5863 ) When using checkpoint sync, only checkpoint state is available, block is not downloaded and backfilled later. `dag.backfill` tracks latest filled `slot`, and latest `parent_root` for which no block has been synced yet. In checkpoint sync, this assumption is broken, because there, the start `dag.backfill.slot` is set based on checkpoint state slot, and the block is also not available. However, sync manager in backward mode also requests `dag.backfill.slot` and `block_clearance` then backfills the checkpoint block once it is synced. But, there is no guarantee that a peer ever sends us that block. They could send us all parent blocks and solely omit the checkpoint block itself. In that situation, we would accept the parent blocks and advance `dag.backfill`, and subsequently never request the checkpoint block again, resulting in gap inside blocks DB that is never filled. To mitigate that, the assumption is restored that `dag.backfill.slot` is the latest filled `slot`, and `dag.backfill.parent_root` is the next block that needs to be synced. By setting `slot` to `tail.slot + 1` and `parent_root` to `tail.root`, we put a fake summary into `dag.backfill` so that `block_clearance` only proceeds once checkpoint block exists.	2024-02-09 11:20:36 +01:00
tersec	cf1bec7670	update some deprecated stew/results to results imports (#5743 )	2024-01-16 22:37:14 +00:00
Jacek Sieka	62cbdeefc5	verify `genesis_time` more strictly (fixes #1667 ) (#5694 ) Bogus values lead to crashes down the line when timers overflow	2024-01-06 15:26:56 +01:00
tersec	6a9d522705	Apply EIP-7044 to block signature batch verification (#5637 )	2023-12-01 14:44:45 +00:00
tersec	115ffa70eb	rm unused code (#5623 )	2023-11-25 12:09:18 +00:00
tersec	152dd74179	propagate newPayload-VALID to block ancestors (#5343 )	2023-08-23 19:56:35 +00:00
Jacek Sieka	49729e1ef3	prevent concurrent `storeBlock` calls (fixes #5285 ) (#5295 ) When a block is introduced to the system both via REST and gossip at the same time, we will call `storeBlock` from two locations leading to a dupliace check race condition as we wait for the EL. This issue may manifest in particular when using an external block builder that itself publishes the block onto the gossip network. * refactor enqueue flow * simplify calling `addBlock` * complete request manager verifier future for blobless blocks * re-verify parent conditions before adding block among other things, it might have gone stale or finalized between one call and the other	2023-08-17 15:12:37 +02:00
Jacek Sieka	a2adbf809f	Perform block pre-check before validating execution (#5169 ) * Perform block pre-check before validating execution When syncing, blocks have not been gossip-validated and are therefore prone to trivial faults like being known-unviable, duplicate or missing their parent. In addition, the duplicate-block check in BlockProcessor was not considering the quarantine flow and would therefore cause recently-quarantined blocks to be silenty dropped when their parent appears delaying the sync end-game and thus causing longer startup resync time. This PR verifies trivial conditions before performing execution validation thus avoiding duplicates and missing parents alike. It also ensures that the fast-sync EL mode is used for finalized blocks even if the EL is timing out / slow to respond - this allows the CL to complete its sync faster and switch to "normal" lock-step at the head of the chain more quickly, thus also allowing the EL to access the latest consensensus information earlier. * oops * remove unused constant	2023-07-11 18:55:51 +02:00
tersec	cd087b9a43	replace `optimisticRoots` table with field in `BlockRef` (#4969 ) * replace optimisticRoots table with field in BlockRef * copyright year * mark finalized blocks as verified on load * Update beacon_chain/consensus_object_pools/block_dag.nim Co-authored-by: Etan Kissling <etan@status.im> * expand non-optimistic block checking to all pre-merge blocks; refactor markBlockVerified to use BlockRef rather than block root and remove superfluous caller in newPayload path replaced by addResolvedHeadBlock BlockRef construction * don't treat finalized block specially; VALID status is sticky --------- Co-authored-by: Etan Kissling <etan@status.im>	2023-05-20 12:18:51 +00:00
Jacek Sieka	09a69c7b07	better batch sig verification failure message	2023-05-12 08:18:13 +02:00
Etan Kissling	e6e4ba9de6	clean up redundant tests and config (#4836 ) The consensus-spec-tests already cover the scenarios of our custom test runner, so the custom tests can be removed. Also cleans up unused config flags and related unreachable logic.	2023-04-18 21:26:36 +02:00
Etan Kissling	ad118cd354	rename `stateFork` > `consensusFork` (#4718 ) Just the variable, not yet `lcDataForkAtStateFork` / `atStateFork`. - Shorten comment in `light_client.nim` to keep line width - Do not rename `stateFork` mention in `runProposalForkchoiceUpdated`. - Do not rename `stateFork` in `getStateField(dag.headState, fork)` Rest is just a mechanical mass replace	2023-03-11 00:35:52 +00:00
tersec	3b41e6a0e7	rename ConsensusFork.EIP4844 to ConsensusFork.Deneb (#4692 )	2023-03-04 13:35:39 +00:00
tersec	ea060de6d4	more eip4844 -> deneb module references (#4690 )	2023-03-02 21:09:24 +01:00
tersec	a382498cfe	batch-verify BLS to execution change messages (#4637 )	2023-02-17 13:35:12 +00:00
tersec	0fb726c420	`BeaconStateFork/BeaconBlockFork` -> `ConsensusFork` (#4560 ) * `BeaconStateFork/BeaconBlockFork` -> `ConsensusFork` * revert unrelated change * revert unrelated changes * update test summaries	2023-01-28 19:53:41 +00:00
tersec	aacc8d702d	remove Nim 1.2-compatible `push raise`s and update copyright notice years (#4528 )	2023-01-20 14:14:37 +00:00
Jacek Sieka	75c7195bfd	Backfill only up to MIN_EPOCHS_FOR_BLOCK_REQUESTS blocks (#4421 ) When backfilling, we only need to download blocks that are newer than MIN_EPOCHS_FOR_BLOCK_REQUESTS - the rest cannot reliably be fetched from the network and does not have to be provided to others. This change affects only trusted-node-synced clients - genesis sync continues to work as before (because it needs to construct a state by building it from genesis). Those wishing to complete a backfill should do so with era files instead.	2022-12-23 08:42:55 +01:00
tersec	dee5af58d6	eip4844 light client tests; avoid case object out-of-bound array reads (#4404 )	2022-12-08 17:21:53 +01:00
tersec	2932d3b808	extent `BeaconStateFork` enum (#4396 )	2022-12-07 16:47:23 +00:00
tersec	ec443601eb	implement capellaImplementationMissing points; don't track not-active validator duties (#4340 ) * implement several capellaImplementationMissing points * don't register validator activity for not-active validators * don't check validator indices already coming out of committees which exist; must be active validators, or else other deeper bugs	2022-11-22 13:56:05 +02:00
Etan Kissling	48994f67d3	rename `BlockError` -> `VerifierError` (#4310 ) We currently use `BlockError` for both beacon blocks and LC objects. In light of EIP4844, we will likely also use it for blob sidecars. To avoid confusion, renaming it to a more generic `VerifierError`, and update its documentation to be more generic. To avoid long lines as a followup, also renaming the `block_processor`'s `BlockProcessingCompleted.completed`->`ProcessingStatus.completed` and `BlockProcessingCompleted.notCompleted`->`ProcessingStatus.notCompleted`	2022-11-10 17:40:27 +00:00
Jacek Sieka	09ade6d33d	Make trusted node sync era-aware (#4283 ) This PR removes a bunch of code to make TNS aware of era files, avoiding a duplicated backfill when era files are available. * reuse chaindag for loading backfill state, replacing the TNS homebrew * fix era block iteration to skip empty slots * add tests for `can_advance_slots`	2022-11-10 10:44:47 +00:00
tersec	5b46f0b723	add Capella support to Forked* (#4276 ) * add Capella support to Forked* * remove cruft * add `OnForkyBlockAdded`	2022-11-02 16:23:30 +00:00
Jacek Sieka	819442acc3	Allow chain dag without genesis / block (#4230 ) * Allow chain dag without genesis / block This PR enables the initialization of the dag without access to blocks or genesis state - it is a prerequisite for implementing a number of interesting features: * checkpoint sync without any block download * pruning of blocks and states * backfill checkpoint block	2022-10-14 22:40:10 +03:00
Etan Kissling	6069003a1f	fix check for attaching to pre-finalized parent (#4161 ) When the BN's head is reorged while shut down, reloading the BN will not assign `BlockRef` to alternate branches. However, blocks from other branches are still present in the database, leading to their descendants incorrectly marked as `UnviableFork`. By restricting the check to blocks that have been finalized, they should be reported as `MissingParent` instead, eventually re-assigning a `BlockRef` to them.	2022-09-22 18:33:26 +00:00
tersec	19bf460a3b	more `withState` `state` -> `forkyState` (#4104 )	2022-09-10 08:12:07 +02:00
tersec	b60456fdf3	`withState`: `state` -> `forkyState` (#4038 )	2022-08-26 22:47:40 +00:00
Jacek Sieka	0d9fd54857	cache shuffling separately from other EpochRef data (fixes #2677 ) (#3990 ) In order to avoid full replays when validating attestations hailing from untaken forks, it's better to keep shufflings separate from `EpochRef` and perform a lookahead on the shuffling when processing the block that determines them. This also helps performance in the case where REST clients are trying to perform lookahead on attestation duties and decreases memory usage by sharing shufflings between EpochRef instances of the same dependent root.	2022-08-18 21:07:01 +03:00
Miran	dfd4afc9f2	compatibility with Nim 1.4+ (#3888 )	2022-07-29 10:53:42 +00:00
Etan Kissling	2a2bcea70d	group justified and finalized `Checkpoint` (#3841 ) The justified and finalized `Checkpoint` are frequently passed around together. This introduces a new `FinalityCheckpoint` data structure that combines them into one. Due to the large usage of this structure in fork choice, also took this opportunity to update fork choice tests to the latest v1.2.0-rc.1 spec. Many additional tests enabled, some need more work, e.g. EL mock blocks. Also implemented `discard_equivocations` which was skipped in #3661, and improved code reuse across fork choice logic while at it.	2022-07-06 13:33:02 +03:00
tersec	1221bb66e8	optimistic sync (#3793 ) * optimistic sync * flag that initially loaded blocks from database might need execution block root filled in * return optimistic status in REST calls * refactor blockslot pruning * ensure beacon_blocks_by_{root,range} do not provide optimistic blocks * handle forkchoice head being pre-merge with block being postmerge * re-enable blocking head updates on validator duties * fix is_optimistic_candidate_block per spec; don't crash with nil future * fix is_optimistic_candidate_block per spec; don't crash with nil future * mark blocks sans execution payloads valid during head update	2022-07-04 23:35:33 +03:00
Jacek Sieka	f31f52e24a	fix missing frontfill index (fixes #3658 ) (#3675 ) * fix key load duration log * log broken frontfill block root	2022-05-31 10:09:01 +02:00
tersec	61ba308e13	stylecheck fixes (#3593 )	2022-04-14 17:39:37 +02:00
Jacek Sieka	f70ff38b53	enable `styleCheck:usages` (#3573 ) Some upstream repos still need fixes, but this gets us close enough that style hints can be enabled by default. In general, "canonical" spellings are preferred even if they violate nep-1 - this applies in particular to spec-related stuff like `genesis_validators_root` which appears throughout the codebase.	2022-04-08 16:22:49 +00:00
Jacek Sieka	bc80ac3be1	harden REST API `atSlot` against non-finalized blocks (#3538 ) * harden validator API against pre-finalized slot requests * check `syncHorizon` when responding to validator api requests too far from `head` * limit state-id based requests to one epoch ahead of `head` * put historic data bounds on block/attestation/etc validator production API, preventing them from being used with already-finalized slots * add validator block smoke tests * make rest test create a new genesis with the tests running roughly in the first epoch to allow testing a few more boundary conditions	2022-03-23 12:42:16 +01:00
Jacek Sieka	4207b127f9	era: load blocks and states (#3394 ) * era: load blocks and states Era files contain finalized history and can be thought of as an alternative source for block and state data that allows clients to avoid syncing this information from the P2P network - the P2P network is then used to "top up" the client with the most recent data. They can be freely shared in the community via whatever means (http, torrent, etc) and serve as a permanent cold store of consensus data (and, after the merge, execution data) for history buffs and bean counters alike. This PR gently introduces support for loading blocks and states in two cases: block requests from rest/p2p and frontfilling when doing checkpoint sync. The era files are used as a secondary source if the information is not found in the database - compared to the database, there are a few key differences: * the database stores the block indexed by block root while the era file indexes by slot - the former is used only in rest, while the latter is used both by p2p and rest. * when loading blocks from era files, the root is no longer trivially available - if it is needed, it must either be computed (slow) or cached (messy) - the good news is that for p2p requests, it is not needed * in era files, "framed" snappy encoding is used while in the database we store unframed snappy - for p2p2 requests, the latter requires recompression while the former could avoid it * front-filling is the process of using era files to replace backfilling - in theory this front-filling could happen from any block and front-fills with gaps could also be entertained, but our backfilling algorithm cannot take advantage of this because there's no (simple) way to tell it to "skip" a range. * front-filling, as implemented, is a bit slow (10s to load mainnet): we load the full BeaconState for every era to grab the roots of the blocks - it would be better to partially load the state - as such, it would also be good to be able to partially decompress snappy blobs * lookups from REST via root are served by first looking up a block summary in the database, then using the slot to load the block data from the era file - however, there needs to be an option to create the summary table from era files to fully support historical queries To test this, `ncli_db` has an era file exporter: the files it creates should be placed in an `era` folder next to `db` in the data directory. What's interesting in particular about this setup is that `db` remains as the source of truth for security purposes - it stores the latest synced head root which in turn determines where a node "starts" its consensus participation - the era directory however can be freely shared between nodes / people without any (significant) security implications, assuming the era files are consistent / not broken. There's lots of future improvements to be had: * we can drop the in-memory `BlockRef` index almost entirely - at this point, resident memory usage of Nimbus should drop to a cool 500-600 mb * we could serve era files via REST trivially: this would drop backfill times to whatever time it takes to download the files - unlike the current implementation that downloads block by block, downloading an era at a time almost entirely cuts out request overhead * we can "reasonably" recreate detailed state history from almost any point in time, turning an O(slot) process into O(1) effectively - we'll still need caches and indices to do this with sufficient efficiency for the rest api, but at least it cuts the whole process down to minutes instead of hours, for arbitrary points in time * CI: ignore failures with Nim-1.6 (temporary) * test fixes Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>	2022-03-23 09:58:17 +01:00
Etan Kissling	fd1ffd62dd	update light client server for DAG failure modes (#3514 ) Gracefully handles the new failure modes recently introduced to the DAG as part of https://github.com/status-im/nimbus-eth2/pull/3513 Data that is deemed to exist but fails to load leads to an error log to avoid suppressing logic errors accidentally. In `verifyFinalization` mode, the assertions remain active.	2022-03-20 11:58:59 +01:00
Jacek Sieka	05ffe7b2bf	Prune `BlockRef` on finalization (#3513 ) Up til now, the block dag has been using `BlockRef`, a structure adapted for a full DAG, to represent all of chain history. This is a correct and simple design, but does not exploit the linearity of the chain once parts of it finalize. By pruning the in-memory `BlockRef` structure at finalization, we save, at the time of writing, a cool ~250mb (or 25%:ish) chunk of memory landing us at a steady state of ~750mb normal memory usage for a validating node. Above all though, we prevent memory usage from growing proportionally with the length of the chain, something that would not be sustainable over time - instead, the steady state memory usage is roughly determined by the validator set size which grows much more slowly. With these changes, the core should remain sustainable memory-wise post-merge all the way to withdrawals (when the validator set is expected to grow). In-memory indices are still used for the "hot" unfinalized portion of the chain - this ensure that consensus performance remains unchanged. What changes is that for historical access, we use a db-based linear slot index which is cache-and-disk-friendly, keeping the cost for accessing historical data at a similar level as before, achieving the savings at no percievable cost to functionality or performance. A nice collateral benefit is the almost-instant startup since we no longer load any large indicies at dag init. The cost of this functionality instead can be found in the complexity of having to deal with two ways of traversing the chain - by `BlockRef` and by slot. * use `BlockId` instead of `BlockRef` where finalized / historical data may be required * simplify clearance pre-advancement * remove dag.finalizedBlocks (~50:ish mb) * remove `getBlockAtSlot` - use `getBlockIdAtSlot` instead * `parent` and `atSlot` for `BlockId` now require a `ChainDAGRef` instance, unlike `BlockRef` traversal * prune `BlockRef` parents on finality (~200:ish mb) * speed up ChainDAG init by not loading finalized history index * mess up light client server error handling - this need revisiting :)	2022-03-17 17:42:56 +00:00
Jacek Sieka	c64bf045f3	remove StateData (#3507 ) One more step on the journey to reduce `BlockRef` usage across the codebase - this one gets rid of `StateData` whose job was to keep track of which block was last assigned to a state - these duties have now been taken over by `latest_block_root`, a fairly recent addition that computes this block root from state data (at a small cost that should be insignificant) 99% mechanical change.	2022-03-16 08:20:40 +01:00
Jacek Sieka	a3bd01b58d	move dependent root computations to `BeaconState` / `EpochRef` (#3478 ) * fewer deps on `BlockRef` traversal in anticipation of pruning * allows identifying EpochRef:s by their shuffling as a first step of * tighten error handling around missing blocks using the zero hash for signalling "missing block" is fragile and easy to miss - with checkpoint sync now, and pruning in the future, missing blocks become "normal".	2022-03-15 09:24:55 +01:00
Etan Kissling	ae408c279a	add option to collect light client data (#3474 ) Light clients require full nodes to serve additional data so that they can stay in sync with the network. This patch adds a new launch option `--import-light-client-data` to configure what data to make available. For now, data is only kept in memory; it is not persisted at this time. Note that data is only locally collected, a separate patch is needed to actually make it availble over the network. `--serve-light-client-data` will be used for serving data, but is not functional yet outside tests.	2022-03-11 21:28:10 +01:00
Jacek Sieka	12ed537f75	catch wrong-fork-blocks earlier (#3444 ) Can't apply a phase0 block to a later phase state and vice versa. Since instantiation has been a topic, pre/post c file size: ``` 424K @mspec@sstate_transition.nim.c 892K @mspec@sstate_transition_block.nim.c ``` ``` 288K @mspec@sstate_transition.nim.c 880K @mspec@sstate_transition_block.nim.c ```	2022-02-28 12:58:34 +00:00
Jacek Sieka	40a4c01086	chaindag: don't keep backfill block table in memory (#3429 ) This PR names and documents the concept of the archive: a range of slots for which we have degraded functionality in terms of historical access - in particular: * we don't support rewinding to states in this range * we don't keep an in-memory representation of the block dag The archive de-facto exists in a trusted-node-synced node, but this PR gives it a name and drops the in-memory digest index. In order to satisfy `GetBlocksByRange` requests, we ensure that we have blocks for the entire archive period via backfill. Future versions may relax this further, adding a "pre-archive" period that is fully pruned. During by-slot searches in the archive (both for libp2p and rest requests), an extra database lookup is used to covert the given `slot` to a `root` - future versions will avoid this using era files which natively are indexed by `slot`. That said, the lookup is quite fast compared to the actual block loading given how trivial the table is - it's hard to measure, even. A collateral benefit of this PR is that checkpoint-synced nodes will see 100-200MB memory usage savings, thanks to the dropped in-memory cache - future pruning work will bring this benefit to full nodes as well. * document chaindag storage architecture and assumptions * look up parent using block id instead of full block in clearance (future-proofing the code against a future in which blocks come from era files) * simplify finalized block init, always writing the backfill portion to db at startup (to ensure lookups work as expected) * preallocate some extra memory for finalized blocks, to avoid immediate realloc	2022-02-26 19:16:19 +01:00
Jacek Sieka	d583e8e4ac	Store finalized block roots in database (3s startup) (#3320 ) * Store finalized block roots in database (3s startup) When the chain has finalized a checkpoint, the history from that point onwards becomes linear - this is exploited in `.era` files to allow constant-time by-slot lookups. In the database, we can do the same by storing finalized block roots in a simple sparse table indexed by slot, bringing the two representations closer to each other in terms of conceptual layout and performance. Doing so has a number of interesting effects: * mainnet startup time is improved 3-5x (3s on my laptop) * the _first_ startup might take slightly longer as the new index is being built - ~10s on the same laptop * we no longer rely on the beacon block summaries to load the full dag - this is a lot faster because we no longer have to look up each block by parent root * a collateral benefit is that we no longer need to load the full summaries table into memory - we get the RSS benefits of #3164 without the CPU hit. Other random stuff: * simplify forky block generics * fix withManyWrites multiple evaluation * fix validator key cache not being updated properly in chaindag read-only mode * drop pre-altair summaries from `kvstore` * recreate missing summaries from altair+ blocks as well (in case database has lost some to an involuntary restart) * print database startup timings in chaindag load log * avoid allocating superfluos state at startup * use a recursive sql query to load the summaries of the unfinalized blocks	2022-01-30 18:51:04 +02:00

1 2

98 Commits