nimbus-eth2

Commit Graph

Author	SHA1	Message	Date
Etan Kissling	7c53841cd8	Revert "Revert "fix checkpoint block potentially not getting backfilled into DB (#5863 )" (#5871 )" (#5875 ) This reverts commit `1575478b72`.	2024-02-09 20:44:54 +01:00
tersec	1575478b72	Revert "fix checkpoint block potentially not getting backfilled into DB (#5863 )" (#5871 ) This reverts commit `65e6f892de`.	2024-02-09 12:49:07 +00:00
Etan Kissling	65e6f892de	fix checkpoint block potentially not getting backfilled into DB (#5863 ) When using checkpoint sync, only checkpoint state is available, block is not downloaded and backfilled later. `dag.backfill` tracks latest filled `slot`, and latest `parent_root` for which no block has been synced yet. In checkpoint sync, this assumption is broken, because there, the start `dag.backfill.slot` is set based on checkpoint state slot, and the block is also not available. However, sync manager in backward mode also requests `dag.backfill.slot` and `block_clearance` then backfills the checkpoint block once it is synced. But, there is no guarantee that a peer ever sends us that block. They could send us all parent blocks and solely omit the checkpoint block itself. In that situation, we would accept the parent blocks and advance `dag.backfill`, and subsequently never request the checkpoint block again, resulting in gap inside blocks DB that is never filled. To mitigate that, the assumption is restored that `dag.backfill.slot` is the latest filled `slot`, and `dag.backfill.parent_root` is the next block that needs to be synced. By setting `slot` to `tail.slot + 1` and `parent_root` to `tail.root`, we put a fake summary into `dag.backfill` so that `block_clearance` only proceeds once checkpoint block exists.	2024-02-09 11:20:36 +01:00
Etan Kissling	4266e16835	allow `getBlockIdAtSlot` to answer queries from available states (#5869 ) After checkpoint sync, historical block IDs cannot yet be queried. However, they are needed to compute dependent roots of `ShufflingRef`. To allow lookup, enable `getBlockIdAtSlot` to answer from compatible states in memory; as long as they descend from the finalized checkpoint and the requested slot is sufficiently recent, `block_roots` contains everything to recover `BlockSlotId` up to `SLOTS_PER_HISTORICAL_ROOT`. This is similar to how `attester_dependent_root` etc. are computed. This accelerates the first couple minutes of checkpoint sync on Mainnet, especially the time until finality advances past the synced checkpoint.	2024-02-09 11:13:00 +01:00
tersec	7fd8beb418	rm unused code in {ncli,research,tests}/ (#5809 )	2024-01-21 07:55:03 +01:00
Jacek Sieka	62cbdeefc5	verify `genesis_time` more strictly (fixes #1667 ) (#5694 ) Bogus values lead to crashes down the line when timers overflow	2024-01-06 15:26:56 +01:00
Etan Kissling	8cea8af620	fix startup after BN exited between head and finalized blocks updates (#5617 ) When the BN exits after writing new `head` to database, but before completing the `updateFinalizedBlocks` call, the database is slightly inconsistent due to the partial write. We currently fail to start up after that. Fix that by catching up on partial `updateFinalizedBlocks` tasks on start up, and add a test for this edge case.	2023-11-23 00:44:20 +01:00
tersec	54bdda13b4	rm unused code (#5596 )	2023-11-11 11:49:34 +03:00
Etan Kissling	eb35039704	allow higher `MIN_EPOCHS_FOR_BLOCK_REQUESTS` than safe minimum (#5590 ) Gnosis uses `MIN_EPOCHS_FOR_BLOCK_REQUESTS` = 33024, but the computed safe minimum (that Nimbus was using) is 2304. Relax the compatibility check to allow `MIN_EPOCHS_FOR_BLOCK_REQUESTS` above the safe minimum and honor `config.yaml` preferences for `MIN_EPOCHS_FOR_BLOCK_REQUESTS`.	2023-11-10 15:04:55 +00:00
tersec	447786518f	ShufflingRef approach to next-epoch validator duty calculation/prediction (#5414 ) * ShufflingRef approach to next-epoch validator duty calculation/prediction * refactor action_tracker.updateActions to take ShufflingRef + beacon_proposers; refactor maybeUpdateActionTrackerNextEpoch to be separate and reused function; add actual fallback logic * document one possible set of conditions * check epoch participation flags and inactivity scores to ensure no penalties and MAX_EFFECTIVE_BALANCE to ensure rewards don't matter * correctly (un)shuffle each proposer index * remove debugging assertion	2023-10-10 00:02:07 +00:00
Etan Kissling	e7bc41e005	`blck` --> `forkyBlck` when using `withBlck` / `withStateAndBlck` (#5451 ) For symmetry with `forkyState` when using `withState`, and to avoid problems with shadowing of `blck` when using `withBlck` in `template`, also rename the injected `blck` to `forkyBlck`. - https://github.com/nim-lang/Nim/issues/22698	2023-09-21 12:49:14 +02:00
Etan Kissling	176ea09c2b	cleanup `OnBlockAdded` usage (#5426 ) Reduce repetitiveness when using forked `OnBlockAdded` callbacks by introducing a template to obtain appropriate cb from `ConsensusFork`.	2023-09-13 17:57:54 +00:00
tersec	ff87ee9181	rm i386 test_blockchain_dag workaround (#5356 )	2023-08-25 15:24:56 +00:00
Jacek Sieka	b8a32419b8	async batch verification (+40% sig verification throughput) (#5176 ) * async batch verification When batch verification is done, the main thread is blocked reducing concurrency. With this PR, the new thread signalling primitive in chronos is used to offload the full batch verification process to a separate thread allowing the main threads to continue async operations while the other threads verify signatures. Similar to previous behavior, the number of ongoing batch verifications is capped to prevent runaway resource usage. In addition to the asynchronous processing, 3 addition changes help drive throughput: * A loop is used for batch accumulation: this prevents a stampede of small batches in eager mode where both the eager and the scheduled batch runner would pick batches off the queue, prematurely picking "fresh" batches off the queue * An additional small wait is introduced for small batches - this helps create slightly larger batches which make better used of the increased concurrency * Up to 2 batches are scheduled to the threadpool during high pressure, reducing startup latency for the threads Together, these changes increase attestation verification throughput under load up to 30%. * fixup * Update submodules * fix blst build issues (and a PIC warning) * bump --------- Co-authored-by: Zahary Karadjov <zahary@gmail.com>	2023-08-03 11:36:45 +03:00
tersec	846e7c585b	Revert "Revert "generalize `ShufflingRef` acceleration logic (#5197 )" (#5223 )" (#5225 ) This reverts commit `2ab4592a31`.	2023-07-31 13:11:45 +00:00
tersec	2ab4592a31	Revert "generalize `ShufflingRef` acceleration logic (#5197 )" (#5223 ) This reverts commit `eb3a30655b`.	2023-07-31 08:05:32 +02:00
Etan Kissling	eb3a30655b	generalize `ShufflingRef` acceleration logic (#5197 ) Split up the `ShufflingRef` acceleration logic into generically usable parts and attester shuffling specific parts. The generic parts could be used to accelerate other purposes, e.g., REST `/states/xxx/randao` API.	2023-07-20 10:25:39 +02:00
Etan Kissling	f98c33ad03	generalize `commonAncestor` function to `BlockId` (#5192 ) To enable additional use cases, e.g., `/states/###/randao` beacon API, `ShufflingRef` acceleration logic needs to be able to operate on parts of the DAG that do not have `BlockRef`. Changing `commonAncestor` to act on `BlockId` instead of `BlockRef` is a step toward that and also simplifies the logic some more.	2023-07-18 17:37:53 +02:00
Etan Kissling	2efc44a8ab	accelerate RANDAO computation for post-merge blocks (#5190 ) Post-merge blocks contain all information to directly obtain RANDAO without having to load any additional info. Take advantage of that to further accelerate `ShufflingRef` computation. Note that it is still necessary to verify that `blck` / `state` share a sufficiently recent ancestor for the purpose of computing attester shufflings. - new: 243.71s, 239.67s, 237.32s, 238.36s, 239.57s - old: 251.33s, 234.29s, 249.28s, 237.03s, 236.78s	2023-07-15 22:16:56 +02:00
Etan Kissling	74bb4b1411	simplify RANDAO recovery in `ShufflingRef` acceleration (#5183 ) Current RANDAO recovery logic is quite complex as it optimizes for the minimum amount of database reads. Loading blocks isn't the bottleneck though, so rather make the implementation more concise by avoiding the complex strategy planning step. Note that this also prepares for an even faster implementation for post-merge blocks in the future that extracts RANDAO from `ExecutionPayload` directly if available, so even in cases where efficiency is slightly lower, only historical data is affected. `time nim c -r tests/test_blockchain_dag` (cached binary): - new: 145.45s, 133.59s, 144.65s, 127.69s, 136.14s - old: 149.15s, 150.84s, 135.77s, 137.49s, 133.89s	2023-07-12 17:27:05 +02:00
Etan Kissling	5115aaedb7	early exit `commonAncestor` when comparing with `finalizedHead` (#5174 ) * early exit `commonAncestor` when comparing with `finalizedHead` As all `BlockRef` lead to `finalizedHead` (`parent == nil`), can shortcut in that situation and immediately return `finalizedHead` if passed as one of the arguments. * typo in comment * add test from #5152 Co-authored-by: tersec <tersec@users.noreply.github.com> * add note about test complexity * regenerate test summary --------- Co-authored-by: tersec <tersec@users.noreply.github.com>	2023-07-10 20:36:25 +00:00
Etan Kissling	2722778ce5	reduce `nim-eth` dependencies just for RNG (#5099 ) We have several modules that import `nim-eth` for the sole purpose of its `keys.newRng` function. This function is meanwhile a simple wrapper around `nim-bearssl`'s `HmacDrbgContext.new()`, so the import doesn't really serve a use anymore. Replace `keys.newRng` with the direct call to reduce `nim-eth` imports.	2023-06-19 22:43:50 +00:00
Etan Kissling	dbba003a38	Revert "Revert "accelerate `getShufflingRef` (#4911 )" (#4958 )" This reverts commit `748be8b67b`.	2023-05-15 17:41:40 +02:00
Etan Kissling	748be8b67b	Revert "accelerate `getShufflingRef` (#4911 )" (#4958 ) This reverts commit `ea97e93e74`.	2023-05-15 15:25:51 +00:00
henridf	573228ffa0	Rename eth1/ -> el/ and eth1_monitor.nim -> el_monitor.nim (#4944 )	2023-05-15 05:05:12 +00:00
Etan Kissling	ea97e93e74	accelerate `getShufflingRef` (#4911 ) When an uncached `ShufflingRef` is requested, we currently replay state which can take several seconds. Acceleration is possible by: 1. Start from any state with locked-in `get_active_validator_indices`. Any blocks / slots applied to such a state can only affect that result for future epochs, so are viable for querying target epoch. `compute_activation_exit_epoch(state.slot.epoch) > target.epoch` 2. Determine highest common ancestor among `state` and `target.blck`. At the ancestor slot, same rules re `get_active_validator_indices`. `compute_activation_exit_epoch(ancestorSlot.epoch) > target.epoch` 3. We now have a `state` that shares history with `target.blck` up through a common ancestor slot. Any blocks / slots that the `state` contains, which are not part of the `target.blck` history, affect `get_active_validator_indices` at epochs _after_ `target.epoch`. 4. Select `state.randao_mixes[N]` that is closest to common ancestor. Either direction is fine (above / below ancestor). 5. From that RANDAO mix, mix in / out all RANDAO reveals from blocks in-between. This is just an XOR operation, so fully reversible. `mix = mix xor SHA256(blck.message.body.randao_reveal)` 6. Compute the attester dependent slot from `target.epoch`. `if epoch >= 2: (target.epoch - 1).start_slot - 1 else: GENESIS_SLOT` 7. Trace back from `target.blck` to the attester dependent slot. We now have the destination for which we want to obtain RANDAO. 8. Mix in all RANDAO reveals from blocks up through the `dependentBlck`. Same method, no special handling necessary for epoch transitions. 9. Combine `get_active_validator_indices` from `state` at `target.epoch` with the recovered RANDAO value at `dependentBlck` to obtain the requested shuffling, and construct the `ShufflingRef` without replay. * more tests and simplify logic * test with different number of deposits per branch * Update beacon_chain/consensus_object_pools/blockchain_dag.nim Co-authored-by: Jacek Sieka <jacek@status.im> * `commonAncestor` tests * lint --------- Co-authored-by: Jacek Sieka <jacek@status.im>	2023-05-12 19:36:59 +02:00
tersec	d058aa09c8	more withdrowls (#4674 )	2023-03-02 17:13:35 +01:00
tersec	0fb726c420	`BeaconStateFork/BeaconBlockFork` -> `ConsensusFork` (#4560 ) * `BeaconStateFork/BeaconBlockFork` -> `ConsensusFork` * revert unrelated change * revert unrelated changes * update test summaries	2023-01-28 19:53:41 +00:00
Jacek Sieka	0ba9fc4ede	History pruning (fixes #4419 ) (#4445 ) Introduce (optional) pruning of historical data - a pruned node will continue to answer queries for historical data up to `MIN_EPOCHS_FOR_BLOCK_REQUESTS` epochs, or roughly 5 months, capping typical database usage at around 60-70gb. To enable pruning, add `--history=prune` to the command line - on the first start, old data will be cleared (which may take a while) - after that, data is pruned continuously. When pruning an existing database, the database will not shrink - instead, the freed space is recycled as the node continues to run - to free up space, perform a trusted node sync with a fresh database. When switching on archive mode in a pruned node, history is retained from that point onwards. History pruning is scheduled to be enabled by default in a future release. In this PR, `minimal` mode from #4419 is not implemented meaning retention periods for states and blocks are always the same - depending on user demand, a future PR may implement `minimal` as well.	2023-01-07 10:02:15 +00:00
Jacek Sieka	bd8f08204e	Implement skip_randao_verification for blinded blocks (#4435 ) * Implement skip_randao_verification for blinded blocks * fix redundant randao verification on block replay (5% faster) * check randao in REST instead of internally * avoid redundant copies when making blocks * cleanup leftover randao skipping code * fix test summary	2022-12-19 15:11:12 +02:00
tersec	4e71e77da7	structure for supporting capella block production (#4383 )	2022-12-02 08:39:01 +01:00
Jacek Sieka	cd160b5650	more strict read-only database mode (#4362 ) * avoid creating pre-altair backwards compatibility tables * allow running ncli_db era export without above tables present * drop unused pre-altair backwards compatibility tables * run benchmark on read-ronly database * fix running benchmark from genesis	2022-11-28 23:21:58 +00:00
Etan Kissling	48994f67d3	rename `BlockError` -> `VerifierError` (#4310 ) We currently use `BlockError` for both beacon blocks and LC objects. In light of EIP4844, we will likely also use it for blob sidecars. To avoid confusion, renaming it to a more generic `VerifierError`, and update its documentation to be more generic. To avoid long lines as a followup, also renaming the `block_processor`'s `BlockProcessingCompleted.completed`->`ProcessingStatus.completed` and `BlockProcessingCompleted.notCompleted`->`ProcessingStatus.notCompleted`	2022-11-10 17:40:27 +00:00
Jacek Sieka	d839b9d07e	State-only checkpoint state startup (#4251 ) Currently, we require genesis and a checkpoint block and state to start from an arbitrary slot - this PR relaxes this requirement so that we can start with a state alone. The current trusted-node-sync algorithm works by first downloading blocks until we find an epoch aligned non-empty slot, then downloads the state via slot. However, current [proposals](https://github.com/ethereum/beacon-APIs/pull/226) for checkpointing prefer finalized state as the main reference - this allows more simple access control and caching on the server side - in particular, this should help checkpoint-syncing from sources that have a fast `finalized` state download (like infura and teku) but are slow when accessing state via slot. Earlier versions of Nimbus will not be able to read databases created without a checkpoint block and genesis. In most cases, backfilling makes the database compatible except where genesis is also missing (custom networks). * backfill checkpoint block from libp2p instead of checkpoint source, when doing trusted node sync * allow starting the client without genesis / checkpoint block * perform epoch start slot lookahead when loading tail state, so as to deal with the case where the epoch start slot does not have a block * replace `--blockId` with `--state-id` in TNS command line * when replaying, also look at the parent of the last-known-block (even if we don't have the parent block data, we can still replay from a "parent" state) - in particular, this clears the way for implementing state pruning * deprecate `--finalized-checkpoint-block` option (no longer needed)	2022-11-02 10:02:38 +00:00
Jacek Sieka	819442acc3	Allow chain dag without genesis / block (#4230 ) * Allow chain dag without genesis / block This PR enables the initialization of the dag without access to blocks or genesis state - it is a prerequisite for implementing a number of interesting features: * checkpoint sync without any block download * pruning of blocks and states * backfill checkpoint block	2022-10-14 22:40:10 +03:00
Jacek Sieka	40bed02f60	Build block in parallel with attestation packing (#4185 ) * fix block proposal in first slot after checkpoint	2022-10-04 11:24:16 +00:00
Etan Kissling	5968ed586b	use LRU strategy for shuffling/epoch caches (#4196 ) When EL `newPayload` is slow (e.g., Raspberry Pi with Besu), the epoch and shuffling caches tend to fill up with multiple copies per epoch when processing gossip and performing validator duties close to wall slot. The old strategy of evicting oldest epoch led to the same item being evicted over and over, leading to blocking of over 5 minutes in extreme cases where alternate epochs/shuffling got loaded repeatedly. Changing the cache eviction strategy to least-recently-used seems to improve the situation drastically. A simple implementation was selected based on single linked-list without a hashtable.	2022-09-29 14:55:58 +00:00
tersec	1819d79e07	avoid potential database inconsistency after fcU `INVALID`+crash (#4192 ) * avoid database race-condition inconsistency after fcU `INVALID` then crash * ensure head doesn't fall behind finalized; add more tests for head movement/reloading DAG	2022-09-28 21:07:31 +00:00
Jacek Sieka	b1bc830a92	Harden EpochRef loading against bogus block root at tail (#4178 ) * add more error information when things go wrong with database * lower log level when reloading attestations from no-block epoch start slot	2022-09-27 18:56:08 +02:00
tersec	0f6d19b4b3	implement v1.2.0 optimistic sync tests (#4174 ) * implement v1.2.0 optimistic sync tests * Update beacon_chain/consensus_object_pools/blockchain_dag.nim Co-authored-by: Etan Kissling <etan@status.im> * `lvh` -> `latestValidHash` and only invalidate one specific block" * `getEarliestInvalidRoot` -> `getEarliestInvalidBlockRoot`; `defaultEarliestInvalidRoot` -> `defaultEarliestInvalidBlockRoot` Co-authored-by: Etan Kissling <etan@status.im>	2022-09-27 15:11:47 +03:00
Jacek Sieka	7f9af78ddb	test randao skippping (complements #3837 ) (#4179 )	2022-09-27 09:22:24 +02:00
Jacek Sieka	0d9fd54857	cache shuffling separately from other EpochRef data (fixes #2677 ) (#3990 ) In order to avoid full replays when validating attestations hailing from untaken forks, it's better to keep shufflings separate from `EpochRef` and perform a lookahead on the shuffling when processing the block that determines them. This also helps performance in the case where REST clients are trying to perform lookahead on attestation duties and decreases memory usage by sharing shufflings between EpochRef instances of the same dependent root.	2022-08-18 21:07:01 +03:00
tersec	8eb5d5de09	use ZERO_HASH for default(Eth2Digest)/Eth2Digest() in func calls (#3770 )	2022-06-18 04:57:37 +00:00
tersec	a07d14cd99	remove unused imports in tests/ (#3713 )	2022-06-07 17:05:06 +00:00
Jacek Sieka	e009728858	work around Nim assignment bug that breaks state pruning (#3545 ) See https://github.com/nim-lang/Nim/issues/19613	2022-03-24 14:37:37 +00:00
Jacek Sieka	13fafe3a40	simplify unviable head pruning (#3528 ) Also note bug that exists that potentially prevents states from being pruned correctly	2022-03-21 09:20:26 +00:00
Jacek Sieka	d0223d1f28	fix finalized epoch ref loading on checkpoint start (#3517 ) regression from #3513 that did not take tail into consideration when loading epoch ancestor	2022-03-18 13:13:57 +01:00
Jacek Sieka	05ffe7b2bf	Prune `BlockRef` on finalization (#3513 ) Up til now, the block dag has been using `BlockRef`, a structure adapted for a full DAG, to represent all of chain history. This is a correct and simple design, but does not exploit the linearity of the chain once parts of it finalize. By pruning the in-memory `BlockRef` structure at finalization, we save, at the time of writing, a cool ~250mb (or 25%:ish) chunk of memory landing us at a steady state of ~750mb normal memory usage for a validating node. Above all though, we prevent memory usage from growing proportionally with the length of the chain, something that would not be sustainable over time - instead, the steady state memory usage is roughly determined by the validator set size which grows much more slowly. With these changes, the core should remain sustainable memory-wise post-merge all the way to withdrawals (when the validator set is expected to grow). In-memory indices are still used for the "hot" unfinalized portion of the chain - this ensure that consensus performance remains unchanged. What changes is that for historical access, we use a db-based linear slot index which is cache-and-disk-friendly, keeping the cost for accessing historical data at a similar level as before, achieving the savings at no percievable cost to functionality or performance. A nice collateral benefit is the almost-instant startup since we no longer load any large indicies at dag init. The cost of this functionality instead can be found in the complexity of having to deal with two ways of traversing the chain - by `BlockRef` and by slot. * use `BlockId` instead of `BlockRef` where finalized / historical data may be required * simplify clearance pre-advancement * remove dag.finalizedBlocks (~50:ish mb) * remove `getBlockAtSlot` - use `getBlockIdAtSlot` instead * `parent` and `atSlot` for `BlockId` now require a `ChainDAGRef` instance, unlike `BlockRef` traversal * prune `BlockRef` parents on finality (~200:ish mb) * speed up ChainDAG init by not loading finalized history index * mess up light client server error handling - this need revisiting :)	2022-03-17 17:42:56 +00:00
Jacek Sieka	c64bf045f3	remove StateData (#3507 ) One more step on the journey to reduce `BlockRef` usage across the codebase - this one gets rid of `StateData` whose job was to keep track of which block was last assigned to a state - these duties have now been taken over by `latest_block_root`, a fairly recent addition that computes this block root from state data (at a small cost that should be insignificant) 99% mechanical change.	2022-03-16 08:20:40 +01:00
Jacek Sieka	a3bd01b58d	move dependent root computations to `BeaconState` / `EpochRef` (#3478 ) * fewer deps on `BlockRef` traversal in anticipation of pruning * allows identifying EpochRef:s by their shuffling as a first step of * tighten error handling around missing blocks using the zero hash for signalling "missing block" is fragile and easy to miss - with checkpoint sync now, and pruning in the future, missing blocks become "normal".	2022-03-15 09:24:55 +01:00

1 2

64 Commits