Split up the `ShufflingRef` acceleration logic into generically usable
parts and attester shuffling specific parts. The generic parts could be
used to accelerate other purposes, e.g., REST `/states/xxx/randao` API.
To enable additional use cases, e.g., `/states/###/randao` beacon API,
`ShufflingRef` acceleration logic needs to be able to operate on parts
of the DAG that do not have `BlockRef`. Changing `commonAncestor` to
act on `BlockId` instead of `BlockRef` is a step toward that and also
simplifies the logic some more.
Post-merge blocks contain all information to directly obtain RANDAO
without having to load any additional info. Take advantage of that to
further accelerate `ShufflingRef` computation. Note that it is still
necessary to verify that `blck` / `state` share a sufficiently recent
ancestor for the purpose of computing attester shufflings.
- new: 243.71s, 239.67s, 237.32s, 238.36s, 239.57s
- old: 251.33s, 234.29s, 249.28s, 237.03s, 236.78s
Current RANDAO recovery logic is quite complex as it optimizes for the
minimum amount of database reads. Loading blocks isn't the bottleneck
though, so rather make the implementation more concise by avoiding the
complex strategy planning step. Note that this also prepares for an even
faster implementation for post-merge blocks in the future that extracts
RANDAO from `ExecutionPayload` directly if available, so even in cases
where efficiency is slightly lower, only historical data is affected.
`time nim c -r tests/test_blockchain_dag` (cached binary):
- new: 145.45s, 133.59s, 144.65s, 127.69s, 136.14s
- old: 149.15s, 150.84s, 135.77s, 137.49s, 133.89s
* early exit `commonAncestor` when comparing with `finalizedHead`
As all `BlockRef` lead to `finalizedHead` (`parent == nil`),
can shortcut in that situation and immediately return `finalizedHead`
if passed as one of the arguments.
* typo in comment
* add test from #5152
Co-authored-by: tersec <tersec@users.noreply.github.com>
* add note about test complexity
* regenerate test summary
---------
Co-authored-by: tersec <tersec@users.noreply.github.com>
These tables can't be deleted from (read-only) and would be too slow to
delete from anyway due to the inefficient storage format in use.
* slow down startup clearing too
* remove unused del function
* replace optimisticRoots table with field in BlockRef
* copyright year
* mark finalized blocks as verified on load
* Update beacon_chain/consensus_object_pools/block_dag.nim
Co-authored-by: Etan Kissling <etan@status.im>
* expand non-optimistic block checking to all pre-merge blocks; refactor markBlockVerified to use BlockRef rather than block root and remove superfluous caller in newPayload path replaced by addResolvedHeadBlock BlockRef construction
* don't treat finalized block specially; VALID status is sticky
---------
Co-authored-by: Etan Kissling <etan@status.im>
When an uncached `ShufflingRef` is requested, we currently replay state
which can take several seconds. Acceleration is possible by:
1. Start from any state with locked-in `get_active_validator_indices`.
Any blocks / slots applied to such a state can only affect that
result for future epochs, so are viable for querying target epoch.
`compute_activation_exit_epoch(state.slot.epoch) > target.epoch`
2. Determine highest common ancestor among `state` and `target.blck`.
At the ancestor slot, same rules re `get_active_validator_indices`.
`compute_activation_exit_epoch(ancestorSlot.epoch) > target.epoch`
3. We now have a `state` that shares history with `target.blck` up
through a common ancestor slot. Any blocks / slots that the `state`
contains, which are not part of the `target.blck` history, affect
`get_active_validator_indices` at epochs _after_ `target.epoch`.
4. Select `state.randao_mixes[N]` that is closest to common ancestor.
Either direction is fine (above / below ancestor).
5. From that RANDAO mix, mix in / out all RANDAO reveals from blocks
in-between. This is just an XOR operation, so fully reversible.
`mix = mix xor SHA256(blck.message.body.randao_reveal)`
6. Compute the attester dependent slot from `target.epoch`.
`if epoch >= 2: (target.epoch - 1).start_slot - 1 else: GENESIS_SLOT`
7. Trace back from `target.blck` to the attester dependent slot.
We now have the destination for which we want to obtain RANDAO.
8. Mix in all RANDAO reveals from blocks up through the `dependentBlck`.
Same method, no special handling necessary for epoch transitions.
9. Combine `get_active_validator_indices` from `state` at `target.epoch`
with the recovered RANDAO value at `dependentBlck` to obtain the
requested shuffling, and construct the `ShufflingRef` without replay.
* more tests and simplify logic
* test with different number of deposits per branch
* Update beacon_chain/consensus_object_pools/blockchain_dag.nim
Co-authored-by: Jacek Sieka <jacek@status.im>
* `commonAncestor` tests
* lint
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
* Incremental pruning
When turning on pruning the first time the current pruning algorithm
will prune the full database at startup. This delays restart
unnecessarily, since all of the pruned space is not needed at once.
This PR introduces incremental pruning such that we will never prune
more than 32 blocks or the sync speed, whichever is higher.
This mode is expected to become default in a follow-up release.
Post-Capella, historical roots are computed from historical summaries
instead of being directly stored in the beacon state.
Slightly messy to pass both lists around - this is done to avoid
computing the historical root unnecessarily.
The consensus-spec-tests already cover the scenarios of our custom test
runner, so the custom tests can be removed. Also cleans up unused config
flags and related unreachable logic.
Just the variable, not yet `lcDataForkAtStateFork` / `atStateFork`.
- Shorten comment in `light_client.nim` to keep line width
- Do not rename `stateFork` mention in `runProposalForkchoiceUpdated`.
- Do not rename `stateFork` in `getStateField(dag.headState, fork)`
Rest is just a mechanical mass replace
* Local sim impovements
* Added support for running Capella and EIP-4844 simulations
by downloading the correct version of Geth.
* Added support for using Nimbus remote signer and Web3Signer.
Use 2 out of 3 threshold signing configuration in the mainnet
configuration and regular remote signing in the minimal one.
* The local testnet simulation can now use a payload builder.
This is currently not activated in CI due to lack of automated
procedures for installing third-party relays or builders.
You are adviced to use mergemock for now, but for most realistic
results, we can create a simple builder based on the nimbus-eth1
codebase that will be able to propose transactions from the regular
network mempool.
* Start the simulation from a merged state. This would allow us
to start removing pre-merge functionality such as the gossip
subsciption logic. The commit also removes the merge-forcing
hack installed after the TTD removal.
* Consolidate all the tools used in the local simulation into a
single `ncli_testnet` binary.
Other changes:
Renamed the `EIP_4844_FORK_*` config constants to `DENEB_FORK_*` as
this matches the latest spec and it's already used in the official
Sepolia config.
By pre-seeding the sync committee cache when applying blocks, we avoid a
significantly expensive validator set traversal / sync committee index
construction during sync / block application - 20-30% sync speedup
post-altair.
* also cache/reload total active balance for another cool 10%
Extends fork choice state to also track slot numbers to improve accuracy
of `/eth/v1/debug/fork_choice` endpoint. Autoenable this API on devnet,
and disable some extra checks on devnet to aid focused testing efforts.
Align fork choice pruning logic with API based on checkpoints vs root.
Introduce (optional) pruning of historical data - a pruned node will
continue to answer queries for historical data up to
`MIN_EPOCHS_FOR_BLOCK_REQUESTS` epochs, or roughly 5 months, capping
typical database usage at around 60-70gb.
To enable pruning, add `--history=prune` to the command line - on the
first start, old data will be cleared (which may take a while) - after
that, data is pruned continuously.
When pruning an existing database, the database will not shrink -
instead, the freed space is recycled as the node continues to run - to
free up space, perform a trusted node sync with a fresh database.
When switching on archive mode in a pruned node, history is retained
from that point onwards.
History pruning is scheduled to be enabled by default in a future
release.
In this PR, `minimal` mode from #4419 is not implemented meaning
retention periods for states and blocks are always the same - depending
on user demand, a future PR may implement `minimal` as well.
When not backfilling all the way to genesis (#4421), it becomes more
useful to start rebuilding the historical indices from an arbitrary
starting point.
To rebuild the index from non-genesis, a state and an unbroken block
history is needed - here, we allow loading the state from an era file
and recreating the history from there onwards.
* speed up partial era state loading
When backfilling, we only need to download blocks that are newer than
MIN_EPOCHS_FOR_BLOCK_REQUESTS - the rest cannot reliably be fetched from
the network and does not have to be provided to others.
This change affects only trusted-node-synced clients - genesis sync
continues to work as before (because it needs to construct a state by
building it from genesis).
Those wishing to complete a backfill should do so with era files
instead.
Trigger ANSI art on upgrade to Capella, similar to the merge.
Future extension could log blinking art when user successfully managed
to get BLS to Execution change included into a block for a validator.
Art created by http://beatscribe.com/ (beatscribe#1008 on Discord)
* Types and scaffolding for EIP-4844
This commit adds the EIP-4844 spec types, and fills in
scaffolding/boilerplate for the use of these types across the repo.
None of the actual EIP-4844 logic is introduced yet.
This follows the pattern used by @tersec when introducing Capella (#4276).
* use eth2-networks fork
* review feedback: add static check EIP4844_FORK_EPOCH == FAR_FUTURE_EPOCH
* review feedback: remove EIP4844 from /eth/v1/config/spec response
* Cleanup / review feedback
* Fix REST test
This PR removes a bunch of code to make TNS aware of era files, avoiding
a duplicated backfill when era files are available.
* reuse chaindag for loading backfill state, replacing the TNS homebrew
* fix era block iteration to skip empty slots
* add tests for `can_advance_slots`
Currently, we require genesis and a checkpoint block and state to start
from an arbitrary slot - this PR relaxes this requirement so that we can
start with a state alone.
The current trusted-node-sync algorithm works by first downloading
blocks until we find an epoch aligned non-empty slot, then downloads the
state via slot.
However, current
[proposals](https://github.com/ethereum/beacon-APIs/pull/226) for
checkpointing prefer finalized state as
the main reference - this allows more simple access control and caching
on the server side - in particular, this should help checkpoint-syncing
from sources that have a fast `finalized` state download (like infura
and teku) but are slow when accessing state via slot.
Earlier versions of Nimbus will not be able to read databases created
without a checkpoint block and genesis. In most cases, backfilling makes
the database compatible except where genesis is also missing (custom
networks).
* backfill checkpoint block from libp2p instead of checkpoint source,
when doing trusted node sync
* allow starting the client without genesis / checkpoint block
* perform epoch start slot lookahead when loading tail state, so as to
deal with the case where the epoch start slot does not have a block
* replace `--blockId` with `--state-id` in TNS command line
* when replaying, also look at the parent of the last-known-block (even
if we don't have the parent block data, we can still replay from a
"parent" state) - in particular, this clears the way for implementing
state pruning
* deprecate `--finalized-checkpoint-block` option (no longer needed)
* Allow chain dag without genesis / block
This PR enables the initialization of the dag without access to blocks
or genesis state - it is a prerequisite for implementing a number of
interesting features:
* checkpoint sync without any block download
* pruning of blocks and states
* backfill checkpoint block
When EL `newPayload` is slow (e.g., Raspberry Pi with Besu), the epoch
and shuffling caches tend to fill up with multiple copies per epoch when
processing gossip and performing validator duties close to wall slot.
The old strategy of evicting oldest epoch led to the same item being
evicted over and over, leading to blocking of over 5 minutes in extreme
cases where alternate epochs/shuffling got loaded repeatedly.
Changing the cache eviction strategy to least-recently-used seems to
improve the situation drastically. A simple implementation was selected
based on single linked-list without a hashtable.
* avoid database race-condition inconsistency after fcU `INVALID` then crash
* ensure head doesn't fall behind finalized; add more tests for head movement/reloading DAG
In order to avoid full replays when validating attestations hailing from
untaken forks, it's better to keep shufflings separate from `EpochRef`
and perform a lookahead on the shuffling when processing the block that
determines them.
This also helps performance in the case where REST clients are trying to
perform lookahead on attestation duties and decreases memory usage by
sharing shufflings between EpochRef instances of the same dependent
root.
The justified and finalized `Checkpoint` are frequently passed around
together. This introduces a new `FinalityCheckpoint` data structure that
combines them into one.
Due to the large usage of this structure in fork choice, also took this
opportunity to update fork choice tests to the latest v1.2.0-rc.1 spec.
Many additional tests enabled, some need more work, e.g. EL mock blocks.
Also implemented `discard_equivocations` which was skipped in #3661,
and improved code reuse across fork choice logic while at it.
* merge LC db into main BN db
To treat derived LC data similar to derived state caches, merge it into
the main beacon node DB.
* shorten table names, group with lc prefix
* optimistic sync
* flag that initially loaded blocks from database might need execution block root filled in
* return optimistic status in REST calls
* refactor blockslot pruning
* ensure beacon_blocks_by_{root,range} do not provide optimistic blocks
* handle forkchoice head being pre-merge with block being postmerge
* re-enable blocking head updates on validator duties
* fix is_optimistic_candidate_block per spec; don't crash with nil future
* fix is_optimistic_candidate_block per spec; don't crash with nil future
* mark blocks sans execution payloads valid during head update
* persist LC data across restarts
With the Altair spec `LightClientUpdate` structure taking its final form
it is finally possible to persist LC data across restarts without having
to worry about data migration due to spec changes. A separate `lcdataV1`
database is created in the `caches` subdirectory to hold known LC data.
A full database with default settings (129 periods) uses <15 MB disk.
* extend LC data DB rationale
* wording
* add `isSupportedBySQLite` helper and explicit return
* remove redundant `return`
Separate LC initialization options from the main ChainDAGRef options to
allow ChainDAGRef to treat them as opaque and reduce risk for conflicts
when extending those options in the future.