When an uncached `ShufflingRef` is requested, we currently replay state
which can take several seconds. Acceleration is possible by:
1. Start from any state with locked-in `get_active_validator_indices`.
Any blocks / slots applied to such a state can only affect that
result for future epochs, so are viable for querying target epoch.
`compute_activation_exit_epoch(state.slot.epoch) > target.epoch`
2. Determine highest common ancestor among `state` and `target.blck`.
At the ancestor slot, same rules re `get_active_validator_indices`.
`compute_activation_exit_epoch(ancestorSlot.epoch) > target.epoch`
3. We now have a `state` that shares history with `target.blck` up
through a common ancestor slot. Any blocks / slots that the `state`
contains, which are not part of the `target.blck` history, affect
`get_active_validator_indices` at epochs _after_ `target.epoch`.
4. Select `state.randao_mixes[N]` that is closest to common ancestor.
Either direction is fine (above / below ancestor).
5. From that RANDAO mix, mix in / out all RANDAO reveals from blocks
in-between. This is just an XOR operation, so fully reversible.
`mix = mix xor SHA256(blck.message.body.randao_reveal)`
6. Compute the attester dependent slot from `target.epoch`.
`if epoch >= 2: (target.epoch - 1).start_slot - 1 else: GENESIS_SLOT`
7. Trace back from `target.blck` to the attester dependent slot.
We now have the destination for which we want to obtain RANDAO.
8. Mix in all RANDAO reveals from blocks up through the `dependentBlck`.
Same method, no special handling necessary for epoch transitions.
9. Combine `get_active_validator_indices` from `state` at `target.epoch`
with the recovered RANDAO value at `dependentBlck` to obtain the
requested shuffling, and construct the `ShufflingRef` without replay.
* more tests and simplify logic
* test with different number of deposits per branch
* Update beacon_chain/consensus_object_pools/blockchain_dag.nim
Co-authored-by: Jacek Sieka <jacek@status.im>
* `commonAncestor` tests
* lint
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
The justified and finalized `Checkpoint` are frequently passed around
together. This introduces a new `FinalityCheckpoint` data structure that
combines them into one.
Due to the large usage of this structure in fork choice, also took this
opportunity to update fork choice tests to the latest v1.2.0-rc.1 spec.
Many additional tests enabled, some need more work, e.g. EL mock blocks.
Also implemented `discard_equivocations` which was skipped in #3661,
and improved code reuse across fork choice logic while at it.
* optimistic sync
* flag that initially loaded blocks from database might need execution block root filled in
* return optimistic status in REST calls
* refactor blockslot pruning
* ensure beacon_blocks_by_{root,range} do not provide optimistic blocks
* handle forkchoice head being pre-merge with block being postmerge
* re-enable blocking head updates on validator duties
* fix is_optimistic_candidate_block per spec; don't crash with nil future
* fix is_optimistic_candidate_block per spec; don't crash with nil future
* mark blocks sans execution payloads valid during head update
The spec implicitly talks about the slot of a block in several places,
and keeping it readily available is useful in a number of context -
might as well put this implicitly refereneced helper in the spec code
directly
* fewer deps on `BlockRef` traversal in anticipation of pruning
* allows identifying EpochRef:s by their shuffling as a first step of
* tighten error handling around missing blocks
using the zero hash for signalling "missing block" is fragile and easy
to miss - with checkpoint sync now, and pruning in the future, missing
blocks become "normal".
`BlockId` is a type that bundles a block root with its slot number.
The type can be useful as key in tables that deal with non-finalized
blocks (not uniquely identified by slot) and also support pruning
(drop data about older blocks by slot). Instead of creating a custom
type for those use cases, this patch suggests implementing `hash` for
`BlockId` to re-use the existing type.
This PR names and documents the concept of the archive: a range of slots
for which we have degraded functionality in terms of historical access -
in particular:
* we don't support rewinding to states in this range
* we don't keep an in-memory representation of the block dag
The archive de-facto exists in a trusted-node-synced node, but this PR
gives it a name and drops the in-memory digest index.
In order to satisfy `GetBlocksByRange` requests, we ensure that we have
blocks for the entire archive period via backfill. Future versions may
relax this further, adding a "pre-archive" period that is fully pruned.
During by-slot searches in the archive (both for libp2p and rest
requests), an extra database lookup is used to covert the given `slot`
to a `root` - future versions will avoid this using era files which
natively are indexed by `slot`. That said, the lookup is quite
fast compared to the actual block loading given how trivial the table
is - it's hard to measure, even.
A collateral benefit of this PR is that checkpoint-synced nodes will see
100-200MB memory usage savings, thanks to the dropped in-memory cache -
future pruning work will bring this benefit to full nodes as well.
* document chaindag storage architecture and assumptions
* look up parent using block id instead of full block in clearance
(future-proofing the code against a future in which blocks come from era
files)
* simplify finalized block init, always writing the backfill portion to
db at startup (to ensure lookups work as expected)
* preallocate some extra memory for finalized blocks, to avoid immediate
realloc
When node is restarted before backfill has started but after some blocks
have finalized with forward sync, we would not start the backfill.
* also clean up one last `SomeSome`
* Store finalized block roots in database (3s startup)
When the chain has finalized a checkpoint, the history from that point
onwards becomes linear - this is exploited in `.era` files to allow
constant-time by-slot lookups.
In the database, we can do the same by storing finalized block roots in
a simple sparse table indexed by slot, bringing the two representations
closer to each other in terms of conceptual layout and performance.
Doing so has a number of interesting effects:
* mainnet startup time is improved 3-5x (3s on my laptop)
* the _first_ startup might take slightly longer as the new index is
being built - ~10s on the same laptop
* we no longer rely on the beacon block summaries to load the full dag -
this is a lot faster because we no longer have to look up each block by
parent root
* a collateral benefit is that we no longer need to load the full
summaries table into memory - we get the RSS benefits of #3164 without
the CPU hit.
Other random stuff:
* simplify forky block generics
* fix withManyWrites multiple evaluation
* fix validator key cache not being updated properly in chaindag
read-only mode
* drop pre-altair summaries from `kvstore`
* recreate missing summaries from altair+ blocks as well (in case
database has lost some to an involuntary restart)
* print database startup timings in chaindag load log
* avoid allocating superfluos state at startup
* use a recursive sql query to load the summaries of the unfinalized
blocks
Time in the beacon chain is expressed relative to the genesis time -
this PR creates a `beacon_time` module that collects helpers and
utilities for dealing the time units - the new module does not deal with
actual wall time (that's remains in `beacon_clock`).
Collecting the time related stuff in one place makes it easier to find,
avoids some circular imports and allows more easily identifying the code
actually needs wall time to operate.
* move genesis-time-related functionality into `spec/beacon_time`
* avoid using `chronos.Duration` for time differences - it does not
support negative values (such as when something happens earlier than it
should)
* saturate conversions between `FAR_FUTURE_XXX`, so as to avoid
overflows
* fix delay reporting in validator client so it uses the expected
deadline of the slot, not "closest wall slot"
* simplify looping over the slots of an epoch
* `compute_start_slot_at_epoch` -> `start_slot`
* `compute_epoch_at_slot` -> `epoch`
A follow-up PR will (likely) introduce saturating arithmetic for the
time units - this is merely code moves, renames and fixing of small
bugs.
* REST cleanups
* reject out-of-range committee requests
* print all hex values as lower-case
* allow requesting state information by head state root
* turn `DomainType` into array (follow spec)
* `uint_to_bytesXX` -> `uint_to_bytes` (follow spec)
* fix wrong dependent root in `/eth/v1/validator/duties/proposer/`
* update documentation - `--subscribe-all-subnets` is no longer needed
when using the REST interface with validator clients
* more fixes
* common helpers for dependent block
* remove test rules obsoleted by more strict epoch tests
* fix trailing commas
* Update docs/the_nimbus_book/src/rest-api.md
* Update docs/the_nimbus_book/src/rest-api.md
Co-authored-by: sacha <sacha@status.im>
With the right sequence of events (for example a REST request or a
validation), it can happen that the first traversal across a state
checkpoint boundary is done without storing that state on disk - this
causes problens when replaying states, because now states may be missing
from the database.
Here, we simply avoid using the caches when advancing a state that will
go into the database, ensuring that the information lost during caching
always is permanently stored.
* fix recursion bug in `isProposed`
* BlockId reform
Introduce `BlockId` that helps track a root/slot pair - this prepares
the codebase for backfilling and handling out-of-dag blocks
* move block dag code to separate module
* fix finalised state root in REST event stream
* fix finalised head computation on head update, when starting from
checkpoint
* clean up chaindag init
* revert `epochAncestor` change in introduced in #3144 that would return
an epoch ancestor from the canoncial history instead of the given
history, causing `EpochRef` keys to point to the wrong block