With https://github.com/status-im/nimbus-eth2/pull/4420 implemented, the
checks that we perform are equivalent to those of a `SYNCING` EL - as
such, we can treat missing EL the same as SYNCING and proceed with an
optimistic sync.
This mode of operation significantly speeds up recovery after an offline
EL event because the CL is already synced and can immediately inform the
EL of the latest head.
It also allows using a beacon node for consensus archival queries
without an execution client.
* deprecate `--optimistic` flag
* log block details on EL error, soften log level because we can now
continue to operate
* `UnviableFork` -> `Invalid` when block hash verification fails -
failed hash verification is not a fork-related block issue
When not backfilling all the way to genesis (#4421), it becomes more
useful to start rebuilding the historical indices from an arbitrary
starting point.
To rebuild the index from non-genesis, a state and an unbroken block
history is needed - here, we allow loading the state from an era file
and recreating the history from there onwards.
* speed up partial era state loading
When backfilling, we only need to download blocks that are newer than
MIN_EPOCHS_FOR_BLOCK_REQUESTS - the rest cannot reliably be fetched from
the network and does not have to be provided to others.
This change affects only trusted-node-synced clients - genesis sync
continues to work as before (because it needs to construct a state by
building it from genesis).
Those wishing to complete a backfill should do so with era files
instead.
Trigger ANSI art on upgrade to Capella, similar to the merge.
Future extension could log blinking art when user successfully managed
to get BLS to Execution change included into a block for a validator.
Art created by http://beatscribe.com/ (beatscribe#1008 on Discord)
* 60% state replay speedup
* don't use HashList for epoch participation - in addition to the code
currently clearing the caches several times redundantly, clearing has to
be done each block nullifying the benefit (35%)
* introduce active balance cache - computing it is slow due to cache
unfriendliness in the random access pattern and bounds checking and we
do it for every block - this cache follows the same update pattern as
the active validator index cache (20%)
* avoid recomputing base reward several times per attestation (5%)
Applying 1024 blocks goes from 20s to ~8s on my laptop - these kinds of
requests happen on historical REST queries but also whenever there's a
reorg.
* fix test and diffs
* consolidate consensus spec transition test fixtures
* include capella
* consoliate fork test fixtures
* note change in EIP-4844 process_block in alpha.2
* fix REST liveness endpoint responding even when gossip is not enabled
* fix VC exit code on doppelganger hit
* fix activation epoch not being updated correctly on long deposit
queues
* fix activation epoch being set incorrectly when updating validator
* move most implementation logic to `validator_pool`, add tests
* ensure consistent logging between VC and BN
* add docs
Other changes:
* More optimal search for TTD block.
* Add timeouts to all REST requests during trusted node sync.
Fixes#4037
* Removed support for storing a deposit snapshot in the network
metadata.
* Types and scaffolding for EIP-4844
This commit adds the EIP-4844 spec types, and fills in
scaffolding/boilerplate for the use of these types across the repo.
None of the actual EIP-4844 logic is introduced yet.
This follows the pattern used by @tersec when introducing Capella (#4276).
* use eth2-networks fork
* review feedback: add static check EIP4844_FORK_EPOCH == FAR_FUTURE_EPOCH
* review feedback: remove EIP4844 from /eth/v1/config/spec response
* Cleanup / review feedback
* Fix REST test
Since the sync committee duties are no longer updated on every slot
and previously the sync committee aggregators selection proofs were
generated during the duties update, this now resulted in the client
using stale selection proofs (they must be generated at each slot).
The fix consists of moving the selection proof generation logic in
a different function which is properly executed on each slot.
Other changes:
* The logtrace tool has been enhanced with a framework for adding
new simpler log aggregation and analysis algorithms.
The default CI testnet simulation will now ensure that the blocks
in the network have reasonable sync committee participation.
Adds a "Slot start" log to the LC that behaves similar to BN to inform
the user that the light client is doing something, and to indicate the
latest view of the network (finalized / optimistic).
* implement several capellaImplementationMissing points
* don't register validator activity for not-active validators
* don't check validator indices already coming out of committees which exist; must be active validators, or else other deeper bugs
* Initial commit.
* NextAttestationEntry type.
* Add doppelgangerCheck and actual check.
* Recover deleted check.
* Remove NextAttestainEntry changes.
* More cleanups for NextAttestationEntry.
* Address review comments.
* Remove GENESIS_EPOCH specific check branch.
* Decrease number of full epochs for doppelganger check in VC.
Co-authored-by: zah <zahary@status.im>
The various `PeerScore` constants are used for both beacon blocks and
LC objects, and will likely also find use for EIP4844 blob sidecars.
Renaming them to use more generically applicable names not referring
to blocks explicitly aymore.
We currently use `BlockError` for both beacon blocks and LC objects.
In light of EIP4844, we will likely also use it for blob sidecars.
To avoid confusion, renaming it to a more generic `VerifierError`,
and update its documentation to be more generic.
To avoid long lines as a followup, also renaming the `block_processor`'s
`BlockProcessingCompleted.completed`->`ProcessingStatus.completed` and
`BlockProcessingCompleted.notCompleted`->`ProcessingStatus.notCompleted`
This PR removes a bunch of code to make TNS aware of era files, avoiding
a duplicated backfill when era files are available.
* reuse chaindag for loading backfill state, replacing the TNS homebrew
* fix era block iteration to skip empty slots
* add tests for `can_advance_slots`
Without this, tools like `ncli_db` require command line parameter to
process mainnet blocks, which is a regression since the merge - this is
a subset of #4147.
When the EL/Builder fails to produce an execution payload, we fall back
to an empty `ExecutionPayload`. Even though it contains no transactions
it should refer to the configured fee recipient. This is useful for
privacy reasons (do not reveal the reason for the empty payload) and for
compliance with additional fee recipient rules by staking pools.
* move duty tracking code to `ActionTracker`
* fix earlier duties overwriting later ones
* re-run subnet selection when new duty appears
* log upcoming duties as soon as they're known (vs 4 epochs before)
To further tighten Nimbus against spam, this PR introduces a global
quota for block requests (shared between peers) as well as a general
per-peer request limit that applies to all libp2p requests.
* apply request quota before decoding message
* for high-bandwidth requests (blocks), apply a shared global quota
which helps manage bandwidth for high-peer setups
* add metrics
Currently, we require genesis and a checkpoint block and state to start
from an arbitrary slot - this PR relaxes this requirement so that we can
start with a state alone.
The current trusted-node-sync algorithm works by first downloading
blocks until we find an epoch aligned non-empty slot, then downloads the
state via slot.
However, current
[proposals](https://github.com/ethereum/beacon-APIs/pull/226) for
checkpointing prefer finalized state as
the main reference - this allows more simple access control and caching
on the server side - in particular, this should help checkpoint-syncing
from sources that have a fast `finalized` state download (like infura
and teku) but are slow when accessing state via slot.
Earlier versions of Nimbus will not be able to read databases created
without a checkpoint block and genesis. In most cases, backfilling makes
the database compatible except where genesis is also missing (custom
networks).
* backfill checkpoint block from libp2p instead of checkpoint source,
when doing trusted node sync
* allow starting the client without genesis / checkpoint block
* perform epoch start slot lookahead when loading tail state, so as to
deal with the case where the epoch start slot does not have a block
* replace `--blockId` with `--state-id` in TNS command line
* when replaying, also look at the parent of the last-known-block (even
if we don't have the parent block data, we can still replay from a
"parent" state) - in particular, this clears the way for implementing
state pruning
* deprecate `--finalized-checkpoint-block` option (no longer needed)
* cap maximum number of chunks to download from peer (fixes#1620)
* drop support for requesting blocks via v1 / phase0 protocol
* tighten bounds checking of fixed-size messages
In optimistic mode, Nimbus will sync optimistically even when the
execution client is offline / not available.
An optimistic node is less secure because it has not validated block
transactions via the execution client and can thus not be used for
validation duties.
* Allow chain dag without genesis / block
This PR enables the initialization of the dag without access to blocks
or genesis state - it is a prerequisite for implementing a number of
interesting features:
* checkpoint sync without any block download
* pruning of blocks and states
* backfill checkpoint block
* Fix doppelganger protection reorders validator indices in response issue.
* Add chronos metrics endpoint to nimbus REST API.
* Doppelganger protection now works on duties not on attestations.
Improve logging for doppelganger and indices.
* Improve doppelganger and indices logging.
* Add number of validators to logs.
* Move logging dumps from `debug` to `trace` level.
The LC REST API has been merged into the ethereum/beacon-APIs specs:
- https://github.com/ethereum/beacon-APIs/pull/247
Update URLs to v1 and update REST tests. Note that REST tests do not
start with Altair, so the tested BN will return empty / error responses.
Implements the latest proposal for providing LC data via REST, as of
https://github.com/ethereum/beacon-APIs/pull/247 with a v0 suffix.
Requests:
- `/eth/v0/beacon/light_client/bootstrap/{block_root}`
- `/eth/v0/beacon/light_client/updates?start_period={start_period}&count={count}`
- `/eth/v0/beacon/light_client/finality_update`
- `/eth/v0/beacon/light_client/optimistic_update`
HTTP Server-Sent Events (SSE):
- `light_client_finality_update_v0`
- `light_client_optimistic_update_v0`
Allow config of deployment phase via config instead of attempting to
derive from genesis content (when running relevant testnets), so that
we don't have to keep maintaining the list inside the binary.
The LC REST API has been merged into the ethereum/beacon-APIs specs:
- https://github.com/ethereum/beacon-APIs/pull/247
Update URLs to v1 and update REST tests. Note that REST tests do not
start with Altair, so the tested BN will return empty / error responses.
Allow config of deployment phase via config instead of attempting to
derive from genesis content (when running relevant testnets), so that
we don't have to keep maintaining the list inside the binary.
Implements the latest proposal for providing LC data via REST, as of
https://github.com/ethereum/beacon-APIs/pull/247 with a v0 suffix.
Requests:
- `/eth/v0/beacon/light_client/bootstrap/{block_root}`
- `/eth/v0/beacon/light_client/updates?start_period={start_period}&count={count}`
- `/eth/v0/beacon/light_client/finality_update`
- `/eth/v0/beacon/light_client/optimistic_update`
HTTP Server-Sent Events (SSE):
- `light_client_finality_update_v0`
- `light_client_optimistic_update_v0`
For JSON responses, "eth-consensus-version" header is handled in
`eth2_rest_serialization` for states and `rest_beacon_api` for blocks.
Align them to also be handled in `eth2_rest_serialization` for blocks.
The `ContentNotAcceptableError` is triggered when client either requests
an unsupported media type, or has form errors such as sending multiples.
Updating the description to also indicate non-supported Accept headers.
When EL `newPayload` is slow (e.g., Raspberry Pi with Besu), the epoch
and shuffling caches tend to fill up with multiple copies per epoch when
processing gossip and performing validator duties close to wall slot.
The old strategy of evicting oldest epoch led to the same item being
evicted over and over, leading to blocking of over 5 minutes in extreme
cases where alternate epochs/shuffling got loaded repeatedly.
Changing the cache eviction strategy to least-recently-used seems to
improve the situation drastically. A simple implementation was selected
based on single linked-list without a hashtable.
When backfilling LC updates (`--light-client-data-import-mode=full`),
the highest participation update is computed without ensuring that the
finalized header is in the same period. Updates sharing same period for
both finalized and attested headers should be preferred.
Fixes a bug leading to suboptimal update selection.
* avoid database race-condition inconsistency after fcU `INVALID` then crash
* ensure head doesn't fall behind finalized; add more tests for head movement/reloading DAG
In `eth2_rest_serialization` there was a `dump` function for
`KeystoresAndSlashingProtection` that does not seem to be used.
Removes that unused function.
For pre-encoded JSON REST responses we have `jsonResponsePlain`.
Adds a `sszResponsePlain` function to serve similar purpose for SSZ.
This avoids caller having to explicitly specify Http200 and media type.
When the BN's head is reorged while shut down, reloading the BN will not
assign `BlockRef` to alternate branches. However, blocks from other
branches are still present in the database, leading to their descendants
incorrectly marked as `UnviableFork`. By restricting the check to blocks
that have been finalized, they should be reported as `MissingParent`
instead, eventually re-assigning a `BlockRef` to them.
The `eth1_monitor` check to require engine API from bellatrix onward
has issues in setups where the EL and CL are started simultaneously
because the EL may not be ready to answer requests by the time that the
check is performed. This can be observed, e.g., on Raspberry Pi 4 when
using Besu as the EL client. Now that the merge transition happened, the
check is also not that useful anymore, as users have other ways to know
that their setup is not working correctly (e.g., repeated exchange logs)