* restore doppelganger check on connectivity loss
https://github.com/status-im/nimbus-eth2/pull/4398 introduced a
regression in functionality where doppelganger detection would not be
rerun during connectivity loss. This PR reintroduces this check and
makes some adjustments to the implementation to simplify the code flow
for both BN and VC.
* track when check was last performed for each validator (to deal with
late-added validators)
* track when we performed a doppel-detectable activity (attesting) so as
to avoid false positives
* remove nodeStart special case (this should be treated the same as
adding a validator dynamically just after startup)
* allow sync committee duties in doppelganger period
* don't trigger doppelganger when registering duties
* fix crash when expected index response is missing
* fix missing slashingSafe propagation
Other changes:
Renamed the `EIP_4844_FORK_*` config constants to `DENEB_FORK_*` as
this matches the latest spec and it's already used in the official
Sepolia config.
We do a linear scan of all pubkeys for each validator and slot - this
becomes expensive with large validator counts.
* normalise BN/VC validator startup logging
* fix crash when host cannot be resolved while adding remote validator
* silence repeated log spam for unknown validators
* print pubkey/index/activation mapping on startup/validator
identification
While syncing the finalized portion of the chain, the execution client
cannot efficiently sync and most of the time returns `SYNCING` - in this
PR, we use CL-verified optmistic sync as long as the block is claimed to
be finalized, only occasionally updating the EL with progress.
Although a peer might lie about what is finalized and what isn't,
eventually we'll call the execution client - thus, all a dishonest
client can do is delay execution verification slightly. Gossip blocks in
particular are never assumed to be finalized.
Extends fork choice state to also track slot numbers to improve accuracy
of `/eth/v1/debug/fork_choice` endpoint. Autoenable this API on devnet,
and disable some extra checks on devnet to aid focused testing efforts.
Align fork choice pruning logic with API based on checkpoints vs root.
* eip4844 beacon block proposals
* Don't fetch blobs under minimal preset
@tersec's summary of the issue:
BlobsBundleV1 in the execution API spec assumes a mainnet preset blob
size, where the EIP4844 consensus spec defines
FIELD_ELEMENTS_PER_BLOB: 4 under the minimal preset, which leads to a
Blob having a length of 4 * 32, not 4096 * 32 which BlobsBundleV1
requires.
* Revert unintentional script change
* exit/validatorchange pool includes BLS to execution messages; REST
support for new pool
* catch failed individual futures
* increase BLS changes bound and keep BLS seen consistent with subpool
* deque capacities should be powers of 2
When running `nimbus_light_client`, we persist the latest header from
`LightClientStore.finalized_header` in a database across restarts.
Because the data format is derived from the latest `LightClientStore`,
this could lead to data being persisted in pre-release formats.
To enable us to test later `LightClientStore` versions on devnets,
transition to a `ForkedLightClientStore` internally that is only
migrated to newer forks on-demand (instead of starting at latest).
Distinguish between those code locations that need to be updated on each
light client data format change, and those others that should generally
be fine, as long as a valid light client object is processed.
The former are tagged with static assert for `LightClientDataFork.high`.
The latter are changed to `lcDataFork > LightClientDataFork.None` to
indicate that they depend only on presence of any valid object.
Also bundled a few minor cleanups and fixes.
Also add `Forky` type for `LightClientStore` and minor fixes / cleanups.
The light client data structures were changed to accommodate additional
fields in future forks (e.g., to also hold execution data).
There is a minor change to the JSON serialization, where the `header`
properties are now nested inside a `LightClientHeader`.
The SSZ serialization remains compatible.
See https://github.com/ethereum/consensus-specs/pull/3190
and https://github.com/ethereum/beacon-APIs/pull/287
In a future fork, light client data will be extended with execution info
to support more use cases. To anticipate such an upgrade, introduce
`Forky` and `Forked` types, and ready the database schema.
Because the mapping of sync committee periods to fork versions is not
necessarily unique (fork schedule not in sync with period boundaries),
an additional column is added to `period` -> `LightClientUpdate` table.
* correctly report ignored contributions in metrics
* avoid counting subset contributions in vmon (bring in line with
attestation aggregates)
* avoid signature checks for subset attestations
A being a non-strict subset is a sufficient condition to ignore.
Introduce (optional) pruning of historical data - a pruned node will
continue to answer queries for historical data up to
`MIN_EPOCHS_FOR_BLOCK_REQUESTS` epochs, or roughly 5 months, capping
typical database usage at around 60-70gb.
To enable pruning, add `--history=prune` to the command line - on the
first start, old data will be cleared (which may take a while) - after
that, data is pruned continuously.
When pruning an existing database, the database will not shrink -
instead, the freed space is recycled as the node continues to run - to
free up space, perform a trusted node sync with a fresh database.
When switching on archive mode in a pruned node, history is retained
from that point onwards.
History pruning is scheduled to be enabled by default in a future
release.
In this PR, `minimal` mode from #4419 is not implemented meaning
retention periods for states and blocks are always the same - depending
on user demand, a future PR may implement `minimal` as well.
* consolidate consensus spec transition test fixtures
* include capella
* consoliate fork test fixtures
* note change in EIP-4844 process_block in alpha.2
* fix REST liveness endpoint responding even when gossip is not enabled
* fix VC exit code on doppelganger hit
* fix activation epoch not being updated correctly on long deposit
queues
* fix activation epoch being set incorrectly when updating validator
* move most implementation logic to `validator_pool`, add tests
* ensure consistent logging between VC and BN
* add docs
Other changes:
* More optimal search for TTD block.
* Add timeouts to all REST requests during trusted node sync.
Fixes#4037
* Removed support for storing a deposit snapshot in the network
metadata.
* Types and scaffolding for EIP-4844
This commit adds the EIP-4844 spec types, and fills in
scaffolding/boilerplate for the use of these types across the repo.
None of the actual EIP-4844 logic is introduced yet.
This follows the pattern used by @tersec when introducing Capella (#4276).
* use eth2-networks fork
* review feedback: add static check EIP4844_FORK_EPOCH == FAR_FUTURE_EPOCH
* review feedback: remove EIP4844 from /eth/v1/config/spec response
* Cleanup / review feedback
* Fix REST test
In Jenkins CI we run two instances of unit tests concurrently.
This can trigger CI failure when the same port numbers are re-used
by the different test instances. Fixed one more issue of this by
allowing user configuration of the base port number.
* Initial commit.
* NextAttestationEntry type.
* Add doppelgangerCheck and actual check.
* Recover deleted check.
* Remove NextAttestainEntry changes.
* More cleanups for NextAttestationEntry.
* Address review comments.
* Remove GENESIS_EPOCH specific check branch.
* Decrease number of full epochs for doppelganger check in VC.
Co-authored-by: zah <zahary@status.im>
The various `PeerScore` constants are used for both beacon blocks and
LC objects, and will likely also find use for EIP4844 blob sidecars.
Renaming them to use more generically applicable names not referring
to blocks explicitly aymore.
We currently use `BlockError` for both beacon blocks and LC objects.
In light of EIP4844, we will likely also use it for blob sidecars.
To avoid confusion, renaming it to a more generic `VerifierError`,
and update its documentation to be more generic.
To avoid long lines as a followup, also renaming the `block_processor`'s
`BlockProcessingCompleted.completed`->`ProcessingStatus.completed` and
`BlockProcessingCompleted.notCompleted`->`ProcessingStatus.notCompleted`
This PR removes a bunch of code to make TNS aware of era files, avoiding
a duplicated backfill when era files are available.
* reuse chaindag for loading backfill state, replacing the TNS homebrew
* fix era block iteration to skip empty slots
* add tests for `can_advance_slots`
When the EL/Builder fails to produce an execution payload, we fall back
to an empty `ExecutionPayload`. Even though it contains no transactions
it should refer to the configured fee recipient. This is useful for
privacy reasons (do not reveal the reason for the empty payload) and for
compliance with additional fee recipient rules by staking pools.
* move duty tracking code to `ActionTracker`
* fix earlier duties overwriting later ones
* re-run subnet selection when new duty appears
* log upcoming duties as soon as they're known (vs 4 epochs before)
Currently, we require genesis and a checkpoint block and state to start
from an arbitrary slot - this PR relaxes this requirement so that we can
start with a state alone.
The current trusted-node-sync algorithm works by first downloading
blocks until we find an epoch aligned non-empty slot, then downloads the
state via slot.
However, current
[proposals](https://github.com/ethereum/beacon-APIs/pull/226) for
checkpointing prefer finalized state as
the main reference - this allows more simple access control and caching
on the server side - in particular, this should help checkpoint-syncing
from sources that have a fast `finalized` state download (like infura
and teku) but are slow when accessing state via slot.
Earlier versions of Nimbus will not be able to read databases created
without a checkpoint block and genesis. In most cases, backfilling makes
the database compatible except where genesis is also missing (custom
networks).
* backfill checkpoint block from libp2p instead of checkpoint source,
when doing trusted node sync
* allow starting the client without genesis / checkpoint block
* perform epoch start slot lookahead when loading tail state, so as to
deal with the case where the epoch start slot does not have a block
* replace `--blockId` with `--state-id` in TNS command line
* when replaying, also look at the parent of the last-known-block (even
if we don't have the parent block data, we can still replay from a
"parent" state) - in particular, this clears the way for implementing
state pruning
* deprecate `--finalized-checkpoint-block` option (no longer needed)
* Allow chain dag without genesis / block
This PR enables the initialization of the dag without access to blocks
or genesis state - it is a prerequisite for implementing a number of
interesting features:
* checkpoint sync without any block download
* pruning of blocks and states
* backfill checkpoint block
When EL `newPayload` is slow (e.g., Raspberry Pi with Besu), the epoch
and shuffling caches tend to fill up with multiple copies per epoch when
processing gossip and performing validator duties close to wall slot.
The old strategy of evicting oldest epoch led to the same item being
evicted over and over, leading to blocking of over 5 minutes in extreme
cases where alternate epochs/shuffling got loaded repeatedly.
Changing the cache eviction strategy to least-recently-used seems to
improve the situation drastically. A simple implementation was selected
based on single linked-list without a hashtable.
* avoid database race-condition inconsistency after fcU `INVALID` then crash
* ensure head doesn't fall behind finalized; add more tests for head movement/reloading DAG
* detect mismatch of config and binary
When loading configuration that sets keys that Nimbus bakes into the
binary at compile-time, raise an error if the config is incompatible
instead of ignoring the conflicting value.
Since these files may have been created in a previous run or manually,
we want to keep loading them even on nodes that don't enable the
keystore API (for example static setups)
Other changes:
* log keystore loading progressively (#3699)
* print initial fee recipient when loading validators
* log dynamic fee recipient updates
When the sync queue processes results for a blocks by range request,
and the requested range contained some slots that are already finalized,
`BlockError.MissingParent` currently leads to `PeerScoreBadBlocks` even
when the error occurs on a non-finalized slot in the requested range.
This patch changes the scoring in that case to `PeerScoreMissingBlocks`
for consistency with range requests solely covering non-finalized slots,
and, likewise, rewinds the sync queue to the next `rewindSlot`.
* more efficient forkchoiceUpdated usage
* await rather than asyncSpawn; ensure head update before dag.updateHead
* use action tracker rather than attached validators to check for next slot proposal; use wall slot + 1 rather than state slot + 1 to correctly check when missing blocks
* re-add two-fcU case for when newPayload not VALID
* check dynamicFeeRecipientsStore for potential proposal
* remove duplicate checks for whether next proposer
`news` has a few open issues that are not present in `nim-websock`:
1. There is a 1 second delay between each MB of sent data.
2. Cancelling an ongoing `send` makes the entire WebSocket unusable.
3. Control packets do not have priority over ongoing message frames.
Using `news`, there are quite a few of these messages in Geth:
```
Previously seen beacon client is offline. Please ensure it is
operational to follow the chain!
```
It may take quite some time to reconnect when this happens.
Using `nim-websock`, this message still occurs because `eth1_monitor`
reconnects the EL connection when no new blocks occurred for 5 minutes,
but reconnecting is quick and the message is rarer.
The optimistic sync spec was updated since the LC based optsync module
was introduced. It is no longer necessary to wait for the justified
checkpoint to have execution enabled; instead, any block is okay to be
optimistically imported to the EL client, as long as its parent block
has execution enabled. Complex syncing logic has been removed, and the
LC optsync module will now follow gossip directly, reducing the latency
when using this module. Note that because this is now based on gossip
instead of using sync manager / request manager, that individual blocks
may be missed. However, EL clients should recover from this by fetching
missing blocks themselves.
* Harden block proposal against expired slashings/exits
When a message is signed in a phase0 domain, it can no longer be
validated under bellatrix due to the correct fork no longer being
available in the `BeaconState`.
To ensure that all slashing/exits are still valid, in this PR we re-run
the checks in the state that we're proposing for, thus hardening against
both signatures and other changes in the state that might have
invalidated the message.
* fix same message added multiple times
in case of attestation slashing of multiple validators in one go
* support connecting to peers without bellatrix
Make discovery fork ID aware of scheduled Bellatrix fork to enable
connections to peers that don't have Bellatrix scheduled yet.
Without this, has peering issues with peers on older SW version.
* expand tests with compatibility checks
* more exhaustive compatibility checks
* Fixes a segfault during block production when the Keymanager API
is disabled. The Keymanager is now disabled on half of the local
testnet nodes to catch such problems in the future.
* Fixes multiple potential stalls from REST requests being done
without a timeout. From practice, we know that such requests
can hang forever if not cancelled with a timeout. At best,
this would be a resource leak, at worst, it may lead to a
full stall of the client and missed validator duties.
* Changes some Options usages to Opt (for easier use of valueOr)
* Keymanager API for the validator client
* Properly treat the 'description' field as optional when loading Keystores
* Spec-compliant serialization of the slashing data in Keymanager's DeleteKeys response ()
Fixes#3940Fixes#3964Closes#3884 by adding test
In order to avoid full replays when validating attestations hailing from
untaken forks, it's better to keep shufflings separate from `EpochRef`
and perform a lookahead on the shuffling when processing the block that
determines them.
This also helps performance in the case where REST clients are trying to
perform lookahead on attestation duties and decreases memory usage by
sharing shufflings between EpochRef instances of the same dependent
root.
Now that the 1.2.0-rc.2 spec contains the same `LightClientUpdate`
definition that Nimbus was already using before, the corresponding
SSZ test vectors can be re-enabled.
When fetching eth1 data and deposits for a new block proposal, the list
of deposits from previous eth1 data to the next one is fully loaded into
a `seq`. This can potentially be a very long list in active periods.
Changing this to an `iterator` saves memory by ensuring that the entire
list is no longer materialized; only the `DepositData` roots are needed.
* Use final `v1` version for light client protocols
* Unhide LC data collection options
* Default enable LC data serving
* rm unneeded import
* Connect to EL on startup
* Add docs for LC based EL sync