This PR renames the existing `validator_duties` to `beacon_validators`
and in doing so, names validators running inside the beacon node process
"beacon validators" while those running the VC can be referred to as
"client validators" to disambiguate the two.
The existing `validator_duties` instead takes on a new responsibility:
as a home for logic shared between beacon and client validators - ie
code that provides consistency in implementation and behavior between
the two modes of operation.
Not only does this simplify reasoning about where to put code -it also
reduces the number of dependencies the validator client has from ~5000
to ~3000 modules (!) according to `nim genDepend` significantly reducing
compile times.
When a block is introduced to the system both via REST and gossip at the
same time, we will call `storeBlock` from two locations leading to a
dupliace check race condition as we wait for the EL.
This issue may manifest in particular when using an external block
builder that itself publishes the block onto the gossip network.
* refactor enqueue flow
* simplify calling `addBlock`
* complete request manager verifier future for blobless blocks
* re-verify parent conditions before adding block
among other things, it might have gone stale or finalized between one
call and the other
We currently call `onSlotEnd` whenever all in-BN validator duties are
completed. VC validator duties are not awaited. When `onSlotEnd` is
processed close to the slot start, a VC may therefore miss duties.
Adding a delay before `onSlotEnd` improves this situation.
The logic can be optimized further if `ActionTracker` would track
`knownValidators` from REST separately from in-process ones.
When the requestmanager is busy fetching blocks, the queue might get
filled with multiple entries of the same root - since there is no
deduplication, requests containing the same root multiple times will be
sent out.
Also, because the items sit in the queue for a long time potentially,
the request might be stale by the time that the manager is ready with
the previous request.
This PR removes the queue and directly fetches the blocks to download
from the quarantine which solves both problems (the quarantine already
de-duplicates and is clean of stale information).
Removing the queue for blobs is left for a future PR.
Co-authored-by: tersec <tersec@users.noreply.github.com>
* Initial commit.
* Add algorithm in comment.
Remove delays.
Fix logging statement issues.
Change update from epoch to slot.
* Obtain timestamp earlier.
* Add processing delays into algorithm.
* Fix time offset logging to produce integers instead of strings.
* Address review comments.
* Fix copyright year.
Fix updateStatus().
* Remove fields from Slot start log statement.
Fix issues when BN do not support Nimbus Extensions.
Rename metric name and type change.
* Add beacon role to disable time offset check manually.
We have several modules that import `nim-eth` for the sole purpose of
its `keys.newRng` function. This function is meanwhile a simple wrapper
around `nim-bearssl`'s `HmacDrbgContext.new()`, so the import doesn't
really serve a use anymore. Replace `keys.newRng` with the direct call
to reduce `nim-eth` imports.
Since #3976, CORS functionality is broken. Fix it to work again:
- Use `--rest-allowed-origin` instead of `--keymanager-allowed-origin`
to specify CORS `Access-Control-Allow-Origin` header for beacon-APIs.
- Actually pass CORS config to `nim-presto` once more.
* replace optimisticRoots table with field in BlockRef
* copyright year
* mark finalized blocks as verified on load
* Update beacon_chain/consensus_object_pools/block_dag.nim
Co-authored-by: Etan Kissling <etan@status.im>
* expand non-optimistic block checking to all pre-merge blocks; refactor markBlockVerified to use BlockRef rather than block root and remove superfluous caller in newPayload path replaced by addResolvedHeadBlock BlockRef construction
* don't treat finalized block specially; VALID status is sticky
---------
Co-authored-by: Etan Kissling <etan@status.im>
When doing sync for blocks older than
MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS, we skip the blobs by range
request, but we then pass en empty blob sequence to
validation, which then fails.
To fix this: Use an Option[Blobsidecars] to allow expressing the
distinction between "empty blob sequence" and "blobs unavailable". Use
the latter for "old" blocks, and don't attempt to run blob validation.
`SyncCommitteeMsgPool` grouped messages by their `beacon_block_root`.
This is problematic around sync committee period boundaries and forks.
Around sync committee period boundaries, members from both the current
and next sync committee may sign the same `beacon_block_root`; mixing
the signatures from both committees together is a mistake. Likewise,
around fork transitions, the `signing_root` changes, so those messages
also need to be segregated.
* Incremental pruning
When turning on pruning the first time the current pruning algorithm
will prune the full database at startup. This delays restart
unnecessarily, since all of the pruned space is not needed at once.
This PR introduces incremental pruning such that we will never prune
more than 32 blocks or the sync speed, whichever is higher.
This mode is expected to become default in a follow-up release.
* Kzg: Load trusted setup
* scripts/launch_local_testnet.sh: set FIELD_ELEMENTS_PER_BLOB
* Use right setup file for mainnet/minimal
* Force rebuild
* Add comment explaining why build with -f
Post-Deneb, when the request manager receives a missing block from a
peer, it needs to check if the corresponding blobs are available, and
if so pass them along. If they aren't available, the newly-fetched
block must be put in blobless quarantine (while the blobs are
retrieved, coming in next commit).
The consensus-spec-tests already cover the scenarios of our custom test
runner, so the custom tests can be removed. Also cleans up unused config
flags and related unreachable logic.
* allow trusted node sync based on LC trusted block root
Extends `trustedNodeSync` with a new `--trusted-block-root` option that
allows initializing a light client. No `--state-id` must be provided.
The beacon node will then use this light client to obtain the latest
finalized state from the remote server in a trust-minimized fashion.
Note that the provided `--trusted-block-root` should be somewhat recent,
and that security precautions such as comparing the state root against
block explorers is still recommended.
* fix
* workaround for `valueOr` limitations
* reduce magic numbers
* digest len > context len for readability
* move `cstring` conversion to caller
* avoid abbreviations
* `return` codestyle
Just the variable, not yet `lcDataForkAtStateFork` / `atStateFork`.
- Shorten comment in `light_client.nim` to keep line width
- Do not rename `stateFork` mention in `runProposalForkchoiceUpdated`.
- Do not rename `stateFork` in `getStateField(dag.headState, fork)`
Rest is just a mechanical mass replace
* Update sync to use post-decoupling RPCs
blob_sidecars_by_range returns a flat list of sidecars, which must
then be grouped per-slot.
* Add test for groupBlobs
* createBlobs: convert proc to func
* Support for driving multiple EL nodes from a single Nimbus BN
Full list of changes:
* Eth1Monitor has been renamed to ELManager to match its current
responsibilities better.
* The ELManager is no longer optional in the code (it won't have
a nil value under any circumstances).
* The support for subscribing for headers was removed as it only
worked with WebSockets and contributed significant complexity
while bringing only a very minor advantage.
* The `--web3-url` parameter has been deprecated in favor of a
new `--el` parameter. The new parameter has a reasonable default
value and supports specifying a different JWT for each connection.
Each connection can also be configured with a different set of
responsibilities (e.g. download deposits, validate blocks and/or
produce blocks). On the command-line, these properties can be
configured through URL properties stored in the #anchor part of
the URL. In TOML files, they come with a very natural syntax
(althrough the URL scheme is also supported).
* The previously scattered EL-related state and logic is now moved
to `eth1_monitor.nim` (this module will be renamed to `el_manager.nim`
in a follow-up commit). State is assigned properly either to the
`ELManager` or the to individual `ELConnection` objects where
appropriate.
The ELManager executes all Engine API requests against all attached
EL nodes, in parallel. It compares their results and if there is a
disagreement regarding the validity of a certain payload, this is
detected and the beacon node is protected from publishing a block
with a potential execution layer consensus bug in it.
The BN provides metrics per EL node for the number of successful or
failed requests for each type Engine API requests. If an EL node
goes offline and connectivity is resoted later, we report the
problem and the remedy in edge-triggered fashion.
* More progress towards implementing Deneb block production in the VC
and comparing the value of blocks produced by the EL and the builder
API.
* Adds a Makefile target for the zhejiang testnet
* Remove use of beacon_block_and_blobs_sidecar topic
This topic goes away with decoupled blocks and blobs.
* remove use of getBeaconBlockAndBlobsSidecarTopic from test
* update nimbus_light_client.nim
This commit removes ForkySignedBeaconBlockMaybeBlobs and all
references. I tried to pull that thread only as little as was needed
to get rid of it. Left a placeholder BlobSidecar array (in lieu of
Opt[BlobsSidecar]) in a few places; this will be used as we rebuild
the decoupled implementation.
* Local sim impovements
* Added support for running Capella and EIP-4844 simulations
by downloading the correct version of Geth.
* Added support for using Nimbus remote signer and Web3Signer.
Use 2 out of 3 threshold signing configuration in the mainnet
configuration and regular remote signing in the minimal one.
* The local testnet simulation can now use a payload builder.
This is currently not activated in CI due to lack of automated
procedures for installing third-party relays or builders.
You are adviced to use mergemock for now, but for most realistic
results, we can create a simple builder based on the nimbus-eth1
codebase that will be able to propose transactions from the regular
network mempool.
* Start the simulation from a merged state. This would allow us
to start removing pre-merge functionality such as the gossip
subsciption logic. The commit also removes the merge-forcing
hack installed after the TTD removal.
* Consolidate all the tools used in the local simulation into a
single `ncli_testnet` binary.
* restore doppelganger check on connectivity loss
https://github.com/status-im/nimbus-eth2/pull/4398 introduced a
regression in functionality where doppelganger detection would not be
rerun during connectivity loss. This PR reintroduces this check and
makes some adjustments to the implementation to simplify the code flow
for both BN and VC.
* track when check was last performed for each validator (to deal with
late-added validators)
* track when we performed a doppel-detectable activity (attesting) so as
to avoid false positives
* remove nodeStart special case (this should be treated the same as
adding a validator dynamically just after startup)
* allow sync committee duties in doppelganger period
* don't trigger doppelganger when registering duties
* fix crash when expected index response is missing
* fix missing slashingSafe propagation
Other changes:
Renamed the `EIP_4844_FORK_*` config constants to `DENEB_FORK_*` as
this matches the latest spec and it's already used in the official
Sepolia config.
While syncing the finalized portion of the chain, the execution client
cannot efficiently sync and most of the time returns `SYNCING` - in this
PR, we use CL-verified optmistic sync as long as the block is claimed to
be finalized, only occasionally updating the EL with progress.
Although a peer might lie about what is finalized and what isn't,
eventually we'll call the execution client - thus, all a dishonest
client can do is delay execution verification slightly. Gossip blocks in
particular are never assumed to be finalized.
Extends fork choice state to also track slot numbers to improve accuracy
of `/eth/v1/debug/fork_choice` endpoint. Autoenable this API on devnet,
and disable some extra checks on devnet to aid focused testing efforts.
Align fork choice pruning logic with API based on checkpoints vs root.
* exit/validatorchange pool includes BLS to execution messages; REST
support for new pool
* catch failed individual futures
* increase BLS changes bound and keep BLS seen consistent with subpool
* deque capacities should be powers of 2
* Refactor block/blobs types
Use type system to enforce invariant that a pre-4844 block cannot have
a sidecar.
* Update beacon_chain/nimbus_beacon_node.nim
Co-authored-by: tersec <tersec@users.noreply.github.com>
* review feedback
Co-authored-by: tersec <tersec@users.noreply.github.com>
By enabling the validator monitor, more precise information about the
lifecycle of an attestation is logged at the higher `NOTICE` log level
while current `sent` messages are logged at `INF` instead, since they
are less interesting.
In particular, missed attestations and those that vote for the wrong
head are now detected and logged at NOTICE.
In addition to logging, this feature enables rich metrics around
attestation and sync committee performance - by default, validators are
tracked in aggregate but a detailed mode exists as well
This feature has been available since early Nimbus days, but it has now
been tuned and optimised such that it is safe to enable by default, even
for large setups.
* enable automatic validator monitoring by default
* replace `--validator-monitor-totals` flag with
`--validator-monitor-details` - the detailed mode is disabled by default
* lower "sent" log level to `INF` for several messages - in particular
those that are traced by the validator monitor
This is a retake on #3531 which was later reverted in #3578.
Distinguish between those code locations that need to be updated on each
light client data format change, and those others that should generally
be fine, as long as a valid light client object is processed.
The former are tagged with static assert for `LightClientDataFork.high`.
The latter are changed to `lcDataFork > LightClientDataFork.None` to
indicate that they depend only on presence of any valid object.
Also bundled a few minor cleanups and fixes.
Also add `Forky` type for `LightClientStore` and minor fixes / cleanups.
In a future fork, light client data will be extended with execution info
to support more use cases. To anticipate such an upgrade, introduce
`Forky` and `Forked` types, and ready the database schema.
Because the mapping of sync committee periods to fork versions is not
necessarily unique (fork schedule not in sync with period boundaries),
an additional column is added to `period` -> `LightClientUpdate` table.
Introduce (optional) pruning of historical data - a pruned node will
continue to answer queries for historical data up to
`MIN_EPOCHS_FOR_BLOCK_REQUESTS` epochs, or roughly 5 months, capping
typical database usage at around 60-70gb.
To enable pruning, add `--history=prune` to the command line - on the
first start, old data will be cleared (which may take a while) - after
that, data is pruned continuously.
When pruning an existing database, the database will not shrink -
instead, the freed space is recycled as the node continues to run - to
free up space, perform a trusted node sync with a fresh database.
When switching on archive mode in a pruned node, history is retained
from that point onwards.
History pruning is scheduled to be enabled by default in a future
release.
In this PR, `minimal` mode from #4419 is not implemented meaning
retention periods for states and blocks are always the same - depending
on user demand, a future PR may implement `minimal` as well.
With https://github.com/status-im/nimbus-eth2/pull/4420 implemented, the
checks that we perform are equivalent to those of a `SYNCING` EL - as
such, we can treat missing EL the same as SYNCING and proceed with an
optimistic sync.
This mode of operation significantly speeds up recovery after an offline
EL event because the CL is already synced and can immediately inform the
EL of the latest head.
It also allows using a beacon node for consensus archival queries
without an execution client.
* deprecate `--optimistic` flag
* log block details on EL error, soften log level because we can now
continue to operate
* `UnviableFork` -> `Invalid` when block hash verification fails -
failed hash verification is not a fork-related block issue
When backfilling, we only need to download blocks that are newer than
MIN_EPOCHS_FOR_BLOCK_REQUESTS - the rest cannot reliably be fetched from
the network and does not have to be provided to others.
This change affects only trusted-node-synced clients - genesis sync
continues to work as before (because it needs to construct a state by
building it from genesis).
Those wishing to complete a backfill should do so with era files
instead.
Trigger ANSI art on upgrade to Capella, similar to the merge.
Future extension could log blinking art when user successfully managed
to get BLS to Execution change included into a block for a validator.
Art created by http://beatscribe.com/ (beatscribe#1008 on Discord)
* fix REST liveness endpoint responding even when gossip is not enabled
* fix VC exit code on doppelganger hit
* fix activation epoch not being updated correctly on long deposit
queues
* fix activation epoch being set incorrectly when updating validator
* move most implementation logic to `validator_pool`, add tests
* ensure consistent logging between VC and BN
* add docs
Other changes:
* More optimal search for TTD block.
* Add timeouts to all REST requests during trusted node sync.
Fixes#4037
* Removed support for storing a deposit snapshot in the network
metadata.
* Initial commit.
* NextAttestationEntry type.
* Add doppelgangerCheck and actual check.
* Recover deleted check.
* Remove NextAttestainEntry changes.
* More cleanups for NextAttestationEntry.
* Address review comments.
* Remove GENESIS_EPOCH specific check branch.
* Decrease number of full epochs for doppelganger check in VC.
Co-authored-by: zah <zahary@status.im>
We currently use `BlockError` for both beacon blocks and LC objects.
In light of EIP4844, we will likely also use it for blob sidecars.
To avoid confusion, renaming it to a more generic `VerifierError`,
and update its documentation to be more generic.
To avoid long lines as a followup, also renaming the `block_processor`'s
`BlockProcessingCompleted.completed`->`ProcessingStatus.completed` and
`BlockProcessingCompleted.notCompleted`->`ProcessingStatus.notCompleted`
This PR removes a bunch of code to make TNS aware of era files, avoiding
a duplicated backfill when era files are available.
* reuse chaindag for loading backfill state, replacing the TNS homebrew
* fix era block iteration to skip empty slots
* add tests for `can_advance_slots`
* move duty tracking code to `ActionTracker`
* fix earlier duties overwriting later ones
* re-run subnet selection when new duty appears
* log upcoming duties as soon as they're known (vs 4 epochs before)
Currently, we require genesis and a checkpoint block and state to start
from an arbitrary slot - this PR relaxes this requirement so that we can
start with a state alone.
The current trusted-node-sync algorithm works by first downloading
blocks until we find an epoch aligned non-empty slot, then downloads the
state via slot.
However, current
[proposals](https://github.com/ethereum/beacon-APIs/pull/226) for
checkpointing prefer finalized state as
the main reference - this allows more simple access control and caching
on the server side - in particular, this should help checkpoint-syncing
from sources that have a fast `finalized` state download (like infura
and teku) but are slow when accessing state via slot.
Earlier versions of Nimbus will not be able to read databases created
without a checkpoint block and genesis. In most cases, backfilling makes
the database compatible except where genesis is also missing (custom
networks).
* backfill checkpoint block from libp2p instead of checkpoint source,
when doing trusted node sync
* allow starting the client without genesis / checkpoint block
* perform epoch start slot lookahead when loading tail state, so as to
deal with the case where the epoch start slot does not have a block
* replace `--blockId` with `--state-id` in TNS command line
* when replaying, also look at the parent of the last-known-block (even
if we don't have the parent block data, we can still replay from a
"parent" state) - in particular, this clears the way for implementing
state pruning
* deprecate `--finalized-checkpoint-block` option (no longer needed)
In optimistic mode, Nimbus will sync optimistically even when the
execution client is offline / not available.
An optimistic node is less secure because it has not validated block
transactions via the execution client and can thus not be used for
validation duties.
* Allow chain dag without genesis / block
This PR enables the initialization of the dag without access to blocks
or genesis state - it is a prerequisite for implementing a number of
interesting features:
* checkpoint sync without any block download
* pruning of blocks and states
* backfill checkpoint block
Allow config of deployment phase via config instead of attempting to
derive from genesis content (when running relevant testnets), so that
we don't have to keep maintaining the list inside the binary.
The `eth1_monitor` check to require engine API from bellatrix onward
has issues in setups where the EL and CL are started simultaneously
because the EL may not be ready to answer requests by the time that the
check is performed. This can be observed, e.g., on Raspberry Pi 4 when
using Besu as the EL client. Now that the merge transition happened, the
check is also not that useful anymore, as users have other ways to know
that their setup is not working correctly (e.g., repeated exchange logs)
Since these files may have been created in a previous run or manually,
we want to keep loading them even on nodes that don't enable the
keystore API (for example static setups)
Other changes:
* log keystore loading progressively (#3699)
* print initial fee recipient when loading validators
* log dynamic fee recipient updates
* more efficient forkchoiceUpdated usage
* await rather than asyncSpawn; ensure head update before dag.updateHead
* use action tracker rather than attached validators to check for next slot proposal; use wall slot + 1 rather than state slot + 1 to correctly check when missing blocks
* re-add two-fcU case for when newPayload not VALID
* check dynamicFeeRecipientsStore for potential proposal
* remove duplicate checks for whether next proposer
When the BN-embedded LC makes sync progress, pass the corresponding
execution block hash to the EL via `engine_forkchoiceUpdatedV1`.
This allows the EL to sync to wall slot while the chain DAG is behind.
Renamed `--light-client` to `--sync-light-client` for clarity, and
`--light-client-trusted-block-root` to `--trusted-block-root` for
consistency with `nimbus_light_client`.
Note that this does not work well in practice at this time:
- Geth sticks to the optimistic sync:
"Ignoring payload while snap syncing" (when passing the LC head)
"Forkchoice requested unknown head" (when updating to LC head)
- Nethermind syncs to LC head but does not report ancestors as VALID,
so the main forward sync is still stuck in optimistic mode:
"Pre-pivot block, ignored and returned Syncing"
To aid EL client teams in fixing those issues, having this available
as a hidden option is still useful.