* log validator that triggers doppelganger
* move activity detection closer to where it's performed, record
aggregation as activity (since it's part of the liveness endpoint)
* vc: check for doppelgangers in epoch 0 also (so that activity in epoch
1 can happen)
* avoid some looping when processing activities
* Remove use of beacon_block_and_blobs_sidecar topic
This topic goes away with decoupled blocks and blobs.
* remove use of getBeaconBlockAndBlobsSidecarTopic from test
* update nimbus_light_client.nim
This commit removes ForkySignedBeaconBlockMaybeBlobs and all
references. I tried to pull that thread only as little as was needed
to get rid of it. Left a placeholder BlobSidecar array (in lieu of
Opt[BlobsSidecar]) in a few places; this will be used as we rebuild
the decoupled implementation.
* Local sim impovements
* Added support for running Capella and EIP-4844 simulations
by downloading the correct version of Geth.
* Added support for using Nimbus remote signer and Web3Signer.
Use 2 out of 3 threshold signing configuration in the mainnet
configuration and regular remote signing in the minimal one.
* The local testnet simulation can now use a payload builder.
This is currently not activated in CI due to lack of automated
procedures for installing third-party relays or builders.
You are adviced to use mergemock for now, but for most realistic
results, we can create a simple builder based on the nimbus-eth1
codebase that will be able to propose transactions from the regular
network mempool.
* Start the simulation from a merged state. This would allow us
to start removing pre-merge functionality such as the gossip
subsciption logic. The commit also removes the merge-forcing
hack installed after the TTD removal.
* Consolidate all the tools used in the local simulation into a
single `ncli_testnet` binary.
* Initial commit.
* Address review comments and recommendations.
* Fix too often `Execution client not in sync` messages in logs.
* Add failure reason for duties requests.
* Add more reasons to every place of ValidatorApiError.
* Address race condition issue.
* Remove `vc` argument for getFailureReason().
* restore doppelganger check on connectivity loss
https://github.com/status-im/nimbus-eth2/pull/4398 introduced a
regression in functionality where doppelganger detection would not be
rerun during connectivity loss. This PR reintroduces this check and
makes some adjustments to the implementation to simplify the code flow
for both BN and VC.
* track when check was last performed for each validator (to deal with
late-added validators)
* track when we performed a doppel-detectable activity (attesting) so as
to avoid false positives
* remove nodeStart special case (this should be treated the same as
adding a validator dynamically just after startup)
* allow sync committee duties in doppelganger period
* don't trigger doppelganger when registering duties
* fix crash when expected index response is missing
* fix missing slashingSafe propagation
Other changes:
Renamed the `EIP_4844_FORK_*` config constants to `DENEB_FORK_*` as
this matches the latest spec and it's already used in the official
Sepolia config.
* Fix getStateRoot() and getBlockRoot() API functions which should obtain `execution_optimistic` field.
Fix sync committee service to check `execution_optimistic` field of getBlockRoot() response.
* 2nd part.
* Remove presets usage.
We do a linear scan of all pubkeys for each validator and slot - this
becomes expensive with large validator counts.
* normalise BN/VC validator startup logging
* fix crash when host cannot be resolved while adding remote validator
* silence repeated log spam for unknown validators
* print pubkey/index/activation mapping on startup/validator
identification
By pre-seeding the sync committee cache when applying blocks, we avoid a
significantly expensive validator set traversal / sync committee index
construction during sync / block application - 20-30% sync speedup
post-altair.
* also cache/reload total active balance for another cool 10%
While syncing the finalized portion of the chain, the execution client
cannot efficiently sync and most of the time returns `SYNCING` - in this
PR, we use CL-verified optmistic sync as long as the block is claimed to
be finalized, only occasionally updating the EL with progress.
Although a peer might lie about what is finalized and what isn't,
eventually we'll call the execution client - thus, all a dishonest
client can do is delay execution verification slightly. Gossip blocks in
particular are never assumed to be finalized.
Extends fork choice state to also track slot numbers to improve accuracy
of `/eth/v1/debug/fork_choice` endpoint. Autoenable this API on devnet,
and disable some extra checks on devnet to aid focused testing efforts.
Align fork choice pruning logic with API based on checkpoints vs root.
* clean up some Nim 1.2 workarounds
* re-add notes about JS backend
* another proc/noSideEffect -> func
* revert ncli/ncli_common.nim changes; 19969 evidently wasn't backported to 1.6
To allow LC data retention longer than the one for historic states,
introduce persistent DB caches for `current_sync_committee` and
`LightClientHeader` for finalized epoch boundary blocks.
This way, historic `LightClientBootstrap` requests may still be honored
even after pruning. Note that historic `LightClientUpdate` requests are
already answered using fully persisted objects, so don't need changes.
Sync committees and headers are cached on finalization of new data.
For existing data, info is lazily cached on first access.
Co-authored-by: Jacek Sieka <jacek@status.im>
The "on" default for validator monitor details incurs a heavy
performance penalty on large-validator setups - this may cause excess
memory usage or slowdowns when metrics are queried - this PR changes the
default to off, as was intended for the 23.1.0 release.
* debug log upon sidecar validation failure
* Fill in signature catch upon SignedBeaconBlockAndBlobsSidecar deser
* Always fill blobssidecar slot and root
* Skip lastFCU when eth1monitor is nil
* fix
* Use cached root
When the epoch boundary block is missed, we incorrectly assume that the
next couple blocks improve finality, leading to repeated pushes of the
same light client finality update and incorrectly ignoring some gossip.
* combine common implementation of LC helpers
Combine replicated helper code from Altair/Capella/EIP4844 into single
`Forky` based implementation. Also convert `template` to `func` to avoid
selection of incorrect overload.
* fix
* eip4844 beacon block proposals
* Don't fetch blobs under minimal preset
@tersec's summary of the issue:
BlobsBundleV1 in the execution API spec assumes a mainnet preset blob
size, where the EIP4844 consensus spec defines
FIELD_ELEMENTS_PER_BLOB: 4 under the minimal preset, which leads to a
Blob having a length of 4 * 32, not 4096 * 32 which BlobsBundleV1
requires.
* Revert unintentional script change
* exit/validatorchange pool includes BLS to execution messages; REST
support for new pool
* catch failed individual futures
* increase BLS changes bound and keep BLS seen consistent with subpool
* deque capacities should be powers of 2
When accessing DB in `readOnly` mode that does not already have latest
schema, initial writes trigger `attempt to write a readonly database`.
Avoid that by only writing schema when DB is not `readOnly`, and provide
data from legacy tables if such are present.
* Refactor block/blobs types
Use type system to enforce invariant that a pre-4844 block cannot have
a sidecar.
* Update beacon_chain/nimbus_beacon_node.nim
Co-authored-by: tersec <tersec@users.noreply.github.com>
* review feedback
Co-authored-by: tersec <tersec@users.noreply.github.com>
When running `nimbus_light_client`, we persist the latest header from
`LightClientStore.finalized_header` in a database across restarts.
Because the data format is derived from the latest `LightClientStore`,
this could lead to data being persisted in pre-release formats.
To enable us to test later `LightClientStore` versions on devnets,
transition to a `ForkedLightClientStore` internally that is only
migrated to newer forks on-demand (instead of starting at latest).
By enabling the validator monitor, more precise information about the
lifecycle of an attestation is logged at the higher `NOTICE` log level
while current `sent` messages are logged at `INF` instead, since they
are less interesting.
In particular, missed attestations and those that vote for the wrong
head are now detected and logged at NOTICE.
In addition to logging, this feature enables rich metrics around
attestation and sync committee performance - by default, validators are
tracked in aggregate but a detailed mode exists as well
This feature has been available since early Nimbus days, but it has now
been tuned and optimised such that it is safe to enable by default, even
for large setups.
* enable automatic validator monitoring by default
* replace `--validator-monitor-totals` flag with
`--validator-monitor-details` - the detailed mode is disabled by default
* lower "sent" log level to `INF` for several messages - in particular
those that are traced by the validator monitor
This is a retake on #3531 which was later reverted in #3578.
Distinguish between those code locations that need to be updated on each
light client data format change, and those others that should generally
be fine, as long as a valid light client object is processed.
The former are tagged with static assert for `LightClientDataFork.high`.
The latter are changed to `lcDataFork > LightClientDataFork.None` to
indicate that they depend only on presence of any valid object.
Also bundled a few minor cleanups and fixes.
Also add `Forky` type for `LightClientStore` and minor fixes / cleanups.
The light client data structures were changed to accommodate additional
fields in future forks (e.g., to also hold execution data).
There is a minor change to the JSON serialization, where the `header`
properties are now nested inside a `LightClientHeader`.
The SSZ serialization remains compatible.
See https://github.com/ethereum/consensus-specs/pull/3190
and https://github.com/ethereum/beacon-APIs/pull/287
* Working Makefile targets for Capella devnet2
make capella-devnet-2
make clean-capella-devnet-2
You'll need to have https://github.com/tmuxinator/tmuxinator installed.
It's available as a regular package in most Linux distributions or through
Nix or Brew on macOS.
This commit also fixes the initial hang in the Eth1 monitor in the "find
TTD block" procedure through a fix to the network metadata files which
hasn't been upstreamed yet.
Other changes:
* Disabled Geth snap sync in the simulation
When all Geth nodes are configured to run with snap sync enabled, they all
start snap sync after the first forkchoiceUpdated which causes the BNs to
skip validator duties because the EL is syncing. The snap sync never completes
due to poor connectivity between the Geth nodes in the simulation.
In a future fork, light client data will be extended with execution info
to support more use cases. To anticipate such an upgrade, introduce
`Forky` and `Forked` types, and ready the database schema.
Because the mapping of sync committee periods to fork versions is not
necessarily unique (fork schedule not in sync with period boundaries),
an additional column is added to `period` -> `LightClientUpdate` table.
* correctly report ignored contributions in metrics
* avoid counting subset contributions in vmon (bring in line with
attestation aggregates)
* avoid signature checks for subset attestations
A being a non-strict subset is a sufficient condition to ignore.
Bellatrix and Altair light client data share same body, but have other
fork digests. Validate that the peer's sent fork digest matches the one
that we expect (derived from `attested_header.slot`).
Introduce (optional) pruning of historical data - a pruned node will
continue to answer queries for historical data up to
`MIN_EPOCHS_FOR_BLOCK_REQUESTS` epochs, or roughly 5 months, capping
typical database usage at around 60-70gb.
To enable pruning, add `--history=prune` to the command line - on the
first start, old data will be cleared (which may take a while) - after
that, data is pruned continuously.
When pruning an existing database, the database will not shrink -
instead, the freed space is recycled as the node continues to run - to
free up space, perform a trusted node sync with a fresh database.
When switching on archive mode in a pruned node, history is retained
from that point onwards.
History pruning is scheduled to be enabled by default in a future
release.
In this PR, `minimal` mode from #4419 is not implemented meaning
retention periods for states and blocks are always the same - depending
on user demand, a future PR may implement `minimal` as well.
With https://github.com/status-im/nimbus-eth2/pull/4420 implemented, the
checks that we perform are equivalent to those of a `SYNCING` EL - as
such, we can treat missing EL the same as SYNCING and proceed with an
optimistic sync.
This mode of operation significantly speeds up recovery after an offline
EL event because the CL is already synced and can immediately inform the
EL of the latest head.
It also allows using a beacon node for consensus archival queries
without an execution client.
* deprecate `--optimistic` flag
* log block details on EL error, soften log level because we can now
continue to operate
* `UnviableFork` -> `Invalid` when block hash verification fails -
failed hash verification is not a fork-related block issue
When not backfilling all the way to genesis (#4421), it becomes more
useful to start rebuilding the historical indices from an arbitrary
starting point.
To rebuild the index from non-genesis, a state and an unbroken block
history is needed - here, we allow loading the state from an era file
and recreating the history from there onwards.
* speed up partial era state loading
When backfilling, we only need to download blocks that are newer than
MIN_EPOCHS_FOR_BLOCK_REQUESTS - the rest cannot reliably be fetched from
the network and does not have to be provided to others.
This change affects only trusted-node-synced clients - genesis sync
continues to work as before (because it needs to construct a state by
building it from genesis).
Those wishing to complete a backfill should do so with era files
instead.
Trigger ANSI art on upgrade to Capella, similar to the merge.
Future extension could log blinking art when user successfully managed
to get BLS to Execution change included into a block for a validator.
Art created by http://beatscribe.com/ (beatscribe#1008 on Discord)
* 60% state replay speedup
* don't use HashList for epoch participation - in addition to the code
currently clearing the caches several times redundantly, clearing has to
be done each block nullifying the benefit (35%)
* introduce active balance cache - computing it is slow due to cache
unfriendliness in the random access pattern and bounds checking and we
do it for every block - this cache follows the same update pattern as
the active validator index cache (20%)
* avoid recomputing base reward several times per attestation (5%)
Applying 1024 blocks goes from 20s to ~8s on my laptop - these kinds of
requests happen on historical REST queries but also whenever there's a
reorg.
* fix test and diffs
* consolidate consensus spec transition test fixtures
* include capella
* consoliate fork test fixtures
* note change in EIP-4844 process_block in alpha.2
* fix REST liveness endpoint responding even when gossip is not enabled
* fix VC exit code on doppelganger hit
* fix activation epoch not being updated correctly on long deposit
queues
* fix activation epoch being set incorrectly when updating validator
* move most implementation logic to `validator_pool`, add tests
* ensure consistent logging between VC and BN
* add docs
Other changes:
* More optimal search for TTD block.
* Add timeouts to all REST requests during trusted node sync.
Fixes#4037
* Removed support for storing a deposit snapshot in the network
metadata.
* Types and scaffolding for EIP-4844
This commit adds the EIP-4844 spec types, and fills in
scaffolding/boilerplate for the use of these types across the repo.
None of the actual EIP-4844 logic is introduced yet.
This follows the pattern used by @tersec when introducing Capella (#4276).
* use eth2-networks fork
* review feedback: add static check EIP4844_FORK_EPOCH == FAR_FUTURE_EPOCH
* review feedback: remove EIP4844 from /eth/v1/config/spec response
* Cleanup / review feedback
* Fix REST test
Since the sync committee duties are no longer updated on every slot
and previously the sync committee aggregators selection proofs were
generated during the duties update, this now resulted in the client
using stale selection proofs (they must be generated at each slot).
The fix consists of moving the selection proof generation logic in
a different function which is properly executed on each slot.
Other changes:
* The logtrace tool has been enhanced with a framework for adding
new simpler log aggregation and analysis algorithms.
The default CI testnet simulation will now ensure that the blocks
in the network have reasonable sync committee participation.
Adds a "Slot start" log to the LC that behaves similar to BN to inform
the user that the light client is doing something, and to indicate the
latest view of the network (finalized / optimistic).
* implement several capellaImplementationMissing points
* don't register validator activity for not-active validators
* don't check validator indices already coming out of committees which exist; must be active validators, or else other deeper bugs
* Initial commit.
* NextAttestationEntry type.
* Add doppelgangerCheck and actual check.
* Recover deleted check.
* Remove NextAttestainEntry changes.
* More cleanups for NextAttestationEntry.
* Address review comments.
* Remove GENESIS_EPOCH specific check branch.
* Decrease number of full epochs for doppelganger check in VC.
Co-authored-by: zah <zahary@status.im>
The various `PeerScore` constants are used for both beacon blocks and
LC objects, and will likely also find use for EIP4844 blob sidecars.
Renaming them to use more generically applicable names not referring
to blocks explicitly aymore.
We currently use `BlockError` for both beacon blocks and LC objects.
In light of EIP4844, we will likely also use it for blob sidecars.
To avoid confusion, renaming it to a more generic `VerifierError`,
and update its documentation to be more generic.
To avoid long lines as a followup, also renaming the `block_processor`'s
`BlockProcessingCompleted.completed`->`ProcessingStatus.completed` and
`BlockProcessingCompleted.notCompleted`->`ProcessingStatus.notCompleted`
This PR removes a bunch of code to make TNS aware of era files, avoiding
a duplicated backfill when era files are available.
* reuse chaindag for loading backfill state, replacing the TNS homebrew
* fix era block iteration to skip empty slots
* add tests for `can_advance_slots`
Without this, tools like `ncli_db` require command line parameter to
process mainnet blocks, which is a regression since the merge - this is
a subset of #4147.
When the EL/Builder fails to produce an execution payload, we fall back
to an empty `ExecutionPayload`. Even though it contains no transactions
it should refer to the configured fee recipient. This is useful for
privacy reasons (do not reveal the reason for the empty payload) and for
compliance with additional fee recipient rules by staking pools.
* move duty tracking code to `ActionTracker`
* fix earlier duties overwriting later ones
* re-run subnet selection when new duty appears
* log upcoming duties as soon as they're known (vs 4 epochs before)
To further tighten Nimbus against spam, this PR introduces a global
quota for block requests (shared between peers) as well as a general
per-peer request limit that applies to all libp2p requests.
* apply request quota before decoding message
* for high-bandwidth requests (blocks), apply a shared global quota
which helps manage bandwidth for high-peer setups
* add metrics
Currently, we require genesis and a checkpoint block and state to start
from an arbitrary slot - this PR relaxes this requirement so that we can
start with a state alone.
The current trusted-node-sync algorithm works by first downloading
blocks until we find an epoch aligned non-empty slot, then downloads the
state via slot.
However, current
[proposals](https://github.com/ethereum/beacon-APIs/pull/226) for
checkpointing prefer finalized state as
the main reference - this allows more simple access control and caching
on the server side - in particular, this should help checkpoint-syncing
from sources that have a fast `finalized` state download (like infura
and teku) but are slow when accessing state via slot.
Earlier versions of Nimbus will not be able to read databases created
without a checkpoint block and genesis. In most cases, backfilling makes
the database compatible except where genesis is also missing (custom
networks).
* backfill checkpoint block from libp2p instead of checkpoint source,
when doing trusted node sync
* allow starting the client without genesis / checkpoint block
* perform epoch start slot lookahead when loading tail state, so as to
deal with the case where the epoch start slot does not have a block
* replace `--blockId` with `--state-id` in TNS command line
* when replaying, also look at the parent of the last-known-block (even
if we don't have the parent block data, we can still replay from a
"parent" state) - in particular, this clears the way for implementing
state pruning
* deprecate `--finalized-checkpoint-block` option (no longer needed)
* cap maximum number of chunks to download from peer (fixes#1620)
* drop support for requesting blocks via v1 / phase0 protocol
* tighten bounds checking of fixed-size messages
In optimistic mode, Nimbus will sync optimistically even when the
execution client is offline / not available.
An optimistic node is less secure because it has not validated block
transactions via the execution client and can thus not be used for
validation duties.
* Allow chain dag without genesis / block
This PR enables the initialization of the dag without access to blocks
or genesis state - it is a prerequisite for implementing a number of
interesting features:
* checkpoint sync without any block download
* pruning of blocks and states
* backfill checkpoint block
* Fix doppelganger protection reorders validator indices in response issue.
* Add chronos metrics endpoint to nimbus REST API.
* Doppelganger protection now works on duties not on attestations.
Improve logging for doppelganger and indices.
* Improve doppelganger and indices logging.
* Add number of validators to logs.
* Move logging dumps from `debug` to `trace` level.
The LC REST API has been merged into the ethereum/beacon-APIs specs:
- https://github.com/ethereum/beacon-APIs/pull/247
Update URLs to v1 and update REST tests. Note that REST tests do not
start with Altair, so the tested BN will return empty / error responses.
Implements the latest proposal for providing LC data via REST, as of
https://github.com/ethereum/beacon-APIs/pull/247 with a v0 suffix.
Requests:
- `/eth/v0/beacon/light_client/bootstrap/{block_root}`
- `/eth/v0/beacon/light_client/updates?start_period={start_period}&count={count}`
- `/eth/v0/beacon/light_client/finality_update`
- `/eth/v0/beacon/light_client/optimistic_update`
HTTP Server-Sent Events (SSE):
- `light_client_finality_update_v0`
- `light_client_optimistic_update_v0`
Allow config of deployment phase via config instead of attempting to
derive from genesis content (when running relevant testnets), so that
we don't have to keep maintaining the list inside the binary.
The LC REST API has been merged into the ethereum/beacon-APIs specs:
- https://github.com/ethereum/beacon-APIs/pull/247
Update URLs to v1 and update REST tests. Note that REST tests do not
start with Altair, so the tested BN will return empty / error responses.
Allow config of deployment phase via config instead of attempting to
derive from genesis content (when running relevant testnets), so that
we don't have to keep maintaining the list inside the binary.