* restore doppelganger check on connectivity loss
https://github.com/status-im/nimbus-eth2/pull/4398 introduced a
regression in functionality where doppelganger detection would not be
rerun during connectivity loss. This PR reintroduces this check and
makes some adjustments to the implementation to simplify the code flow
for both BN and VC.
* track when check was last performed for each validator (to deal with
late-added validators)
* track when we performed a doppel-detectable activity (attesting) so as
to avoid false positives
* remove nodeStart special case (this should be treated the same as
adding a validator dynamically just after startup)
* allow sync committee duties in doppelganger period
* don't trigger doppelganger when registering duties
* fix crash when expected index response is missing
* fix missing slashingSafe propagation
While syncing the finalized portion of the chain, the execution client
cannot efficiently sync and most of the time returns `SYNCING` - in this
PR, we use CL-verified optmistic sync as long as the block is claimed to
be finalized, only occasionally updating the EL with progress.
Although a peer might lie about what is finalized and what isn't,
eventually we'll call the execution client - thus, all a dishonest
client can do is delay execution verification slightly. Gossip blocks in
particular are never assumed to be finalized.
* exit/validatorchange pool includes BLS to execution messages; REST
support for new pool
* catch failed individual futures
* increase BLS changes bound and keep BLS seen consistent with subpool
* deque capacities should be powers of 2
* Refactor block/blobs types
Use type system to enforce invariant that a pre-4844 block cannot have
a sidecar.
* Update beacon_chain/nimbus_beacon_node.nim
Co-authored-by: tersec <tersec@users.noreply.github.com>
* review feedback
Co-authored-by: tersec <tersec@users.noreply.github.com>
In a future fork, light client data will be extended with execution info
to support more use cases. To anticipate such an upgrade, introduce
`Forky` and `Forked` types, and ready the database schema.
Because the mapping of sync committee periods to fork versions is not
necessarily unique (fork schedule not in sync with period boundaries),
an additional column is added to `period` -> `LightClientUpdate` table.
* correctly report ignored contributions in metrics
* avoid counting subset contributions in vmon (bring in line with
attestation aggregates)
* avoid signature checks for subset attestations
A being a non-strict subset is a sufficient condition to ignore.
* fix REST liveness endpoint responding even when gossip is not enabled
* fix VC exit code on doppelganger hit
* fix activation epoch not being updated correctly on long deposit
queues
* fix activation epoch being set incorrectly when updating validator
* move most implementation logic to `validator_pool`, add tests
* ensure consistent logging between VC and BN
* add docs
* Initial commit.
* NextAttestationEntry type.
* Add doppelgangerCheck and actual check.
* Recover deleted check.
* Remove NextAttestainEntry changes.
* More cleanups for NextAttestationEntry.
* Address review comments.
* Remove GENESIS_EPOCH specific check branch.
* Decrease number of full epochs for doppelganger check in VC.
Co-authored-by: zah <zahary@status.im>
When the client was started without any validators, the doppelganger
detection structures were never initialized properly. Later, when
validators were added through the Keymanager API, they interacted
with the uninitialized doppelganger detection structures and their
duties were inappropriately skipped.
Whether new blocks/attestations/etc are produced internally or received
via REST, their journey through the node is the same - to ensure that
they get the same treatment (logging, metrics, processing), this PR
moves the routing to a dedicated module and fixes several small
differences that existed before.
* `xxxValidator` -> `processMessageName` - the processor also was adding
messages to pools, so we want the name to reflect that action
* add missing "sent" metrics for some messages
* document ignore policy better - already-seen messages are not actaully
rebroadcast by libp2p
* skip redundant signature checks for internal validators consistently
Adds a `LightClient` instance to the beacon node as preparation to
accelerate syncing in the future (optimistic sync).
- `--light-client-enable` turns on the feature
- `--light-client-trusted-block-root` configures block to start from
If no block root is configured, light client tracks DAG `finalizedHead`.
Introduces a new library for syncing using libp2p based light client
sync protocol, and adds a new `nimbus_light_client` executable that uses
this library for syncing. The new executable emits log messages when
new beacon block headers are received, and is integrated into testing.
Incorporates the latest changes to the light client sync protocol based
on Devconnect AMS feedback. Note that this breaks compatibility with the
previous prototype, due to changes to data structures and endpoints.
See https://github.com/ethereum/consensus-specs/pull/2802
* deactivate Doppelganger Protection during genesis
* also don't actually flag supposed-doppelgangers (because they're before broadcastStartEpoch) on GENESIS_SLOT start
These use a separate flow, and were previously only registered from the
network
* don't log successes in totals mode (TMI)
* remove `attestation-sent` event which is unused
* limit by-root requests to non-finalized blocks
Presently, we keep a mapping from block root to `BlockRef` in memory -
this has simplified reasoning about the dag, but is not sustainable with
the chain growing.
We can distinguish between two cases where by-root access is useful:
* unfinalized blocks - this is where the beacon chain is operating
generally, by validating incoming data as interesting for future fork
choice decisions - bounded by the length of the unfinalized period
* finalized blocks - historical access in the REST API etc - no bounds,
really
In this PR, we limit the by-root block index to the first use case:
finalized chain data can more efficiently be addressed by slot number.
Future work includes:
* limiting the `BlockRef` horizon in general - each instance is 40
bytes+overhead which adds up - this needs further refactoring to deal
with the tail vs state problem
* persisting the finalized slot-to-hash index - this one also keeps
growing unbounded (albeit slowly)
Anyway, this PR easily shaves ~128mb of memory usage at the time of
writing.
* No longer honor `BeaconBlocksByRoot` requests outside of the
non-finalized period - previously, Nimbus would generously return any
block through this libp2p request - per the spec, finalized blocks
should be fetched via `BeaconBlocksByRange` instead.
* return `Opt[BlockRef]` instead of `nil` when blocks can't be found -
this becomes a lot more common now and thus deserves more attention
* `dag.blocks` -> `dag.forkBlocks` - this index only carries unfinalized
blocks from now - `finalizedBlocks` covers the other `BlockRef`
instances
* in backfill, verify that the last backfilled block leads back to
genesis, or panic
* add backfill timings to log
* fix missing check that `BlockRef` block can be fetched with
`getForkedBlock` reliably
* shortcut doppelganger check when feature is not enabled
* in REST/JSON-RPC, fetch blocks without involving `BlockRef`
* fix dag.blocks ref
Time in the beacon chain is expressed relative to the genesis time -
this PR creates a `beacon_time` module that collects helpers and
utilities for dealing the time units - the new module does not deal with
actual wall time (that's remains in `beacon_clock`).
Collecting the time related stuff in one place makes it easier to find,
avoids some circular imports and allows more easily identifying the code
actually needs wall time to operate.
* move genesis-time-related functionality into `spec/beacon_time`
* avoid using `chronos.Duration` for time differences - it does not
support negative values (such as when something happens earlier than it
should)
* saturate conversions between `FAR_FUTURE_XXX`, so as to avoid
overflows
* fix delay reporting in validator client so it uses the expected
deadline of the slot, not "closest wall slot"
* simplify looping over the slots of an epoch
* `compute_start_slot_at_epoch` -> `start_slot`
* `compute_epoch_at_slot` -> `epoch`
A follow-up PR will (likely) introduce saturating arithmetic for the
time units - this is merely code moves, renames and fixing of small
bugs.
With checkpoint sync in particular, and state pruning in the future,
loading states or state-dependent data may fail. This PR adjusts the
code to allow this to be handled gracefully.
In particular, the new availability assumption is that states are always
available for the finalized checkpoint and newer, but may fail for
anything older.
The `tail` remains the point where state loading de-facto fails, meaning
that between the tail and the finalized checkpoint, we can still get
historical data (but code should be prepared to handle this as an
error).
However, to harden the code against long replays, several operations
which are assumed to work only with non-final data (such as gossip
verification and validator duties) now limit their search horizon to
post-finalized data.
* harden several state-dependent operations by logging an error instead
of introducing a panic when state loading fails
* `withState` -> `withUpdatedState` to differentiate from the other
`withState`
* `updateStateData` can now fail if no state is found in database - it
is also hardened against excessively long replays
* `getEpochRef` can now fail when replay fails
* reject blocks with invalid target root - they would be ignored
previously
* fix recursion bug in `isProposed`
* log doppelganger detection when it activates and when it causes missed
duties
* less prominent eth1 sync progress
* log in-progress sync at notice only when actually missing duties
* better detail in replay log
* don't log finalization checkpoints - this is quite verbose when
syncing and already included in "Slot start"
* support GOSSIP_MAX_SIZE_MERGE-sized blocks; prevent fork choice clock stutter via aggregate attestations
* relay max gossip size to libp2p, use tight uncompressed bounds for fixed-size messages
* Update beacon_chain/networking/eth2_network.nim
Co-authored-by: Jacek Sieka <jacek@status.im>
* Update beacon_chain/networking/eth2_network.nim
Co-authored-by: Jacek Sieka <jacek@status.im>
Co-authored-by: Jacek Sieka <jacek@status.im>
* use v1.1.6 test vectors; use BeaconTime instead of Slot in fork choice
* tick through every slot at least once
* use div INTERVALS_PER_SLOT and use precomputed constants of them
* use correct (even if numerically equal) constant
Validator monitoring based on and mostly compatible with the
implementation in Lighthouse - tracks additional logs and metrics for
specified validators so as to stay on top on performance.
The implementation works more or less the following way:
* Validator pubkeys are singled out for monitoring - these can be
running on the node or not
* For every action that the validator takes, we record steps in the
process such as messages being seen on the network or published in the
API
* When the dust settles at the end of an epoch, we report the
information from one epoch before that, which coincides with the
balances being updated - this is a tradeoff between being correct
(waiting for finalization) and providing relevant information in a
timely manner)
* batch-verify sync messages for a small perf boost
Generally reuses the same structure as attestation and aggregate
verification
* normalize `signatures` and `signature_batch` to use the same pattern
of verification
* normalize parameter names, order etc for signature stuff in general
* avoid calling `blsSign` directly - instead, go through `signatures`
consistently
* move quarantine outside of chaindag
The quarantine has been part of the ChainDAG for the longest time, but
this design has a few issues:
* the function in which blocks are verified and added to the dag becomes
reentrant and therefore difficult to reason about - we're currently
using a stateful flag to work around it
* quarantined blocks bypass the processing queue leading to a processing
stampede
* the quarantine flow is unsuitable for orphaned attestations - these
should also should be quarantined eventually
Instead of processing the quarantine inside ChainDAG, this PR moves
re-queueing to `block_processor` which already is responsible for
dealing with follow-up work when a block is added to the dag
This sets the stage for keeping attestations in the quarantine as well.
Also:
* make `BlockError` `{.pure.}`
* avoid use of `ValidationResult` in block clearance (that's for gossip)
Renames and cleanups split out from the validator monitoring branch, so
as to reduce conflict area vs other PR:s
* add constants for expected message timing
* name validators after the messages they validate, mostly, to make
grepping easier
* unify field naming of EpochInfo across forks to make cross-fork code
easier