* Support for driving multiple EL nodes from a single Nimbus BN
Full list of changes:
* Eth1Monitor has been renamed to ELManager to match its current
responsibilities better.
* The ELManager is no longer optional in the code (it won't have
a nil value under any circumstances).
* The support for subscribing for headers was removed as it only
worked with WebSockets and contributed significant complexity
while bringing only a very minor advantage.
* The `--web3-url` parameter has been deprecated in favor of a
new `--el` parameter. The new parameter has a reasonable default
value and supports specifying a different JWT for each connection.
Each connection can also be configured with a different set of
responsibilities (e.g. download deposits, validate blocks and/or
produce blocks). On the command-line, these properties can be
configured through URL properties stored in the #anchor part of
the URL. In TOML files, they come with a very natural syntax
(althrough the URL scheme is also supported).
* The previously scattered EL-related state and logic is now moved
to `eth1_monitor.nim` (this module will be renamed to `el_manager.nim`
in a follow-up commit). State is assigned properly either to the
`ELManager` or the to individual `ELConnection` objects where
appropriate.
The ELManager executes all Engine API requests against all attached
EL nodes, in parallel. It compares their results and if there is a
disagreement regarding the validity of a certain payload, this is
detected and the beacon node is protected from publishing a block
with a potential execution layer consensus bug in it.
The BN provides metrics per EL node for the number of successful or
failed requests for each type Engine API requests. If an EL node
goes offline and connectivity is resoted later, we report the
problem and the remedy in edge-triggered fashion.
* More progress towards implementing Deneb block production in the VC
and comparing the value of blocks produced by the EL and the builder
API.
* Adds a Makefile target for the zhejiang testnet
* Fix issue when VC unable to detect errors properly and act accordingly.
Switch all API functions used by VC to RestPlainResponse, this allows us to print errors returned by BN servers.
* Fix issue when prepareBeaconCommitteeSubnet() do not perform actions when BN is optimistically synced only.
* Fix Defect issue.
* Fix submit/publish returning `false` when operation was successful.
* Address review comments.
* Fix some client calls unable to receive `execution_optimistic` field, mark BN as OptSynced when such request has been made.
* Adjust warning levels.
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
* fix REST liveness endpoint responding even when gossip is not enabled
* fix VC exit code on doppelganger hit
* fix activation epoch not being updated correctly on long deposit
queues
* fix activation epoch being set incorrectly when updating validator
* move most implementation logic to `validator_pool`, add tests
* ensure consistent logging between VC and BN
* add docs
Since the sync committee duties are no longer updated on every slot
and previously the sync committee aggregators selection proofs were
generated during the duties update, this now resulted in the client
using stale selection proofs (they must be generated at each slot).
The fix consists of moving the selection proof generation logic in
a different function which is properly executed on each slot.
Other changes:
* The logtrace tool has been enhanced with a framework for adding
new simpler log aggregation and analysis algorithms.
The default CI testnet simulation will now ensure that the blocks
in the network have reasonable sync committee participation.
* implement several capellaImplementationMissing points
* don't register validator activity for not-active validators
* don't check validator indices already coming out of committees which exist; must be active validators, or else other deeper bugs
* move duty tracking code to `ActionTracker`
* fix earlier duties overwriting later ones
* re-run subnet selection when new duty appears
* log upcoming duties as soon as they're known (vs 4 epochs before)
* more efficient forkchoiceUpdated usage
* await rather than asyncSpawn; ensure head update before dag.updateHead
* use action tracker rather than attached validators to check for next slot proposal; use wall slot + 1 rather than state slot + 1 to correctly check when missing blocks
* re-add two-fcU case for when newPayload not VALID
* check dynamicFeeRecipientsStore for potential proposal
* remove duplicate checks for whether next proposer
In order to avoid full replays when validating attestations hailing from
untaken forks, it's better to keep shufflings separate from `EpochRef`
and perform a lookahead on the shuffling when processing the block that
determines them.
This also helps performance in the case where REST clients are trying to
perform lookahead on attestation duties and decreases memory usage by
sharing shufflings between EpochRef instances of the same dependent
root.
Whether new blocks/attestations/etc are produced internally or received
via REST, their journey through the node is the same - to ensure that
they get the same treatment (logging, metrics, processing), this PR
moves the routing to a dedicated module and fixes several small
differences that existed before.
* `xxxValidator` -> `processMessageName` - the processor also was adding
messages to pools, so we want the name to reflect that action
* add missing "sent" metrics for some messages
* document ignore policy better - already-seen messages are not actaully
rebroadcast by libp2p
* skip redundant signature checks for internal validators consistently
* optimistic sync
* flag that initially loaded blocks from database might need execution block root filled in
* return optimistic status in REST calls
* refactor blockslot pruning
* ensure beacon_blocks_by_{root,range} do not provide optimistic blocks
* handle forkchoice head being pre-merge with block being postmerge
* re-enable blocking head updates on validator duties
* fix is_optimistic_candidate_block per spec; don't crash with nil future
* fix is_optimistic_candidate_block per spec; don't crash with nil future
* mark blocks sans execution payloads valid during head update
* Initial commit
* Make `events` API spec compliant.
* Add `Eth-Consensus-Version` in responses.
* Bump chronos to get redirect with headers working.
* Add `is_optimistic` field and handling to syncing RestSyncInfo.
* SSZ `[]` -> `mitem`
* `[]` -> `item`
immutable access via mutable instance cannot rely on template
overloading, and `[]` cannot be a `func` because of special seq handling
in compiler.
Other changes:
* logtrace can now verify sync committee messages and contributions
* Many unnecessary use of pairs() have been removed for consistency
* Map 40x BN response codes to BeaconNodeStatus.Incompatible in the VC
* harden validator API against pre-finalized slot requests
* check `syncHorizon` when responding to validator api requests too far
from `head`
* limit state-id based requests to one epoch ahead of `head`
* put historic data bounds on block/attestation/etc validator production API, preventing them from being used with already-finalized slots
* add validator block smoke tests
* make rest test create a new genesis with the tests running roughly in
the first epoch to allow testing a few more boundary conditions
Up til now, the block dag has been using `BlockRef`, a structure adapted
for a full DAG, to represent all of chain history. This is a correct and
simple design, but does not exploit the linearity of the chain once
parts of it finalize.
By pruning the in-memory `BlockRef` structure at finalization, we save,
at the time of writing, a cool ~250mb (or 25%:ish) chunk of memory
landing us at a steady state of ~750mb normal memory usage for a
validating node.
Above all though, we prevent memory usage from growing proportionally
with the length of the chain, something that would not be sustainable
over time - instead, the steady state memory usage is roughly
determined by the validator set size which grows much more slowly. With
these changes, the core should remain sustainable memory-wise post-merge
all the way to withdrawals (when the validator set is expected to grow).
In-memory indices are still used for the "hot" unfinalized portion of
the chain - this ensure that consensus performance remains unchanged.
What changes is that for historical access, we use a db-based linear
slot index which is cache-and-disk-friendly, keeping the cost for
accessing historical data at a similar level as before, achieving the
savings at no percievable cost to functionality or performance.
A nice collateral benefit is the almost-instant startup since we no
longer load any large indicies at dag init.
The cost of this functionality instead can be found in the complexity of
having to deal with two ways of traversing the chain - by `BlockRef` and
by slot.
* use `BlockId` instead of `BlockRef` where finalized / historical data
may be required
* simplify clearance pre-advancement
* remove dag.finalizedBlocks (~50:ish mb)
* remove `getBlockAtSlot` - use `getBlockIdAtSlot` instead
* `parent` and `atSlot` for `BlockId` now require a `ChainDAGRef`
instance, unlike `BlockRef` traversal
* prune `BlockRef` parents on finality (~200:ish mb)
* speed up ChainDAG init by not loading finalized history index
* mess up light client server error handling - this need revisiting :)
One more step on the journey to reduce `BlockRef` usage across the
codebase - this one gets rid of `StateData` whose job was to keep track
of which block was last assigned to a state - these duties have now been
taken over by `latest_block_root`, a fairly recent addition that
computes this block root from state data (at a small cost that should be
insignificant)
99% mechanical change.
* fewer deps on `BlockRef` traversal in anticipation of pruning
* allows identifying EpochRef:s by their shuffling as a first step of
* tighten error handling around missing blocks
using the zero hash for signalling "missing block" is fragile and easy
to miss - with checkpoint sync now, and pruning in the future, missing
blocks become "normal".
* limit by-root requests to non-finalized blocks
Presently, we keep a mapping from block root to `BlockRef` in memory -
this has simplified reasoning about the dag, but is not sustainable with
the chain growing.
We can distinguish between two cases where by-root access is useful:
* unfinalized blocks - this is where the beacon chain is operating
generally, by validating incoming data as interesting for future fork
choice decisions - bounded by the length of the unfinalized period
* finalized blocks - historical access in the REST API etc - no bounds,
really
In this PR, we limit the by-root block index to the first use case:
finalized chain data can more efficiently be addressed by slot number.
Future work includes:
* limiting the `BlockRef` horizon in general - each instance is 40
bytes+overhead which adds up - this needs further refactoring to deal
with the tail vs state problem
* persisting the finalized slot-to-hash index - this one also keeps
growing unbounded (albeit slowly)
Anyway, this PR easily shaves ~128mb of memory usage at the time of
writing.
* No longer honor `BeaconBlocksByRoot` requests outside of the
non-finalized period - previously, Nimbus would generously return any
block through this libp2p request - per the spec, finalized blocks
should be fetched via `BeaconBlocksByRange` instead.
* return `Opt[BlockRef]` instead of `nil` when blocks can't be found -
this becomes a lot more common now and thus deserves more attention
* `dag.blocks` -> `dag.forkBlocks` - this index only carries unfinalized
blocks from now - `finalizedBlocks` covers the other `BlockRef`
instances
* in backfill, verify that the last backfilled block leads back to
genesis, or panic
* add backfill timings to log
* fix missing check that `BlockRef` block can be fetched with
`getForkedBlock` reliably
* shortcut doppelganger check when feature is not enabled
* in REST/JSON-RPC, fetch blocks without involving `BlockRef`
* fix dag.blocks ref
Using the wrong type here causes requests to fail due to the overly
zealous parameter validation - the failure is harmless in the current
duty subscription model, but would have caused more serious failures
down the line.
Time in the beacon chain is expressed relative to the genesis time -
this PR creates a `beacon_time` module that collects helpers and
utilities for dealing the time units - the new module does not deal with
actual wall time (that's remains in `beacon_clock`).
Collecting the time related stuff in one place makes it easier to find,
avoids some circular imports and allows more easily identifying the code
actually needs wall time to operate.
* move genesis-time-related functionality into `spec/beacon_time`
* avoid using `chronos.Duration` for time differences - it does not
support negative values (such as when something happens earlier than it
should)
* saturate conversions between `FAR_FUTURE_XXX`, so as to avoid
overflows
* fix delay reporting in validator client so it uses the expected
deadline of the slot, not "closest wall slot"
* simplify looping over the slots of an epoch
* `compute_start_slot_at_epoch` -> `start_slot`
* `compute_epoch_at_slot` -> `epoch`
A follow-up PR will (likely) introduce saturating arithmetic for the
time units - this is merely code moves, renames and fixing of small
bugs.
* Harden CommitteeIndex, SubnetId, SyncSubcommitteeIndex
Harden the use of `CommitteeIndex` et al to prevent future issues by
using a distinct type, then validating before use in several cases -
datatypes in spec are kept simple though so that invalid data still can
be read.
* fix invalid epoch used in REST
`/eth/v1/beacon/states/{state_id}/committees` committee length (could
return invalid data)
* normalize some variable names
* normalize committee index loops
* fix `RestAttesterDuty` to use `uint64` for `validator_committee_index`
* validate `CommitteeIndex` on ingress in REST API
* update rest rules with stricter parsing
* better REST serializers
* save lots of memory by not using `zip` ...at least a few bytes!
* REST cleanups
* reject out-of-range committee requests
* print all hex values as lower-case
* allow requesting state information by head state root
* turn `DomainType` into array (follow spec)
* `uint_to_bytesXX` -> `uint_to_bytes` (follow spec)
* fix wrong dependent root in `/eth/v1/validator/duties/proposer/`
* update documentation - `--subscribe-all-subnets` is no longer needed
when using the REST interface with validator clients
* more fixes
* common helpers for dependent block
* remove test rules obsoleted by more strict epoch tests
* fix trailing commas
* Update docs/the_nimbus_book/src/rest-api.md
* Update docs/the_nimbus_book/src/rest-api.md
Co-authored-by: sacha <sacha@status.im>