* extend light client protocol for Electra
Add missing Electra support for light client protocol:
- https://github.com/ethereum/consensus-specs/pull/3811
Tested against PR consensus-spec-tests, the test runner automatically
picks up the new tests once available.
* workaround `version-2-0`: `Error: cannot instantiate: 'SomeUnsignedInt'`
* fix initialization when Electra not scheduled
* try reduce stack size in test
* put correct sync committee branch version into DB
* adjust fork schedule in light client data tests
* further reduce stack size
* split function into multiple parts
* rename variable
* regenerate test reports to cover new Electra tests
* add Nim bug reference
Including sync contributions into a block affects validator rewards.
When we have not received aggregate sync contributions, but have seen
individual messages, we can produce the contributions locally, improving
validator rewards when subscribing to all subnets or when having a
non-aggregating attached validator in the sync committee.
Addresses two inaccuracies in light client data size documentation:
1. `SyncCommittee` pubkeys serialize are 48 bytes not 64 bytes
2. Some of the estimates used 1000 vs 1024 bytes/KB, aligned to 1024
Bellatrix light client data does not contain the EL block hash, so we
had to follow blocks gossip to learn the EL `block_hash` of such blocks.
Now that Bellatrix is obsolete, we can simplify EL syncing logic under
light client scenarios. Bellatrix light client data can still be used
to advance the light client sync itself, but will no longer result in
`engine_forkchoiceUpdated` calls until the sync reaches Capella. This
also frees up some memory as we no longer have to retain blocks.
Using `let contextFork = consensusFork` no longer seems to work to avoid
capturing the `var` loop variable; it ends up being `Electra` for all
handlers. Use `closureScope` as a more sustainable fix.
`ValidatorSig` uses `blob` but `TrustedSig` uses `data`, aligning the
names reduces code duplication and improves clarity. It also simplifies
`StableContainer` compatibility checks.
In nim-web3 all std.Option are replaced by results.Opt. The same goes in nim-eth, with additional fields name changes and GasInt changed from int64 to uint64.
* Beacon node side implementation.
* Validator client side implementation.
* Address review comments and fix the test.
* Only 400 errors could be IndexedErrorMessage, 500 errors are always ErrorMessage.
* Remove VC shutdown functionality.
* Remove magic constants.
* Make arguments more visible and disable default values.
* Address review comments.
`sizeof` also includes padding between fields, while SSZ defines
`fixedPortionSize` (on type) or `sszSize` (on value) to denote
required bytes to encode. Switch forked block/state readers to SSZ size.
As blocks/states are much larger than the padding, this doesn't affect
practical use cases but is slightly more correct this way.
* electra attestation updates
In Electra, we have two attestation formats: on-chain and on-network -
the former combines all committees of a slot in a single committee bit
list.
This PR makes a number of cleanups to move towards fixing this -
attestation packing however still needs to be fixed as it currently
creates attestations with a single committee only which is very
inefficient.
* more attestations in the blocks
* signing and aggregation fixes
* tool fix
* test, import
* Make listen-address default to use dualstack.
* Use correct newProtocol().
* Bump nim-eth.
* Bump nim-eth one more time.
* Use `*` instead of IPv6 address for dualstack sockets.
* Bump chronos and nim-eth.
* Use new constructor.
* Fix listenAddress should be Opt[T] not Option[T].
* Fix options.md.
- add support for setting protocol handlers with `{.raises.}` annotation
- fix: valueOr and withValue utilities
- fix: remove explicit param from GossipSubParams constructor
Add support for using era file for the initial checkpoint block.
This should also avoid an error when the beacon node is restarted
before the backfill process has made any progress (#6059).
The `<` function to compare peers was not exported, leading to the same
peer be acquired over and over again until kick. `mixin` doesn't pull it
into `peerCmp` without `*` export, and with the export no mixin needed.
The `wss_sim` was not properly maintained since Bellatrix. The missing
functionality is now added, including:
- Bellatrix: Connect to an EL for execution payload production
- Capella: Correct withdrawals processing, is mandatory to do
- Deneb: Dump blob sidecars into the output directory
See https://ethresear.ch/t/insecura-my-consensus-for-the-pyrmont-network/11833
Iterating peers should only yield peers present in registry, otherwise
`nil` pointers are returned and depending on comparison function it will
break, see #6149.
When initializing from a state that's not aligned to an epoch boundary,
an earlier state is loaded that's epoch aligned, and subsequently topped
up with the missing blocks. `dag.headSyncCommittee` is initialized prior
to topping up the missing blocks, though. If the sync committee changes
while applying the blocks (e.g., a sync committee period boundary hits),
the cached information becomes unlinked from `dag.head`, leading to
valid blocks based on that chain being rejected. To fix this, move cache
initialization after the top up with blocks. This has been observed on
Goerli by initializing from 7919502 and attempting to top up 7920111.
The block gets rejected with an invalid state root on nodes that have
restarted after setting 7920111 as head, while it gets accepted by all
other nodes. Error message is `block: state root verification failed`.
The incorrect initialization behaviour was introduced in #4592, before
which the sync committee cache was initialized after applying blocks.
The fallback when blobless quarantine contains a block with all blobs
modifies collection while iterating, potentially asserting if reachable.
Using a second loop to process this situation resolves that.
`batchVerify`'s precondition is a non-empty signature list:
```nim
if input.len == 0:
# Spec precondition
return false
```
This means that in eras without any blocks (as has happened on Goerli),
calling it leads to era files being reported as invalid.
Using a dedicated branch for researching the effectiveness of split view
scenario handling simplifies testing and avoids having partial work on
`unstable`. If we want, we can reintroduce it under a `--debug` flag at
a later time. But for now, Goerli is a rare opoprtunity to test this,
maybe just for another week or so.
- https://github.com/status-im/infra-nimbus/pull/179
In split view situation, the canonical chain may only be served by a
tiny amount of peers, and branches may span long durations. Minority
branches may still have a large weight from attestations and should
be discovered. To assist with that, add a branch discovery module that
assists in such a situation by specifically targeting peers with unknown
histories and downloading from them, in addition to sync manager work
which handles popular branches.
There are situations where all states in the `blockchain_dag` are
occupied and cannot be borrowed.
- headState: Many assumptions in the code that it cannot be advanced
- clearanceState: Resets every time a new block gets imported, including
blocks from non-canonical branches
- epochRefState: Used even more frequently than clearanceState
This means that during the catch-up mechanic where the head state is
slowly advanced to wall clock to catch up on validator duties in the
situation where the canonical head is way behind non-canonical heads,
we cannot use any of the three existing states. In that situation,
Nimbus already consumes an increased amount of memory due to all the
`BlockRef`, fork choice states and so on, so experience is degraded.
It seems reasonable to allocate a fourth state temporarily during that
mechanic, until a new proposal could be made on the canonical chain.
Note that currently, on `unstable`, proposals _do_ happen every couple
hours because sync manager doesn't manage to discover additional heads
in a split-view scenario on Goerli. However, with the branch discovery
module, new blocks are discovered all the time, and the clearanceState
may no longer be borrowed as it is reset to different branch too often.
The extra state could also find other uses in the future, e.g., for
incremental computations as in reindexing the database, or online
collection of historical light client data.
Use the same eviction policy for blocks as already the case for blobs.
FIFO makes more sense, because it favors keeping ancestors of blocks
which need to be applied to the DAG before their children get eligible.
`eth2_network` forgets to descore peers when opening connection times
out. It only descores when opening the connection succeeds and then
there is a subsequent error. The caller cannot distinguish the cases,
so ensure that the descore is also applied if the request fails during
its initial portion.
When quarantining a block from block processor, we should also keep a
copy of its blobs. Otherwise, this involves more network roundtrips
to obtain information we already have. This is in line with how blobs
arrive from gossip and request manager sources. The existing flow does
not work when applying blocks from quarantine, which is addressed here.
Blobs are cached from gossip and other sources for all orphans, not just
those specifically tagged as `blobless`. `blobless` only means that they
are actively fetched from the network. The `MaxBlobs` should be aligned
to match `MaxOrphans`. Note that blobs are tiny compared to blocks, so
this isn't a huge memory hog.
* handle case of unreachable block in `is_optimstic` helper
When a non-canonical block is still in the DB, it can be accessed via
`BlockId`, but `BlockRef` may be unavailable if the block was not
properly cleaned when it got orphaned. Report it as optimistic.
* `template` -> `func`
When checking for `MissingParent`, it may be that the parent block was
already discovered as part of a prior run. In that case, it can be
loaded from storage and processed without having to rediscover the
entire branch from the network. This is similar to #6112 but for blocks
that are discovered via gossip / sync mgr instead of via request mgr.
* Add some duration metering.
Refactor some log statements.
Rework sync contribution deadline waiting.
Add some cancellation reporting handlers.
* Make all validator's shortLog to become validatorLog.
Optimize some logs with logScope.
* Add `raises`.
* More log statements polishing.
During sync, we can skip the `blobSidecarsByRange` request when there
are no blocks with `kzg_commitments` in the blocks data. Avoids running
into throttling from peers during long periods of non-finality.
Each individual blob currently uses as much quota from the network limit
as an entire block does, 128 items per second shared across all peers.
Blobs are 128 KB each instead of up to several MB and are simpler to
encode. There can be multiple per block (6 currently), so allow 2000
blobs per second across all peers. That decreases the cost per block
from `3125 + 3125 * blobs.len` quota (= `[3125, 21875]`) to a lower
`3125 + 200 * blobs.len` quota (= `[3125, 4325]`), accounting for the
slight increase in data transfer and encoding time.
During sync, sometimes the same block gets encountered and added to
quarantine multiple times. If its parent is already known, quarantine
incorrectly registers it as missing, leading to re-download. This can
be fixed by registering the parent's deepest missing parent recursively.
Also increase the stickiness of `missing`. We only perform 4 attempts
within ~16 seconds before giving up. Very frequently, this is not enough
and there is no progress until sync manager kicks in even on holesky.
When restarting beacon node, orphaned blocks remain in the database but
on startup, only the canonical chain as selected by fork choice loads.
When a new block is discovered that builds on top of an orphaned block,
the orphaned block is re-downloaded using sync/request manager, despite
it already being present on disk. Such queries can be answered locally
to improve discovery speed of alternate forks.
During lag spike, e.g., from state replays, peer count can temporarily
drop significantly. Should not have to wait another 60 minutes in that
situation just to be back where one started.
The `clearanceState` points to the latest resolved block, regardless of
whether that block is canonical according to fork choice. If chain is
stalled and we want to prepare for resuming validator duties, we need
a recent state according to fork choice to avoid lag spikes and missing
slot timings.
Nimbus currently stops performing validator duties if the blockchain
does not progress for `node.config.syncHorizon` slots. This means that
the chain won't recover because no new blocks are proposed. To fix that,
continue performing validator duties if no progress is registered for a
long time, and none of our peers is indicating any progress.
#6087 introduced a subtle change to `nim-web3` resulting in `Gwei` to be
serialized differently than before. Using a `distinct` type for `Gwei`
improves type safety and avoids such problems in the future.
On Goerli there are some instances of long streaks of empty epochs due
to different branches being built in parallel. They sometimes lead to
`Request for pruned historical state` logs requiring a BN restart to
resolve. Avoid that by trying to restore states from the entire non-
finalized history, to avoid losing sync in such situtions.
When a config defines a different `INACTIVITY_SCORE_RECOVERY_RATE` than
the default, `process_inactivity_updates` uses an incorrect rate ever
since #2710 when `INACTIVITY_SCORE_RECOVERY_RATE` became configurable.
When there are long periods of non-finality, `nodeIsViableForHead` has
been observed to consume significant time as it repeatedly walks the
non-finalized check graph as part of determining what heads are eligible
for fork choice. Caching the result resolves that.
Overall, it may still be better to prune fork choice more aggressively
when finality advances, to fully avoid the case specced out using the
linear scan. The current implementation is very close to spec, though,
so such a change should not be introduced without thorough testing.
The simple cache should allow significantly better performance on Goerli
while the network is still supported (Mid April).
In `block_dag` there is a max depth of 100 years configured to detect
internal inconsistencies, e.g., circular references. As `BlockRef` was
changed long ago to only reflect the non-finalized chain segment, the
theoretically supported max depth can be reduced and simplified.
We don't need the `cfg` right now, but it makes sense to have the object
passed to the clock so that the API doesn't break if we want to support
configurable `SECONDS_PER_SLOT`. As the `libnimbus_lc` library is not
yet widely used, better to add the argument now than later.
The `syncHorizon` describes the number of empty slots before the beacon
node considers itself to be out of sync. There are two places where we
currently set this to 50 slots, but it makes more sense to base it on
wall time, e.g., the 10 minutes that the default 50 are derived from.