In split view situation, the canonical chain may only be served by a
tiny amount of peers, and branches may span long durations. Minority
branches may still have a large weight from attestations and should
be discovered. To assist with that, add a branch discovery module that
assists in such a situation by specifically targeting peers with unknown
histories and downloading from them, in addition to sync manager work
which handles popular branches.
When quarantining a block from block processor, we should also keep a
copy of its blobs. Otherwise, this involves more network roundtrips
to obtain information we already have. This is in line with how blobs
arrive from gossip and request manager sources. The existing flow does
not work when applying blocks from quarantine, which is addressed here.
When checking for `MissingParent`, it may be that the parent block was
already discovered as part of a prior run. In that case, it can be
loaded from storage and processed without having to rediscover the
entire branch from the network. This is similar to #6112 but for blocks
that are discovered via gossip / sync mgr instead of via request mgr.
With checkpoint sync, the checkpoint block is typically unavailable at
the start, and only backfilled later. To avoid treating it as having
zero hash, execution disabled in some contexts, wrap the result of
`loadExecutionBlockHash` in `Opt` and handle block hash being unknown.
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
Finish the rename started in #4809 to have a consistent naming.
`ExecutionPayloadHash` suggests hash over payload instead of block.
`BlockHash` is also the canonical name in engine API.
Full caches should not be used to mark blocks as unviable. The unviable
status is quite persistent and a block marked as such won't be processed
again once the cache empties. Problem originally introduced in #4808.
* use `PayloadAttributesV3` in `nimbus_light_client` for Deneb
From Deneb onward, `forkchoiceUpdated` requires `PayloadAttributesV3`.
In `nimbus_light_client` we still used `PayloadAttributesV2`.
Also clean up two other locations that were already correctly using
`PayloadAttributesV3`, to reduce code duplication.
* fix letter case
For symmetry with `forkyState` when using `withState`, and to avoid
problems with shadowing of `blck` when using `withBlck` in `template`,
also rename the injected `blck` to `forkyBlck`.
- https://github.com/nim-lang/Nim/issues/22698
Add separate log topic for `block_processor` messages.
Topic named similar to the other `_processor` modules:
- `eth2_processor` --> `gossip_eth2`
- `light_client_processor` --> `gossip_lc`
- `optimistic_processor` --> `gossip_opt`
When a block is introduced to the system both via REST and gossip at the
same time, we will call `storeBlock` from two locations leading to a
dupliace check race condition as we wait for the EL.
This issue may manifest in particular when using an external block
builder that itself publishes the block onto the gossip network.
* refactor enqueue flow
* simplify calling `addBlock`
* complete request manager verifier future for blobless blocks
* re-verify parent conditions before adding block
among other things, it might have gone stale or finalized between one
call and the other
* async batch verification
When batch verification is done, the main thread is blocked reducing
concurrency.
With this PR, the new thread signalling primitive in chronos is used to
offload the full batch verification process to a separate thread
allowing the main threads to continue async operations while the other
threads verify signatures.
Similar to previous behavior, the number of ongoing batch verifications
is capped to prevent runaway resource usage.
In addition to the asynchronous processing, 3 addition changes help
drive throughput:
* A loop is used for batch accumulation: this prevents a stampede of
small batches in eager mode where both the eager and the scheduled batch
runner would pick batches off the queue, prematurely picking "fresh"
batches off the queue
* An additional small wait is introduced for small batches - this helps
create slightly larger batches which make better used of the increased
concurrency
* Up to 2 batches are scheduled to the threadpool during high pressure,
reducing startup latency for the threads
Together, these changes increase attestation verification throughput
under load up to 30%.
* fixup
* Update submodules
* fix blst build issues (and a PIC warning)
* bump
---------
Co-authored-by: Zahary Karadjov <zahary@gmail.com>
* fall back to non-fcu fork choice on epoch boundaries
* Future[bool]
* fix
* Update beacon_chain/consensus_object_pools/consensus_manager.nim
Co-authored-by: Etan Kissling <etan@status.im>
* make things consistent with Opt[void] return
---------
Co-authored-by: Etan Kissling <etan@status.im>
* Perform block pre-check before validating execution
When syncing, blocks have not been gossip-validated and are therefore
prone to trivial faults like being known-unviable, duplicate or missing
their parent.
In addition, the duplicate-block check in BlockProcessor was not
considering the quarantine flow and would therefore cause
recently-quarantined blocks to be silenty dropped when their parent
appears delaying the sync end-game and thus causing longer startup
resync time.
This PR verifies trivial conditions before performing execution
validation thus avoiding duplicates and missing parents alike.
It also ensures that the fast-sync EL mode is used for finalized blocks
even if the EL is timing out / slow to respond - this allows the CL to
complete its sync faster and switch to "normal" lock-step at the head of
the chain more quickly, thus also allowing the EL to access the latest
consensensus information earlier.
* oops
* remove unused constant
* Clarify addOrphan error/logging
addOrphan returned a bool to indicate success. Change this to a Result
so that different errors can be distinguished.
* Update beacon_chain/consensus_object_pools/block_quarantine.nim
Co-authored-by: tersec <tersec@users.noreply.github.com>
* Update beacon_chain/gossip_processing/gossip_validation.nim
---------
Co-authored-by: tersec <tersec@users.noreply.github.com>
* replace optimisticRoots table with field in BlockRef
* copyright year
* mark finalized blocks as verified on load
* Update beacon_chain/consensus_object_pools/block_dag.nim
Co-authored-by: Etan Kissling <etan@status.im>
* expand non-optimistic block checking to all pre-merge blocks; refactor markBlockVerified to use BlockRef rather than block root and remove superfluous caller in newPayload path replaced by addResolvedHeadBlock BlockRef construction
* don't treat finalized block specially; VALID status is sticky
---------
Co-authored-by: Etan Kissling <etan@status.im>
When doing sync for blocks older than
MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS, we skip the blobs by range
request, but we then pass en empty blob sequence to
validation, which then fails.
To fix this: Use an Option[Blobsidecars] to allow expressing the
distinction between "empty blob sequence" and "blobs unavailable". Use
the latter for "old" blocks, and don't attempt to run blob validation.
Updates gossip validation spec references to v1.3.0 and fixes an
incorrect reference to "signed_aggregate_and_proof" in sync contribution
documentation.
* low attestations during epoch should instafail in CI; dbg -> warn level on newPayload log
* improve newPayload warning message when no valid EL connected
* reduce potential spam; make log spelling more consistent; use fatal/quit
* Simplify block quarantine blobless
The quarantine blobless table was initially keyed off of (Eth2Digest,
ValidatorSig). This was modelled off the orphan table. The presence of
the signature in the key is necessary for orphans, because we can't
verify the signature for an orphan. That is not the case for a
blobless block, where the signature can be verified.
So this PR changes the blobless block table to be keyed off a
Eth2Digest only. This simplifies the retrieval and handling of
blobless blocks.
* review feedback
Just the variable, not yet `lcDataForkAtStateFork` / `atStateFork`.
- Shorten comment in `light_client.nim` to keep line width
- Do not rename `stateFork` mention in `runProposalForkchoiceUpdated`.
- Do not rename `stateFork` in `getStateField(dag.headState, fork)`
Rest is just a mechanical mass replace
* Update sync to use post-decoupling RPCs
blob_sidecars_by_range returns a flat list of sidecars, which must
then be grouped per-slot.
* Add test for groupBlobs
* createBlobs: convert proc to func
* Support for driving multiple EL nodes from a single Nimbus BN
Full list of changes:
* Eth1Monitor has been renamed to ELManager to match its current
responsibilities better.
* The ELManager is no longer optional in the code (it won't have
a nil value under any circumstances).
* The support for subscribing for headers was removed as it only
worked with WebSockets and contributed significant complexity
while bringing only a very minor advantage.
* The `--web3-url` parameter has been deprecated in favor of a
new `--el` parameter. The new parameter has a reasonable default
value and supports specifying a different JWT for each connection.
Each connection can also be configured with a different set of
responsibilities (e.g. download deposits, validate blocks and/or
produce blocks). On the command-line, these properties can be
configured through URL properties stored in the #anchor part of
the URL. In TOML files, they come with a very natural syntax
(althrough the URL scheme is also supported).
* The previously scattered EL-related state and logic is now moved
to `eth1_monitor.nim` (this module will be renamed to `el_manager.nim`
in a follow-up commit). State is assigned properly either to the
`ELManager` or the to individual `ELConnection` objects where
appropriate.
The ELManager executes all Engine API requests against all attached
EL nodes, in parallel. It compares their results and if there is a
disagreement regarding the validity of a certain payload, this is
detected and the beacon node is protected from publishing a block
with a potential execution layer consensus bug in it.
The BN provides metrics per EL node for the number of successful or
failed requests for each type Engine API requests. If an EL node
goes offline and connectivity is resoted later, we report the
problem and the remedy in edge-triggered fashion.
* More progress towards implementing Deneb block production in the VC
and comparing the value of blocks produced by the EL and the builder
API.
* Adds a Makefile target for the zhejiang testnet
This commit removes ForkySignedBeaconBlockMaybeBlobs and all
references. I tried to pull that thread only as little as was needed
to get rid of it. Left a placeholder BlobSidecar array (in lieu of
Opt[BlobsSidecar]) in a few places; this will be used as we rebuild
the decoupled implementation.
While syncing the finalized portion of the chain, the execution client
cannot efficiently sync and most of the time returns `SYNCING` - in this
PR, we use CL-verified optmistic sync as long as the block is claimed to
be finalized, only occasionally updating the EL with progress.
Although a peer might lie about what is finalized and what isn't,
eventually we'll call the execution client - thus, all a dishonest
client can do is delay execution verification slightly. Gossip blocks in
particular are never assumed to be finalized.
* debug log upon sidecar validation failure
* Fill in signature catch upon SignedBeaconBlockAndBlobsSidecar deser
* Always fill blobssidecar slot and root
* Skip lastFCU when eth1monitor is nil
* fix
* Use cached root
* exit/validatorchange pool includes BLS to execution messages; REST
support for new pool
* catch failed individual futures
* increase BLS changes bound and keep BLS seen consistent with subpool
* deque capacities should be powers of 2
With https://github.com/status-im/nimbus-eth2/pull/4420 implemented, the
checks that we perform are equivalent to those of a `SYNCING` EL - as
such, we can treat missing EL the same as SYNCING and proceed with an
optimistic sync.
This mode of operation significantly speeds up recovery after an offline
EL event because the CL is already synced and can immediately inform the
EL of the latest head.
It also allows using a beacon node for consensus archival queries
without an execution client.
* deprecate `--optimistic` flag
* log block details on EL error, soften log level because we can now
continue to operate
* `UnviableFork` -> `Invalid` when block hash verification fails -
failed hash verification is not a fork-related block issue