Sync committee duties are performed by the sync committee as determined
by slot + 1. We did it correctly for individual messages, but selected
the incorrect participants for aggregate contributions for the very
first slot per period (roughly 1 per ~27 hrs on Mainnet). The faulty
participants selection code was originally introduced in #2925.
In Nim 2.0, attempting to use `Taskpool.spawn` inside `{.async.}` `proc`
leads to `Error: cannot generate destructor for generic type: Isolated`.
Add an intermediate wrapper `proc` that performs the `spawn` operation
to workaround the problem.
Add separate log topic for `block_processor` messages.
Topic named similar to the other `_processor` modules:
- `eth2_processor` --> `gossip_eth2`
- `light_client_processor` --> `gossip_lc`
- `optimistic_processor` --> `gossip_opt`
When a block is introduced to the system both via REST and gossip at the
same time, we will call `storeBlock` from two locations leading to a
dupliace check race condition as we wait for the EL.
This issue may manifest in particular when using an external block
builder that itself publishes the block onto the gossip network.
* refactor enqueue flow
* simplify calling `addBlock`
* complete request manager verifier future for blobless blocks
* re-verify parent conditions before adding block
among other things, it might have gone stale or finalized between one
call and the other
* async batch verification
When batch verification is done, the main thread is blocked reducing
concurrency.
With this PR, the new thread signalling primitive in chronos is used to
offload the full batch verification process to a separate thread
allowing the main threads to continue async operations while the other
threads verify signatures.
Similar to previous behavior, the number of ongoing batch verifications
is capped to prevent runaway resource usage.
In addition to the asynchronous processing, 3 addition changes help
drive throughput:
* A loop is used for batch accumulation: this prevents a stampede of
small batches in eager mode where both the eager and the scheduled batch
runner would pick batches off the queue, prematurely picking "fresh"
batches off the queue
* An additional small wait is introduced for small batches - this helps
create slightly larger batches which make better used of the increased
concurrency
* Up to 2 batches are scheduled to the threadpool during high pressure,
reducing startup latency for the threads
Together, these changes increase attestation verification throughput
under load up to 30%.
* fixup
* Update submodules
* fix blst build issues (and a PIC warning)
* bump
---------
Co-authored-by: Zahary Karadjov <zahary@gmail.com>
* fall back to non-fcu fork choice on epoch boundaries
* Future[bool]
* fix
* Update beacon_chain/consensus_object_pools/consensus_manager.nim
Co-authored-by: Etan Kissling <etan@status.im>
* make things consistent with Opt[void] return
---------
Co-authored-by: Etan Kissling <etan@status.im>
* Perform block pre-check before validating execution
When syncing, blocks have not been gossip-validated and are therefore
prone to trivial faults like being known-unviable, duplicate or missing
their parent.
In addition, the duplicate-block check in BlockProcessor was not
considering the quarantine flow and would therefore cause
recently-quarantined blocks to be silenty dropped when their parent
appears delaying the sync end-game and thus causing longer startup
resync time.
This PR verifies trivial conditions before performing execution
validation thus avoiding duplicates and missing parents alike.
It also ensures that the fast-sync EL mode is used for finalized blocks
even if the EL is timing out / slow to respond - this allows the CL to
complete its sync faster and switch to "normal" lock-step at the head of
the chain more quickly, thus also allowing the EL to access the latest
consensensus information earlier.
* oops
* remove unused constant
* Clarify addOrphan error/logging
addOrphan returned a bool to indicate success. Change this to a Result
so that different errors can be distinguished.
* Update beacon_chain/consensus_object_pools/block_quarantine.nim
Co-authored-by: tersec <tersec@users.noreply.github.com>
* Update beacon_chain/gossip_processing/gossip_validation.nim
---------
Co-authored-by: tersec <tersec@users.noreply.github.com>
* replace optimisticRoots table with field in BlockRef
* copyright year
* mark finalized blocks as verified on load
* Update beacon_chain/consensus_object_pools/block_dag.nim
Co-authored-by: Etan Kissling <etan@status.im>
* expand non-optimistic block checking to all pre-merge blocks; refactor markBlockVerified to use BlockRef rather than block root and remove superfluous caller in newPayload path replaced by addResolvedHeadBlock BlockRef construction
* don't treat finalized block specially; VALID status is sticky
---------
Co-authored-by: Etan Kissling <etan@status.im>
When doing sync for blocks older than
MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS, we skip the blobs by range
request, but we then pass en empty blob sequence to
validation, which then fails.
To fix this: Use an Option[Blobsidecars] to allow expressing the
distinction between "empty blob sequence" and "blobs unavailable". Use
the latter for "old" blocks, and don't attempt to run blob validation.
`SyncCommitteeMsgPool` grouped messages by their `beacon_block_root`.
This is problematic around sync committee period boundaries and forks.
Around sync committee period boundaries, members from both the current
and next sync committee may sign the same `beacon_block_root`; mixing
the signatures from both committees together is a mistake. Likewise,
around fork transitions, the `signing_root` changes, so those messages
also need to be segregated.
The `SignedContributionAndProof: invalid contribution signature` check
is sometimes hit around fork boundaries when running local testnet.
To avoid failing CI, revert this isntance to a plain `errReject` until
the underlying problem is addressed.
Updates gossip validation spec references to v1.3.0 and fixes an
incorrect reference to "signed_aggregate_and_proof" in sync contribution
documentation.
Fail local testnets on any gossip REJECT, instead of just asserting some
of the attestation related checks. This now also ensures that blocks,
BLS to Execution changes, blob sidecars and LC messages are checked
when running in a local testnet environment (`--verify-finalization`).
https://github.com/status-im/nimbus-eth2/pull/2904#discussion_r719603935