* use `PayloadAttributesV3` in `nimbus_light_client` for Deneb
From Deneb onward, `forkchoiceUpdated` requires `PayloadAttributesV3`.
In `nimbus_light_client` we still used `PayloadAttributesV2`.
Also clean up two other locations that were already correctly using
`PayloadAttributesV3`, to reduce code duplication.
* fix letter case
* avoid potentially subtle template/function symbol name interactions
* use warn instead of error in getExecutionPayload codepath to ensure lack of ambiguity
When the BN exits after writing new `head` to database, but before
completing the `updateFinalizedBlocks` call, the database is slightly
inconsistent due to the partial write. We currently fail to start up
after that. Fix that by catching up on partial `updateFinalizedBlocks`
tasks on start up, and add a test for this edge case.
Simplify best `LightClientUpdate` collection by tracking only canonical
data instead of tracking the best update across all branches within the
sync committee period.
- https://github.com/ethereum/consensus-specs/pull/3553
* reorder gossip validation checks
Doing the coverage check only after the corresponding committee index is
known allows optimization by early rejecting invalid data.
* use same helper for individual attestations as well
This PR brings down the time to send 100 attestations from ~1s to
~100ms, making it feasible to run 10k validators on a single node (which
regularly send 300 attestations / slot).
This is done by batching the slashing protection database write in a
single transaction thus avoiding a slow fsync for every signature -
effects will be more pronounced on slow drives.
The benefit applies both to beacon and client validators.
When creating new LC updates, information about the parent block's post
state must be available (cached), but information about current block's
post state is not yet required. Caching information about the current
block's post state can be delayed, simplifying the LC data collection
logic a bit and allowing more future flexibility with the cache design.
The proposed change ensures that VC validators are registered with the builder specified by the `--payload-builder-url` argument even if the beacon node has no attached validators. It also prevent such validators from being unintentionally registered with builders configured for specific attached validators by the keymanager api.
Per the current specs, the VC have no way to specify which builders the BN should use on a per-node basis, so for the time being we have to resort to using the BN fallback default builder URL for VC validators.
When new finality is reached without supermajority sync committee
support, trigger another event push on beacon-API and libp2p once
the finality gains supermajority support.
- https://github.com/ethereum/consensus-specs/pull/3549
Replace sections that need to be maintained with every `ConsensusFork`
related to LC data collection with a generic logic that keeps working
when unrelated parts of Ethereum change.
* Initial commit.
* Fix issues and tests.
* Fix test compilation issue.
* Update AllTests.
* Change the most poor score name from <lowest> to <bad>.
Split sync committee message score in range, so lexicographic scores will not intersect with normal one.
Lexicographic scores should be below to normal scores.
* Address review comments.
Fix aggregated attestation scoring to use MAX_VALIDATORS_PER_COMMITTEE.
Fix sync committee contributions to use SYNC_SUBCOMMITTEE_SIZE.
Add getUniqueVotes test vectors.
* Post-rebase fixes.
* Address review comments.
* Return back score calculation based on actual bits length.
* AllTests modification.
Gnosis uses `MIN_EPOCHS_FOR_BLOCK_REQUESTS` = 33024, but the computed
safe minimum (that Nimbus was using) is 2304. Relax the compatibility
check to allow `MIN_EPOCHS_FOR_BLOCK_REQUESTS` above the safe minimum
and honor `config.yaml` preferences for `MIN_EPOCHS_FOR_BLOCK_REQUESTS`.
* make `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS` configurable
Gnosis uses custom `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS` to account
for the faster slot timing, so that blobs still remain available for
roughly the same amount of real time.
Also extend REST config endpoint with full config form `v1.4.0-beta.4`,
and extend compatibility checks when loading configs to reduce warnings.
* Add slashing database pruning to VC.
Fix GetBlockHeaderResponse object declaration (spec has been changed).
* Switch to getFinalizedBlockHeader instead.
* Fix proper sign.
Add statements.
Show pruning log statement only when pruning happens.
* Optimize and remove debugging helpers.
In `v1.4.0-alpha.0`, the blob index validation on gossip was changed to
use `compute_subnet_for_blob_sidecar` instead of having a separate topic
for each individual blob. We updated the spec reference in #5363 without
updating the code accordingly. Fixing this now, and also adding the new
`MAX_BLOBS_PER_BLOCK` check from `v1.4.0-beta.3` as it shares the theme.
`v1.4.0-beta.4` made the Gossip rules more strict and now requires to
ignore blobs from other branches if there are equivocating blocks.
Those blobs are only requestable via Req/Resp.
Fix regression from #4808 where blobs that are already known are issued
ACCEPT verdict, propagating them to peers over and over again.
`validateBlobSidecar` contains the correct IGNORE logic. Moved it above
the expensive checks to retain the performance of the check.
Capacity should be set to theoretical limit to ensure correct hash root.
Actual length may be shorter. Only use is `ExecutionPayloadForSigning`
so it doesn't matter yet in practice, but still worth fixing.
Using trusted node sync currently requires to run two commands -
first the `trustedNodeSync` command to initialize the database,
followed by the regular startup command to continue syncing.
The `trustedNodeSync` options are now also available during regular
startup, and are used when the database is empty to simplify setting up
a new Nimbus beacon node. This also aligns behaviour closer with other
Ethereum consensus implementations.
The new logic only applies if the database has not yet been initialized;
same as before. Also, the database needs to be removed under the same
conditions as before when a fresh sync is desired.
Move KZG trusted setup initialization before `BeaconNode.init` to avoid
edge case where network message is received and processed before crypto
library has been properly initialized. Followup from #4870.
To prepare for calling trusted node sync from the main Nimbus startup
logic, extract the trusted node sync trigger into a separate function.
Further allow passing a pre-opened database, as that will be needed to
check whether trusted node sync needs to be called during regular start.
* Bump nim-eth: Change block timestamp from std.Time to distinct uint64
Also change tx.maxFeePerBlobGas from GasInt to UInt256
following Cancun latest spec
* Fix EthTime.now from func to proc due to sideeffects
* ShufflingRef approach to next-epoch validator duty calculation/prediction
* refactor action_tracker.updateActions to take ShufflingRef + beacon_proposers; refactor maybeUpdateActionTrackerNextEpoch to be separate and reused function; add actual fallback logic
* document one possible set of conditions
* check epoch participation flags and inactivity scores to ensure no penalties and MAX_EFFECTIVE_BALANCE to ensure rewards don't matter
* correctly (un)shuffle each proposer index
* remove debugging assertion
The templates for `BeaconBlock`, `BeaconBlockBody` and `BeaconState`
are the only ones using a `macro` mechanism for code generation.
This prevents using the dot-syntax style `consensusFork.BeaconFoo`
in some situations, and also tends to trigger naming conflicts,
requiring the `Type` suffix. Furthermore, the `macro` only works
for types that are re-defined in every single `ConsensusFork`.
Replacing with the simpler but more verbose approach used for other
types for consistency and to avoid the downsides of the `macro`.
Furthermore, simplify `test_fixture_sanity_blocks` to use `forks` sugar.
Directly initialize `ForkedLightClientObj` instead of separately first
setting the `kind` (initializing everything to zero) and then assigning
the forky data after that.
Cleanup of `ProveField` warnings in `slashing_protection_common` module.
Note that `ProveField` is disabled by default in makefile, but sometimes
these pop up when doing a regular `nim c`, and cleaning these may allow
enabling the warning in some future.
As the `case` is over `a.kind`, `ProveField` warnings for `b` have to be
suppressed using `{.push.}` / `{.pop.}` for comparison operators.
Spurious `ProveField` warning can be avoided by using `case` instead of
`if` on `metadata.genesis.kind`. Also suppress `GlobalVar` hints when
`incbinEnabled` is used, which has global `let` definitions.
`Eth2NetworkMetadata` has an `incompatible` case to hold an error string
in case the loaded file is not compatible with the compile-time config.
The same can be modeled with a `Result[Eth2NetworkMetadata, string]` and
avoids followup checks for the `incompatible` case.
We use file descriptors for validators and sockets and might run out of
either on high-validator setups - increasing the limit here is harmless
and avoids a common limiting factor in setup
Co-authored-by: Etan Kissling <etan@status.im>
For symmetry with `forkyState` when using `withState`, and to avoid
problems with shadowing of `blck` when using `withBlck` in `template`,
also rename the injected `blck` to `forkyBlck`.
- https://github.com/nim-lang/Nim/issues/22698
* implement EIP-7514 for Deneb: Add Max Epoch Churn Limit
Cap activations per epoch according to EIP-7514:
- https://eips.ethereum.org/EIPS/eip-7514
- https://github.com/ethereum/consensus-specs/pull/3499
* apply proposer boost to first block in case of equivocation
Implement spec changes to fork choice; this only affects equivocation
when multiple blocks are signed for the same slot. Regular operation
is not changed.
- https://github.com/ethereum/consensus-specs/pull/3352
* bump test vectors to v1.4.0-beta.2-hotfix
---------
Co-authored-by: tersec <tersec@users.noreply.github.com>
Currently, passing `0xc00000...` proof seems to pass `verifyProofs`.
Unsure why such a check is not necessary in spec, and also unsure
whether it is correct to reject proof at infinity, or if it could
occur, e.g., for a blob containing all 0 bytes. Weird overall...
* proper fix
We currently compute `justified_total_active_balance` inside
`calculateProposerBoost`, despite that sum already being known
in the `EpochRef` cache. Tracking `justified_total_active_balance`
whenever the justified checkpoint updates allows replacing the
repeated computation with a lookup, at minimal memory cost.
Currently we always request duties for current and next sync period.
As sync periods are quite long (~27 hrs on Mainnet), having access to
the duties so early doesn't help too much. To avoid running into errors
when the BN does not have the duties available around period boundary,
delay requesting them until the current period is close to finish.
`SYNC_COMMITTEE_SUBNET_COUNT` epochs are what the spec says should be
the lookahead timing of starting to subscribe to sync committee gossip.
Reusing the constant here for consistency.
This fixes these warning messages in the first slot of a new period.
```
rocketpool_validator | WRN 2023-09-07 20:19:35.439+00:00 Beacon node has incompatible configuration reason="Epoch value is far from the future;400;getSyncCommitteeDuties(first);invalid-request" node=http://eth2:5052[Nimbus/v23.8.0-872b19-stateofus] node_index=0 node_roles=AGBSDT
rocketpool_validator | WRN 2023-09-07 20:19:35.440+00:00 Unable to get sync committee duties period=889 epoch=227584 reason="Epoch value is far from the future;400;getSyncCommitteeDuties(first);invalid-request" service=duties_service
rocketpool_validator | NOT 2023-09-07 20:19:35.441+00:00 Beacon node is in sync head_slot=7274495 sync_distance=1 is_optimistic=false node=http://eth2:5052[Nimbus/v23.8.0-872b19-stateofus] node_index=0 node_roles=AGBSDT
```
Sync committee duties are performed by the sync committee as determined
by slot + 1. We did it correctly for individual messages, but selected
the incorrect participants for aggregate contributions for the very
first slot per period (roughly 1 per ~27 hrs on Mainnet). The faulty
participants selection code was originally introduced in #2925.
To allow testing https://github.com/ethereum/consensus-specs/issues/3466
add support for selecting fork choice version at launch. This means we
can deploy a different logic when `DENEB_FORK_EPOCH != FAR_FUTURE_EPOCH`
that won't be used on Mainnet.
In `Slot end`, sync duties are reported one slot too late,
because sync committee duties are determined by the next slot
instead of the current one. Address that by taking account of this.
* Add metadata for the Holesky network
* Add copyright banner to the new Nim module
* Working version
* Bump Chronos to fix downloading from Github
* Add checksum check of the downloaded file
* Clean up debugging code and obsolete imports
* optimize epoch transition via get_flag_index_deltas() and get_inactivity_penalty_deltas()
* directly mutate Altair info in {get,update}_flag_and_inactivity_deltas
The `is_next_sync_committee_known` helper allocates a fresh
`SyncCommittee` in the caller and `nimZeroMem`s it on each use.
Use `static` to compare against a compile-time zeroed copy instead.
This also should help reduce stack size far enough to link with Nim 2.0.
Nim 2.0 gets confused when compiling `all_tests`:
```
Error: undeclared identifier: 'maxLen'
candidates (edit distance, scope distance); see '--spellSuggest':
(3, 7): 'Table'
(3, 7): 'len'
(3, 7): 'max'
```
Renaming the generic parameter `U` to `maxLen` fixes this somehow.
It also increases readability to use the same name consistently.
In Nim 2.0, attempting to use `Taskpool.spawn` inside `{.async.}` `proc`
leads to `Error: cannot generate destructor for generic type: Isolated`.
Add an intermediate wrapper `proc` that performs the `spawn` operation
to workaround the problem.
From old interop tests, a mock `eth1BlockHash` was defined in `base`.
To avoid accidental use by Nimbus, move to `tests` and rename it to
`mockEth1BlockHash`.
This PR renames the existing `validator_duties` to `beacon_validators`
and in doing so, names validators running inside the beacon node process
"beacon validators" while those running the VC can be referred to as
"client validators" to disambiguate the two.
The existing `validator_duties` instead takes on a new responsibility:
as a home for logic shared between beacon and client validators - ie
code that provides consistency in implementation and behavior between
the two modes of operation.
Not only does this simplify reasoning about where to put code -it also
reduces the number of dependencies the validator client has from ~5000
to ~3000 modules (!) according to `nim genDepend` significantly reducing
compile times.
- Remove unnecessary `Defect` references
- Remove spurious `SerializationError` references
- Remove duplicate `writeValue` template in `keystore.nim`;
same implementation already exists a bit further above in same file.
Add separate log topic for `block_processor` messages.
Topic named similar to the other `_processor` modules:
- `eth2_processor` --> `gossip_eth2`
- `light_client_processor` --> `gossip_lc`
- `optimistic_processor` --> `gossip_opt`
In #4465, a regression was introduced by deleting the periodic call to
`engine_exchangeTransitionConfiguration` in the Nimbus light client.
This led to the light client never treating the EL as online and,
subsequently, not sending `engine_newPayload` requests for new blocks.
Strangely, `engine_forkchoiceUpdated` requests still make it through :)
Geth still requires both `engine_newPayload` and `fcU` to be called.
By restoring the `exchangeTransitionConfiguration` loop, `newPayload`
requests are once more issued to the EL.
When a block is introduced to the system both via REST and gossip at the
same time, we will call `storeBlock` from two locations leading to a
dupliace check race condition as we wait for the EL.
This issue may manifest in particular when using an external block
builder that itself publishes the block onto the gossip network.
* refactor enqueue flow
* simplify calling `addBlock`
* complete request manager verifier future for blobless blocks
* re-verify parent conditions before adding block
among other things, it might have gone stale or finalized between one
call and the other
every attestation is processed with a new wall time so we end up
iterating over all attestations for every attestation we queue - this is
4% of cpu time on a subscribe-all-subnets node
* remove redundant zero checks - block root must be an existing block
and therefore cannot be zero
* simplify "hasn't-voted" check to root only (isZeroMemory is dubiously
implemented for objects)
* Add support for POST /eth/v2/beacon/blocks
* More descriptive errors
* Address review feedback
* Return 500 (not 400) for a missing implementation case
We know the aggregate publickey of a fully participating sync committee.
Because participation is typically very high (>95%), it is faster to
start from that aggregate publickey and subtract the individual keys of
non-participants, than summing up all the participating pubkeys.
To obtain the correct `transactions_root` and `withdrawals_root`,
it is necessary to process execution block header. Light client updates
don't contain the correct MPT roots.
Using `create` with objects containing managed objects is broken:
https://github.com/nim-lang/Nim/issues/22341
Switch to a safer pattern based on `new+GC_ref`/`GC_unref` instead.
* async batch verification
When batch verification is done, the main thread is blocked reducing
concurrency.
With this PR, the new thread signalling primitive in chronos is used to
offload the full batch verification process to a separate thread
allowing the main threads to continue async operations while the other
threads verify signatures.
Similar to previous behavior, the number of ongoing batch verifications
is capped to prevent runaway resource usage.
In addition to the asynchronous processing, 3 addition changes help
drive throughput:
* A loop is used for batch accumulation: this prevents a stampede of
small batches in eager mode where both the eager and the scheduled batch
runner would pick batches off the queue, prematurely picking "fresh"
batches off the queue
* An additional small wait is introduced for small batches - this helps
create slightly larger batches which make better used of the increased
concurrency
* Up to 2 batches are scheduled to the threadpool during high pressure,
reducing startup latency for the threads
Together, these changes increase attestation verification throughput
under load up to 30%.
* fixup
* Update submodules
* fix blst build issues (and a PIC warning)
* bump
---------
Co-authored-by: Zahary Karadjov <zahary@gmail.com>
This PR removes a few hundred thousand temporary seq allocations during
state transition - in particular, the flag seq was allocated per
validator while committees are computed per attestation.
Split up the `ShufflingRef` acceleration logic into generically usable
parts and attester shuffling specific parts. The generic parts could be
used to accelerate other purposes, e.g., REST `/states/xxx/randao` API.
* speed up state/block loading
When loading blocks and states from db/era, we currently redundantly
check their CRC32 - for a state, this costs 50ms of loading time
presently (110mb uncompressed size) on a decent laptop.
* remove `maxDecompressedDbRecordSize` - not actually used on recent
data since we store the framed format - also, we're in luck: we blew
past the limit quite some time ago
* fix obsolete exception-based error checking
* avoid `zeroMem` when reading from era store
see https://github.com/status-im/nim-snappy/pull/22 for benchmarks
* bump snappy
* Add new REST endpoints to monitor REST server connections and new chronos metrics.
* Bump head versions of chronos and presto.
* Bump chronos with regression fix.
* Remove outdated tests which was supposed to test pipeline mode.
* Disable pipeline mode in resttest.
* Update copyright year.
* Upgrade test_signing_node to start use AsyncProcess instead of std library's osproc.
Bump chronos to check graceful shutdown.
* Update AllTests.
* Bump chronos.
Split up the `ShufflingRef` acceleration logic into generically usable
parts and attester shuffling specific parts. The generic parts could be
used to accelerate other purposes, e.g., REST `/states/xxx/randao` API.
When producing a local block and `runProposalForkchoiceUpdated` was
missed, `getPayloadFromSingleEL` adds additional ~500ms of latency.
Quick fix to avoid missing blocks to that.
We currently call `onSlotEnd` whenever all in-BN validator duties are
completed. VC validator duties are not awaited. When `onSlotEnd` is
processed close to the slot start, a VC may therefore miss duties.
Adding a delay before `onSlotEnd` improves this situation.
The logic can be optimized further if `ActionTracker` would track
`knownValidators` from REST separately from in-process ones.
To enable additional use cases, e.g., `/states/###/randao` beacon API,
`ShufflingRef` acceleration logic needs to be able to operate on parts
of the DAG that do not have `BlockRef`. Changing `commonAncestor` to
act on `BlockId` instead of `BlockRef` is a step toward that and also
simplifies the logic some more.
* fall back to non-fcu fork choice on epoch boundaries
* Future[bool]
* fix
* Update beacon_chain/consensus_object_pools/consensus_manager.nim
Co-authored-by: Etan Kissling <etan@status.im>
* make things consistent with Opt[void] return
---------
Co-authored-by: Etan Kissling <etan@status.im>
Post-merge blocks contain all information to directly obtain RANDAO
without having to load any additional info. Take advantage of that to
further accelerate `ShufflingRef` computation. Note that it is still
necessary to verify that `blck` / `state` share a sufficiently recent
ancestor for the purpose of computing attester shufflings.
- new: 243.71s, 239.67s, 237.32s, 238.36s, 239.57s
- old: 251.33s, 234.29s, 249.28s, 237.03s, 236.78s
Current RANDAO recovery logic is quite complex as it optimizes for the
minimum amount of database reads. Loading blocks isn't the bottleneck
though, so rather make the implementation more concise by avoiding the
complex strategy planning step. Note that this also prepares for an even
faster implementation for post-merge blocks in the future that extracts
RANDAO from `ExecutionPayload` directly if available, so even in cases
where efficiency is slightly lower, only historical data is affected.
`time nim c -r tests/test_blockchain_dag` (cached binary):
- new: 145.45s, 133.59s, 144.65s, 127.69s, 136.14s
- old: 149.15s, 150.84s, 135.77s, 137.49s, 133.89s
* Perform block pre-check before validating execution
When syncing, blocks have not been gossip-validated and are therefore
prone to trivial faults like being known-unviable, duplicate or missing
their parent.
In addition, the duplicate-block check in BlockProcessor was not
considering the quarantine flow and would therefore cause
recently-quarantined blocks to be silenty dropped when their parent
appears delaying the sync end-game and thus causing longer startup
resync time.
This PR verifies trivial conditions before performing execution
validation thus avoiding duplicates and missing parents alike.
It also ensures that the fast-sync EL mode is used for finalized blocks
even if the EL is timing out / slow to respond - this allows the CL to
complete its sync faster and switch to "normal" lock-step at the head of
the chain more quickly, thus also allowing the EL to access the latest
consensensus information earlier.
* oops
* remove unused constant
When the requestmanager is busy fetching blocks, the queue might get
filled with multiple entries of the same root - since there is no
deduplication, requests containing the same root multiple times will be
sent out.
Also, because the items sit in the queue for a long time potentially,
the request might be stale by the time that the manager is ready with
the previous request.
This PR removes the queue and directly fetches the blocks to download
from the quarantine which solves both problems (the quarantine already
de-duplicates and is clean of stale information).
Removing the queue for blobs is left for a future PR.
Co-authored-by: tersec <tersec@users.noreply.github.com>
* early exit `commonAncestor` when comparing with `finalizedHead`
As all `BlockRef` lead to `finalizedHead` (`parent == nil`),
can shortcut in that situation and immediately return `finalizedHead`
if passed as one of the arguments.
* typo in comment
* add test from #5152
Co-authored-by: tersec <tersec@users.noreply.github.com>
* add note about test complexity
* regenerate test summary
---------
Co-authored-by: tersec <tersec@users.noreply.github.com>