`Eth2NetworkMetadata` has an `incompatible` case to hold an error string
in case the loaded file is not compatible with the compile-time config.
The same can be modeled with a `Result[Eth2NetworkMetadata, string]` and
avoids followup checks for the `incompatible` case.
We use file descriptors for validators and sockets and might run out of
either on high-validator setups - increasing the limit here is harmless
and avoids a common limiting factor in setup
Co-authored-by: Etan Kissling <etan@status.im>
For symmetry with `forkyState` when using `withState`, and to avoid
problems with shadowing of `blck` when using `withBlck` in `template`,
also rename the injected `blck` to `forkyBlck`.
- https://github.com/nim-lang/Nim/issues/22698
* implement EIP-7514 for Deneb: Add Max Epoch Churn Limit
Cap activations per epoch according to EIP-7514:
- https://eips.ethereum.org/EIPS/eip-7514
- https://github.com/ethereum/consensus-specs/pull/3499
* apply proposer boost to first block in case of equivocation
Implement spec changes to fork choice; this only affects equivocation
when multiple blocks are signed for the same slot. Regular operation
is not changed.
- https://github.com/ethereum/consensus-specs/pull/3352
* bump test vectors to v1.4.0-beta.2-hotfix
---------
Co-authored-by: tersec <tersec@users.noreply.github.com>
Currently, passing `0xc00000...` proof seems to pass `verifyProofs`.
Unsure why such a check is not necessary in spec, and also unsure
whether it is correct to reject proof at infinity, or if it could
occur, e.g., for a blob containing all 0 bytes. Weird overall...
* proper fix
We currently compute `justified_total_active_balance` inside
`calculateProposerBoost`, despite that sum already being known
in the `EpochRef` cache. Tracking `justified_total_active_balance`
whenever the justified checkpoint updates allows replacing the
repeated computation with a lookup, at minimal memory cost.
Currently we always request duties for current and next sync period.
As sync periods are quite long (~27 hrs on Mainnet), having access to
the duties so early doesn't help too much. To avoid running into errors
when the BN does not have the duties available around period boundary,
delay requesting them until the current period is close to finish.
`SYNC_COMMITTEE_SUBNET_COUNT` epochs are what the spec says should be
the lookahead timing of starting to subscribe to sync committee gossip.
Reusing the constant here for consistency.
This fixes these warning messages in the first slot of a new period.
```
rocketpool_validator | WRN 2023-09-07 20:19:35.439+00:00 Beacon node has incompatible configuration reason="Epoch value is far from the future;400;getSyncCommitteeDuties(first);invalid-request" node=http://eth2:5052[Nimbus/v23.8.0-872b19-stateofus] node_index=0 node_roles=AGBSDT
rocketpool_validator | WRN 2023-09-07 20:19:35.440+00:00 Unable to get sync committee duties period=889 epoch=227584 reason="Epoch value is far from the future;400;getSyncCommitteeDuties(first);invalid-request" service=duties_service
rocketpool_validator | NOT 2023-09-07 20:19:35.441+00:00 Beacon node is in sync head_slot=7274495 sync_distance=1 is_optimistic=false node=http://eth2:5052[Nimbus/v23.8.0-872b19-stateofus] node_index=0 node_roles=AGBSDT
```
Sync committee duties are performed by the sync committee as determined
by slot + 1. We did it correctly for individual messages, but selected
the incorrect participants for aggregate contributions for the very
first slot per period (roughly 1 per ~27 hrs on Mainnet). The faulty
participants selection code was originally introduced in #2925.
To allow testing https://github.com/ethereum/consensus-specs/issues/3466
add support for selecting fork choice version at launch. This means we
can deploy a different logic when `DENEB_FORK_EPOCH != FAR_FUTURE_EPOCH`
that won't be used on Mainnet.
In `Slot end`, sync duties are reported one slot too late,
because sync committee duties are determined by the next slot
instead of the current one. Address that by taking account of this.
* Add metadata for the Holesky network
* Add copyright banner to the new Nim module
* Working version
* Bump Chronos to fix downloading from Github
* Add checksum check of the downloaded file
* Clean up debugging code and obsolete imports
* optimize epoch transition via get_flag_index_deltas() and get_inactivity_penalty_deltas()
* directly mutate Altair info in {get,update}_flag_and_inactivity_deltas
The `is_next_sync_committee_known` helper allocates a fresh
`SyncCommittee` in the caller and `nimZeroMem`s it on each use.
Use `static` to compare against a compile-time zeroed copy instead.
This also should help reduce stack size far enough to link with Nim 2.0.
Nim 2.0 gets confused when compiling `all_tests`:
```
Error: undeclared identifier: 'maxLen'
candidates (edit distance, scope distance); see '--spellSuggest':
(3, 7): 'Table'
(3, 7): 'len'
(3, 7): 'max'
```
Renaming the generic parameter `U` to `maxLen` fixes this somehow.
It also increases readability to use the same name consistently.
In Nim 2.0, attempting to use `Taskpool.spawn` inside `{.async.}` `proc`
leads to `Error: cannot generate destructor for generic type: Isolated`.
Add an intermediate wrapper `proc` that performs the `spawn` operation
to workaround the problem.
From old interop tests, a mock `eth1BlockHash` was defined in `base`.
To avoid accidental use by Nimbus, move to `tests` and rename it to
`mockEth1BlockHash`.
This PR renames the existing `validator_duties` to `beacon_validators`
and in doing so, names validators running inside the beacon node process
"beacon validators" while those running the VC can be referred to as
"client validators" to disambiguate the two.
The existing `validator_duties` instead takes on a new responsibility:
as a home for logic shared between beacon and client validators - ie
code that provides consistency in implementation and behavior between
the two modes of operation.
Not only does this simplify reasoning about where to put code -it also
reduces the number of dependencies the validator client has from ~5000
to ~3000 modules (!) according to `nim genDepend` significantly reducing
compile times.
- Remove unnecessary `Defect` references
- Remove spurious `SerializationError` references
- Remove duplicate `writeValue` template in `keystore.nim`;
same implementation already exists a bit further above in same file.
Add separate log topic for `block_processor` messages.
Topic named similar to the other `_processor` modules:
- `eth2_processor` --> `gossip_eth2`
- `light_client_processor` --> `gossip_lc`
- `optimistic_processor` --> `gossip_opt`
In #4465, a regression was introduced by deleting the periodic call to
`engine_exchangeTransitionConfiguration` in the Nimbus light client.
This led to the light client never treating the EL as online and,
subsequently, not sending `engine_newPayload` requests for new blocks.
Strangely, `engine_forkchoiceUpdated` requests still make it through :)
Geth still requires both `engine_newPayload` and `fcU` to be called.
By restoring the `exchangeTransitionConfiguration` loop, `newPayload`
requests are once more issued to the EL.
When a block is introduced to the system both via REST and gossip at the
same time, we will call `storeBlock` from two locations leading to a
dupliace check race condition as we wait for the EL.
This issue may manifest in particular when using an external block
builder that itself publishes the block onto the gossip network.
* refactor enqueue flow
* simplify calling `addBlock`
* complete request manager verifier future for blobless blocks
* re-verify parent conditions before adding block
among other things, it might have gone stale or finalized between one
call and the other
every attestation is processed with a new wall time so we end up
iterating over all attestations for every attestation we queue - this is
4% of cpu time on a subscribe-all-subnets node
* remove redundant zero checks - block root must be an existing block
and therefore cannot be zero
* simplify "hasn't-voted" check to root only (isZeroMemory is dubiously
implemented for objects)
* Add support for POST /eth/v2/beacon/blocks
* More descriptive errors
* Address review feedback
* Return 500 (not 400) for a missing implementation case
We know the aggregate publickey of a fully participating sync committee.
Because participation is typically very high (>95%), it is faster to
start from that aggregate publickey and subtract the individual keys of
non-participants, than summing up all the participating pubkeys.
To obtain the correct `transactions_root` and `withdrawals_root`,
it is necessary to process execution block header. Light client updates
don't contain the correct MPT roots.
Using `create` with objects containing managed objects is broken:
https://github.com/nim-lang/Nim/issues/22341
Switch to a safer pattern based on `new+GC_ref`/`GC_unref` instead.
* async batch verification
When batch verification is done, the main thread is blocked reducing
concurrency.
With this PR, the new thread signalling primitive in chronos is used to
offload the full batch verification process to a separate thread
allowing the main threads to continue async operations while the other
threads verify signatures.
Similar to previous behavior, the number of ongoing batch verifications
is capped to prevent runaway resource usage.
In addition to the asynchronous processing, 3 addition changes help
drive throughput:
* A loop is used for batch accumulation: this prevents a stampede of
small batches in eager mode where both the eager and the scheduled batch
runner would pick batches off the queue, prematurely picking "fresh"
batches off the queue
* An additional small wait is introduced for small batches - this helps
create slightly larger batches which make better used of the increased
concurrency
* Up to 2 batches are scheduled to the threadpool during high pressure,
reducing startup latency for the threads
Together, these changes increase attestation verification throughput
under load up to 30%.
* fixup
* Update submodules
* fix blst build issues (and a PIC warning)
* bump
---------
Co-authored-by: Zahary Karadjov <zahary@gmail.com>
This PR removes a few hundred thousand temporary seq allocations during
state transition - in particular, the flag seq was allocated per
validator while committees are computed per attestation.
Split up the `ShufflingRef` acceleration logic into generically usable
parts and attester shuffling specific parts. The generic parts could be
used to accelerate other purposes, e.g., REST `/states/xxx/randao` API.
* speed up state/block loading
When loading blocks and states from db/era, we currently redundantly
check their CRC32 - for a state, this costs 50ms of loading time
presently (110mb uncompressed size) on a decent laptop.
* remove `maxDecompressedDbRecordSize` - not actually used on recent
data since we store the framed format - also, we're in luck: we blew
past the limit quite some time ago
* fix obsolete exception-based error checking
* avoid `zeroMem` when reading from era store
see https://github.com/status-im/nim-snappy/pull/22 for benchmarks
* bump snappy
* Add new REST endpoints to monitor REST server connections and new chronos metrics.
* Bump head versions of chronos and presto.
* Bump chronos with regression fix.
* Remove outdated tests which was supposed to test pipeline mode.
* Disable pipeline mode in resttest.
* Update copyright year.
* Upgrade test_signing_node to start use AsyncProcess instead of std library's osproc.
Bump chronos to check graceful shutdown.
* Update AllTests.
* Bump chronos.
Split up the `ShufflingRef` acceleration logic into generically usable
parts and attester shuffling specific parts. The generic parts could be
used to accelerate other purposes, e.g., REST `/states/xxx/randao` API.
When producing a local block and `runProposalForkchoiceUpdated` was
missed, `getPayloadFromSingleEL` adds additional ~500ms of latency.
Quick fix to avoid missing blocks to that.
We currently call `onSlotEnd` whenever all in-BN validator duties are
completed. VC validator duties are not awaited. When `onSlotEnd` is
processed close to the slot start, a VC may therefore miss duties.
Adding a delay before `onSlotEnd` improves this situation.
The logic can be optimized further if `ActionTracker` would track
`knownValidators` from REST separately from in-process ones.
To enable additional use cases, e.g., `/states/###/randao` beacon API,
`ShufflingRef` acceleration logic needs to be able to operate on parts
of the DAG that do not have `BlockRef`. Changing `commonAncestor` to
act on `BlockId` instead of `BlockRef` is a step toward that and also
simplifies the logic some more.
* fall back to non-fcu fork choice on epoch boundaries
* Future[bool]
* fix
* Update beacon_chain/consensus_object_pools/consensus_manager.nim
Co-authored-by: Etan Kissling <etan@status.im>
* make things consistent with Opt[void] return
---------
Co-authored-by: Etan Kissling <etan@status.im>
Post-merge blocks contain all information to directly obtain RANDAO
without having to load any additional info. Take advantage of that to
further accelerate `ShufflingRef` computation. Note that it is still
necessary to verify that `blck` / `state` share a sufficiently recent
ancestor for the purpose of computing attester shufflings.
- new: 243.71s, 239.67s, 237.32s, 238.36s, 239.57s
- old: 251.33s, 234.29s, 249.28s, 237.03s, 236.78s
Current RANDAO recovery logic is quite complex as it optimizes for the
minimum amount of database reads. Loading blocks isn't the bottleneck
though, so rather make the implementation more concise by avoiding the
complex strategy planning step. Note that this also prepares for an even
faster implementation for post-merge blocks in the future that extracts
RANDAO from `ExecutionPayload` directly if available, so even in cases
where efficiency is slightly lower, only historical data is affected.
`time nim c -r tests/test_blockchain_dag` (cached binary):
- new: 145.45s, 133.59s, 144.65s, 127.69s, 136.14s
- old: 149.15s, 150.84s, 135.77s, 137.49s, 133.89s
* Perform block pre-check before validating execution
When syncing, blocks have not been gossip-validated and are therefore
prone to trivial faults like being known-unviable, duplicate or missing
their parent.
In addition, the duplicate-block check in BlockProcessor was not
considering the quarantine flow and would therefore cause
recently-quarantined blocks to be silenty dropped when their parent
appears delaying the sync end-game and thus causing longer startup
resync time.
This PR verifies trivial conditions before performing execution
validation thus avoiding duplicates and missing parents alike.
It also ensures that the fast-sync EL mode is used for finalized blocks
even if the EL is timing out / slow to respond - this allows the CL to
complete its sync faster and switch to "normal" lock-step at the head of
the chain more quickly, thus also allowing the EL to access the latest
consensensus information earlier.
* oops
* remove unused constant
When the requestmanager is busy fetching blocks, the queue might get
filled with multiple entries of the same root - since there is no
deduplication, requests containing the same root multiple times will be
sent out.
Also, because the items sit in the queue for a long time potentially,
the request might be stale by the time that the manager is ready with
the previous request.
This PR removes the queue and directly fetches the blocks to download
from the quarantine which solves both problems (the quarantine already
de-duplicates and is clean of stale information).
Removing the queue for blobs is left for a future PR.
Co-authored-by: tersec <tersec@users.noreply.github.com>
* early exit `commonAncestor` when comparing with `finalizedHead`
As all `BlockRef` lead to `finalizedHead` (`parent == nil`),
can shortcut in that situation and immediately return `finalizedHead`
if passed as one of the arguments.
* typo in comment
* add test from #5152
Co-authored-by: tersec <tersec@users.noreply.github.com>
* add note about test complexity
* regenerate test summary
---------
Co-authored-by: tersec <tersec@users.noreply.github.com>
With `v1.6.14` there is compilation issue in `trusted_node_sync` where
a type is not inferred automatically anymore for a `nil` instance.
Fix it so we can bump the compiler.
See https://github.com/status-im/nimbus-build-system/pull/63
* Make VC able to understand any type of `/eth/v1/config/spec` response without any changes in source code.
Update compatibility checking.
Now VC is able to obtain any constant from `spec` call.
* Remove RestSpecVC declaration.
* Initial commit.
* Add algorithm in comment.
Remove delays.
Fix logging statement issues.
Change update from epoch to slot.
* Obtain timestamp earlier.
* Add processing delays into algorithm.
* Fix time offset logging to produce integers instead of strings.
* Address review comments.
* Fix copyright year.
Fix updateStatus().
* Remove fields from Slot start log statement.
Fix issues when BN do not support Nimbus Extensions.
Rename metric name and type change.
* Add beacon role to disable time offset check manually.
These tables can't be deleted from (read-only) and would be too slow to
delete from anyway due to the inefficient storage format in use.
* slow down startup clearing too
* remove unused del function
These tables can't be deleted from (read-only) and would be too slow to
delete from anyway due to the inefficient storage format in use.
* slow down startup clearing too
* remove unused del function
The helper function to compute delay until next light client sync task
can be useful from more general purpose contexts. Move to helpers, and
change it to return `Duration` instead of `BeaconTime` for flexibility.
- When syncing `LightClientUpdatesByRange`, and peer replies with
fewer periods than requested, no need to delay next request.
- When `FinalityUpdate` / `OptimisticUpdate` sync fails,
no need to retry immediately.
The `UpdatesByRange` API takes `startPeriod / count`, but is internally
called by `Slice`. Move the logic that converts from the `Slice` to the
caller to reduce complexity inside the used `doRequest` function.
We have several modules that import `nim-eth` for the sole purpose of
its `keys.newRng` function. This function is meanwhile a simple wrapper
around `nim-bearssl`'s `HmacDrbgContext.new()`, so the import doesn't
really serve a use anymore. Replace `keys.newRng` with the direct call
to reduce `nim-eth` imports.
* require sync committee supermajority in CI
To better catch problems with sync committee messages in CI, extend
local testnet simulation to also verify that each block is signed
by a supermajority of the sync committee.
Requires #5083 and #5084
* lint
`produceSyncAggregate` is called in new slot when block is produced,
while the other functions in `sync_committee_msg_pool` are called in
previous slot. So, need to subtract 1 slot when producing sync aggregate
to accept the signatures using the old digest during fork transition.
Override default Json parsing for keystore case objects to avoid
`CaseTransition` logic to be emitted. When parsing the discriminator,
reinitialize the entire object instead of implicitly changing it,
to avoid UB and also possible oversights when extending the object.
See https://github.com/status-im/nim-serialization/pull/59
Before assigning to `genesisData` or returning `cfg`, have to check that
metadata is not `incompatible` to avoid `ProveField` warning.
The way how we use it was already correct (`incompatible` unreachable).
Use `case` syntax to silence the warning, and add comments referring to
the existing checks that make `incompatible` unreachable.
* ensure sync duties for next epoch are registered in time
For attestations, VC queries duties for current and next epoch.
For sync messages, VC queries for current and next period (if soon).
This means that for sync messages we don't actually have the duties for
next epoch in all situations, leading to situation where VC may miss
sync duties in the final slot of an epoch when using. As duties remain
same within a sync committee period, simply copy them over to next epoch
to avoid this situation without adding network latency.
* Update beacon_chain/validator_client/duties_service.nim
Co-authored-by: Jacek Sieka <jacek@status.im>
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
VC currently misses sync committee duties for first slot of most epochs
because the 1 slot offset is not taken into account. Duties for the next
slot must be used during any given current slot. We use the correct slot
for processing the duty, but do not use the correct slot for fetching.
`nim-serialization` is tagged with `{.raises:[SerializationError].}` so
it is no longer sufficient to catch `SszError` in some situations.
`SszError` inherits from `SerializationError`, so broadening the caught
exception types can be done now, to enable bumping `nim-serialization`.
https://github.com/status-im/nimbus-eth2/pull/5043#issuecomment-1584227993#5061 is also needed to bump `nim-serialization`.
Cleanup for `ProveField` warnings in `keystore` module.
Note that `ProveField` is disabled by default in makefile, but sometimes
these pop up when doing a regular `nim c`, and cleaning these may allow
enabling the warning in some future.
* split file loading from parsing in helpers
In `readSszForkedHashedBeaconState` and `readRuntimeConfig`, split the
part that loads the file from the part that parses the file. The parsing
portion can be reused with that, e.g., when loading from the network.
* add missing export marker
* Refactor api.nim to provide more informative failure reasons.
Distinct between unexpected data and unexpected code.
Deprecate Option[T] usage.
* Fix 400 for produceBlindedBlock().
Get proper string conversion for strategy.
* Fix SSZ encoded versions of ProduceBlockResponseV2, ProduceBlockResponseV2 can be received and decoded.
Fix done() warnings.
Bump presto.
* Fix compilation error with new presto.
Use TcpNoDelay option for Web3Signer.
* Fix produceBlockV2() should provide SSZ responses too.
* Address block encoding issue.
* Fix signing test.
* Bump presto.
* Address review comments.
Use separate functions per format when parsing LC data from REST.
This allows to process events from the eventstream more directly,
as they are always JSON not SSZ. And also makes the code cleaner.
Assigning to fields of `var` case objects emits `ProveField` warnings.
We disable them in `make` based builds but they may pop up when building
manually with `nim c`. Suppress them for the `keyGen` function, as we
assign to `result.value` separately from `result.ok` to avoid copying.
* `ProveField` cleanups in `forks`
Some more cleanup for `ProveField` warnings in `forks` module.
Note that `ProveField` is disabled by default in makefile, but sometimes
these pop up when doing a regular `nim c`, and cleaning these may allow
enabling the warning in some future.
* use syntax that works if passed to multiple args of call
When using trusted node sync with `--trusted-block-root`, the remote
server is only trusted for data availability, not for correctness.
As a downloaded genesis state cannot be validated for correctness,
require it to be passed via the network metadata `genesis.ssz` file
for `--trusted-block-root` mode. Network metadata is considered trusted
as it is provided by the user and not by the remote server.
Further adds a check for consistent `genesis_time` when using `StateId`
based trusted node sync. This is just a sanity check to avoid spreading
blatantly incorrect data, similar to existing `genesis_validators_root`
checks.
* Initial commit with both methods enabled: `poll` and `event`.
* Address review comments.
* Address review comments.
Fix copyright years.
* After bump fixes.
Since #3976, CORS functionality is broken. Fix it to work again:
- Use `--rest-allowed-origin` instead of `--keymanager-allowed-origin`
to specify CORS `Access-Control-Allow-Origin` header for beacon-APIs.
- Actually pass CORS config to `nim-presto` once more.