Nimbus currently stops performing validator duties if the blockchain
does not progress for `node.config.syncHorizon` slots. This means that
the chain won't recover because no new blocks are proposed. To fix that,
continue performing validator duties if no progress is registered for a
long time, and none of our peers is indicating any progress.
#6087 introduced a subtle change to `nim-web3` resulting in `Gwei` to be
serialized differently than before. Using a `distinct` type for `Gwei`
improves type safety and avoids such problems in the future.
On Goerli there are some instances of long streaks of empty epochs due
to different branches being built in parallel. They sometimes lead to
`Request for pruned historical state` logs requiring a BN restart to
resolve. Avoid that by trying to restore states from the entire non-
finalized history, to avoid losing sync in such situtions.
When a config defines a different `INACTIVITY_SCORE_RECOVERY_RATE` than
the default, `process_inactivity_updates` uses an incorrect rate ever
since #2710 when `INACTIVITY_SCORE_RECOVERY_RATE` became configurable.
When there are long periods of non-finality, `nodeIsViableForHead` has
been observed to consume significant time as it repeatedly walks the
non-finalized check graph as part of determining what heads are eligible
for fork choice. Caching the result resolves that.
Overall, it may still be better to prune fork choice more aggressively
when finality advances, to fully avoid the case specced out using the
linear scan. The current implementation is very close to spec, though,
so such a change should not be introduced without thorough testing.
The simple cache should allow significantly better performance on Goerli
while the network is still supported (Mid April).
In `block_dag` there is a max depth of 100 years configured to detect
internal inconsistencies, e.g., circular references. As `BlockRef` was
changed long ago to only reflect the non-finalized chain segment, the
theoretically supported max depth can be reduced and simplified.
We don't need the `cfg` right now, but it makes sense to have the object
passed to the clock so that the API doesn't break if we want to support
configurable `SECONDS_PER_SLOT`. As the `libnimbus_lc` library is not
yet widely used, better to add the argument now than later.
The `syncHorizon` describes the number of empty slots before the beacon
node considers itself to be out of sync. There are two places where we
currently set this to 50 slots, but it makes more sense to base it on
wall time, e.g., the 10 minutes that the default 50 are derived from.
* allow specifying get_proposer_reward block root at state.slot
* Add consensus_block_value calculation.
* Address review comments.
* Post-rebase adjustments.
* Use proper state to calculate consensus block value.
* Revert "allow specifying get_proposer_reward block root at state.slot"
This reverts commit 9fef9a8199f63056060527ac2531acc3b0ed8dcb.
* Fix post-revert problems.
Return back to Gwei.
* Adding test which is not working.
* Do not use test suite if it does not have post-state.
* Add debug logging.
* Increase logging to track sources of balance changes.
* Fix sync committee rewards/penalties calculation.
* Revert "Increase logging to track sources of balance changes."
This reverts commit 32feb20f2fdb66521401710866cd59ecc9951ef8.
* Adopt new vision to block rewards.
* Add block produce logging to VC.
* Remove rewards.nim.
* Eliminate toWei changes.
* Improve UInt256 shortLog.
* Fix conversion procedure.
* Address review comments.
* Fix test.
* Revert "Fix test."
This reverts commit 4948b2c1ec.
---------
Co-authored-by: tersec <tersec@users.noreply.github.com>
Co-authored-by: Etan Kissling <etan@status.im>
Provide additional context in the `syncEth1Chain tick` debug log to aid
with understanding of flow when debugging on a more precise basis than
just having the metrics.
Corrects a regression from #5998 that led to crashes in #6046.
In `trustedNodeSync` mode, the config does not contain genesis keys,
so attempting to load from them is a `Defect`.
Fix the `/eth/v1/beacon/deposit_snapshot` API to produce proper EIP-4881
compatible `DepositTreeSnapshot` responses. The endpoint used to expose
a Nimbus-specific database internal format.
Also fix trusted node sync to consume properly formatted EIP-4881 data
with `--with-deposit-snapshot`, and `--finalized-deposit-tree-snapshot`
beacon node launch option to use the EIP-4881 data. Further ensure that
`ncli_testnet` produces EIP-4881 formatted data for interoperability.
EIP-4881 was never correctly implemented, the `DepositTreeSnapshot`
structure has nothing to do with its actual definition. Reflect that
by renaming the type to a Nimbus-specific `DepositContractSnapshot`,
so that an actual EIP-4881 implementation can use the correct names.
- https://eips.ethereum.org/EIPS/eip-4881#specification
Notably, `DepositTreeSnapshot` contains a compressed sequence in
`finalized`, only containing the minimally required intermediate roots.
That also explains the incorrect REST response reported in #5508.
The non-canonical representation was introduced in #4303 and is also
persisted in the database. We'll have to maintain it for a while.
`nextActionWait` currently shows `n/a` if only proposal is scheduled
but no attestation, e.g., attestation was already made for current
epoch and validator is exiting next epoch so doesn't have another
attestation lined up. It's an edge case but it's still more correct
to also log `nextActionWait` if only proposal is scheduled.
When using `--external-beacon-api-url`, one has to accompany it with
either `--trusted-block-root` or `--trusted-state-root`. If neither is
specified, we can fallback to a deeply finalized noncontroversial block
root. For networks that started post Altair, e.g., Holesky, the genesis
block root fulfills that requirement, as in, it is implicitly trusted.
Therefore, if only `--external-beacon-api-url` is provided without any
`--trusted-block-root` or `--trusted-state-root`, use genesis block root
if it is a viable starting point (post-Altair).
```
build/nimbus_beacon_node \
--network=holesky \
--data-dir="$HOME/Downloads/nimbus/data/holesky" \
"--external-beacon-api-url=http://unstable.holesky.beacon-api.nimbus.team" \
--tcp-port=9010 --udp-port=9010 \
--rest --log-level=DEBUG \
--no-el
```
The exports in `eth2_discovery` produce deprecation warnings as they
refer to `close`, `closeWait` and so on. Turns out that they are not
necessary at all. The `Eth2DiscoveryProtocol` is even already exported
two lines above using `*` marker...
During refactoring of #5959, some implicit `return` were overlooked,
resulting in spurious `err()` being returned without message.
```
{"lvl":"WRN","ts":"2024-02-26 10:12:20.469+00:00","msg":"Beacon nodes report different configuration values","reason":"","service":"fallback_service","node":"http://127.0.0.1:9303[Nimbus/v24.2.1-4e9bc7-stateofus]","node_index":0,"node_roles":"AGBSDT"}
```
Correcting the helpers to return explicit result in all exhaustive
cases so that this cannot happen anymore by accident.
* add EIP-7044 support to keymanager API
When trying to sign `VoluntaryExit` via keymanager API, the logic is not
yet aware of EIP-7044 (part of Deneb). This patch adds missing EIP-7044
support to the keymanager API as well.
As part of this, the VC needs to become aware about:
- `CAPELLA_FORK_VERSION`: To correctly form the EIP-7044 signing domain.
The fork schedule does not indicate which of the results, if any,
corresponds to Capella.
- `CAPELLA_FORK_EPOCH`: To detect whether Capella was scheduled.
If a BN does not have it in its config while other BNs have it,
this leads to a log if Capella has not activated yet, or marks the BN
as incompatible if Capella already activated.
- `DENEB_FORK_EPOCH`: To check whether EIP-7044 logic should be used.
Related PRs:
- #5120 added support for processing EIP-7044 `VoluntaryExit` messages
as part of the state transition functions (tested by EF spec tests).
- #5953 synced the support from #5120 to gossip validation.
- #5954 added support to the `nimbus_beacon_node deposits exit` command.
- #5956 contains an alternative generic version of `VCForkConfig`.
* address reviewer feedback: letter case, module location, double lookup
---------
Co-authored-by: cheatfate <eugene.kabanov@status.im>
* Update beacon_chain/rpc/rest_constants.nim
* move `VCRuntimeConfig` back to `rest_types`
---------
Co-authored-by: cheatfate <eugene.kabanov@status.im>
* fix `getForkVersion` helper
---------
Co-authored-by: cheatfate <eugene.kabanov@status.im>
In #5120, the `nimbus_beacon_node deposits exit` command was updated for
compatibility with EIP-7044, which forces signatures to be made using
`CAPELLA_FORK_VERSION` regardless of the `VoluntaryExit`'s `epoch` after
Deneb is activated.
This update had a regression, as an older mechanism was used to fetch
`RuntimeConfig`, resulting in an encoding issue (#5362). This was then
fixed in #5370, restoring general `deposits exit` functionality.
However, the logic from #5120 has another flaw, as it uses an incorrect
fork version based on the pre-Deneb logic even after Deneb and EIP-7044
are activated. Fix this now, so that `deposits exit` continues to work
correctly after Deneb activates.
In #5120, EIP-7044 support got added to the state transition function to
force `CAPELLA_FORK_VERSION` to be used when validiting `VoluntaryExit`
messages, irrespective of their `epoch`.
In #5637, similar logic was added when batch verifying BLS signatures,
which is used during gossip validation (libp2p gossipsub, and req/resp).
However, that logic did not match the one introduced in #5120, and only
uses `CAPELLA_FORK_VERSION` when a `VoluntaryExit`'s `epoch` was set to
a value `>= CAPELLA_FORK_EPOCH`. Otherwise, `BELLATRIX_FORK_VERSION`
would still be used when validating `VoluntaryExit`, e.g., with `epoch`
set to `0`, as is the case in this Holesky block:
- https://holesky.beaconcha.in/slot/1076985#voluntary-exits
Extracting the correct logic from #5120 into a function, and reusing it
when verifying BLS signatures fixes this issue, and also leverages the
exhaustive EF test suite that covers the (correct) #5120 logic.
This fix only affects networks that have EIP-7044 applied (post-Deneb).
Without the fix, Deneb blocks with a `VoluntaryExit` with `epoch` set to
`< CAPELLA_FORK_EPOCH` incorrectly fail to validate despite being valid.
Incorrect blocks that contain a malicious `VoluntaryExit` with `epoch`
set to `< CAPELLA_FORK_EPOCH` and signed using `BELLATRIX_FORK_VERSION`
_would_ pass the BLS verification stage, but subsequently fail the state
transition logic. Such blocks would still correctly be labeled invalid.
During refactoring of #5959, some implicit `return` were overlooked,
resulting in spurious `err()` being returned without message.
```
{"lvl":"WRN","ts":"2024-02-26 10:12:20.469+00:00","msg":"Beacon nodes report different configuration values","reason":"","service":"fallback_service","node":"http://127.0.0.1:9303[Nimbus/v24.2.1-4e9bc7-stateofus]","node_index":0,"node_roles":"AGBSDT"}
```
Correcting the helpers to return explicit result in all exhaustive
cases so that this cannot happen anymore by accident.
* add EIP-7044 support to keymanager API
When trying to sign `VoluntaryExit` via keymanager API, the logic is not
yet aware of EIP-7044 (part of Deneb). This patch adds missing EIP-7044
support to the keymanager API as well.
As part of this, the VC needs to become aware about:
- `CAPELLA_FORK_VERSION`: To correctly form the EIP-7044 signing domain.
The fork schedule does not indicate which of the results, if any,
corresponds to Capella.
- `CAPELLA_FORK_EPOCH`: To detect whether Capella was scheduled.
If a BN does not have it in its config while other BNs have it,
this leads to a log if Capella has not activated yet, or marks the BN
as incompatible if Capella already activated.
- `DENEB_FORK_EPOCH`: To check whether EIP-7044 logic should be used.
Related PRs:
- #5120 added support for processing EIP-7044 `VoluntaryExit` messages
as part of the state transition functions (tested by EF spec tests).
- #5953 synced the support from #5120 to gossip validation.
- #5954 added support to the `nimbus_beacon_node deposits exit` command.
- #5956 contains an alternative generic version of `VCForkConfig`.
* address reviewer feedback: letter case, module location, double lookup
---------
Co-authored-by: cheatfate <eugene.kabanov@status.im>
* Update beacon_chain/rpc/rest_constants.nim
* move `VCRuntimeConfig` back to `rest_types`
---------
Co-authored-by: cheatfate <eugene.kabanov@status.im>
* fix `getForkVersion` helper
---------
Co-authored-by: cheatfate <eugene.kabanov@status.im>
Fix regression from #5842 where `Eth-Execution-Payload-Value` is parsed
into `consensusValue` instead of `Eth-consensus-Block-Value`. We don't
use those values for now, but fixing avoids hard-to-debug bugs later.
In #5120, the `nimbus_beacon_node deposits exit` command was updated for
compatibility with EIP-7044, which forces signatures to be made using
`CAPELLA_FORK_VERSION` regardless of the `VoluntaryExit`'s `epoch` after
Deneb is activated.
This update had a regression, as an older mechanism was used to fetch
`RuntimeConfig`, resulting in an encoding issue (#5362). This was then
fixed in #5370, restoring general `deposits exit` functionality.
However, the logic from #5120 has another flaw, as it uses an incorrect
fork version based on the pre-Deneb logic even after Deneb and EIP-7044
are activated. Fix this now, so that `deposits exit` continues to work
correctly after Deneb activates.
In #5120, EIP-7044 support got added to the state transition function to
force `CAPELLA_FORK_VERSION` to be used when validiting `VoluntaryExit`
messages, irrespective of their `epoch`.
In #5637, similar logic was added when batch verifying BLS signatures,
which is used during gossip validation (libp2p gossipsub, and req/resp).
However, that logic did not match the one introduced in #5120, and only
uses `CAPELLA_FORK_VERSION` when a `VoluntaryExit`'s `epoch` was set to
a value `>= CAPELLA_FORK_EPOCH`. Otherwise, `BELLATRIX_FORK_VERSION`
would still be used when validating `VoluntaryExit`, e.g., with `epoch`
set to `0`, as is the case in this Holesky block:
- https://holesky.beaconcha.in/slot/1076985#voluntary-exits
Extracting the correct logic from #5120 into a function, and reusing it
when verifying BLS signatures fixes this issue, and also leverages the
exhaustive EF test suite that covers the (correct) #5120 logic.
This fix only affects networks that have EIP-7044 applied (post-Deneb).
Without the fix, Deneb blocks with a `VoluntaryExit` with `epoch` set to
`< CAPELLA_FORK_EPOCH` incorrectly fail to validate despite being valid.
Incorrect blocks that contain a malicious `VoluntaryExit` with `epoch`
set to `< CAPELLA_FORK_EPOCH` and signed using `BELLATRIX_FORK_VERSION`
_would_ pass the BLS verification stage, but subsequently fail the state
transition logic. Such blocks would still correctly be labeled invalid.
This PR allows sharing the pubkey data between validators by using a
thread-local cache for pubkey data, netting about a 400mb mem usage
reduction on holesky due to us keeping 3 permanent + several ephemeral
state copies in memory at all times and each state copy holding a full
validator.
The PR also introduces a hash cache for the key which gives ~14% speedup
for a full state `hash_tree_root` - the key makes up for a large part of
the `Validator` htr time.
Finally, the time it takes to copy a state goes down as well from ~80m
ms to ~60, for reasons similar to htr.
We use a `ptr` even if a `ref` could in theory have been used - there is
not much practical benefit to a `ref` (given it's mutable) while a `ptr`
is cheaper and easier to copy (when copying temporary states).
We could go further and cache a cooked pubkey but it turns out this is
quite intrusive - in all the relevant places, we're already using a
cooked key from the immutable validator data so there are no immediate
performance gains of doing so while managing the compressed -> cooked
key mapping would become more difficult - something for a future PR
perhaps.
Co-authored-by: Etan Kissling <etan@status.im>
* track latest duration instead of total in new timing metrics
Change `db_checkpoint_seconds` and `state_replay_seconds` metrics to
record the latest duration instead of the total. `nim-metrics` already
synthesizes a `_total` metric from these implicitly.
* still have to use inc, metrics only synthesizes the name not the sum
* prefix with `beacon_dag`
Validator monitoring gained 2 new metrics for tracking when blocks are
included or not on the head chain.
Similar to attestations, if the block is produced in epoch N, reporting
will use the state when switching to epoch N+2 to do the reporting (so
as to reasonably stabilise the block inclusion in the face of reorgs).
Database checkpointing can take seconds, e.g., while Geth is syncing.
Add a debug log + metric for it, and also info log if it takes longer
than 250ms, same as for the existing `State replayed` log. If the log
shows up for a user while the system is not overloaded, it may point
to slow disk speed or thermal issue.
* compute post-merge randao mix without loading state
* avoid copying state on shuffling computation and compute epochref
* speed up state copy for block production
With checkpoint sync, the checkpoint block is typically unavailable at
the start, and only backfilled later. To avoid treating it as having
zero hash, execution disabled in some contexts, wrap the result of
`loadExecutionBlockHash` in `Opt` and handle block hash being unknown.
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
When syncing, we log a notice each time someone asks us for a block that
we haven't backfilled yet. This is quite verbose and not unexpected,
because the status message does not allow indicating backfill progress.
When using checkpoint sync, only checkpoint state is available, block is
not downloaded and backfilled later.
`dag.backfill` tracks latest filled `slot`, and latest `parent_root` for
which no block has been synced yet.
In checkpoint sync, this assumption is broken, because there, the start
`dag.backfill.slot` is set based on checkpoint state slot, and the block
is also not available.
However, sync manager in backward mode also requests `dag.backfill.slot`
and `block_clearance` then backfills the checkpoint block once it is
synced. But, there is no guarantee that a peer ever sends us that block.
They could send us all parent blocks and solely omit the checkpoint
block itself. In that situation, we would accept the parent blocks and
advance `dag.backfill`, and subsequently never request the checkpoint
block again, resulting in gap inside blocks DB that is never filled.
To mitigate that, the assumption is restored that `dag.backfill.slot`
is the latest filled `slot`, and `dag.backfill.parent_root` is the next
block that needs to be synced. By setting `slot` to `tail.slot + 1` and
`parent_root` to `tail.root`, we put a fake summary into `dag.backfill`
so that `block_clearance` only proceeds once checkpoint block exists.
After checkpoint sync, historical block IDs cannot yet be queried.
However, they are needed to compute dependent roots of `ShufflingRef`.
To allow lookup, enable `getBlockIdAtSlot` to answer from compatible
states in memory; as long as they descend from the finalized checkpoint
and the requested slot is sufficiently recent, `block_roots` contains
everything to recover `BlockSlotId` up to `SLOTS_PER_HISTORICAL_ROOT`.
This is similar to how `attester_dependent_root` etc. are computed.
This accelerates the first couple minutes of checkpoint sync on Mainnet,
especially the time until finality advances past the synced checkpoint.
Finish the rename started in #4809 to have a consistent naming.
`ExecutionPayloadHash` suggests hash over payload instead of block.
`BlockHash` is also the canonical name in engine API.
When using checkpoint sync, the initial block is missing in the DB.
Update the LC data collector initialization to account for that,
avoiding a spurious error message when it is incorrectly accessed:
```
ERR 2024-02-07 11:21:55.416+01:00 Block failed to load unexpectedly topics="chaindag_lc" bid=d30517a7:8257504 tail=8257504
```
Also fixes a regression from #5691 that resulted in similar messages
while importing the first few blocks after checkpoint sync.
Thanks to @arnetheduck for reporting this.
Full caches should not be used to mark blocks as unviable. The unviable
status is quite persistent and a block marked as such won't be processed
again once the cache empties. Problem originally introduced in #4808.
Changes here are more significant because of some good old tech debt in
block production which has grown quite hairy - the reduction in
exception handling at least provides some steps in the right direction.
`eth1_chain` no longer logs with `topics` since #5768, making it hard
to filter messages from this module. Re-add the `topics`, and also fix
outdated `topics` in `el_manager` (formerly `*_monitor`).
When BN clock is out of sync, VC sets BN status to `BrokenClock`. It is
only reset to `Offline` after restoring time sync. However, if VC fails
encounters an error while checking time, Nimbus extensions are assumed
to be unavailable and the BN is no longer checked for having a synced
clock. This means it is never reset back to `Offline` if errors start
occurring _after_ BN is already set to `BrokenClock`. This could be
because BN is changed from Nimbus to an alternative implementation,
or due to intermittent connection issues.
Ensure that BN status is reset back to `Offline` when Nimbus extensions
are disabled to ensure eventual connection recovery.
* Revert "use `RestPlainResponse` to improve builder API rerror reporting"
* Update rest_deneb_mev_calls.nim
copyright year linting
* Update rest_capella_mev_calls.nim
more copyright year linting
* Adopt asyncraises guarantees to most of the REST API handlers.
Bump presto.
* Fix copyright year.
---------
Co-authored-by: Etan Kissling <etan@status.im>
#5773 removed catching up on validator duties after lag. The `curSlot`
variable that was used originally to track catch-up progress no longer
has a use and is also no longer properly updated. Remove it.
If the initial state replays cover the finalized head, import matching
`LightClientBootstrap` into database.
This also addresses this error when light client requests bootstrap from
the genesis slot on networks that launch with Altair enabled.
```
{"lvl":"DBG","ts":"2023-10-04 11:17:49.665+00:00","msg":"LC bootstrap unavailable: Sync committee branch not cached","topics":"chaindag_lc","slot":0}
```
Avoid marking blocks invalid when corresponding `blobSidecarsByRange`
returns an incomplete / incorrect response while syncing. The block
itself may still be valid in that scenario.
There are two conditions leading to `duplicate contribution` log.
Align the logs with the ones used for attestation aggregates,
so that the two conditions can be separated when reading logs.