Using a dedicated branch for researching the effectiveness of split view
scenario handling simplifies testing and avoids having partial work on
`unstable`. If we want, we can reintroduce it under a `--debug` flag at
a later time. But for now, Goerli is a rare opoprtunity to test this,
maybe just for another week or so.
- https://github.com/status-im/infra-nimbus/pull/179
In split view situation, the canonical chain may only be served by a
tiny amount of peers, and branches may span long durations. Minority
branches may still have a large weight from attestations and should
be discovered. To assist with that, add a branch discovery module that
assists in such a situation by specifically targeting peers with unknown
histories and downloading from them, in addition to sync manager work
which handles popular branches.
There are situations where all states in the `blockchain_dag` are
occupied and cannot be borrowed.
- headState: Many assumptions in the code that it cannot be advanced
- clearanceState: Resets every time a new block gets imported, including
blocks from non-canonical branches
- epochRefState: Used even more frequently than clearanceState
This means that during the catch-up mechanic where the head state is
slowly advanced to wall clock to catch up on validator duties in the
situation where the canonical head is way behind non-canonical heads,
we cannot use any of the three existing states. In that situation,
Nimbus already consumes an increased amount of memory due to all the
`BlockRef`, fork choice states and so on, so experience is degraded.
It seems reasonable to allocate a fourth state temporarily during that
mechanic, until a new proposal could be made on the canonical chain.
Note that currently, on `unstable`, proposals _do_ happen every couple
hours because sync manager doesn't manage to discover additional heads
in a split-view scenario on Goerli. However, with the branch discovery
module, new blocks are discovered all the time, and the clearanceState
may no longer be borrowed as it is reset to different branch too often.
The extra state could also find other uses in the future, e.g., for
incremental computations as in reindexing the database, or online
collection of historical light client data.
Use the same eviction policy for blocks as already the case for blobs.
FIFO makes more sense, because it favors keeping ancestors of blocks
which need to be applied to the DAG before their children get eligible.
`eth2_network` forgets to descore peers when opening connection times
out. It only descores when opening the connection succeeds and then
there is a subsequent error. The caller cannot distinguish the cases,
so ensure that the descore is also applied if the request fails during
its initial portion.
When quarantining a block from block processor, we should also keep a
copy of its blobs. Otherwise, this involves more network roundtrips
to obtain information we already have. This is in line with how blobs
arrive from gossip and request manager sources. The existing flow does
not work when applying blocks from quarantine, which is addressed here.
Blobs are cached from gossip and other sources for all orphans, not just
those specifically tagged as `blobless`. `blobless` only means that they
are actively fetched from the network. The `MaxBlobs` should be aligned
to match `MaxOrphans`. Note that blobs are tiny compared to blocks, so
this isn't a huge memory hog.
* handle case of unreachable block in `is_optimstic` helper
When a non-canonical block is still in the DB, it can be accessed via
`BlockId`, but `BlockRef` may be unavailable if the block was not
properly cleaned when it got orphaned. Report it as optimistic.
* `template` -> `func`
When checking for `MissingParent`, it may be that the parent block was
already discovered as part of a prior run. In that case, it can be
loaded from storage and processed without having to rediscover the
entire branch from the network. This is similar to #6112 but for blocks
that are discovered via gossip / sync mgr instead of via request mgr.
* Add some duration metering.
Refactor some log statements.
Rework sync contribution deadline waiting.
Add some cancellation reporting handlers.
* Make all validator's shortLog to become validatorLog.
Optimize some logs with logScope.
* Add `raises`.
* More log statements polishing.
During sync, we can skip the `blobSidecarsByRange` request when there
are no blocks with `kzg_commitments` in the blocks data. Avoids running
into throttling from peers during long periods of non-finality.
Each individual blob currently uses as much quota from the network limit
as an entire block does, 128 items per second shared across all peers.
Blobs are 128 KB each instead of up to several MB and are simpler to
encode. There can be multiple per block (6 currently), so allow 2000
blobs per second across all peers. That decreases the cost per block
from `3125 + 3125 * blobs.len` quota (= `[3125, 21875]`) to a lower
`3125 + 200 * blobs.len` quota (= `[3125, 4325]`), accounting for the
slight increase in data transfer and encoding time.
During sync, sometimes the same block gets encountered and added to
quarantine multiple times. If its parent is already known, quarantine
incorrectly registers it as missing, leading to re-download. This can
be fixed by registering the parent's deepest missing parent recursively.
Also increase the stickiness of `missing`. We only perform 4 attempts
within ~16 seconds before giving up. Very frequently, this is not enough
and there is no progress until sync manager kicks in even on holesky.
When restarting beacon node, orphaned blocks remain in the database but
on startup, only the canonical chain as selected by fork choice loads.
When a new block is discovered that builds on top of an orphaned block,
the orphaned block is re-downloaded using sync/request manager, despite
it already being present on disk. Such queries can be answered locally
to improve discovery speed of alternate forks.
During lag spike, e.g., from state replays, peer count can temporarily
drop significantly. Should not have to wait another 60 minutes in that
situation just to be back where one started.
The `clearanceState` points to the latest resolved block, regardless of
whether that block is canonical according to fork choice. If chain is
stalled and we want to prepare for resuming validator duties, we need
a recent state according to fork choice to avoid lag spikes and missing
slot timings.
Nimbus currently stops performing validator duties if the blockchain
does not progress for `node.config.syncHorizon` slots. This means that
the chain won't recover because no new blocks are proposed. To fix that,
continue performing validator duties if no progress is registered for a
long time, and none of our peers is indicating any progress.
#6087 introduced a subtle change to `nim-web3` resulting in `Gwei` to be
serialized differently than before. Using a `distinct` type for `Gwei`
improves type safety and avoids such problems in the future.
On Goerli there are some instances of long streaks of empty epochs due
to different branches being built in parallel. They sometimes lead to
`Request for pruned historical state` logs requiring a BN restart to
resolve. Avoid that by trying to restore states from the entire non-
finalized history, to avoid losing sync in such situtions.
When a config defines a different `INACTIVITY_SCORE_RECOVERY_RATE` than
the default, `process_inactivity_updates` uses an incorrect rate ever
since #2710 when `INACTIVITY_SCORE_RECOVERY_RATE` became configurable.
When there are long periods of non-finality, `nodeIsViableForHead` has
been observed to consume significant time as it repeatedly walks the
non-finalized check graph as part of determining what heads are eligible
for fork choice. Caching the result resolves that.
Overall, it may still be better to prune fork choice more aggressively
when finality advances, to fully avoid the case specced out using the
linear scan. The current implementation is very close to spec, though,
so such a change should not be introduced without thorough testing.
The simple cache should allow significantly better performance on Goerli
while the network is still supported (Mid April).
In `block_dag` there is a max depth of 100 years configured to detect
internal inconsistencies, e.g., circular references. As `BlockRef` was
changed long ago to only reflect the non-finalized chain segment, the
theoretically supported max depth can be reduced and simplified.
We don't need the `cfg` right now, but it makes sense to have the object
passed to the clock so that the API doesn't break if we want to support
configurable `SECONDS_PER_SLOT`. As the `libnimbus_lc` library is not
yet widely used, better to add the argument now than later.
The `syncHorizon` describes the number of empty slots before the beacon
node considers itself to be out of sync. There are two places where we
currently set this to 50 slots, but it makes more sense to base it on
wall time, e.g., the 10 minutes that the default 50 are derived from.
* allow specifying get_proposer_reward block root at state.slot
* Add consensus_block_value calculation.
* Address review comments.
* Post-rebase adjustments.
* Use proper state to calculate consensus block value.
* Revert "allow specifying get_proposer_reward block root at state.slot"
This reverts commit 9fef9a8199f63056060527ac2531acc3b0ed8dcb.
* Fix post-revert problems.
Return back to Gwei.
* Adding test which is not working.
* Do not use test suite if it does not have post-state.
* Add debug logging.
* Increase logging to track sources of balance changes.
* Fix sync committee rewards/penalties calculation.
* Revert "Increase logging to track sources of balance changes."
This reverts commit 32feb20f2fdb66521401710866cd59ecc9951ef8.
* Adopt new vision to block rewards.
* Add block produce logging to VC.
* Remove rewards.nim.
* Eliminate toWei changes.
* Improve UInt256 shortLog.
* Fix conversion procedure.
* Address review comments.
* Fix test.
* Revert "Fix test."
This reverts commit 4948b2c1ec.
---------
Co-authored-by: tersec <tersec@users.noreply.github.com>
Co-authored-by: Etan Kissling <etan@status.im>
Provide additional context in the `syncEth1Chain tick` debug log to aid
with understanding of flow when debugging on a more precise basis than
just having the metrics.
Corrects a regression from #5998 that led to crashes in #6046.
In `trustedNodeSync` mode, the config does not contain genesis keys,
so attempting to load from them is a `Defect`.
Fix the `/eth/v1/beacon/deposit_snapshot` API to produce proper EIP-4881
compatible `DepositTreeSnapshot` responses. The endpoint used to expose
a Nimbus-specific database internal format.
Also fix trusted node sync to consume properly formatted EIP-4881 data
with `--with-deposit-snapshot`, and `--finalized-deposit-tree-snapshot`
beacon node launch option to use the EIP-4881 data. Further ensure that
`ncli_testnet` produces EIP-4881 formatted data for interoperability.
EIP-4881 was never correctly implemented, the `DepositTreeSnapshot`
structure has nothing to do with its actual definition. Reflect that
by renaming the type to a Nimbus-specific `DepositContractSnapshot`,
so that an actual EIP-4881 implementation can use the correct names.
- https://eips.ethereum.org/EIPS/eip-4881#specification
Notably, `DepositTreeSnapshot` contains a compressed sequence in
`finalized`, only containing the minimally required intermediate roots.
That also explains the incorrect REST response reported in #5508.
The non-canonical representation was introduced in #4303 and is also
persisted in the database. We'll have to maintain it for a while.
`nextActionWait` currently shows `n/a` if only proposal is scheduled
but no attestation, e.g., attestation was already made for current
epoch and validator is exiting next epoch so doesn't have another
attestation lined up. It's an edge case but it's still more correct
to also log `nextActionWait` if only proposal is scheduled.
When using `--external-beacon-api-url`, one has to accompany it with
either `--trusted-block-root` or `--trusted-state-root`. If neither is
specified, we can fallback to a deeply finalized noncontroversial block
root. For networks that started post Altair, e.g., Holesky, the genesis
block root fulfills that requirement, as in, it is implicitly trusted.
Therefore, if only `--external-beacon-api-url` is provided without any
`--trusted-block-root` or `--trusted-state-root`, use genesis block root
if it is a viable starting point (post-Altair).
```
build/nimbus_beacon_node \
--network=holesky \
--data-dir="$HOME/Downloads/nimbus/data/holesky" \
"--external-beacon-api-url=http://unstable.holesky.beacon-api.nimbus.team" \
--tcp-port=9010 --udp-port=9010 \
--rest --log-level=DEBUG \
--no-el
```
The exports in `eth2_discovery` produce deprecation warnings as they
refer to `close`, `closeWait` and so on. Turns out that they are not
necessary at all. The `Eth2DiscoveryProtocol` is even already exported
two lines above using `*` marker...
During refactoring of #5959, some implicit `return` were overlooked,
resulting in spurious `err()` being returned without message.
```
{"lvl":"WRN","ts":"2024-02-26 10:12:20.469+00:00","msg":"Beacon nodes report different configuration values","reason":"","service":"fallback_service","node":"http://127.0.0.1:9303[Nimbus/v24.2.1-4e9bc7-stateofus]","node_index":0,"node_roles":"AGBSDT"}
```
Correcting the helpers to return explicit result in all exhaustive
cases so that this cannot happen anymore by accident.
* add EIP-7044 support to keymanager API
When trying to sign `VoluntaryExit` via keymanager API, the logic is not
yet aware of EIP-7044 (part of Deneb). This patch adds missing EIP-7044
support to the keymanager API as well.
As part of this, the VC needs to become aware about:
- `CAPELLA_FORK_VERSION`: To correctly form the EIP-7044 signing domain.
The fork schedule does not indicate which of the results, if any,
corresponds to Capella.
- `CAPELLA_FORK_EPOCH`: To detect whether Capella was scheduled.
If a BN does not have it in its config while other BNs have it,
this leads to a log if Capella has not activated yet, or marks the BN
as incompatible if Capella already activated.
- `DENEB_FORK_EPOCH`: To check whether EIP-7044 logic should be used.
Related PRs:
- #5120 added support for processing EIP-7044 `VoluntaryExit` messages
as part of the state transition functions (tested by EF spec tests).
- #5953 synced the support from #5120 to gossip validation.
- #5954 added support to the `nimbus_beacon_node deposits exit` command.
- #5956 contains an alternative generic version of `VCForkConfig`.
* address reviewer feedback: letter case, module location, double lookup
---------
Co-authored-by: cheatfate <eugene.kabanov@status.im>
* Update beacon_chain/rpc/rest_constants.nim
* move `VCRuntimeConfig` back to `rest_types`
---------
Co-authored-by: cheatfate <eugene.kabanov@status.im>
* fix `getForkVersion` helper
---------
Co-authored-by: cheatfate <eugene.kabanov@status.im>
In #5120, the `nimbus_beacon_node deposits exit` command was updated for
compatibility with EIP-7044, which forces signatures to be made using
`CAPELLA_FORK_VERSION` regardless of the `VoluntaryExit`'s `epoch` after
Deneb is activated.
This update had a regression, as an older mechanism was used to fetch
`RuntimeConfig`, resulting in an encoding issue (#5362). This was then
fixed in #5370, restoring general `deposits exit` functionality.
However, the logic from #5120 has another flaw, as it uses an incorrect
fork version based on the pre-Deneb logic even after Deneb and EIP-7044
are activated. Fix this now, so that `deposits exit` continues to work
correctly after Deneb activates.
In #5120, EIP-7044 support got added to the state transition function to
force `CAPELLA_FORK_VERSION` to be used when validiting `VoluntaryExit`
messages, irrespective of their `epoch`.
In #5637, similar logic was added when batch verifying BLS signatures,
which is used during gossip validation (libp2p gossipsub, and req/resp).
However, that logic did not match the one introduced in #5120, and only
uses `CAPELLA_FORK_VERSION` when a `VoluntaryExit`'s `epoch` was set to
a value `>= CAPELLA_FORK_EPOCH`. Otherwise, `BELLATRIX_FORK_VERSION`
would still be used when validating `VoluntaryExit`, e.g., with `epoch`
set to `0`, as is the case in this Holesky block:
- https://holesky.beaconcha.in/slot/1076985#voluntary-exits
Extracting the correct logic from #5120 into a function, and reusing it
when verifying BLS signatures fixes this issue, and also leverages the
exhaustive EF test suite that covers the (correct) #5120 logic.
This fix only affects networks that have EIP-7044 applied (post-Deneb).
Without the fix, Deneb blocks with a `VoluntaryExit` with `epoch` set to
`< CAPELLA_FORK_EPOCH` incorrectly fail to validate despite being valid.
Incorrect blocks that contain a malicious `VoluntaryExit` with `epoch`
set to `< CAPELLA_FORK_EPOCH` and signed using `BELLATRIX_FORK_VERSION`
_would_ pass the BLS verification stage, but subsequently fail the state
transition logic. Such blocks would still correctly be labeled invalid.
During refactoring of #5959, some implicit `return` were overlooked,
resulting in spurious `err()` being returned without message.
```
{"lvl":"WRN","ts":"2024-02-26 10:12:20.469+00:00","msg":"Beacon nodes report different configuration values","reason":"","service":"fallback_service","node":"http://127.0.0.1:9303[Nimbus/v24.2.1-4e9bc7-stateofus]","node_index":0,"node_roles":"AGBSDT"}
```
Correcting the helpers to return explicit result in all exhaustive
cases so that this cannot happen anymore by accident.
* add EIP-7044 support to keymanager API
When trying to sign `VoluntaryExit` via keymanager API, the logic is not
yet aware of EIP-7044 (part of Deneb). This patch adds missing EIP-7044
support to the keymanager API as well.
As part of this, the VC needs to become aware about:
- `CAPELLA_FORK_VERSION`: To correctly form the EIP-7044 signing domain.
The fork schedule does not indicate which of the results, if any,
corresponds to Capella.
- `CAPELLA_FORK_EPOCH`: To detect whether Capella was scheduled.
If a BN does not have it in its config while other BNs have it,
this leads to a log if Capella has not activated yet, or marks the BN
as incompatible if Capella already activated.
- `DENEB_FORK_EPOCH`: To check whether EIP-7044 logic should be used.
Related PRs:
- #5120 added support for processing EIP-7044 `VoluntaryExit` messages
as part of the state transition functions (tested by EF spec tests).
- #5953 synced the support from #5120 to gossip validation.
- #5954 added support to the `nimbus_beacon_node deposits exit` command.
- #5956 contains an alternative generic version of `VCForkConfig`.
* address reviewer feedback: letter case, module location, double lookup
---------
Co-authored-by: cheatfate <eugene.kabanov@status.im>
* Update beacon_chain/rpc/rest_constants.nim
* move `VCRuntimeConfig` back to `rest_types`
---------
Co-authored-by: cheatfate <eugene.kabanov@status.im>
* fix `getForkVersion` helper
---------
Co-authored-by: cheatfate <eugene.kabanov@status.im>
Fix regression from #5842 where `Eth-Execution-Payload-Value` is parsed
into `consensusValue` instead of `Eth-consensus-Block-Value`. We don't
use those values for now, but fixing avoids hard-to-debug bugs later.
In #5120, the `nimbus_beacon_node deposits exit` command was updated for
compatibility with EIP-7044, which forces signatures to be made using
`CAPELLA_FORK_VERSION` regardless of the `VoluntaryExit`'s `epoch` after
Deneb is activated.
This update had a regression, as an older mechanism was used to fetch
`RuntimeConfig`, resulting in an encoding issue (#5362). This was then
fixed in #5370, restoring general `deposits exit` functionality.
However, the logic from #5120 has another flaw, as it uses an incorrect
fork version based on the pre-Deneb logic even after Deneb and EIP-7044
are activated. Fix this now, so that `deposits exit` continues to work
correctly after Deneb activates.
In #5120, EIP-7044 support got added to the state transition function to
force `CAPELLA_FORK_VERSION` to be used when validiting `VoluntaryExit`
messages, irrespective of their `epoch`.
In #5637, similar logic was added when batch verifying BLS signatures,
which is used during gossip validation (libp2p gossipsub, and req/resp).
However, that logic did not match the one introduced in #5120, and only
uses `CAPELLA_FORK_VERSION` when a `VoluntaryExit`'s `epoch` was set to
a value `>= CAPELLA_FORK_EPOCH`. Otherwise, `BELLATRIX_FORK_VERSION`
would still be used when validating `VoluntaryExit`, e.g., with `epoch`
set to `0`, as is the case in this Holesky block:
- https://holesky.beaconcha.in/slot/1076985#voluntary-exits
Extracting the correct logic from #5120 into a function, and reusing it
when verifying BLS signatures fixes this issue, and also leverages the
exhaustive EF test suite that covers the (correct) #5120 logic.
This fix only affects networks that have EIP-7044 applied (post-Deneb).
Without the fix, Deneb blocks with a `VoluntaryExit` with `epoch` set to
`< CAPELLA_FORK_EPOCH` incorrectly fail to validate despite being valid.
Incorrect blocks that contain a malicious `VoluntaryExit` with `epoch`
set to `< CAPELLA_FORK_EPOCH` and signed using `BELLATRIX_FORK_VERSION`
_would_ pass the BLS verification stage, but subsequently fail the state
transition logic. Such blocks would still correctly be labeled invalid.
This PR allows sharing the pubkey data between validators by using a
thread-local cache for pubkey data, netting about a 400mb mem usage
reduction on holesky due to us keeping 3 permanent + several ephemeral
state copies in memory at all times and each state copy holding a full
validator.
The PR also introduces a hash cache for the key which gives ~14% speedup
for a full state `hash_tree_root` - the key makes up for a large part of
the `Validator` htr time.
Finally, the time it takes to copy a state goes down as well from ~80m
ms to ~60, for reasons similar to htr.
We use a `ptr` even if a `ref` could in theory have been used - there is
not much practical benefit to a `ref` (given it's mutable) while a `ptr`
is cheaper and easier to copy (when copying temporary states).
We could go further and cache a cooked pubkey but it turns out this is
quite intrusive - in all the relevant places, we're already using a
cooked key from the immutable validator data so there are no immediate
performance gains of doing so while managing the compressed -> cooked
key mapping would become more difficult - something for a future PR
perhaps.
Co-authored-by: Etan Kissling <etan@status.im>
* track latest duration instead of total in new timing metrics
Change `db_checkpoint_seconds` and `state_replay_seconds` metrics to
record the latest duration instead of the total. `nim-metrics` already
synthesizes a `_total` metric from these implicitly.
* still have to use inc, metrics only synthesizes the name not the sum
* prefix with `beacon_dag`
Validator monitoring gained 2 new metrics for tracking when blocks are
included or not on the head chain.
Similar to attestations, if the block is produced in epoch N, reporting
will use the state when switching to epoch N+2 to do the reporting (so
as to reasonably stabilise the block inclusion in the face of reorgs).
Database checkpointing can take seconds, e.g., while Geth is syncing.
Add a debug log + metric for it, and also info log if it takes longer
than 250ms, same as for the existing `State replayed` log. If the log
shows up for a user while the system is not overloaded, it may point
to slow disk speed or thermal issue.
* compute post-merge randao mix without loading state
* avoid copying state on shuffling computation and compute epochref
* speed up state copy for block production
With checkpoint sync, the checkpoint block is typically unavailable at
the start, and only backfilled later. To avoid treating it as having
zero hash, execution disabled in some contexts, wrap the result of
`loadExecutionBlockHash` in `Opt` and handle block hash being unknown.
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
When syncing, we log a notice each time someone asks us for a block that
we haven't backfilled yet. This is quite verbose and not unexpected,
because the status message does not allow indicating backfill progress.
When using checkpoint sync, only checkpoint state is available, block is
not downloaded and backfilled later.
`dag.backfill` tracks latest filled `slot`, and latest `parent_root` for
which no block has been synced yet.
In checkpoint sync, this assumption is broken, because there, the start
`dag.backfill.slot` is set based on checkpoint state slot, and the block
is also not available.
However, sync manager in backward mode also requests `dag.backfill.slot`
and `block_clearance` then backfills the checkpoint block once it is
synced. But, there is no guarantee that a peer ever sends us that block.
They could send us all parent blocks and solely omit the checkpoint
block itself. In that situation, we would accept the parent blocks and
advance `dag.backfill`, and subsequently never request the checkpoint
block again, resulting in gap inside blocks DB that is never filled.
To mitigate that, the assumption is restored that `dag.backfill.slot`
is the latest filled `slot`, and `dag.backfill.parent_root` is the next
block that needs to be synced. By setting `slot` to `tail.slot + 1` and
`parent_root` to `tail.root`, we put a fake summary into `dag.backfill`
so that `block_clearance` only proceeds once checkpoint block exists.
After checkpoint sync, historical block IDs cannot yet be queried.
However, they are needed to compute dependent roots of `ShufflingRef`.
To allow lookup, enable `getBlockIdAtSlot` to answer from compatible
states in memory; as long as they descend from the finalized checkpoint
and the requested slot is sufficiently recent, `block_roots` contains
everything to recover `BlockSlotId` up to `SLOTS_PER_HISTORICAL_ROOT`.
This is similar to how `attester_dependent_root` etc. are computed.
This accelerates the first couple minutes of checkpoint sync on Mainnet,
especially the time until finality advances past the synced checkpoint.
Finish the rename started in #4809 to have a consistent naming.
`ExecutionPayloadHash` suggests hash over payload instead of block.
`BlockHash` is also the canonical name in engine API.
When using checkpoint sync, the initial block is missing in the DB.
Update the LC data collector initialization to account for that,
avoiding a spurious error message when it is incorrectly accessed:
```
ERR 2024-02-07 11:21:55.416+01:00 Block failed to load unexpectedly topics="chaindag_lc" bid=d30517a7:8257504 tail=8257504
```
Also fixes a regression from #5691 that resulted in similar messages
while importing the first few blocks after checkpoint sync.
Thanks to @arnetheduck for reporting this.
Full caches should not be used to mark blocks as unviable. The unviable
status is quite persistent and a block marked as such won't be processed
again once the cache empties. Problem originally introduced in #4808.
Changes here are more significant because of some good old tech debt in
block production which has grown quite hairy - the reduction in
exception handling at least provides some steps in the right direction.
`eth1_chain` no longer logs with `topics` since #5768, making it hard
to filter messages from this module. Re-add the `topics`, and also fix
outdated `topics` in `el_manager` (formerly `*_monitor`).
When BN clock is out of sync, VC sets BN status to `BrokenClock`. It is
only reset to `Offline` after restoring time sync. However, if VC fails
encounters an error while checking time, Nimbus extensions are assumed
to be unavailable and the BN is no longer checked for having a synced
clock. This means it is never reset back to `Offline` if errors start
occurring _after_ BN is already set to `BrokenClock`. This could be
because BN is changed from Nimbus to an alternative implementation,
or due to intermittent connection issues.
Ensure that BN status is reset back to `Offline` when Nimbus extensions
are disabled to ensure eventual connection recovery.
* Revert "use `RestPlainResponse` to improve builder API rerror reporting"
* Update rest_deneb_mev_calls.nim
copyright year linting
* Update rest_capella_mev_calls.nim
more copyright year linting
* Adopt asyncraises guarantees to most of the REST API handlers.
Bump presto.
* Fix copyright year.
---------
Co-authored-by: Etan Kissling <etan@status.im>
#5773 removed catching up on validator duties after lag. The `curSlot`
variable that was used originally to track catch-up progress no longer
has a use and is also no longer properly updated. Remove it.
If the initial state replays cover the finalized head, import matching
`LightClientBootstrap` into database.
This also addresses this error when light client requests bootstrap from
the genesis slot on networks that launch with Altair enabled.
```
{"lvl":"DBG","ts":"2023-10-04 11:17:49.665+00:00","msg":"LC bootstrap unavailable: Sync committee branch not cached","topics":"chaindag_lc","slot":0}
```
Avoid marking blocks invalid when corresponding `blobSidecarsByRange`
returns an incomplete / incorrect response while syncing. The block
itself may still be valid in that scenario.
There are two conditions leading to `duplicate contribution` log.
Align the logs with the ones used for attestation aggregates,
so that the two conditions can be separated when reading logs.
The `blob_gas_used` field was not properly populated when constructing
Deneb light client data. This is due to #5026 not applying the change to
the entire codebase when the new field got introduced, and due to #5350
not catching that oversight in other modules. Also reviewed codebase and
discovered that `shortLog` for Deneb execution payloads has same bug.
To simplify supporting "am I ready for the fork" requests, add a simple
marker to the default status bar that indicates readiness by displaying
the fork's corresponding mascot. This is the same one that is also
displayed during the actual fork transition, so does not introduce
new dependencies. It also only shows in default status bar, not in logs.
* remove extraneous length checks in KZG batch proofs
* re-add winservice import but only for Windows, to avoid UnusedImport warning
* also uses establishWindowsService
* Make some startup procedures async.
Add more handful makeBannerAndConfig().
* Dissect windows service code from `nimbus_beacon_node.nim`.
* Add report service startup errors using windows error codes.
Add plug able exitService().
Co-authored-by: Zahary Karadjov <zahary@status.im>
Co-authored-by: Jacek Sieka <jacek@status.im>
* Avoid global in p2p macro (fixes#4578)
* copy p2p macro to this repo and start de-crufting it
* make protocol registration dynamic, removing light client hacks et al
* split out light client protocol into its own file
* cleanups
* Option -> Opt
* remove more cruft
* further split beacon_sync
this allows the light client to respond to peer metadata messages
without exposing the block sync protocol
* better protocol init
* "constant" protocol index
* avoid casts
* copyright
* move some discovery code to discovery
* avoid extraneous data copy when sending chunks
* remove redundant forkdigest field
* document how to connect to a specific peer
Etan Kissling (2):
remove unused `skip0xPrefix`
keep the internal count helper
Will (1):
Bugfix/nully values (#61)
Yuriy Glukhov (5):
Contract constructor support
Fixed compilation error in exec function
Added string encoding
Fixed source->from field of EthCall
More flexibility to contract DSL, Async contract caller
jangko (5):
Reduce compiler warnings when using Nim v2
Migrate to json-serialization
Add tests of json rpc marshalled types
Resolve contract_dsl ambiguity
Event handler passing around JsonString instead of JsonNode
Share encoder between json-rpc and chronicles (#119)
Simplify generic constraint of rpc and chronicles encoders
Feature/execution api spec (#69)
v0.3.0
bump nim-json-rpc to a6475e49b26d3afc58aaa3d67621c94eafef8efb
coffeepots (1):
Use nim-json-serialization for RPCs (#172)
jangko (10):
Add copyright to source file
Remove StringOfJson
Fix optional parameter parsing fails in rpc macro with generics
Rename jrpc_sys module back to jsonmarshal
Reenable test hhtps
Add test for createRpcSigsFromNim and createSingleRpcSig
Let the OS choose the port for tests
Add onProcessMessage hook to client
Fix example in the README.md
Move errors module back to json_rpc folder
Upgrade rpc router internals (#178)
RPC server handle null return value correctly
v0.3.0
kdeme (1):
Add example test case that currently fails the Option parsing
Extend slot start message and default status bar with information about
current head fork and the next fork transition (corresponding to head).
This is useful to know whether a synced client is aware of a future fork
and can also be useful when syncing from old forks to follow progress
across the various forks.
```
peers: 8 ❯ finalized: 741c2ce2:230474 ❯ head: b330f58b:230477:20 ❯ fork: Capella (next: Deneb:231680) ❯ time: 230599:24 (7379192) ❯ sync: 00h24m (99.63%) 2.6492slots/s (QwQUwQPQDQ:7375263)/opt
```
```
INF 2024-01-12 12:18:00.001+01:00 Slot start topics="beacnde" slot=7379190 epoch=230599 fork="Capella (next: Deneb:231680)" sync="--h--m (99.62%) 0.0000slots/s (wwwwwwwwww:7375167)/opt" peers=0 head=741c2ce2:7375168 finalized=230472:723abe7e delay=1ms861us
```
In #5664, `nim-json-rpc` dependency got bumped which included a change
in behaviour when processing `null` data for heap allocated objects.
- https://github.com/status-im/nim-json-rpc/pull/176
Old behaviour was to raise an exception, while new behaviour is to set
the value to `nil` but treat it as a successful parse. Old exceptions
were similar to "Parameter [result] expected JObject but got JNull".
As part of the `nim-json-rpc` bump in #5664, `el_manager.nim` was not
updated to match the new behaviour, leading to crash whenever its logic
assumes that a successfully parsed web3 `BlockObject` (heap allocated)
may be assumed to be non-`nil`.
As a quick remedy, the `el_manager.nim` is updated to transform `nil`
responses for `BlockObject` into `ValueError`, allowing reuse of the
existing and tested exception based processing.
In #5664, `nim-json-rpc` dependency got bumped which included a change
in behaviour when processing `null` data for heap allocated objects.
- https://github.com/status-im/nim-json-rpc/pull/176
Old behaviour was to raise an exception, while new behaviour is to set
the value to `nil` but treat it as a successful parse. Old exceptions
were similar to "Parameter [result] expected JObject but got JNull".
As part of the `nim-json-rpc` bump in #5664, `el_manager.nim` was not
updated to match the new behaviour, leading to crash whenever its logic
assumes that a successfully parsed web3 `BlockObject` (heap allocated)
may be assumed to be non-`nil`.
As a quick remedy, the `el_manager.nim` is updated to transform `nil`
responses for `BlockObject` into `ValueError`, allowing reuse of the
existing and tested exception based processing.
This PR causes a few new warnings to appear - these are harmless but
will need addressing separately as they span several libraries.
* new asyncraises syntax
* asyncraises support in several modules
* `sink` usage reduces spurious copying
* `?` compatiblity for `async` + `results`
* remove `-d:chronosStrictException` (obsolete)
The `eth2-networks` repo often receives metadata updates with a delay.
Switch to `goerli` repo to obtain the latest config (Dencun scheduling)
when it is updated. This is in line with how Sepolia / Holesky work.
- https://github.com/eth-clients/goerli/pull/178
The `eth2-networks` repo often receives metadata updates with a delay.
Switch to `goerli` repo to obtain the latest config (Dencun scheduling)
when it is updated. This is in line with how Sepolia / Holesky work.
- https://github.com/eth-clients/goerli/pull/178
With Capella, `bls_to_execution_change` SSE should be emitted on the
event stream whenever a new `SignedBLSToExecutionChange` is received.
Add this missing functionality for compatibility with beacon-API specs.
- https://github.com/ethereum/beacon-APIs/pull/248
This requires all object types to be explicitly white-listed for
default serialization. The PR makes the minimal changes, although
a number of similar mechanisms in eth2_rest_serialization can now
be removed.
As followup from #5654, ensure that we still keep the comment around
referring to the correct `forkchoiceUpdated` to use being driven by
the fork schedule.
* use `PayloadAttributesV3` in `nimbus_light_client` for Deneb
From Deneb onward, `forkchoiceUpdated` requires `PayloadAttributesV3`.
In `nimbus_light_client` we still used `PayloadAttributesV2`.
Also clean up two other locations that were already correctly using
`PayloadAttributesV3`, to reduce code duplication.
* fix letter case
* avoid potentially subtle template/function symbol name interactions
* use warn instead of error in getExecutionPayload codepath to ensure lack of ambiguity
When the BN exits after writing new `head` to database, but before
completing the `updateFinalizedBlocks` call, the database is slightly
inconsistent due to the partial write. We currently fail to start up
after that. Fix that by catching up on partial `updateFinalizedBlocks`
tasks on start up, and add a test for this edge case.
Simplify best `LightClientUpdate` collection by tracking only canonical
data instead of tracking the best update across all branches within the
sync committee period.
- https://github.com/ethereum/consensus-specs/pull/3553
* reorder gossip validation checks
Doing the coverage check only after the corresponding committee index is
known allows optimization by early rejecting invalid data.
* use same helper for individual attestations as well
This PR brings down the time to send 100 attestations from ~1s to
~100ms, making it feasible to run 10k validators on a single node (which
regularly send 300 attestations / slot).
This is done by batching the slashing protection database write in a
single transaction thus avoiding a slow fsync for every signature -
effects will be more pronounced on slow drives.
The benefit applies both to beacon and client validators.
When creating new LC updates, information about the parent block's post
state must be available (cached), but information about current block's
post state is not yet required. Caching information about the current
block's post state can be delayed, simplifying the LC data collection
logic a bit and allowing more future flexibility with the cache design.
The proposed change ensures that VC validators are registered with the builder specified by the `--payload-builder-url` argument even if the beacon node has no attached validators. It also prevent such validators from being unintentionally registered with builders configured for specific attached validators by the keymanager api.
Per the current specs, the VC have no way to specify which builders the BN should use on a per-node basis, so for the time being we have to resort to using the BN fallback default builder URL for VC validators.
When new finality is reached without supermajority sync committee
support, trigger another event push on beacon-API and libp2p once
the finality gains supermajority support.
- https://github.com/ethereum/consensus-specs/pull/3549
Replace sections that need to be maintained with every `ConsensusFork`
related to LC data collection with a generic logic that keeps working
when unrelated parts of Ethereum change.
* Initial commit.
* Fix issues and tests.
* Fix test compilation issue.
* Update AllTests.
* Change the most poor score name from <lowest> to <bad>.
Split sync committee message score in range, so lexicographic scores will not intersect with normal one.
Lexicographic scores should be below to normal scores.
* Address review comments.
Fix aggregated attestation scoring to use MAX_VALIDATORS_PER_COMMITTEE.
Fix sync committee contributions to use SYNC_SUBCOMMITTEE_SIZE.
Add getUniqueVotes test vectors.
* Post-rebase fixes.
* Address review comments.
* Return back score calculation based on actual bits length.
* AllTests modification.
Gnosis uses `MIN_EPOCHS_FOR_BLOCK_REQUESTS` = 33024, but the computed
safe minimum (that Nimbus was using) is 2304. Relax the compatibility
check to allow `MIN_EPOCHS_FOR_BLOCK_REQUESTS` above the safe minimum
and honor `config.yaml` preferences for `MIN_EPOCHS_FOR_BLOCK_REQUESTS`.
* make `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS` configurable
Gnosis uses custom `MIN_EPOCHS_FOR_BLOB_SIDECARS_REQUESTS` to account
for the faster slot timing, so that blobs still remain available for
roughly the same amount of real time.
Also extend REST config endpoint with full config form `v1.4.0-beta.4`,
and extend compatibility checks when loading configs to reduce warnings.
* Add slashing database pruning to VC.
Fix GetBlockHeaderResponse object declaration (spec has been changed).
* Switch to getFinalizedBlockHeader instead.
* Fix proper sign.
Add statements.
Show pruning log statement only when pruning happens.
* Optimize and remove debugging helpers.
In `v1.4.0-alpha.0`, the blob index validation on gossip was changed to
use `compute_subnet_for_blob_sidecar` instead of having a separate topic
for each individual blob. We updated the spec reference in #5363 without
updating the code accordingly. Fixing this now, and also adding the new
`MAX_BLOBS_PER_BLOCK` check from `v1.4.0-beta.3` as it shares the theme.
`v1.4.0-beta.4` made the Gossip rules more strict and now requires to
ignore blobs from other branches if there are equivocating blocks.
Those blobs are only requestable via Req/Resp.
Fix regression from #4808 where blobs that are already known are issued
ACCEPT verdict, propagating them to peers over and over again.
`validateBlobSidecar` contains the correct IGNORE logic. Moved it above
the expensive checks to retain the performance of the check.
Capacity should be set to theoretical limit to ensure correct hash root.
Actual length may be shorter. Only use is `ExecutionPayloadForSigning`
so it doesn't matter yet in practice, but still worth fixing.
Using trusted node sync currently requires to run two commands -
first the `trustedNodeSync` command to initialize the database,
followed by the regular startup command to continue syncing.
The `trustedNodeSync` options are now also available during regular
startup, and are used when the database is empty to simplify setting up
a new Nimbus beacon node. This also aligns behaviour closer with other
Ethereum consensus implementations.
The new logic only applies if the database has not yet been initialized;
same as before. Also, the database needs to be removed under the same
conditions as before when a fresh sync is desired.
Move KZG trusted setup initialization before `BeaconNode.init` to avoid
edge case where network message is received and processed before crypto
library has been properly initialized. Followup from #4870.
To prepare for calling trusted node sync from the main Nimbus startup
logic, extract the trusted node sync trigger into a separate function.
Further allow passing a pre-opened database, as that will be needed to
check whether trusted node sync needs to be called during regular start.
* Bump nim-eth: Change block timestamp from std.Time to distinct uint64
Also change tx.maxFeePerBlobGas from GasInt to UInt256
following Cancun latest spec
* Fix EthTime.now from func to proc due to sideeffects
* ShufflingRef approach to next-epoch validator duty calculation/prediction
* refactor action_tracker.updateActions to take ShufflingRef + beacon_proposers; refactor maybeUpdateActionTrackerNextEpoch to be separate and reused function; add actual fallback logic
* document one possible set of conditions
* check epoch participation flags and inactivity scores to ensure no penalties and MAX_EFFECTIVE_BALANCE to ensure rewards don't matter
* correctly (un)shuffle each proposer index
* remove debugging assertion
The templates for `BeaconBlock`, `BeaconBlockBody` and `BeaconState`
are the only ones using a `macro` mechanism for code generation.
This prevents using the dot-syntax style `consensusFork.BeaconFoo`
in some situations, and also tends to trigger naming conflicts,
requiring the `Type` suffix. Furthermore, the `macro` only works
for types that are re-defined in every single `ConsensusFork`.
Replacing with the simpler but more verbose approach used for other
types for consistency and to avoid the downsides of the `macro`.
Furthermore, simplify `test_fixture_sanity_blocks` to use `forks` sugar.
Directly initialize `ForkedLightClientObj` instead of separately first
setting the `kind` (initializing everything to zero) and then assigning
the forky data after that.
Cleanup of `ProveField` warnings in `slashing_protection_common` module.
Note that `ProveField` is disabled by default in makefile, but sometimes
these pop up when doing a regular `nim c`, and cleaning these may allow
enabling the warning in some future.
As the `case` is over `a.kind`, `ProveField` warnings for `b` have to be
suppressed using `{.push.}` / `{.pop.}` for comparison operators.
Spurious `ProveField` warning can be avoided by using `case` instead of
`if` on `metadata.genesis.kind`. Also suppress `GlobalVar` hints when
`incbinEnabled` is used, which has global `let` definitions.
`Eth2NetworkMetadata` has an `incompatible` case to hold an error string
in case the loaded file is not compatible with the compile-time config.
The same can be modeled with a `Result[Eth2NetworkMetadata, string]` and
avoids followup checks for the `incompatible` case.
We use file descriptors for validators and sockets and might run out of
either on high-validator setups - increasing the limit here is harmless
and avoids a common limiting factor in setup
Co-authored-by: Etan Kissling <etan@status.im>
For symmetry with `forkyState` when using `withState`, and to avoid
problems with shadowing of `blck` when using `withBlck` in `template`,
also rename the injected `blck` to `forkyBlck`.
- https://github.com/nim-lang/Nim/issues/22698
* implement EIP-7514 for Deneb: Add Max Epoch Churn Limit
Cap activations per epoch according to EIP-7514:
- https://eips.ethereum.org/EIPS/eip-7514
- https://github.com/ethereum/consensus-specs/pull/3499
* apply proposer boost to first block in case of equivocation
Implement spec changes to fork choice; this only affects equivocation
when multiple blocks are signed for the same slot. Regular operation
is not changed.
- https://github.com/ethereum/consensus-specs/pull/3352
* bump test vectors to v1.4.0-beta.2-hotfix
---------
Co-authored-by: tersec <tersec@users.noreply.github.com>
Currently, passing `0xc00000...` proof seems to pass `verifyProofs`.
Unsure why such a check is not necessary in spec, and also unsure
whether it is correct to reject proof at infinity, or if it could
occur, e.g., for a blob containing all 0 bytes. Weird overall...
* proper fix
We currently compute `justified_total_active_balance` inside
`calculateProposerBoost`, despite that sum already being known
in the `EpochRef` cache. Tracking `justified_total_active_balance`
whenever the justified checkpoint updates allows replacing the
repeated computation with a lookup, at minimal memory cost.
Currently we always request duties for current and next sync period.
As sync periods are quite long (~27 hrs on Mainnet), having access to
the duties so early doesn't help too much. To avoid running into errors
when the BN does not have the duties available around period boundary,
delay requesting them until the current period is close to finish.
`SYNC_COMMITTEE_SUBNET_COUNT` epochs are what the spec says should be
the lookahead timing of starting to subscribe to sync committee gossip.
Reusing the constant here for consistency.
This fixes these warning messages in the first slot of a new period.
```
rocketpool_validator | WRN 2023-09-07 20:19:35.439+00:00 Beacon node has incompatible configuration reason="Epoch value is far from the future;400;getSyncCommitteeDuties(first);invalid-request" node=http://eth2:5052[Nimbus/v23.8.0-872b19-stateofus] node_index=0 node_roles=AGBSDT
rocketpool_validator | WRN 2023-09-07 20:19:35.440+00:00 Unable to get sync committee duties period=889 epoch=227584 reason="Epoch value is far from the future;400;getSyncCommitteeDuties(first);invalid-request" service=duties_service
rocketpool_validator | NOT 2023-09-07 20:19:35.441+00:00 Beacon node is in sync head_slot=7274495 sync_distance=1 is_optimistic=false node=http://eth2:5052[Nimbus/v23.8.0-872b19-stateofus] node_index=0 node_roles=AGBSDT
```
Sync committee duties are performed by the sync committee as determined
by slot + 1. We did it correctly for individual messages, but selected
the incorrect participants for aggregate contributions for the very
first slot per period (roughly 1 per ~27 hrs on Mainnet). The faulty
participants selection code was originally introduced in #2925.
To allow testing https://github.com/ethereum/consensus-specs/issues/3466
add support for selecting fork choice version at launch. This means we
can deploy a different logic when `DENEB_FORK_EPOCH != FAR_FUTURE_EPOCH`
that won't be used on Mainnet.
In `Slot end`, sync duties are reported one slot too late,
because sync committee duties are determined by the next slot
instead of the current one. Address that by taking account of this.
* Add metadata for the Holesky network
* Add copyright banner to the new Nim module
* Working version
* Bump Chronos to fix downloading from Github
* Add checksum check of the downloaded file
* Clean up debugging code and obsolete imports
* optimize epoch transition via get_flag_index_deltas() and get_inactivity_penalty_deltas()
* directly mutate Altair info in {get,update}_flag_and_inactivity_deltas
The `is_next_sync_committee_known` helper allocates a fresh
`SyncCommittee` in the caller and `nimZeroMem`s it on each use.
Use `static` to compare against a compile-time zeroed copy instead.
This also should help reduce stack size far enough to link with Nim 2.0.
Nim 2.0 gets confused when compiling `all_tests`:
```
Error: undeclared identifier: 'maxLen'
candidates (edit distance, scope distance); see '--spellSuggest':
(3, 7): 'Table'
(3, 7): 'len'
(3, 7): 'max'
```
Renaming the generic parameter `U` to `maxLen` fixes this somehow.
It also increases readability to use the same name consistently.
In Nim 2.0, attempting to use `Taskpool.spawn` inside `{.async.}` `proc`
leads to `Error: cannot generate destructor for generic type: Isolated`.
Add an intermediate wrapper `proc` that performs the `spawn` operation
to workaround the problem.
From old interop tests, a mock `eth1BlockHash` was defined in `base`.
To avoid accidental use by Nimbus, move to `tests` and rename it to
`mockEth1BlockHash`.
This PR renames the existing `validator_duties` to `beacon_validators`
and in doing so, names validators running inside the beacon node process
"beacon validators" while those running the VC can be referred to as
"client validators" to disambiguate the two.
The existing `validator_duties` instead takes on a new responsibility:
as a home for logic shared between beacon and client validators - ie
code that provides consistency in implementation and behavior between
the two modes of operation.
Not only does this simplify reasoning about where to put code -it also
reduces the number of dependencies the validator client has from ~5000
to ~3000 modules (!) according to `nim genDepend` significantly reducing
compile times.
- Remove unnecessary `Defect` references
- Remove spurious `SerializationError` references
- Remove duplicate `writeValue` template in `keystore.nim`;
same implementation already exists a bit further above in same file.
Add separate log topic for `block_processor` messages.
Topic named similar to the other `_processor` modules:
- `eth2_processor` --> `gossip_eth2`
- `light_client_processor` --> `gossip_lc`
- `optimistic_processor` --> `gossip_opt`
In #4465, a regression was introduced by deleting the periodic call to
`engine_exchangeTransitionConfiguration` in the Nimbus light client.
This led to the light client never treating the EL as online and,
subsequently, not sending `engine_newPayload` requests for new blocks.
Strangely, `engine_forkchoiceUpdated` requests still make it through :)
Geth still requires both `engine_newPayload` and `fcU` to be called.
By restoring the `exchangeTransitionConfiguration` loop, `newPayload`
requests are once more issued to the EL.
When a block is introduced to the system both via REST and gossip at the
same time, we will call `storeBlock` from two locations leading to a
dupliace check race condition as we wait for the EL.
This issue may manifest in particular when using an external block
builder that itself publishes the block onto the gossip network.
* refactor enqueue flow
* simplify calling `addBlock`
* complete request manager verifier future for blobless blocks
* re-verify parent conditions before adding block
among other things, it might have gone stale or finalized between one
call and the other