Commit Graph

3922 Commits

Author SHA1 Message Date
Jacek Sieka 1ef7d237cc
Shared validator pubkey (#5883)
This PR allows sharing the pubkey data between validators by using a
thread-local cache for pubkey data, netting about a 400mb mem usage
reduction on holesky due to us keeping 3 permanent + several ephemeral
state copies in memory at all times and each state copy holding a full
validator.

The PR also introduces a hash cache for the key which gives ~14% speedup
for a full state `hash_tree_root` - the key makes up for a large part of
the `Validator` htr time.

Finally, the time it takes to copy a state goes down as well from ~80m
ms to ~60, for reasons similar to htr.

We use a `ptr` even if a `ref` could in theory have been used - there is
not much practical benefit to a `ref` (given it's mutable) while a `ptr`
is cheaper and easier to copy (when copying temporary states).

We could go further and cache a cooked pubkey but it turns out this is
quite intrusive - in all the relevant places, we're already using a
cooked key from the immutable validator data so there are no immediate
performance gains of doing so while managing the compressed -> cooked
key mapping would become more difficult - something for a future PR
perhaps.

Co-authored-by: Etan Kissling <etan@status.im>
2024-02-21 20:06:19 +01:00
Etan Kissling 88045a91cd
rename new timing metrics, as `_total` suffix is implicit (#5917)
* track latest duration instead of total in new timing metrics

Change `db_checkpoint_seconds` and `state_replay_seconds` metrics to
record the latest duration instead of the total. `nim-metrics` already
synthesizes a `_total` metric from these implicitly.

* still have to use inc, metrics only synthesizes the name not the sum

* prefix with `beacon_dag`
2024-02-20 20:34:41 +01:00
tersec ffbc8d1466
refactor epoch state transition to facilitate individual validator balance change calculations (#5910) 2024-02-20 05:14:52 +00:00
Jacek Sieka 8d465a7d8c
vmon: Missed block metric (#5913)
Validator monitoring gained 2 new metrics for tracking when blocks are
included or not on the head chain.

Similar to attestations, if the block is produced in epoch N, reporting
will use the state when switching to epoch N+2 to do the reporting (so
as to reasonably stabilise the block inclusion in the face of reorgs).
2024-02-20 06:40:18 +02:00
tersec 87ae60f780
search for validator indices backwards while processing deposits (#5914) 2024-02-20 06:34:57 +02:00
Zahary Karadjov 7fe43fc204
Version v24.2.1 2024-02-20 05:49:56 +02:00
tersec 28f69ccc0a
add Prater/Goerli deprecation notice (#5898) 2024-02-19 10:09:39 +00:00
Etan Kissling 92197ce690
add metric for database checkpoint duration (#5897)
Database checkpointing can take seconds, e.g., while Geth is syncing.
Add a debug log + metric for it, and also info log if it takes longer
than 250ms, same as for the existing `State replayed` log. If the log
shows up for a user while the system is not overloaded, it may point
to slow disk speed or thermal issue.
2024-02-19 11:00:11 +01:00
Etan Kissling e04e95167d
avoid `read`/`readError` in favor of `value`/`error` (#5904)
In VC logic, bump 3 remaining uses of `readError`/`read` to use
`error`/`value` instead. The surrounding logic guarantees success.
2024-02-19 10:52:35 +01:00
Etan Kissling 4fc1550d0f
add `{.push raises: [].}` to recently modified files (#5908)
Status Nim style mandates `{.push raises: []}.` at start of modules.
Ensure that's the case so that exceptions are properly tracked.

- https://status-im.github.io/nim-style-guide/errors.exceptions.html
- https://github.com/status-im/nim-eth/pull/614#discussion_r1220906149
2024-02-18 01:16:49 +00:00
Etan Kissling 30b7c6153f
handle `Exception` during `EraFile.verify` (#5900)
`Taskpool.new()` is marked as `{.raises: [Exception].}`. Catch this.
2024-02-17 18:19:30 +01:00
Jacek Sieka b5089ebf70
log elmanager timeouts (#5895)
Also:

* remove some unused metrics
* simplify execution payload fetching flow
2024-02-17 10:15:02 +01:00
tersec e410fe0052
https://github.com/ethereum/consensus-specs/pull/3600 (#5896) 2024-02-17 09:02:50 +00:00
tersec ea29e0afc8
use 1.4.0-beta.7-hotfix consensus spec test vectors (#5894) 2024-02-16 04:49:18 +00:00
tersec 52c538fb3c
stop calling exchangeTransitionConfiguration (#5889) 2024-02-14 10:01:08 +00:00
Etan Kissling 81b849a2eb
bump `gnosis-chain-configs` to `b02e5dd0bc61f123fa28d027cf95d47ebe2ae05d` (#5885)
- Schedule deneb
2024-02-13 12:07:22 +01:00
Jacek Sieka afdfe302f3
state loading optimizations (#5881)
* compute post-merge randao mix without loading state
* avoid copying state on shuffling computation and compute epochref
* speed up state copy for block production
2024-02-12 15:58:55 +01:00
tersec 8240c1bf34
use decimal representations of engine and builder bid values (#5879) 2024-02-10 05:13:00 +01:00
tersec 134774e00d
ensure reason field logging consistently uses string type (#5878) 2024-02-10 03:50:31 +01:00
tersec a4680cb7fa
refactor addHeadBlock() to research/ and tests/ helper (#5874)
* refactor addHeadBlock() to research/ and tests/ helper

* rm now-dead code
2024-02-09 23:46:51 +00:00
Etan Kissling 9593ef74b8
do not cache zero block hash if block unavailable (#5865)
With checkpoint sync, the checkpoint block is typically unavailable at
the start, and only backfilled later. To avoid treating it as having
zero hash, execution disabled in some contexts, wrap the result of
`loadExecutionBlockHash` in `Opt` and handle block hash being unknown.

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2024-02-09 22:10:38 +00:00
Etan Kissling 7c53841cd8
Revert "Revert "fix checkpoint block potentially not getting backfilled into DB (#5863)" (#5871)" (#5875)
This reverts commit 1575478b72.
2024-02-09 20:44:54 +01:00
Etan Kissling f2d92729a2
reduce verbosity of `Got request for pre-backfill slot` (#5876)
When syncing, we log a notice each time someone asks us for a block that
we haven't backfilled yet. This is quite verbose and not unexpected,
because the status message does not allow indicating backfill progress.
2024-02-09 20:32:31 +01:00
tersec 1575478b72
Revert "fix checkpoint block potentially not getting backfilled into DB (#5863)" (#5871)
This reverts commit 65e6f892de.
2024-02-09 12:49:07 +00:00
Etan Kissling 65e6f892de
fix checkpoint block potentially not getting backfilled into DB (#5863)
When using checkpoint sync, only checkpoint state is available, block is
not downloaded and backfilled later.

`dag.backfill` tracks latest filled `slot`, and latest `parent_root` for
which no block has been synced yet.

In checkpoint sync, this assumption is broken, because there, the start
`dag.backfill.slot` is set based on checkpoint state slot, and the block
is also not available.

However, sync manager in backward mode also requests `dag.backfill.slot`
and `block_clearance` then backfills the checkpoint block once it is
synced. But, there is no guarantee that a peer ever sends us that block.
They could send us all parent blocks and solely omit the checkpoint
block itself. In that situation, we would accept the parent blocks and
advance `dag.backfill`, and subsequently never request the checkpoint
block again, resulting in gap inside blocks DB that is never filled.

To mitigate that, the assumption is restored that `dag.backfill.slot`
is the latest filled `slot`, and `dag.backfill.parent_root` is the next
block that needs to be synced. By setting `slot` to `tail.slot + 1` and
`parent_root` to `tail.root`, we put a fake summary into `dag.backfill`
so that `block_clearance` only proceeds once checkpoint block exists.
2024-02-09 11:20:36 +01:00
Etan Kissling 4266e16835
allow `getBlockIdAtSlot` to answer queries from available states (#5869)
After checkpoint sync, historical block IDs cannot yet be queried.
However, they are needed to compute dependent roots of `ShufflingRef`.
To allow lookup, enable `getBlockIdAtSlot` to answer from compatible
states in memory; as long as they descend from the finalized checkpoint
and the requested slot is sufficiently recent, `block_roots` contains
everything to recover `BlockSlotId` up to `SLOTS_PER_HISTORICAL_ROOT`.
This is similar to how `attester_dependent_root` etc. are computed.

This accelerates the first couple minutes of checkpoint sync on Mainnet,
especially the time until finality advances past the synced checkpoint.
2024-02-09 11:13:00 +01:00
tersec 642774e596
unrevert rest of https://github.com/status-im/nimbus-eth2/pull/5765 (#5867)
* unrevert rest of https://github.com/status-im/nimbus-eth2/pull/5765

* rm stray e2store docs changes

* reduce diff

* fix indent

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2024-02-09 09:35:41 +01:00
Kim De Mey dca444bea7
Split era specific code from e2s specific code (#5866) 2024-02-09 08:59:36 +01:00
Etan Kissling a746063a61
bump `eth2-networks` to `934c948e69205dcf2deb87e4ae6cc140c335f94d` (#5868)
- Schedule Deneb for Mainnet
2024-02-08 19:18:35 +00:00
Etan Kissling e398078abc
`...ExecutionPayloadHash` --> `...ExecutionBlockHash` (#5864)
Finish the rename started in #4809 to have a consistent naming.
`ExecutionPayloadHash` suggests hash over payload instead of block.
`BlockHash` is also the canonical name in engine API.
2024-02-08 01:24:55 +01:00
Eugene Kabanov 464ff68658
Address issues #5675 and #5681. (#5846) 2024-02-07 19:51:36 +00:00
Etan Kissling ed8743b986
fix standalone compilation of `trusted_node_sync.nim` (#5861)
#5544 contained a regression that broke standalone compilation of
`trusted_node_sync` as a main module. Fix it, and add to CI.
2024-02-07 19:26:29 +00:00
Etan Kissling 94ba0a9bd1
consider block availability when initializing LC data collector (#5860)
When using checkpoint sync, the initial block is missing in the DB.
Update the LC data collector initialization to account for that,
avoiding a spurious error message when it is incorrectly accessed:

```
ERR 2024-02-07 11:21:55.416+01:00 Block failed to load unexpectedly          topics="chaindag_lc" bid=d30517a7:8257504 tail=8257504
```

Also fixes a regression from #5691 that resulted in similar messages
while importing the first few blocks after checkpoint sync.

Thanks to @arnetheduck for reporting this.
2024-02-07 18:03:19 +00:00
Jacek Sieka 9aabca6a64
Clean up debug/heads v2 types (#5859) 2024-02-07 17:51:12 +01:00
Etan Kissling b7026a683a
avoid marking blocks as unviable if `blobless` quarantine is full (#5858)
Full caches should not be used to mark blocks as unviable. The unviable
status is quite persistent and a block marked as such won't be processed
again once the cache empties. Problem originally introduced in #4808.
2024-02-07 13:38:20 +00:00
Jacek Sieka 47704bde14
raises for beacon validators & router (#5826)
Changes here are more significant because of some good old tech debt in
block production which has grown quite hairy - the reduction in
exception handling at least provides some steps in the right direction.
2024-02-07 12:26:04 +01:00
Etan Kissling 94a65c2a9e
log `extra_data` instead of `extra_data_len` for `ExecutionPayload` (#5851)
Add more details to execution payload logs, reusing the same facilities
that we already use for `GraffitiBytes`.
2024-02-07 10:09:25 +01:00
Etan Kissling 3ac043212c
set `topic` for `eth1_chain` logs (#5854)
`eth1_chain` no longer logs with `topics` since #5768, making it hard
to filter messages from this module. Re-add the `topics`, and also fix
outdated `topics` in `el_manager` (formerly `*_monitor`).
2024-02-07 09:44:32 +01:00
Etan Kissling f0f14f10d3
fix compilation with `-d:has_deposit_root_checks` (#5855)
Since #4465, compilation with `-d:has_deposit_root_checks` fails. #4707
further built on top of it but the additions also don't compile. Fix it.
2024-02-06 23:03:52 +01:00
Etan Kissling 41403022bb
prevent accidentally hashing `BeaconState`/`BeaconBlock` in Deneb (#5852)
Extend protection against accidentally calling computationally expensive
functions when a cache is available to Deneb, as done for earlier forks.
2024-02-06 19:57:53 +01:00
Eugene Kabanov 21efe7e060
VC: Use produceBlockV3 when its available. (#5842)
* Initial commit.

* Add helper functions and publishBlock() implementations.

* Address review comments.
2024-02-02 15:24:40 +00:00
Zahary Karadjov 742f151f68
Version v24.2.0 2024-02-02 02:05:56 +02:00
tersec 8b261dd3e0
fix blob_sidecar SSE versioned_hash field to be 0x-prefixed hex (#5844) 2024-01-31 04:50:24 +01:00
tersec 87052eba4e
implement getBlindedBlock REST API (#5829) 2024-01-31 03:18:55 +00:00
tersec 45b4b46041
use "reason" instead of "error"/"validatorError" to log gossip ignore/reject reasons (#5839) 2024-01-31 03:18:20 +00:00
tersec 0638741f8b
halve validator registration chunk size (#5837) 2024-01-29 14:09:09 +01:00
tersec 3d7f634e70
unrevert more of https://github.com/status-im/nimbus-eth2/pull/5765 (#5834) 2024-01-29 08:35:16 +01:00
tersec 225ef5e69a
partially revert https://github.com/status-im/nimbus-eth2/pull/5765 (#5833) 2024-01-28 23:45:52 +01:00
Etan Kissling 61cb7fafdf
clear `BrokenClock` status if Nimbus extensions no longer supported (#5827)
When BN clock is out of sync, VC sets BN status to `BrokenClock`. It is
only reset to `Offline` after restoring time sync. However, if VC fails
encounters an error while checking time, Nimbus extensions are assumed
to be unavailable and the BN is no longer checked for having a synced
clock. This means it is never reset back to `Offline` if errors start
occurring _after_ BN is already set to `BrokenClock`. This could be
because BN is changed from Nimbus to an alternative implementation,
or due to intermittent connection issues.

Ensure that BN status is reset back to `Offline` when Nimbus extensions
are disabled to ensure eventual connection recovery.
2024-01-25 11:52:25 +01:00
tersec 128834a8eb
use `RestPlainResponse` to improve builder API rerror reporting (#5819) 2024-01-24 23:27:22 +00:00