Commit Graph

98 Commits

Author SHA1 Message Date
Etan Kissling 035ca015e6
continue validator duties if chain does not progress for a long time (#6101)
Nimbus currently stops performing validator duties if the blockchain
does not progress for `node.config.syncHorizon` slots. This means that
the chain won't recover because no new blocks are proposed. To fix that,
continue performing validator duties if no progress is registered for a
long time, and none of our peers is indicating any progress.
2024-03-20 03:23:53 +01:00
Etan Kissling f54fa083b4
fix EIP-7044 implementation when using batch verification (#5953)
In #5120, EIP-7044 support got added to the state transition function to
force `CAPELLA_FORK_VERSION` to be used when validiting `VoluntaryExit`
messages, irrespective of their `epoch`.

In #5637, similar logic was added when batch verifying BLS signatures,
which is used during gossip validation (libp2p gossipsub, and req/resp).
However, that logic did not match the one introduced in #5120, and only
uses `CAPELLA_FORK_VERSION` when a `VoluntaryExit`'s `epoch` was set to
a value `>= CAPELLA_FORK_EPOCH`. Otherwise, `BELLATRIX_FORK_VERSION`
would still be used when validating `VoluntaryExit`, e.g., with `epoch`
set to `0`, as is the case in this Holesky block:

- https://holesky.beaconcha.in/slot/1076985#voluntary-exits

Extracting the correct logic from #5120 into a function, and reusing it
when verifying BLS signatures fixes this issue, and also leverages the
exhaustive EF test suite that covers the (correct) #5120 logic.

This fix only affects networks that have EIP-7044 applied (post-Deneb).

Without the fix, Deneb blocks with a `VoluntaryExit` with `epoch` set to
`< CAPELLA_FORK_EPOCH` incorrectly fail to validate despite being valid.

Incorrect blocks that contain a malicious `VoluntaryExit` with `epoch`
set to `< CAPELLA_FORK_EPOCH` and signed using `BELLATRIX_FORK_VERSION`
_would_ pass the BLS verification stage, but subsequently fail the state
transition logic. Such blocks would still correctly be labeled invalid.
2024-02-25 15:25:26 +01:00
tersec a4680cb7fa
refactor addHeadBlock() to research/ and tests/ helper (#5874)
* refactor addHeadBlock() to research/ and tests/ helper

* rm now-dead code
2024-02-09 23:46:51 +00:00
Etan Kissling 7c53841cd8
Revert "Revert "fix checkpoint block potentially not getting backfilled into DB (#5863)" (#5871)" (#5875)
This reverts commit 1575478b72.
2024-02-09 20:44:54 +01:00
tersec 1575478b72
Revert "fix checkpoint block potentially not getting backfilled into DB (#5863)" (#5871)
This reverts commit 65e6f892de.
2024-02-09 12:49:07 +00:00
Etan Kissling 65e6f892de
fix checkpoint block potentially not getting backfilled into DB (#5863)
When using checkpoint sync, only checkpoint state is available, block is
not downloaded and backfilled later.

`dag.backfill` tracks latest filled `slot`, and latest `parent_root` for
which no block has been synced yet.

In checkpoint sync, this assumption is broken, because there, the start
`dag.backfill.slot` is set based on checkpoint state slot, and the block
is also not available.

However, sync manager in backward mode also requests `dag.backfill.slot`
and `block_clearance` then backfills the checkpoint block once it is
synced. But, there is no guarantee that a peer ever sends us that block.
They could send us all parent blocks and solely omit the checkpoint
block itself. In that situation, we would accept the parent blocks and
advance `dag.backfill`, and subsequently never request the checkpoint
block again, resulting in gap inside blocks DB that is never filled.

To mitigate that, the assumption is restored that `dag.backfill.slot`
is the latest filled `slot`, and `dag.backfill.parent_root` is the next
block that needs to be synced. By setting `slot` to `tail.slot + 1` and
`parent_root` to `tail.root`, we put a fake summary into `dag.backfill`
so that `block_clearance` only proceeds once checkpoint block exists.
2024-02-09 11:20:36 +01:00
tersec cf1bec7670
update some deprecated stew/results to results imports (#5743) 2024-01-16 22:37:14 +00:00
Jacek Sieka 62cbdeefc5
verify `genesis_time` more strictly (fixes #1667) (#5694)
Bogus values lead to crashes down the line when timers overflow
2024-01-06 15:26:56 +01:00
tersec 6a9d522705
Apply EIP-7044 to block signature batch verification (#5637) 2023-12-01 14:44:45 +00:00
tersec 115ffa70eb
rm unused code (#5623) 2023-11-25 12:09:18 +00:00
tersec 152dd74179
propagate newPayload-VALID to block ancestors (#5343) 2023-08-23 19:56:35 +00:00
Jacek Sieka 49729e1ef3
prevent concurrent `storeBlock` calls (fixes #5285) (#5295)
When a block is introduced to the system both via REST and gossip at the
same time, we will call `storeBlock` from two locations leading to a
dupliace check race condition as we wait for the EL.

This issue may manifest in particular when using an external block
builder that itself publishes the block onto the gossip network.

* refactor enqueue flow
* simplify calling `addBlock`
* complete request manager verifier future for blobless blocks
* re-verify parent conditions before adding block

among other things, it might have gone stale or finalized between one
call and the other
2023-08-17 15:12:37 +02:00
Jacek Sieka a2adbf809f
Perform block pre-check before validating execution (#5169)
* Perform block pre-check before validating execution

When syncing, blocks have not been gossip-validated and are therefore
prone to trivial faults like being known-unviable, duplicate or missing
their parent.

In addition, the duplicate-block check in BlockProcessor was not
considering the quarantine flow and would therefore cause
recently-quarantined blocks to be silenty dropped when their parent
appears delaying the sync end-game and thus causing longer startup
resync time.

This PR verifies trivial conditions before performing execution
validation thus avoiding duplicates and missing parents alike.

It also ensures that the fast-sync EL mode is used for finalized blocks
even if the EL is timing out / slow to respond - this allows the CL to
complete its sync faster and switch to "normal" lock-step at the head of
the chain more quickly, thus also allowing the EL to access the latest
consensensus information earlier.

* oops

* remove unused constant
2023-07-11 18:55:51 +02:00
tersec cd087b9a43
replace `optimisticRoots` table with field in `BlockRef` (#4969)
* replace optimisticRoots table with field in BlockRef

* copyright year

* mark finalized blocks as verified on load

* Update beacon_chain/consensus_object_pools/block_dag.nim

Co-authored-by: Etan Kissling <etan@status.im>

* expand non-optimistic block checking to all pre-merge blocks; refactor markBlockVerified to use BlockRef rather than block root and remove superfluous caller in newPayload path replaced by addResolvedHeadBlock BlockRef construction

* don't treat finalized block specially; VALID status is sticky

---------

Co-authored-by: Etan Kissling <etan@status.im>
2023-05-20 12:18:51 +00:00
Jacek Sieka 09a69c7b07
better batch sig verification failure message 2023-05-12 08:18:13 +02:00
Etan Kissling e6e4ba9de6
clean up redundant tests and config (#4836)
The consensus-spec-tests already cover the scenarios of our custom test
runner, so the custom tests can be removed. Also cleans up unused config
flags and related unreachable logic.
2023-04-18 21:26:36 +02:00
Etan Kissling ad118cd354
rename `stateFork` > `consensusFork` (#4718)
Just the variable, not yet `lcDataForkAtStateFork` / `atStateFork`.

- Shorten comment in `light_client.nim` to keep line width
- Do not rename `stateFork` mention in `runProposalForkchoiceUpdated`.
- Do not rename `stateFork` in `getStateField(dag.headState, fork)`

Rest is just a mechanical mass replace
2023-03-11 00:35:52 +00:00
tersec 3b41e6a0e7
rename ConsensusFork.EIP4844 to ConsensusFork.Deneb (#4692) 2023-03-04 13:35:39 +00:00
tersec ea060de6d4
more eip4844 -> deneb module references (#4690) 2023-03-02 21:09:24 +01:00
tersec a382498cfe
batch-verify BLS to execution change messages (#4637) 2023-02-17 13:35:12 +00:00
tersec 0fb726c420
`BeaconStateFork/BeaconBlockFork` -> `ConsensusFork` (#4560)
* `BeaconStateFork/BeaconBlockFork` -> `ConsensusFork`

* revert unrelated change

* revert unrelated changes

* update test summaries
2023-01-28 19:53:41 +00:00
tersec aacc8d702d
remove Nim 1.2-compatible `push raise`s and update copyright notice years (#4528) 2023-01-20 14:14:37 +00:00
Jacek Sieka 75c7195bfd
Backfill only up to MIN_EPOCHS_FOR_BLOCK_REQUESTS blocks (#4421)
When backfilling, we only need to download blocks that are newer than
MIN_EPOCHS_FOR_BLOCK_REQUESTS - the rest cannot reliably be fetched from
the network and does not have to be provided to others.

This change affects only trusted-node-synced clients - genesis sync
continues to work as before (because it needs to construct a state by
building it from genesis).

Those wishing to complete a backfill should do so with era files
instead.
2022-12-23 08:42:55 +01:00
tersec dee5af58d6
eip4844 light client tests; avoid case object out-of-bound array reads (#4404) 2022-12-08 17:21:53 +01:00
tersec 2932d3b808
extent `BeaconStateFork` enum (#4396) 2022-12-07 16:47:23 +00:00
tersec ec443601eb
implement capellaImplementationMissing points; don't track not-active validator duties (#4340)
* implement several capellaImplementationMissing points

* don't register validator activity for not-active validators

* don't check validator indices already coming out of committees which exist; must be active validators, or else other deeper bugs
2022-11-22 13:56:05 +02:00
Etan Kissling 48994f67d3
rename `BlockError` -> `VerifierError` (#4310)
We currently use `BlockError` for both beacon blocks and LC objects.
In light of EIP4844, we will likely also use it for blob sidecars.
To avoid confusion, renaming it to a more generic `VerifierError`,
and update its documentation to be more generic.

To avoid long lines as a followup, also renaming the `block_processor`'s
`BlockProcessingCompleted.completed`->`ProcessingStatus.completed` and
`BlockProcessingCompleted.notCompleted`->`ProcessingStatus.notCompleted`
2022-11-10 17:40:27 +00:00
Jacek Sieka 09ade6d33d
Make trusted node sync era-aware (#4283)
This PR removes a bunch of code to make TNS aware of era files, avoiding
a duplicated backfill when era files are available.

* reuse chaindag for loading backfill state, replacing the TNS homebrew
* fix era block iteration to skip empty slots
* add tests for `can_advance_slots`
2022-11-10 10:44:47 +00:00
tersec 5b46f0b723
add Capella support to Forked* (#4276)
* add Capella support to Forked*

* remove cruft

* add `OnForkyBlockAdded`
2022-11-02 16:23:30 +00:00
Jacek Sieka 819442acc3
Allow chain dag without genesis / block (#4230)
* Allow chain dag without genesis / block

This PR enables the initialization of the dag without access to blocks
or genesis state - it is a prerequisite for implementing a number of
interesting features:

* checkpoint sync without any block download
* pruning of blocks and states

* backfill checkpoint block
2022-10-14 22:40:10 +03:00
Etan Kissling 6069003a1f
fix check for attaching to pre-finalized parent (#4161)
When the BN's head is reorged while shut down, reloading the BN will not
assign `BlockRef` to alternate branches. However, blocks from other
branches are still present in the database, leading to their descendants
incorrectly marked as `UnviableFork`. By restricting the check to blocks
that have been finalized, they should be reported as `MissingParent`
instead, eventually re-assigning a `BlockRef` to them.
2022-09-22 18:33:26 +00:00
tersec 19bf460a3b
more `withState` `state` -> `forkyState` (#4104) 2022-09-10 08:12:07 +02:00
tersec b60456fdf3
`withState`: `state` -> `forkyState` (#4038) 2022-08-26 22:47:40 +00:00
Jacek Sieka 0d9fd54857
cache shuffling separately from other EpochRef data (fixes #2677) (#3990)
In order to avoid full replays when validating attestations hailing from
untaken forks, it's better to keep shufflings separate from `EpochRef`
and perform a lookahead on the shuffling when processing the block that
determines them.

This also helps performance in the case where REST clients are trying to
perform lookahead on attestation duties and decreases memory usage by
sharing shufflings between EpochRef instances of the same dependent
root.
2022-08-18 21:07:01 +03:00
Miran dfd4afc9f2
compatibility with Nim 1.4+ (#3888) 2022-07-29 10:53:42 +00:00
Etan Kissling 2a2bcea70d
group justified and finalized `Checkpoint` (#3841)
The justified and finalized `Checkpoint` are frequently passed around
together. This introduces a new `FinalityCheckpoint` data structure that
combines them into one.

Due to the large usage of this structure in fork choice, also took this
opportunity to update fork choice tests to the latest v1.2.0-rc.1 spec.
Many additional tests enabled, some need more work, e.g. EL mock blocks.
Also implemented `discard_equivocations` which was skipped in #3661,
and improved code reuse across fork choice logic while at it.
2022-07-06 13:33:02 +03:00
tersec 1221bb66e8
optimistic sync (#3793)
* optimistic sync

* flag that initially loaded blocks from database might need execution block root filled in

* return optimistic status in REST calls

* refactor blockslot pruning

* ensure beacon_blocks_by_{root,range} do not provide optimistic blocks

* handle forkchoice head being pre-merge with block being postmerge

* re-enable blocking head updates on validator duties

* fix is_optimistic_candidate_block per spec; don't crash with nil future

* fix is_optimistic_candidate_block per spec; don't crash with nil future

* mark blocks sans execution payloads valid during head update
2022-07-04 23:35:33 +03:00
Jacek Sieka f31f52e24a
fix missing frontfill index (fixes #3658) (#3675)
* fix key load duration log
* log broken frontfill block root
2022-05-31 10:09:01 +02:00
tersec 61ba308e13
stylecheck fixes (#3593) 2022-04-14 17:39:37 +02:00
Jacek Sieka f70ff38b53
enable `styleCheck:usages` (#3573)
Some upstream repos still need fixes, but this gets us close enough that
style hints can be enabled by default.

In general, "canonical" spellings are preferred even if they violate
nep-1 - this applies in particular to spec-related stuff like
`genesis_validators_root` which appears throughout the codebase.
2022-04-08 16:22:49 +00:00
Jacek Sieka bc80ac3be1
harden REST API `atSlot` against non-finalized blocks (#3538)
* harden validator API against pre-finalized slot requests
* check `syncHorizon` when responding to validator api requests too far
from `head`
* limit state-id based requests to one epoch ahead of `head`
* put historic data bounds on block/attestation/etc validator production API, preventing them from being used with already-finalized slots
* add validator block smoke tests
* make rest test create a new genesis with the tests running roughly in
the first epoch to allow testing a few more boundary conditions
2022-03-23 12:42:16 +01:00
Jacek Sieka 4207b127f9
era: load blocks and states (#3394)
* era: load blocks and states

Era files contain finalized history and can be thought of as an
alternative source for block and state data that allows clients to avoid
syncing this information from the P2P network - the P2P network is then
used to "top up" the client with the most recent data. They can be
freely shared in the community via whatever means (http, torrent, etc)
and serve as a permanent cold store of consensus data (and, after the
merge, execution data) for history buffs and bean counters alike.

This PR gently introduces support for loading blocks and states in two
cases: block requests from rest/p2p and frontfilling when doing
checkpoint sync.

The era files are used as a secondary source if the information is not
found in the database - compared to the database, there are a few key
differences:

* the database stores the block indexed by block root while the era file
indexes by slot - the former is used only in rest, while the latter is
used both by p2p and rest.
* when loading blocks from era files, the root is no longer trivially
available - if it is needed, it must either be computed (slow) or cached
(messy) - the good news is that for p2p requests, it is not needed
* in era files, "framed" snappy encoding is used while in the database
we store unframed snappy - for p2p2 requests, the latter requires
recompression while the former could avoid it
* front-filling is the process of using era files to replace backfilling
- in theory this front-filling could happen from any block and
front-fills with gaps could also be entertained, but our backfilling
algorithm cannot take advantage of this because there's no (simple) way
to tell it to "skip" a range.
* front-filling, as implemented, is a bit slow (10s to load mainnet): we
load the full BeaconState for every era to grab the roots of the blocks
- it would be better to partially load the state - as such, it would
also be good to be able to partially decompress snappy blobs
* lookups from REST via root are served by first looking up a block
summary in the database, then using the slot to load the block data from
the era file - however, there needs to be an option to create the
summary table from era files to fully support historical queries

To test this, `ncli_db` has an era file exporter: the files it creates
should be placed in an `era` folder next to `db` in the data directory.
What's interesting in particular about this setup is that `db` remains
as the source of truth for security purposes - it stores the latest
synced head root which in turn determines where a node "starts" its
consensus participation - the era directory however can be freely shared
between nodes / people without any (significant) security implications,
assuming the era files are consistent / not broken.

There's lots of future improvements to be had:

* we can drop the in-memory `BlockRef` index almost entirely - at this
point, resident memory usage of Nimbus should drop to a cool 500-600 mb
* we could serve era files via REST trivially: this would drop backfill
times to whatever time it takes to download the files - unlike the
current implementation that downloads block by block, downloading an era
at a time almost entirely cuts out request overhead
* we can "reasonably" recreate detailed state history from almost any
point in time, turning an O(slot) process into O(1) effectively - we'll
still need caches and indices to do this with sufficient efficiency for
the rest api, but at least it cuts the whole process down to minutes
instead of hours, for arbitrary points in time

* CI: ignore failures with Nim-1.6 (temporary)

* test fixes

Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
2022-03-23 09:58:17 +01:00
Etan Kissling fd1ffd62dd
update light client server for DAG failure modes (#3514)
Gracefully handles the new failure modes recently introduced to the DAG
as part of https://github.com/status-im/nimbus-eth2/pull/3513
Data that is deemed to exist but fails to load leads to an error log to
avoid suppressing logic errors accidentally. In `verifyFinalization`
mode, the assertions remain active.
2022-03-20 11:58:59 +01:00
Jacek Sieka 05ffe7b2bf
Prune `BlockRef` on finalization (#3513)
Up til now, the block dag has been using `BlockRef`, a structure adapted
for a full DAG, to represent all of chain history. This is a correct and
simple design, but does not exploit the linearity of the chain once
parts of it finalize.

By pruning the in-memory `BlockRef` structure at finalization, we save,
at the time of writing, a cool ~250mb (or 25%:ish) chunk of memory
landing us at a steady state of ~750mb normal memory usage for a
validating node.

Above all though, we prevent memory usage from growing proportionally
with the length of the chain, something that would not be sustainable
over time -  instead, the steady state memory usage is roughly
determined by the validator set size which grows much more slowly. With
these changes, the core should remain sustainable memory-wise post-merge
all the way to withdrawals (when the validator set is expected to grow).

In-memory indices are still used for the "hot" unfinalized portion of
the chain - this ensure that consensus performance remains unchanged.

What changes is that for historical access, we use a db-based linear
slot index which is cache-and-disk-friendly, keeping the cost for
accessing historical data at a similar level as before, achieving the
savings at no percievable cost to functionality or performance.

A nice collateral benefit is the almost-instant startup since we no
longer load any large indicies at dag init.

The cost of this functionality instead can be found in the complexity of
having to deal with two ways of traversing the chain - by `BlockRef` and
by slot.

* use `BlockId` instead of `BlockRef` where finalized / historical data
may be required
* simplify clearance pre-advancement
* remove dag.finalizedBlocks (~50:ish mb)
* remove `getBlockAtSlot` - use `getBlockIdAtSlot` instead
* `parent` and `atSlot` for `BlockId` now require a `ChainDAGRef`
instance, unlike `BlockRef` traversal
* prune `BlockRef` parents on finality (~200:ish mb)
* speed up ChainDAG init by not loading finalized history index
* mess up light client server error handling - this need revisiting :)
2022-03-17 17:42:56 +00:00
Jacek Sieka c64bf045f3
remove StateData (#3507)
One more step on the journey to reduce `BlockRef` usage across the
codebase - this one gets rid of `StateData` whose job was to keep track
of which block was last assigned to a state - these duties have now been
taken over by `latest_block_root`, a fairly recent addition that
computes this block root from state data (at a small cost that should be
insignificant)

99% mechanical change.
2022-03-16 08:20:40 +01:00
Jacek Sieka a3bd01b58d
move dependent root computations to `BeaconState` / `EpochRef` (#3478)
* fewer deps on `BlockRef` traversal in anticipation of pruning
* allows identifying EpochRef:s by their shuffling as a first step of
* tighten error handling around missing blocks

using the zero hash for signalling "missing block" is fragile and easy
to miss - with checkpoint sync now, and pruning in the future, missing
blocks become "normal".
2022-03-15 09:24:55 +01:00
Etan Kissling ae408c279a
add option to collect light client data (#3474)
Light clients require full nodes to serve additional data so that they
can stay in sync with the network. This patch adds a new launch option
`--import-light-client-data` to configure what data to make available.
For now, data is only kept in memory; it is not persisted at this time.
Note that data is only locally collected, a separate patch is needed to
actually make it availble over the network. `--serve-light-client-data`
will be used for serving data, but is not functional yet outside tests.
2022-03-11 21:28:10 +01:00
Jacek Sieka 12ed537f75
catch wrong-fork-blocks earlier (#3444)
Can't apply a phase0 block to a later phase state and vice versa.

Since instantiation has been a topic, pre/post c file size:

```
424K	@mspec@sstate_transition.nim.c
892K	@mspec@sstate_transition_block.nim.c
```

```
288K	@mspec@sstate_transition.nim.c
880K	@mspec@sstate_transition_block.nim.c
```
2022-02-28 12:58:34 +00:00
Jacek Sieka 40a4c01086
chaindag: don't keep backfill block table in memory (#3429)
This PR names and documents the concept of the archive: a range of slots
for which we have degraded functionality in terms of historical access -
in particular:

* we don't support rewinding to states in this range
* we don't keep an in-memory representation of the block dag

The archive de-facto exists in a trusted-node-synced node, but this PR
gives it a name and drops the in-memory digest index.

In order to satisfy `GetBlocksByRange` requests, we ensure that we have
blocks for the entire archive period via backfill. Future versions may
relax this further, adding a "pre-archive" period that is fully pruned.

During by-slot searches in the archive (both for libp2p and rest
requests), an extra database lookup is used to covert the given `slot`
to a `root` - future versions will avoid this using era files which
natively are indexed by `slot`. That said, the lookup is quite
fast compared to the actual block loading given how trivial the table
is - it's hard to measure, even.

A collateral benefit of this PR is that checkpoint-synced nodes will see
100-200MB memory usage savings, thanks to the dropped in-memory cache -
future pruning work will bring this benefit to full nodes as well.

* document chaindag storage architecture and assumptions
* look up parent using block id instead of full block in clearance
(future-proofing the code against a future in which blocks come from era
files)
* simplify finalized block init, always writing the backfill portion to
db at startup (to ensure lookups work as expected)
* preallocate some extra memory for finalized blocks, to avoid immediate
realloc
2022-02-26 19:16:19 +01:00
Jacek Sieka d583e8e4ac
Store finalized block roots in database (3s startup) (#3320)
* Store finalized block roots in database (3s startup)

When the chain has finalized a checkpoint, the history from that point
onwards becomes linear - this is exploited in `.era` files to allow
constant-time by-slot lookups.

In the database, we can do the same by storing finalized block roots in
a simple sparse table indexed by slot, bringing the two representations
closer to each other in terms of conceptual layout and performance.

Doing so has a number of interesting effects:

* mainnet startup time is improved 3-5x (3s on my laptop)
* the _first_ startup might take slightly longer as the new index is
being built - ~10s on the same laptop
* we no longer rely on the beacon block summaries to load the full dag -
this is a lot faster because we no longer have to look up each block by
parent root
* a collateral benefit is that we no longer need to load the full
summaries table into memory - we get the RSS benefits of #3164 without
the CPU hit.

Other random stuff:

* simplify forky block generics
* fix withManyWrites multiple evaluation
* fix validator key cache not being updated properly in chaindag
read-only mode
* drop pre-altair summaries from `kvstore`
* recreate missing summaries from altair+ blocks as well (in case
database has lost some to an involuntary restart)
* print database startup timings in chaindag load log
* avoid allocating superfluos state at startup
* use a recursive sql query to load the summaries of the unfinalized
blocks
2022-01-30 18:51:04 +02:00