Commit Graph

69 Commits

Author SHA1 Message Date
tersec 50f5754e3c
exists{Dir,File} -> {dir,file}Exists; rm unused imports (#3543) 2022-03-24 00:38:48 +00:00
Jacek Sieka 05ffe7b2bf
Prune `BlockRef` on finalization (#3513)
Up til now, the block dag has been using `BlockRef`, a structure adapted
for a full DAG, to represent all of chain history. This is a correct and
simple design, but does not exploit the linearity of the chain once
parts of it finalize.

By pruning the in-memory `BlockRef` structure at finalization, we save,
at the time of writing, a cool ~250mb (or 25%:ish) chunk of memory
landing us at a steady state of ~750mb normal memory usage for a
validating node.

Above all though, we prevent memory usage from growing proportionally
with the length of the chain, something that would not be sustainable
over time -  instead, the steady state memory usage is roughly
determined by the validator set size which grows much more slowly. With
these changes, the core should remain sustainable memory-wise post-merge
all the way to withdrawals (when the validator set is expected to grow).

In-memory indices are still used for the "hot" unfinalized portion of
the chain - this ensure that consensus performance remains unchanged.

What changes is that for historical access, we use a db-based linear
slot index which is cache-and-disk-friendly, keeping the cost for
accessing historical data at a similar level as before, achieving the
savings at no percievable cost to functionality or performance.

A nice collateral benefit is the almost-instant startup since we no
longer load any large indicies at dag init.

The cost of this functionality instead can be found in the complexity of
having to deal with two ways of traversing the chain - by `BlockRef` and
by slot.

* use `BlockId` instead of `BlockRef` where finalized / historical data
may be required
* simplify clearance pre-advancement
* remove dag.finalizedBlocks (~50:ish mb)
* remove `getBlockAtSlot` - use `getBlockIdAtSlot` instead
* `parent` and `atSlot` for `BlockId` now require a `ChainDAGRef`
instance, unlike `BlockRef` traversal
* prune `BlockRef` parents on finality (~200:ish mb)
* speed up ChainDAG init by not loading finalized history index
* mess up light client server error handling - this need revisiting :)
2022-03-17 17:42:56 +00:00
Jacek Sieka c64bf045f3
remove StateData (#3507)
One more step on the journey to reduce `BlockRef` usage across the
codebase - this one gets rid of `StateData` whose job was to keep track
of which block was last assigned to a state - these duties have now been
taken over by `latest_block_root`, a fairly recent addition that
computes this block root from state data (at a small cost that should be
insignificant)

99% mechanical change.
2022-03-16 08:20:40 +01:00
Etan Kissling 29e5a4a752
error and progress codes for light client sync (#3490)
When syncing as a light client, different behaviour is needed to handle
the various ways how errors may occur. The existing logic for blocks can
also be applied to light client objects:
- `Invalid`: Malformed object that is clearly an error by its producer.
- `MissingParent`: More data is needed to decide applicability.
- `UnviableFork`: Object may be valid but will never apply on this fork.
- `Duplicate`: No errors were encountered but the object was not useful.
2022-03-14 10:25:54 +01:00
Etan Kissling 5a3ba5d968
update to pre-release light client sync protocol (#3465)
This adopts the spec sections of the pre-release proposal of the libp2p
based light client sync protocol, and also adds a test runner for the
new accompanying tests. While the release version of the light client
sync protocol contains conflicting definitions, it is currently unused,
and the code specific to the pre-release proposal is marked as such.
See https://github.com/ethereum/consensus-specs/pull/2802
2022-03-08 13:21:56 +01:00
Etan Kissling a84ab5d47f
validate `fork_version` as light client (#3459)
The spec does not provide code for validating the `fork_version` field
of `LightClientUpdate`. However, we can use our own logic for additional
validation of that field. The spec's python test suite sets up states
that do not follow the fork schedule (e.g., that use Altair fork version
before Altair fork epoch), which complicates upstreaming this as code.
2022-03-04 17:09:33 +01:00
Etan Kissling 47d7814518
update light client to v1.1.10 spec (#3457)
Adopts the changes introduced in the v1.1.10 ETH consensus-specs:
- Introduces `is_finality_update` helper
- Ensures `optimistic_header` always >= `finalized_header`
- Updates spec references
2022-03-03 14:03:08 +01:00
Etan Kissling 3b20d57277
use next slot when signing for light client tests (#3447)
In practice, the sync committee signs `LightClientUpdate` instances at
the next slot following the block. This is not correctly reflected in
the tests, where it is signed one slot early. This patch updates the
tests to use the correct slot for the computation.
2022-03-02 11:46:17 +01:00
tersec f0ada15dac
automated CL spec ref URL updates from v1.1.9 to v1.1.10 (#3455) 2022-03-02 10:00:21 +00:00
Etan Kissling 0e34c6023e
cleanup light client sync tests (#3445)
Various cleanups in the light client sync test suite without semantic
impact to make the various tests more streamlined.
2022-02-28 20:58:32 +01:00
tersec 7de3f00f35
generic putCorruptState; {Merge=>Bellatrix}BeaconStateNoImmutableValidators (#3427) 2022-02-21 12:55:56 +01:00
tersec 84588b34da
var => let in specs/ and tests/ (#3425) 2022-02-20 20:13:06 +00:00
Etan Kissling 9790c4958b
converter function for reducing blocks to headers (#3410)
This introduces a function to convert `SignedBeaconBlock` to just their
`BeaconBlockHeader` and updates the usages for reduced code duplication.
2022-02-18 21:35:52 +01:00
tersec 5eecb9a21f
rename no{R=>r}eturn, no{I=>i}init, short{l=>L}og, E{T=>t}h2Node, Beacon{c=>C}hainDB (#3403) 2022-02-16 23:24:44 +01:00
tersec d358299875
fork choice proposer boosting support (#3349)
* fork choice proposer boosting support

* detect nodeDelta underflow/overflow
2022-02-04 12:59:40 +01:00
tersec 8e6a920bf4
rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH (#3350)
* rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH

* fix REST test rules
2022-02-02 14:06:55 +01:00
tersec 0c814f49ee
rename sync_{committee_,}aggregate and execute_payload -> notify_new_payload (#3347) 2022-02-01 07:31:53 +00:00
tersec c9aa1bee01
spec URL updates (#3342) 2022-01-31 09:56:59 +00:00
tersec 89ffa8a1a7
spec URL & copyright year update (#3338) 2022-01-29 01:05:39 +00:00
tersec 60bf5b8bf4
use v1.1.9 test vectors (#3337) 2022-01-28 22:47:48 +00:00
Zahary Karadjov 49b7daa39d [ncli_db] bugfix: take into account finalization delay in reward calc post Altair
This fixes a problem affecting Prater's epoch 64444.
2022-01-28 12:03:23 +02:00
tersec 351c2fd48a
rename mergeData to bellatrixData and mergeFork to bellatrixFork (#3315) 2022-01-24 16:23:13 +00:00
Jacek Sieka 61342c2449
limit by-root requests to non-finalized blocks (#3293)
* limit by-root requests to non-finalized blocks

Presently, we keep a mapping from block root to `BlockRef` in memory -
this has simplified reasoning about the dag, but is not sustainable with
the chain growing.

We can distinguish between two cases where by-root access is useful:

* unfinalized blocks - this is where the beacon chain is operating
generally, by validating incoming data as interesting for future fork
choice decisions - bounded by the length of the unfinalized period
* finalized blocks - historical access in the REST API etc - no bounds,
really

In this PR, we limit the by-root block index to the first use case:
finalized chain data can more efficiently be addressed by slot number.

Future work includes:

* limiting the `BlockRef` horizon in general - each instance is 40
bytes+overhead which adds up - this needs further refactoring to deal
with the tail vs state problem
* persisting the finalized slot-to-hash index - this one also keeps
growing unbounded (albeit slowly)

Anyway, this PR easily shaves ~128mb of memory usage at the time of
writing.

* No longer honor `BeaconBlocksByRoot` requests outside of the
non-finalized period - previously, Nimbus would generously return any
block through this libp2p request - per the spec, finalized blocks
should be fetched via `BeaconBlocksByRange` instead.
* return `Opt[BlockRef]` instead of `nil` when blocks can't be found -
this becomes a lot more common now and thus deserves more attention
* `dag.blocks` -> `dag.forkBlocks` - this index only carries unfinalized
blocks from now - `finalizedBlocks` covers the other `BlockRef`
instances
* in backfill, verify that the last backfilled block leads back to
genesis, or panic
* add backfill timings to log
* fix missing check that `BlockRef` block can be fetched with
`getForkedBlock` reliably
* shortcut doppelganger check when feature is not enabled
* in REST/JSON-RPC, fetch blocks without involving `BlockRef`

* fix dag.blocks ref
2022-01-21 13:33:16 +02:00
Jacek Sieka 836f6984bb
move `state_transition` to `Result` (#3284)
* better error messages in api
* avoid `BlockData` copies when replaying blocks
2022-01-17 12:19:58 +01:00
tersec 14aab2c13f
update 10 modules from using merge to bellatrix (#3272) 2022-01-12 15:50:30 +01:00
Jacek Sieka 805e85e1ff
time: spring cleaning (#3262)
Time in the beacon chain is expressed relative to the genesis time -
this PR creates a `beacon_time` module that collects helpers and
utilities for dealing the time units - the new module does not deal with
actual wall time (that's remains in `beacon_clock`).

Collecting the time related stuff in one place makes it easier to find,
avoids some circular imports and allows more easily identifying the code
actually needs wall time to operate.

* move genesis-time-related functionality into `spec/beacon_time`
* avoid using `chronos.Duration` for time differences - it does not
support negative values (such as when something happens earlier than it
should)
* saturate conversions between `FAR_FUTURE_XXX`, so as to avoid
overflows
* fix delay reporting in validator client so it uses the expected
deadline of the slot, not "closest wall slot"
* simplify looping over the slots of an epoch
* `compute_start_slot_at_epoch` -> `start_slot`
* `compute_epoch_at_slot` -> `epoch`

A follow-up PR will (likely) introduce saturating arithmetic for the
time units - this is merely code moves, renames and fixing of small
bugs.
2022-01-11 11:01:54 +01:00
tersec ae61512ee9
rename upgrade_to_{merge,bellatrix}; detect unchanging spec YAMLs (#3265) 2022-01-10 09:39:43 +00:00
Jacek Sieka 20e700fae4
Harden CommitteeIndex, SubnetId, SyncSubcommitteeIndex (#3259)
* Harden CommitteeIndex, SubnetId, SyncSubcommitteeIndex

Harden the use of `CommitteeIndex` et al to prevent future issues by
using a distinct type, then validating before use in several cases -
datatypes in spec are kept simple though so that invalid data still can
be read.

* fix invalid epoch used in REST
`/eth/v1/beacon/states/{state_id}/committees` committee length (could
return invalid data)
* normalize some variable names
* normalize committee index loops
* fix `RestAttesterDuty` to use `uint64` for `validator_committee_index`
* validate `CommitteeIndex` on ingress in REST API
* update rest rules with stricter parsing
* better REST serializers
* save lots of memory by not using `zip` ...at least a few bytes!
2022-01-09 01:28:49 +02:00
tersec bac0eaa92e
update 10 modules from using merge to bellatrix (#3257) 2022-01-07 18:10:40 +01:00
Jacek Sieka 0a4728a241
Handle access to historical data for which there is no state (#3217)
With checkpoint sync in particular, and state pruning in the future,
loading states or state-dependent data may fail. This PR adjusts the
code to allow this to be handled gracefully.

In particular, the new availability assumption is that states are always
available for the finalized checkpoint and newer, but may fail for
anything older.

The `tail` remains the point where state loading de-facto fails, meaning
that between the tail and the finalized checkpoint, we can still get
historical data (but code should be prepared to handle this as an
error).

However, to harden the code against long replays, several operations
which are assumed to work only with non-final data (such as gossip
verification and validator duties) now limit their search horizon to
post-finalized data.

* harden several state-dependent operations by logging an error instead
of introducing a panic when state loading fails
* `withState` -> `withUpdatedState` to differentiate from the other
`withState`
* `updateStateData` can now fail if no state is found in database - it
is also hardened against excessively long replays
* `getEpochRef` can now fail when replay fails
* reject blocks with invalid target root - they would be ignored
previously
* fix recursion bug in `isProposed`
2022-01-05 19:38:04 +01:00
tersec 66c9b7fbce
shift block_sim fork epochs; allow VC to work with non-multiple-of-3 SECONDS_PER_SLOT (#3244) 2022-01-05 13:41:39 +00:00
tersec 7594fa660d
copyright year and spec URL updates (#3243) 2022-01-05 11:07:14 +00:00
tersec cd77377375
add Bellatrix fork and transition tests; "Ethereum Foundation" -> EF (#3242) 2022-01-05 09:42:56 +01:00
tersec b81c06edab
rename Beacon{Block,State}Fork.Merge to Bellatrix; update copyright years (#3240) 2022-01-04 09:45:38 +00:00
tersec d20387e910
update copyright years and spec URLs (#3239) 2022-01-04 06:08:19 +00:00
tersec da017d2ca5
update from phase0/altair v1.1.6 URLs to v1.1.8 spec URLs (#3238) 2022-01-04 03:57:15 +00:00
tersec 3c63a78c01
use v1.1.8 test vectors (#3236) 2022-01-03 17:43:00 +00:00
tersec 8be1699014
use v1.1.7 test vectors (#3231)
* use v1.1.7 test vectors
2022-01-03 13:06:14 +00:00
tersec 1a6a56bdb1
use BeaconTime instead of Slot in fork choice (#3138)
* use v1.1.6 test vectors; use BeaconTime instead of Slot in fork choice

* tick through every slot at least once

* use div INTERVALS_PER_SLOT and use precomputed constants of them

* use correct (even if numerically equal) constant
2021-12-21 18:56:08 +00:00
Jacek Sieka c270ec21e4
Validator monitoring (#2925)
Validator monitoring based on and mostly compatible with the
implementation in Lighthouse - tracks additional logs and metrics for
specified validators so as to stay on top on performance.

The implementation works more or less the following way:
* Validator pubkeys are singled out for monitoring - these can be
running on the node or not
* For every action that the validator takes, we record steps in the
process such as messages being seen on the network or published in the
API
* When the dust settles at the end of an epoch, we report the
information from one epoch before that, which coincides with the
balances being updated - this is a tradeoff between being correct
(waiting for finalization) and providing relevant information in a
timely manner)
2021-12-20 20:20:31 +01:00
tersec d7799ecdcc
v1.1.6 spec updates (#3206) 2021-12-17 06:56:33 +00:00
tersec f09686e835
update some spec URLs to v1.1.6 (#3188) 2021-12-13 15:45:48 +00:00
Jacek Sieka 03005f48e1
Backfill support for ChainDAG (#3171)
In the ChainDAG, 3 block pointers are kept: genesis, tail and head. This
PR adds one more block pointer: the backfill block which represents the
block that has been backfilled so far.

When doing a checkpoint sync, a random block is given as starting point
- this is the tail block, and we require that the tail block has a
corresponding state.

When backfilling, we end up with blocks without corresponding states,
hence we cannot use `tail` as a backfill pointer - there is no state.

Nonetheless, we need to keep track of where we are in the backfill
process between restarts, such that we can answer GetBeaconBlocksByRange
requests.

This PR adds the basic support for backfill handling - it needs to be
integrated with backfill sync, and the REST API needs to be adjusted to
take advantage of the new backfilled blocks when responding to certain
requests.

Future work will also enable moving the tail in either direction:
* pruning means moving the tail forward in time and removing states
* backwards means recreating past states from genesis, such that
intermediate states are recreated step by step all the way to the tail -
at that point, tail, genesis and backfill will match up.
* backfilling is done when backfill != genesis - later, this will be the
WSS checkpoint instead
2021-12-13 14:36:06 +01:00
Jacek Sieka 069bccd51b
batch-verify sync messages for a small perf boost (#3151)
* batch-verify sync messages for a small perf boost

Generally reuses the same structure as attestation and aggregate
verification

* normalize `signatures` and `signature_batch` to use the same pattern
of verification
* normalize parameter names, order etc for signature stuff in general
* avoid calling `blsSign` directly - instead, go through `signatures`
consistently
2021-12-09 14:56:54 +02:00
Jacek Sieka 1a8b7469e3
move quarantine outside of chaindag (#3124)
* move quarantine outside of chaindag

The quarantine has been part of the ChainDAG for the longest time, but
this design has a few issues:

* the function in which blocks are verified and added to the dag becomes
reentrant and therefore difficult to reason about - we're currently
using a stateful flag to work around it
* quarantined blocks bypass the processing queue leading to a processing
stampede
* the quarantine flow is unsuitable for orphaned attestations - these
should also should be quarantined eventually

Instead of processing the quarantine inside ChainDAG, this PR moves
re-queueing to `block_processor` which already is responsible for
dealing with follow-up work when a block is added to the dag

This sets the stage for keeping attestations in the quarantine as well.

Also:

* make `BlockError` `{.pure.}`
* avoid use of `ValidationResult` in block clearance (that's for gossip)
2021-12-06 10:49:01 +01:00
tersec a8c801eddd
fix Altair fork tests in minimal preset (#3163) 2021-12-06 05:56:46 +00:00
tersec 4378f3f096
almost all remaining ethereum/{eth2.0-specs -> consensus-specs} (#3158) 2021-12-03 20:01:13 +00:00
tersec cc51f3fd12
v1.1.{5 -> 6} phase 0 and altair spec URL updates (#3157) 2021-12-03 17:40:23 +00:00
tersec 61fb458f89
use v1.1.6 test vectors (#3146) 2021-12-01 12:55:42 +00:00
Mamy Ratsimbazafy 97da6e1365
Fork choice EF consensus tests (#3041)
* add EF fork choice tests to CI

* checkpoints

* compilation fixes and add test to preset dependent suite

* support longpaths on Windows CI

* skip minimal tests (long paths issue + impl detals tested)

* fix stackoverflow on some platforms

* rebase on top of https://github.com/status-im/nimbus-eth2/pull/3054

* fix stack usage
2021-11-25 19:41:39 +01:00