Commit Graph

225 Commits

Author SHA1 Message Date
Jacek Sieka b00eac7a50
dag: protect against finalized epoch moving before BlockRef cutoff (#3847) 2022-07-07 14:24:31 +00:00
Etan Kissling 2a2bcea70d
group justified and finalized `Checkpoint` (#3841)
The justified and finalized `Checkpoint` are frequently passed around
together. This introduces a new `FinalityCheckpoint` data structure that
combines them into one.

Due to the large usage of this structure in fork choice, also took this
opportunity to update fork choice tests to the latest v1.2.0-rc.1 spec.
Many additional tests enabled, some need more work, e.g. EL mock blocks.
Also implemented `discard_equivocations` which was skipped in #3661,
and improved code reuse across fork choice logic while at it.
2022-07-06 13:33:02 +03:00
Etan Kissling aff53e962f
merge LC db into main BN db (#3832)
* merge LC db into main BN db

To treat derived LC data similar to derived state caches, merge it into
the main beacon node DB.

* shorten table names, group with lc prefix
2022-07-04 23:46:32 +03:00
tersec 1221bb66e8
optimistic sync (#3793)
* optimistic sync

* flag that initially loaded blocks from database might need execution block root filled in

* return optimistic status in REST calls

* refactor blockslot pruning

* ensure beacon_blocks_by_{root,range} do not provide optimistic blocks

* handle forkchoice head being pre-merge with block being postmerge

* re-enable blocking head updates on validator duties

* fix is_optimistic_candidate_block per spec; don't crash with nil future

* fix is_optimistic_candidate_block per spec; don't crash with nil future

* mark blocks sans execution payloads valid during head update
2022-07-04 23:35:33 +03:00
Etan Kissling 499abd927f
persist LC data across restarts (#3823)
* persist LC data across restarts

With the Altair spec `LightClientUpdate` structure taking its final form
it is finally possible to persist LC data across restarts without having
to worry about data migration due to spec changes. A separate `lcdataV1`
database is created in the `caches` subdirectory to hold known LC data.
A full database with default settings (129 periods) uses <15 MB disk.

* extend LC data DB rationale

* wording

* add `isSupportedBySQLite` helper and explicit return

* remove redundant `return`
2022-06-30 13:04:39 +00:00
Etan Kissling bc1cc8f643
encapsulate LC config into one type (#3817)
Separate LC initialization options from the main ChainDAGRef options to
allow ChainDAGRef to treat them as opaque and reduce risk for conflicts
when extending those options in the future.
2022-06-28 22:52:29 +02:00
Eugene Kabanov d1581a2d8c
Fix proper timing check for bellatrix epoch. (#3807) 2022-06-28 10:21:16 +00:00
Etan Kissling 91d543440a
add option to configure max historic LC data periods (#3799)
Adds a `--light-client-data-max-periods` option to override the number
of sync committee periods to retain light client data.
Raising it above the default enables archive nodes to serve full data.
Lowering below the default speeds up import times (still no persistence)
2022-06-27 13:24:38 +02:00
Etan Kissling aa1b8e4a17
bump nim-ssz-serialization to `3db6cc0f282708aca6c290914488edd832971d61` (#3119)
This updates `nim-ssz-serialization` to
`3db6cc0f282708aca6c290914488edd832971d61`.

Notable changes:
- Use `uint64` for `GeneralizedIndex`
- Add support for building merkle multiproofs
2022-06-26 19:33:06 +02:00
Etan Kissling 2e98c7722f
encapsulate LC data variables into single structure (#3777)
Combines the LC data configuration options (serve / importMode), the
callbacks (finality / optimistic LC update) as well as the cache storing
light client data, into a new `LightClientDataStore` structure.
Also moves the structure into a light client specific file.
2022-06-24 16:57:50 +02:00
Eugene Kabanov eb6b7affee
Add the `execution_optimistic` flag to REST API responses. (#3780)
* Initial commit

* Make `events` API spec compliant.

* Add `Eth-Consensus-Version` in responses.

* Bump chronos to get redirect with headers working.

* Add `is_optimistic` field and handling to syncing RestSyncInfo.
2022-06-20 08:53:39 +03:00
tersec 8eb5d5de09
use ZERO_HASH for default(Eth2Digest)/Eth2Digest() in func calls (#3770) 2022-06-18 04:57:37 +00:00
Etan Kissling ac7393b8ac
remove unused `withStateVars` template (#3738)
Removes the `withStateVars` template that was not used meaningfully.
2022-06-16 11:46:35 +02:00
Etan Kissling 0c00b85782
cleanup LC data helpers (#3746)
Use more general `lowSlot` in LC data helpers, and avoid using
`earliestSlot` variable name as that one has a different meaning.
2022-06-14 22:02:03 +02:00
Etan Kissling cba041ddfa
fix LC data import for Altair fork period (#3744)
The initial sync committee period follows a different finality rule than
the other ones. Instead of next sync committee finalizing as soon as the
`finalizedHead.slot >= period.start_slot` have to use Altair start slot.
2022-06-14 17:31:10 +02:00
Etan Kissling 52ba4f7999
rename light client config parameters (#3740)
For consistency with other options, use a common prefix for light client
data configuration options.

* `--serve-light-client-data` --> `--light-client-data-serve`
* `--import-light-client-data` --> `--light-client-data-import-mode`

No deprecation of the old identifiers as they were only sparingly used
and all usage can be easily updated without interferance.
2022-06-14 12:03:39 +03:00
tersec 65cecc50ca
cleanups: unused and duplicate imports, inconsistent naming conventions, URL updates (#3724) 2022-06-09 14:30:13 +00:00
Etan Kissling 72a46bd520
integrate light client into beacon node (#3557)
Adds a `LightClient` instance to the beacon node as preparation to
accelerate syncing in the future (optimistic sync).

- `--light-client-enable` turns on the feature
- `--light-client-trusted-block-root` configures block to start from

If no block root is configured, light client tracks DAG `finalizedHead`.
2022-06-07 19:01:11 +02:00
tersec 38737549ac
refactor fork consistency checking and gate compilation on it (#3704) 2022-06-04 19:15:15 +00:00
Dustin Brody 21200f4a64 fix false-positive in overlap between default {CAPELLA,SHARDING}_FORK_VERIONs 2022-06-04 14:52:03 +00:00
tersec faf4d4a001
initial Capella support in RuntimeConfig (#3698) 2022-06-03 14:42:40 +00:00
tersec ea113fc420
disallow non-(genesis, far-future) equal transition epochs (#3691) 2022-06-03 09:37:03 +00:00
tersec f929980bf3
update 20 CL spec ref URLs (#3677) 2022-05-31 11:15:31 +00:00
Etan Kissling 01efa93cf6
add light client (standalone) (#3653)
Introduces a new library for syncing using libp2p based light client
sync protocol, and adds a new `nimbus_light_client` executable that uses
this library for syncing. The new executable emits log messages when
new beacon block headers are received, and is integrated into testing.
2022-05-31 12:45:37 +02:00
Jacek Sieka f31f52e24a
fix missing frontfill index (fixes #3658) (#3675)
* fix key load duration log
* log broken frontfill block root
2022-05-31 10:09:01 +02:00
tersec 01534b0431
🐼 (#3670)
* 🐼

* rm panda refs outside core module; preprocess text/ANSI artwork sources

* credit artwork to beatscribe
2022-05-30 08:25:27 +00:00
tersec b3d603f364
more CL spec URL updates to v1.2.0-rc.1 (#3657) 2022-05-24 08:26:35 +00:00
Etan Kissling c808f17a37
update to latest light client libp2p protocol (#3623)
Incorporates the latest changes to the light client sync protocol based
on Devconnect AMS feedback. Note that this breaks compatibility with the
previous prototype, due to changes to data structures and endpoints.
See https://github.com/ethereum/consensus-specs/pull/2802
2022-05-23 14:02:54 +02:00
Jacek Sieka 011e0ca02f
era file verification (#3605)
* era file verification

Implement and document era file verification

* era file states now come with block applied for easier verification
* clarify conflicting version handling
* document verification requirements
* remove count from name, use start-era, end-root to discover range

* remove obsolete todo

* abstract out block root loading
2022-05-10 03:28:46 +03:00
Zahary Karadjov def69e2a06
Revert "More sparse state snapshots in the Gnosis network"
This reverts commit 557717b517.
2022-04-11 13:56:42 +03:00
Zahary Karadjov ac4e7723ea
Fix the build 2022-04-10 23:10:40 +03:00
Zahary Karadjov 557717b517
More sparse state snapshots in the Gnosis network 2022-04-09 18:07:36 +03:00
Jacek Sieka f70ff38b53
enable `styleCheck:usages` (#3573)
Some upstream repos still need fixes, but this gets us close enough that
style hints can be enabled by default.

In general, "canonical" spellings are preferred even if they violate
nep-1 - this applies in particular to spec-related stuff like
`genesis_validators_root` which appears throughout the codebase.
2022-04-08 16:22:49 +00:00
Jacek Sieka 5092fc41c7
use snappy-framed format for compressing bellatrix+ database entries (#3551)
`.era` files and Req/Resp protocols use framed formats - aligning the
database with these makes for less recompression work overall as gossip
is sent only once while req/resp repeats (potentially) - this also
allows efficient pruning-to-era where snappy-recompression is the major
cycle thief.
2022-03-29 11:33:06 +00:00
tersec 9b43a76f2f
kiln beacon node (#3540)
* kiln bn

* use  version of beacon_chain_db

* have Eth1Monitor abstract more tightly over web3provider
2022-03-25 11:40:10 +00:00
Jacek Sieka e009728858
work around Nim assignment bug that breaks state pruning (#3545)
See https://github.com/nim-lang/Nim/issues/19613
2022-03-24 14:37:37 +00:00
Jacek Sieka 4207b127f9
era: load blocks and states (#3394)
* era: load blocks and states

Era files contain finalized history and can be thought of as an
alternative source for block and state data that allows clients to avoid
syncing this information from the P2P network - the P2P network is then
used to "top up" the client with the most recent data. They can be
freely shared in the community via whatever means (http, torrent, etc)
and serve as a permanent cold store of consensus data (and, after the
merge, execution data) for history buffs and bean counters alike.

This PR gently introduces support for loading blocks and states in two
cases: block requests from rest/p2p and frontfilling when doing
checkpoint sync.

The era files are used as a secondary source if the information is not
found in the database - compared to the database, there are a few key
differences:

* the database stores the block indexed by block root while the era file
indexes by slot - the former is used only in rest, while the latter is
used both by p2p and rest.
* when loading blocks from era files, the root is no longer trivially
available - if it is needed, it must either be computed (slow) or cached
(messy) - the good news is that for p2p requests, it is not needed
* in era files, "framed" snappy encoding is used while in the database
we store unframed snappy - for p2p2 requests, the latter requires
recompression while the former could avoid it
* front-filling is the process of using era files to replace backfilling
- in theory this front-filling could happen from any block and
front-fills with gaps could also be entertained, but our backfilling
algorithm cannot take advantage of this because there's no (simple) way
to tell it to "skip" a range.
* front-filling, as implemented, is a bit slow (10s to load mainnet): we
load the full BeaconState for every era to grab the roots of the blocks
- it would be better to partially load the state - as such, it would
also be good to be able to partially decompress snappy blobs
* lookups from REST via root are served by first looking up a block
summary in the database, then using the slot to load the block data from
the era file - however, there needs to be an option to create the
summary table from era files to fully support historical queries

To test this, `ncli_db` has an era file exporter: the files it creates
should be placed in an `era` folder next to `db` in the data directory.
What's interesting in particular about this setup is that `db` remains
as the source of truth for security purposes - it stores the latest
synced head root which in turn determines where a node "starts" its
consensus participation - the era directory however can be freely shared
between nodes / people without any (significant) security implications,
assuming the era files are consistent / not broken.

There's lots of future improvements to be had:

* we can drop the in-memory `BlockRef` index almost entirely - at this
point, resident memory usage of Nimbus should drop to a cool 500-600 mb
* we could serve era files via REST trivially: this would drop backfill
times to whatever time it takes to download the files - unlike the
current implementation that downloads block by block, downloading an era
at a time almost entirely cuts out request overhead
* we can "reasonably" recreate detailed state history from almost any
point in time, turning an O(slot) process into O(1) effectively - we'll
still need caches and indices to do this with sufficient efficiency for
the rest api, but at least it cuts the whole process down to minutes
instead of hours, for arbitrary points in time

* CI: ignore failures with Nim-1.6 (temporary)

* test fixes

Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
2022-03-23 09:58:17 +01:00
Jacek Sieka 361596c719
harden head update against missing parent (#3529)
in case BlockRef ends up in some lifetime leak

* fix duplicate head logging
2022-03-21 15:18:05 +01:00
Jacek Sieka 13fafe3a40
simplify unviable head pruning (#3528)
Also note bug that exists that potentially prevents states from being
pruned correctly
2022-03-21 09:20:26 +00:00
Etan Kissling 04b851f775
fix light client data pruning (#3523)
When eliminating orphaned forks, light client data about blocks was also
deleted when the orphaned fork was referring to a state several slots
after the block. Linking light client data pruning with block deletion
instead of state deletion fixes this problem. Light client data always
refers to blocks and their immediate post-state.
2022-03-20 10:09:43 +01:00
Jacek Sieka ea1acd7397
fix loading when finalized checkpoint slot is missing block (#3525)
ref loop would stop one block early in this case - trying to load
everything in one loop ends up being pretty confusing..

* simplify finalizedBlocks topup by splitting it from the head loop /
query
2022-03-19 11:02:17 +00:00
Jacek Sieka d0223d1f28
fix finalized epoch ref loading on checkpoint start (#3517)
regression from #3513 that did not take tail into consideration when
loading epoch ancestor
2022-03-18 13:13:57 +01:00
Jacek Sieka b3d80827fb
tns: checkpoint wal periodically while backfilling (#3516)
Witout this, we end up with a massive .wal file that needs to be
checkpointed on first startup (which takes a few minutes) - it's much
more efficient to do smaller checkpoints, it turns out.
2022-03-18 12:32:20 +01:00
Jacek Sieka 05ffe7b2bf
Prune `BlockRef` on finalization (#3513)
Up til now, the block dag has been using `BlockRef`, a structure adapted
for a full DAG, to represent all of chain history. This is a correct and
simple design, but does not exploit the linearity of the chain once
parts of it finalize.

By pruning the in-memory `BlockRef` structure at finalization, we save,
at the time of writing, a cool ~250mb (or 25%:ish) chunk of memory
landing us at a steady state of ~750mb normal memory usage for a
validating node.

Above all though, we prevent memory usage from growing proportionally
with the length of the chain, something that would not be sustainable
over time -  instead, the steady state memory usage is roughly
determined by the validator set size which grows much more slowly. With
these changes, the core should remain sustainable memory-wise post-merge
all the way to withdrawals (when the validator set is expected to grow).

In-memory indices are still used for the "hot" unfinalized portion of
the chain - this ensure that consensus performance remains unchanged.

What changes is that for historical access, we use a db-based linear
slot index which is cache-and-disk-friendly, keeping the cost for
accessing historical data at a similar level as before, achieving the
savings at no percievable cost to functionality or performance.

A nice collateral benefit is the almost-instant startup since we no
longer load any large indicies at dag init.

The cost of this functionality instead can be found in the complexity of
having to deal with two ways of traversing the chain - by `BlockRef` and
by slot.

* use `BlockId` instead of `BlockRef` where finalized / historical data
may be required
* simplify clearance pre-advancement
* remove dag.finalizedBlocks (~50:ish mb)
* remove `getBlockAtSlot` - use `getBlockIdAtSlot` instead
* `parent` and `atSlot` for `BlockId` now require a `ChainDAGRef`
instance, unlike `BlockRef` traversal
* prune `BlockRef` parents on finality (~200:ish mb)
* speed up ChainDAG init by not loading finalized history index
* mess up light client server error handling - this need revisiting :)
2022-03-17 17:42:56 +00:00
Jacek Sieka 8a63efc413
move `BlockId` to `spec` (#3511)
The spec implicitly talks about the slot of a block in several places,
and keeping it readily available is useful in a number of context -
might as well put this implicitly refereneced helper in the spec code
directly
2022-03-16 16:00:18 +01:00
Jacek Sieka c64bf045f3
remove StateData (#3507)
One more step on the journey to reduce `BlockRef` usage across the
codebase - this one gets rid of `StateData` whose job was to keep track
of which block was last assigned to a state - these duties have now been
taken over by `latest_block_root`, a fairly recent addition that
computes this block root from state data (at a small cost that should be
insignificant)

99% mechanical change.
2022-03-16 08:20:40 +01:00
Jacek Sieka a3bd01b58d
move dependent root computations to `BeaconState` / `EpochRef` (#3478)
* fewer deps on `BlockRef` traversal in anticipation of pruning
* allows identifying EpochRef:s by their shuffling as a first step of
* tighten error handling around missing blocks

using the zero hash for signalling "missing block" is fragile and easy
to miss - with checkpoint sync now, and pruning in the future, missing
blocks become "normal".
2022-03-15 09:24:55 +01:00
Etan Kissling ae408c279a
add option to collect light client data (#3474)
Light clients require full nodes to serve additional data so that they
can stay in sync with the network. This patch adds a new launch option
`--import-light-client-data` to configure what data to make available.
For now, data is only kept in memory; it is not persisted at this time.
Note that data is only locally collected, a separate patch is needed to
actually make it availble over the network. `--serve-light-client-data`
will be used for serving data, but is not functional yet outside tests.
2022-03-11 21:28:10 +01:00
Jacek Sieka d0183ccd77
Historical state reindex for trusted node sync (#3452)
When performing trusted node sync, historical access is limited to
states after the checkpoint.

Reindexing restores full historical access by replaying historical
blocks against the state and storing snapshots in the database.

The process can be initiated or resumed at any point in time.
2022-03-11 12:49:47 +00:00
Jacek Sieka 4363215a32
relax `BlockRef` database assumptions (#3472)
* remove `getForkedBlock(BlockRef)` which assumes block data exists but
doesn't support archive/backfilled blocks
* fix REST `/eth/v1/beacon/headers` request not returning
archive/backfilled blocks
* avoid re-encoding in REST block SSZ requests (using `getBlockSSZ`)
2022-03-11 13:08:17 +01:00
tersec f0ada15dac
automated CL spec ref URL updates from v1.1.9 to v1.1.10 (#3455) 2022-03-02 10:00:21 +00:00
Jacek Sieka 40a4c01086
chaindag: don't keep backfill block table in memory (#3429)
This PR names and documents the concept of the archive: a range of slots
for which we have degraded functionality in terms of historical access -
in particular:

* we don't support rewinding to states in this range
* we don't keep an in-memory representation of the block dag

The archive de-facto exists in a trusted-node-synced node, but this PR
gives it a name and drops the in-memory digest index.

In order to satisfy `GetBlocksByRange` requests, we ensure that we have
blocks for the entire archive period via backfill. Future versions may
relax this further, adding a "pre-archive" period that is fully pruned.

During by-slot searches in the archive (both for libp2p and rest
requests), an extra database lookup is used to covert the given `slot`
to a `root` - future versions will avoid this using era files which
natively are indexed by `slot`. That said, the lookup is quite
fast compared to the actual block loading given how trivial the table
is - it's hard to measure, even.

A collateral benefit of this PR is that checkpoint-synced nodes will see
100-200MB memory usage savings, thanks to the dropped in-memory cache -
future pruning work will bring this benefit to full nodes as well.

* document chaindag storage architecture and assumptions
* look up parent using block id instead of full block in clearance
(future-proofing the code against a future in which blocks come from era
files)
* simplify finalized block init, always writing the backfill portion to
db at startup (to ensure lookups work as expected)
* preallocate some extra memory for finalized blocks, to avoid immediate
realloc
2022-02-26 19:16:19 +01:00
Jacek Sieka adfe655b16
db: make block loading generic (#3413)
Streamline lookup with Forky and BeaconBlockFork (then we can do the
same for era)

We use type to avoid conditionals, as fork is often already known at a
"higher" level.

* load blockid before loading block by root - this is needed to map root
to slot and will eventually be done via block summary table for "old"
blocks

Co-authored-by: tersec <tersec@users.noreply.github.com>
2022-02-21 09:48:02 +01:00
tersec 79761c78a4
proc -> func, mainly in spec/state transition and adjecent modules (#3405) 2022-02-17 11:53:55 +00:00
tersec 5eecb9a21f
rename no{R=>r}eturn, no{I=>i}init, short{l=>L}og, E{T=>t}h2Node, Beacon{c=>C}hainDB (#3403) 2022-02-16 23:24:44 +01:00
Jacek Sieka 7db5647a6e
clean up / document init (#3387)
* clean up / document init

* drop `immutable_validators` data (pre-altair)
* document versions where data is first added
* avoid needlessly loading genesis block data on startup
* add a few more internal database consistency checks
* remove duplicate state root lookup on state load

* comment
2022-02-16 16:44:04 +01:00
tersec 873a8ec1e6
use isZeroMemory for Eth2Digest comparisons (#3386)
* use isZeroMemory for Eth2Digest comparisons

* use Eth2Digest.isZero abstraction
2022-02-14 05:26:19 +00:00
Jacek Sieka 40fe8f5336 fix missing backfill when restarting node
When node is restarted before backfill has started but after some blocks
have finalized with forward sync, we would not start the backfill.

* also clean up one last `SomeSome`
2022-02-11 23:08:50 +02:00
Zahary Karadjov 215caa21ae Eth1 monitor fixes
* Fix a resource leak introduced in https://github.com/status-im/nimbus-eth2/pull/3279

* Don't restart the Eth1 syncing proggress from scratch in case of
  monitor failures during Eth2 syncing.

* Switch to the primary operator as soon as it is back online.

* Log the web3 credentials in fewer places

Other changes:

The 'web3 test' command has been enhanced to obtain and print more
data regarding the selected provider.
2022-02-03 14:01:55 +02:00
tersec 8e6a920bf4
rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH (#3350)
* rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH

* fix REST test rules
2022-02-02 14:06:55 +01:00
Jacek Sieka ff4f2a6b6c
better log on finalized slot failure 2022-02-01 21:23:18 +01:00
Jacek Sieka 3df9ffca9f val-mon: remove redundant `_total` suffix from counters
It turns out nim-metrics adds this suffix on its own - it also turns out
some of the names are non-conventional and need follow-up.
2022-01-31 18:51:24 +02:00
Jacek Sieka ad327a8769
Fix counters in validator monitor totals mode (#3332)
The current counters set gauges etc to the value of the _last_ validator
to be processed - as the name of the feature implies, we should be using
sums instead.

* fix missing beacon state metrics on startup, pre-first-head-selection
* fix epoch metrics not being updated on cross-epoch reorg
2022-01-31 08:36:29 +01:00
Jacek Sieka d583e8e4ac
Store finalized block roots in database (3s startup) (#3320)
* Store finalized block roots in database (3s startup)

When the chain has finalized a checkpoint, the history from that point
onwards becomes linear - this is exploited in `.era` files to allow
constant-time by-slot lookups.

In the database, we can do the same by storing finalized block roots in
a simple sparse table indexed by slot, bringing the two representations
closer to each other in terms of conceptual layout and performance.

Doing so has a number of interesting effects:

* mainnet startup time is improved 3-5x (3s on my laptop)
* the _first_ startup might take slightly longer as the new index is
being built - ~10s on the same laptop
* we no longer rely on the beacon block summaries to load the full dag -
this is a lot faster because we no longer have to look up each block by
parent root
* a collateral benefit is that we no longer need to load the full
summaries table into memory - we get the RSS benefits of #3164 without
the CPU hit.

Other random stuff:

* simplify forky block generics
* fix withManyWrites multiple evaluation
* fix validator key cache not being updated properly in chaindag
read-only mode
* drop pre-altair summaries from `kvstore`
* recreate missing summaries from altair+ blocks as well (in case
database has lost some to an involuntary restart)
* print database startup timings in chaindag load log
* avoid allocating superfluos state at startup
* use a recursive sql query to load the summaries of the unfinalized
blocks
2022-01-30 18:51:04 +02:00
tersec 29e2169585
phase 0 & altair beacon chain and altair validator spec URL updates (#3339) 2022-01-29 13:53:31 +00:00
Jacek Sieka e264276b36
keep unviables in quarantine (#3331)
they remain unviable even after a reorg
2022-01-28 11:59:55 +01:00
Jacek Sieka d076e1a11b
ncli_db: import states and blocks from era file (#3313) 2022-01-25 09:28:26 +01:00
tersec 351c2fd48a
rename mergeData to bellatrixData and mergeFork to bellatrixFork (#3315) 2022-01-24 16:23:13 +00:00
Jacek Sieka 61342c2449
limit by-root requests to non-finalized blocks (#3293)
* limit by-root requests to non-finalized blocks

Presently, we keep a mapping from block root to `BlockRef` in memory -
this has simplified reasoning about the dag, but is not sustainable with
the chain growing.

We can distinguish between two cases where by-root access is useful:

* unfinalized blocks - this is where the beacon chain is operating
generally, by validating incoming data as interesting for future fork
choice decisions - bounded by the length of the unfinalized period
* finalized blocks - historical access in the REST API etc - no bounds,
really

In this PR, we limit the by-root block index to the first use case:
finalized chain data can more efficiently be addressed by slot number.

Future work includes:

* limiting the `BlockRef` horizon in general - each instance is 40
bytes+overhead which adds up - this needs further refactoring to deal
with the tail vs state problem
* persisting the finalized slot-to-hash index - this one also keeps
growing unbounded (albeit slowly)

Anyway, this PR easily shaves ~128mb of memory usage at the time of
writing.

* No longer honor `BeaconBlocksByRoot` requests outside of the
non-finalized period - previously, Nimbus would generously return any
block through this libp2p request - per the spec, finalized blocks
should be fetched via `BeaconBlocksByRange` instead.
* return `Opt[BlockRef]` instead of `nil` when blocks can't be found -
this becomes a lot more common now and thus deserves more attention
* `dag.blocks` -> `dag.forkBlocks` - this index only carries unfinalized
blocks from now - `finalizedBlocks` covers the other `BlockRef`
instances
* in backfill, verify that the last backfilled block leads back to
genesis, or panic
* add backfill timings to log
* fix missing check that `BlockRef` block can be fetched with
`getForkedBlock` reliably
* shortcut doppelganger check when feature is not enabled
* in REST/JSON-RPC, fetch blocks without involving `BlockRef`

* fix dag.blocks ref
2022-01-21 13:33:16 +02:00
Dustin Brody 9699858422 rename MERGE_FORK_VERSION to BELLATRIX_FORK_VERSION 2022-01-20 19:33:05 +02:00
Jacek Sieka 570379d3d9
Backfiller (#3263)
Backfilling is the process of downloading historical blocks via P2P that
are required to fulfill `GetBlocksByRange` duties - this happens during
both trusted node and finalized checkpoint syncs.

In particular, backfilling happens after syncing to head, such that
attestation work can start as soon as possible.

* Fix SyncQueue initialization procedure.
Remove usage of `awaitne`.
Add cancellation support.
Remove unneeded `sleepAsync()` if peer's head is older than needed.
Add `direction` field to all logs.
Fix syncmanager wedge issue.
Add proper resource cleaning procedure on backward sync finish.

Co-authored-by: cheatfate <eugene.kabanov@status.im>
2022-01-20 08:25:45 +01:00
tersec 9c0c9c98ce
complete switch to beacon_chain/specs/datatypes/bellatrix (#3295) 2022-01-18 13:36:52 +00:00
Zahary Karadjov 47f1f7ff1a More efficient reward data persistance; Address review comments
The new format is based on compressed CSV files in two channels:

* Detailed per-epoch data
* Aggregated "daily" summaries

The use of append-only CSV file speeds up significantly the epoch
processing speed during data generation. The use of compression
results in smaller storage requirements overall. The use of the
aggregated files has a very minor cost in both CPU and storage,
but leads to near interactive speed for report generation.

Other changes:

- Implemented support for graceful shut downs to avoid corrupting
  the saved files.

- Fixed a memory leak caused by lacking `StateCache` clean up on each
  iteration.

- Addressed review comments

- Moved the rewards and penalties calculation code in a separate module

Required invasive changes to existing modules:

- The `data` field of the `KeyedBlockRef` type is made public to be used
  by the validator rewards monitor's Chain DAG update procedure.

- The `getForkedBlock` procedure from the `blockchain_dag.nim` module
  is made public to be used by the validator rewards monitor's Chain DAG
  update procedure.
2022-01-18 01:56:56 +02:00
Zahary Karadjov 29aad0241b Precise per-component ETH-denominated rewards tracking
This is an alternative take on https://github.com/status-im/nimbus-eth2/pull/3107
that aims for more minimal interventions in the spec modules at the expense of
duplicating more of the spec logic in ncli_db.
2022-01-18 01:56:56 +02:00
Jacek Sieka 836f6984bb
move `state_transition` to `Result` (#3284)
* better error messages in api
* avoid `BlockData` copies when replaying blocks
2022-01-17 12:19:58 +01:00
Jacek Sieka 805e85e1ff
time: spring cleaning (#3262)
Time in the beacon chain is expressed relative to the genesis time -
this PR creates a `beacon_time` module that collects helpers and
utilities for dealing the time units - the new module does not deal with
actual wall time (that's remains in `beacon_clock`).

Collecting the time related stuff in one place makes it easier to find,
avoids some circular imports and allows more easily identifying the code
actually needs wall time to operate.

* move genesis-time-related functionality into `spec/beacon_time`
* avoid using `chronos.Duration` for time differences - it does not
support negative values (such as when something happens earlier than it
should)
* saturate conversions between `FAR_FUTURE_XXX`, so as to avoid
overflows
* fix delay reporting in validator client so it uses the expected
deadline of the slot, not "closest wall slot"
* simplify looping over the slots of an epoch
* `compute_start_slot_at_epoch` -> `start_slot`
* `compute_epoch_at_slot` -> `epoch`

A follow-up PR will (likely) introduce saturating arithmetic for the
time units - this is merely code moves, renames and fixing of small
bugs.
2022-01-11 11:01:54 +01:00
tersec ae61512ee9
rename upgrade_to_{merge,bellatrix}; detect unchanging spec YAMLs (#3265) 2022-01-10 09:39:43 +00:00
Jacek Sieka 6f7e0e3393
REST cleanups (#3255)
* REST cleanups

* reject out-of-range committee requests
* print all hex values as lower-case
* allow requesting state information by head state root
* turn `DomainType` into array (follow spec)
* `uint_to_bytesXX` -> `uint_to_bytes` (follow spec)
* fix wrong dependent root in `/eth/v1/validator/duties/proposer/`
* update documentation - `--subscribe-all-subnets` is no longer needed
when using the REST interface with validator clients
* more fixes
* common helpers for dependent block
* remove test rules obsoleted by more strict epoch tests
* fix trailing commas

* Update docs/the_nimbus_book/src/rest-api.md
* Update docs/the_nimbus_book/src/rest-api.md

Co-authored-by: sacha <sacha@status.im>
2022-01-08 22:06:34 +02:00
Jacek Sieka ba99c8fe4f
update era file documentation / impl (#3226)
Overhaul of era files, including documentation and reference
implementations

* store blocks, then state, then slot indices for easy lookup at low
cost
* document era file rationale
* altair+ support in era writer
2022-01-07 11:13:19 +01:00
tersec 0fd8bf7b56
spec URL updates (#3254) 2022-01-06 18:35:38 +00:00
Jacek Sieka 0a4728a241
Handle access to historical data for which there is no state (#3217)
With checkpoint sync in particular, and state pruning in the future,
loading states or state-dependent data may fail. This PR adjusts the
code to allow this to be handled gracefully.

In particular, the new availability assumption is that states are always
available for the finalized checkpoint and newer, but may fail for
anything older.

The `tail` remains the point where state loading de-facto fails, meaning
that between the tail and the finalized checkpoint, we can still get
historical data (but code should be prepared to handle this as an
error).

However, to harden the code against long replays, several operations
which are assumed to work only with non-final data (such as gossip
verification and validator duties) now limit their search horizon to
post-finalized data.

* harden several state-dependent operations by logging an error instead
of introducing a panic when state loading fails
* `withState` -> `withUpdatedState` to differentiate from the other
`withState`
* `updateStateData` can now fail if no state is found in database - it
is also hardened against excessively long replays
* `getEpochRef` can now fail when replay fails
* reject blocks with invalid target root - they would be ignored
previously
* fix recursion bug in `isProposed`
2022-01-05 19:38:04 +01:00
zah fba1f08a5e
Implement #3129 (Optimized history traversals in the REST API) (#3219)
* Fix REST some rest call signatures and implement a simple API benchmark tool

* Implement #3129 (Optimized history traversals in the REST API)

Other notable changes:

The `updateStateData` procedure in the `blockchain_dag.nim` module is
optimized to not rewind down to the last snapshot state saved in the
database if the supplied input state can be used as a starting point
instead.

* Disallow await in withStateForBlockSlot
2022-01-05 15:49:10 +01:00
tersec 5878d34117
rename forkDigests.merge to forkDigests.bellatrix (#3245) 2022-01-05 14:24:15 +00:00
tersec b81c06edab
rename Beacon{Block,State}Fork.Merge to Bellatrix; update copyright years (#3240) 2022-01-04 09:45:38 +00:00
Jacek Sieka c4ce59e55b
Assorted logging improvements (#3237)
* log doppelganger detection when it activates and when it causes missed
duties
* less prominent eth1 sync progress
* log in-progress sync at notice only when actually missing duties
* better detail in replay log
* don't log finalization checkpoints - this is quite verbose when
syncing and already included in "Slot start"
2022-01-03 22:18:49 +01:00
Jacek Sieka 7ec97a6b35
Fix missing checkpoint states` (#3225)
With the right sequence of events (for example a REST request or a
validation), it can happen that the first traversal across a state
checkpoint boundary is done without storing that state on disk - this
causes problens when replaying states, because now states may be missing
from the database.

Here, we simply avoid using the caches when advancing a state that will
go into the database, ensuring that the information lost during caching
always is permanently stored.

* fix recursion bug in `isProposed`
2021-12-30 12:33:03 +01:00
tersec 0d4e49f946
Merge fork gossip support (#3213)
* Merge fork gossip support

* index directly by BeaconStateFork and remove debugging log statement
2021-12-21 15:24:23 +01:00
Jacek Sieka 1021e3324e
Revert writing backfill root to database (#3215)
Introduced in #3171, it turns out we can just follow the block headers
to achieve the same effect

* leaves the constant in the code so as to avoid confusion when reading
database that had the constant written (such as the fleet nodes and
other unstable users)
2021-12-21 11:40:14 +01:00
Jacek Sieka c270ec21e4
Validator monitoring (#2925)
Validator monitoring based on and mostly compatible with the
implementation in Lighthouse - tracks additional logs and metrics for
specified validators so as to stay on top on performance.

The implementation works more or less the following way:
* Validator pubkeys are singled out for monitoring - these can be
running on the node or not
* For every action that the validator takes, we record steps in the
process such as messages being seen on the network or published in the
API
* When the dust settles at the end of an epoch, we report the
information from one epoch before that, which coincides with the
balances being updated - this is a tradeoff between being correct
(waiting for finalization) and providing relevant information in a
timely manner)
2021-12-20 20:20:31 +01:00
Jacek Sieka 03005f48e1
Backfill support for ChainDAG (#3171)
In the ChainDAG, 3 block pointers are kept: genesis, tail and head. This
PR adds one more block pointer: the backfill block which represents the
block that has been backfilled so far.

When doing a checkpoint sync, a random block is given as starting point
- this is the tail block, and we require that the tail block has a
corresponding state.

When backfilling, we end up with blocks without corresponding states,
hence we cannot use `tail` as a backfill pointer - there is no state.

Nonetheless, we need to keep track of where we are in the backfill
process between restarts, such that we can answer GetBeaconBlocksByRange
requests.

This PR adds the basic support for backfill handling - it needs to be
integrated with backfill sync, and the REST API needs to be adjusted to
take advantage of the new backfilled blocks when responding to certain
requests.

Future work will also enable moving the tail in either direction:
* pruning means moving the tail forward in time and removing states
* backwards means recreating past states from genesis, such that
intermediate states are recreated step by step all the way to the tail -
at that point, tail, genesis and backfill will match up.
* backfilling is done when backfill != genesis - later, this will be the
WSS checkpoint instead
2021-12-13 14:36:06 +01:00
Jacek Sieka 9f27f0d97c
BlockId reform (#3176)
* BlockId reform

Introduce `BlockId` that helps track a root/slot pair - this prepares
the codebase for backfilling and handling out-of-dag blocks

* move block dag code to separate module
* fix finalised state root in REST event stream
* fix finalised head computation on head update, when starting from
checkpoint
* clean up chaindag init
* revert `epochAncestor` change in introduced in #3144 that would return
an epoch ancestor from the canoncial history instead of the given
history, causing `EpochRef` keys to point to the wrong block
2021-12-09 19:06:21 +02:00
Jacek Sieka 069bccd51b
batch-verify sync messages for a small perf boost (#3151)
* batch-verify sync messages for a small perf boost

Generally reuses the same structure as attestation and aggregate
verification

* normalize `signatures` and `signature_batch` to use the same pattern
of verification
* normalize parameter names, order etc for signature stuff in general
* avoid calling `blsSign` directly - instead, go through `signatures`
consistently
2021-12-09 14:56:54 +02:00
tersec 2ca28fb861
Merge BeaconBlock gossip validation (#3165)
* Merge BeaconBlock gossip validation

* figure/ground inversion

* revert cosmetic cleanups to reduce merge conflicts
2021-12-08 17:29:22 +00:00
Etan Kissling 38e64b3441 cleanup sync subcommittee accessors
This removes some dead code from `getSubcommitteePositionsAux` which is
no longer needed since the introduction of `SyncCommitteeCache`.
This also cleans up some formatting, uses `let` instead of `var` where
possible, and uses implicit `pairs` in one case for consistency.
2021-12-07 18:17:03 +02:00
Jacek Sieka 89d6a1b403
Introduce slot->BlockRef mapping for finalized chain (#3144)
* Introduce slot->BlockRef mapping for finalized chain

The finalized chain is linear, thus we can use a seq to lookup blocks by
slot number.

Here, we introduce such a seq, even though in the future, it should
likely be backed by a database structure instead, or, more likely, a
flat era file with a flat lookup index.

This dramatically speeds up requests by slot, such as those coming from
the REST interface or GetBlocksByRange, as these are currently served by
a linear iteration from head.

* fix REST block requests to not return blocks from an earlier slot when
the given slot is empty
* fix StateId interpretation such that it doesn't treat state roots as
block roots
* don't load full block from database just to return its root
2021-12-06 20:52:35 +02:00
Jacek Sieka 1a8b7469e3
move quarantine outside of chaindag (#3124)
* move quarantine outside of chaindag

The quarantine has been part of the ChainDAG for the longest time, but
this design has a few issues:

* the function in which blocks are verified and added to the dag becomes
reentrant and therefore difficult to reason about - we're currently
using a stateful flag to work around it
* quarantined blocks bypass the processing queue leading to a processing
stampede
* the quarantine flow is unsuitable for orphaned attestations - these
should also should be quarantined eventually

Instead of processing the quarantine inside ChainDAG, this PR moves
re-queueing to `block_processor` which already is responsible for
dealing with follow-up work when a block is added to the dag

This sets the stage for keeping attestations in the quarantine as well.

Also:

* make `BlockError` `{.pure.}`
* avoid use of `ValidationResult` in block clearance (that's for gossip)
2021-12-06 10:49:01 +01:00
tersec 4378f3f096
almost all remaining ethereum/{eth2.0-specs -> consensus-specs} (#3158) 2021-12-03 20:01:13 +00:00
Jacek Sieka aa1dea03cd
speed up gossip and sync block validation (#3143)
* avoid recomputing hash for block signature check
* check block slot match before hitting the database
2021-12-01 10:52:40 +01:00
Etan Kissling eb777a6c8b allow `withState` to be called multiple times
This allows `blockchain_dag`'s `withState` template to be called more
than once in a single function. This led to a compilation error before
because the injected variables and functions shared the same scope.
2021-11-29 15:24:12 +02:00
Jacek Sieka 9c2f43ed0e
Speed up altair block processing 2x (#3115)
* Speed up altair block processing >2x

Like #3089, this PR drastially speeds up historical REST queries and
other long state replays.

* cache sync committee validator indices
* use ~80mb less memory for validator pubkey mappings
* batch-verify sync aggregate signature (fixes #2985)
* document sync committee hack with head block vs sync message block
* add batch signature verification failure tests

Before:

```
../env.sh nim c -d:release -r ncli_db --db:mainnet_0/db bench --start-slot:-1000
All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
Validation is turned off meaning that no BLS operations are performed
    5830.675,        0.000,     5830.675,     5830.675,            1, Initialize DB
       0.481,        1.878,        0.215,       59.167,          981, Load block from database
    8422.566,        0.000,     8422.566,     8422.566,            1, Load state from database
       6.996,        1.678,        0.042,       14.385,          969, Advance slot, non-epoch
      93.217,        8.318,       84.192,      122.209,           32, Advance slot, epoch
      20.513,       23.665,       11.510,      201.561,          981, Apply block, no slot processing
       0.000,        0.000,        0.000,        0.000,            0, Database load
       0.000,        0.000,        0.000,        0.000,            0, Database store
```

After:

```
    7081.422,        0.000,     7081.422,     7081.422,            1, Initialize DB
       0.553,        2.122,        0.175,       66.692,          981, Load block from database
    5439.446,        0.000,     5439.446,     5439.446,            1, Load state from database
       6.829,        1.575,        0.043,       12.156,          969, Advance slot, non-epoch
      94.716,        2.749,       88.395,      100.026,           32, Advance slot, epoch
      11.636,       23.766,        4.889,      205.250,          981, Apply block, no slot processing
       0.000,        0.000,        0.000,        0.000,            0, Database load
       0.000,        0.000,        0.000,        0.000,            0, Database store
```

* add comment
2021-11-24 13:43:50 +01:00