Some upstream repos still need fixes, but this gets us close enough that
style hints can be enabled by default.
In general, "canonical" spellings are preferred even if they violate
nep-1 - this applies in particular to spec-related stuff like
`genesis_validators_root` which appears throughout the codebase.
`.era` files and Req/Resp protocols use framed formats - aligning the
database with these makes for less recompression work overall as gossip
is sent only once while req/resp repeats (potentially) - this also
allows efficient pruning-to-era where snappy-recompression is the major
cycle thief.
* era: load blocks and states
Era files contain finalized history and can be thought of as an
alternative source for block and state data that allows clients to avoid
syncing this information from the P2P network - the P2P network is then
used to "top up" the client with the most recent data. They can be
freely shared in the community via whatever means (http, torrent, etc)
and serve as a permanent cold store of consensus data (and, after the
merge, execution data) for history buffs and bean counters alike.
This PR gently introduces support for loading blocks and states in two
cases: block requests from rest/p2p and frontfilling when doing
checkpoint sync.
The era files are used as a secondary source if the information is not
found in the database - compared to the database, there are a few key
differences:
* the database stores the block indexed by block root while the era file
indexes by slot - the former is used only in rest, while the latter is
used both by p2p and rest.
* when loading blocks from era files, the root is no longer trivially
available - if it is needed, it must either be computed (slow) or cached
(messy) - the good news is that for p2p requests, it is not needed
* in era files, "framed" snappy encoding is used while in the database
we store unframed snappy - for p2p2 requests, the latter requires
recompression while the former could avoid it
* front-filling is the process of using era files to replace backfilling
- in theory this front-filling could happen from any block and
front-fills with gaps could also be entertained, but our backfilling
algorithm cannot take advantage of this because there's no (simple) way
to tell it to "skip" a range.
* front-filling, as implemented, is a bit slow (10s to load mainnet): we
load the full BeaconState for every era to grab the roots of the blocks
- it would be better to partially load the state - as such, it would
also be good to be able to partially decompress snappy blobs
* lookups from REST via root are served by first looking up a block
summary in the database, then using the slot to load the block data from
the era file - however, there needs to be an option to create the
summary table from era files to fully support historical queries
To test this, `ncli_db` has an era file exporter: the files it creates
should be placed in an `era` folder next to `db` in the data directory.
What's interesting in particular about this setup is that `db` remains
as the source of truth for security purposes - it stores the latest
synced head root which in turn determines where a node "starts" its
consensus participation - the era directory however can be freely shared
between nodes / people without any (significant) security implications,
assuming the era files are consistent / not broken.
There's lots of future improvements to be had:
* we can drop the in-memory `BlockRef` index almost entirely - at this
point, resident memory usage of Nimbus should drop to a cool 500-600 mb
* we could serve era files via REST trivially: this would drop backfill
times to whatever time it takes to download the files - unlike the
current implementation that downloads block by block, downloading an era
at a time almost entirely cuts out request overhead
* we can "reasonably" recreate detailed state history from almost any
point in time, turning an O(slot) process into O(1) effectively - we'll
still need caches and indices to do this with sufficient efficiency for
the rest api, but at least it cuts the whole process down to minutes
instead of hours, for arbitrary points in time
* CI: ignore failures with Nim-1.6 (temporary)
* test fixes
Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
When eliminating orphaned forks, light client data about blocks was also
deleted when the orphaned fork was referring to a state several slots
after the block. Linking light client data pruning with block deletion
instead of state deletion fixes this problem. Light client data always
refers to blocks and their immediate post-state.
ref loop would stop one block early in this case - trying to load
everything in one loop ends up being pretty confusing..
* simplify finalizedBlocks topup by splitting it from the head loop /
query
Witout this, we end up with a massive .wal file that needs to be
checkpointed on first startup (which takes a few minutes) - it's much
more efficient to do smaller checkpoints, it turns out.
Up til now, the block dag has been using `BlockRef`, a structure adapted
for a full DAG, to represent all of chain history. This is a correct and
simple design, but does not exploit the linearity of the chain once
parts of it finalize.
By pruning the in-memory `BlockRef` structure at finalization, we save,
at the time of writing, a cool ~250mb (or 25%:ish) chunk of memory
landing us at a steady state of ~750mb normal memory usage for a
validating node.
Above all though, we prevent memory usage from growing proportionally
with the length of the chain, something that would not be sustainable
over time - instead, the steady state memory usage is roughly
determined by the validator set size which grows much more slowly. With
these changes, the core should remain sustainable memory-wise post-merge
all the way to withdrawals (when the validator set is expected to grow).
In-memory indices are still used for the "hot" unfinalized portion of
the chain - this ensure that consensus performance remains unchanged.
What changes is that for historical access, we use a db-based linear
slot index which is cache-and-disk-friendly, keeping the cost for
accessing historical data at a similar level as before, achieving the
savings at no percievable cost to functionality or performance.
A nice collateral benefit is the almost-instant startup since we no
longer load any large indicies at dag init.
The cost of this functionality instead can be found in the complexity of
having to deal with two ways of traversing the chain - by `BlockRef` and
by slot.
* use `BlockId` instead of `BlockRef` where finalized / historical data
may be required
* simplify clearance pre-advancement
* remove dag.finalizedBlocks (~50:ish mb)
* remove `getBlockAtSlot` - use `getBlockIdAtSlot` instead
* `parent` and `atSlot` for `BlockId` now require a `ChainDAGRef`
instance, unlike `BlockRef` traversal
* prune `BlockRef` parents on finality (~200:ish mb)
* speed up ChainDAG init by not loading finalized history index
* mess up light client server error handling - this need revisiting :)
The spec implicitly talks about the slot of a block in several places,
and keeping it readily available is useful in a number of context -
might as well put this implicitly refereneced helper in the spec code
directly
One more step on the journey to reduce `BlockRef` usage across the
codebase - this one gets rid of `StateData` whose job was to keep track
of which block was last assigned to a state - these duties have now been
taken over by `latest_block_root`, a fairly recent addition that
computes this block root from state data (at a small cost that should be
insignificant)
99% mechanical change.
* fewer deps on `BlockRef` traversal in anticipation of pruning
* allows identifying EpochRef:s by their shuffling as a first step of
* tighten error handling around missing blocks
using the zero hash for signalling "missing block" is fragile and easy
to miss - with checkpoint sync now, and pruning in the future, missing
blocks become "normal".
Light clients require full nodes to serve additional data so that they
can stay in sync with the network. This patch adds a new launch option
`--import-light-client-data` to configure what data to make available.
For now, data is only kept in memory; it is not persisted at this time.
Note that data is only locally collected, a separate patch is needed to
actually make it availble over the network. `--serve-light-client-data`
will be used for serving data, but is not functional yet outside tests.
When performing trusted node sync, historical access is limited to
states after the checkpoint.
Reindexing restores full historical access by replaying historical
blocks against the state and storing snapshots in the database.
The process can be initiated or resumed at any point in time.
This PR names and documents the concept of the archive: a range of slots
for which we have degraded functionality in terms of historical access -
in particular:
* we don't support rewinding to states in this range
* we don't keep an in-memory representation of the block dag
The archive de-facto exists in a trusted-node-synced node, but this PR
gives it a name and drops the in-memory digest index.
In order to satisfy `GetBlocksByRange` requests, we ensure that we have
blocks for the entire archive period via backfill. Future versions may
relax this further, adding a "pre-archive" period that is fully pruned.
During by-slot searches in the archive (both for libp2p and rest
requests), an extra database lookup is used to covert the given `slot`
to a `root` - future versions will avoid this using era files which
natively are indexed by `slot`. That said, the lookup is quite
fast compared to the actual block loading given how trivial the table
is - it's hard to measure, even.
A collateral benefit of this PR is that checkpoint-synced nodes will see
100-200MB memory usage savings, thanks to the dropped in-memory cache -
future pruning work will bring this benefit to full nodes as well.
* document chaindag storage architecture and assumptions
* look up parent using block id instead of full block in clearance
(future-proofing the code against a future in which blocks come from era
files)
* simplify finalized block init, always writing the backfill portion to
db at startup (to ensure lookups work as expected)
* preallocate some extra memory for finalized blocks, to avoid immediate
realloc
Streamline lookup with Forky and BeaconBlockFork (then we can do the
same for era)
We use type to avoid conditionals, as fork is often already known at a
"higher" level.
* load blockid before loading block by root - this is needed to map root
to slot and will eventually be done via block summary table for "old"
blocks
Co-authored-by: tersec <tersec@users.noreply.github.com>
* clean up / document init
* drop `immutable_validators` data (pre-altair)
* document versions where data is first added
* avoid needlessly loading genesis block data on startup
* add a few more internal database consistency checks
* remove duplicate state root lookup on state load
* comment
When node is restarted before backfill has started but after some blocks
have finalized with forward sync, we would not start the backfill.
* also clean up one last `SomeSome`
* Fix a resource leak introduced in https://github.com/status-im/nimbus-eth2/pull/3279
* Don't restart the Eth1 syncing proggress from scratch in case of
monitor failures during Eth2 syncing.
* Switch to the primary operator as soon as it is back online.
* Log the web3 credentials in fewer places
Other changes:
The 'web3 test' command has been enhanced to obtain and print more
data regarding the selected provider.
The current counters set gauges etc to the value of the _last_ validator
to be processed - as the name of the feature implies, we should be using
sums instead.
* fix missing beacon state metrics on startup, pre-first-head-selection
* fix epoch metrics not being updated on cross-epoch reorg
* Store finalized block roots in database (3s startup)
When the chain has finalized a checkpoint, the history from that point
onwards becomes linear - this is exploited in `.era` files to allow
constant-time by-slot lookups.
In the database, we can do the same by storing finalized block roots in
a simple sparse table indexed by slot, bringing the two representations
closer to each other in terms of conceptual layout and performance.
Doing so has a number of interesting effects:
* mainnet startup time is improved 3-5x (3s on my laptop)
* the _first_ startup might take slightly longer as the new index is
being built - ~10s on the same laptop
* we no longer rely on the beacon block summaries to load the full dag -
this is a lot faster because we no longer have to look up each block by
parent root
* a collateral benefit is that we no longer need to load the full
summaries table into memory - we get the RSS benefits of #3164 without
the CPU hit.
Other random stuff:
* simplify forky block generics
* fix withManyWrites multiple evaluation
* fix validator key cache not being updated properly in chaindag
read-only mode
* drop pre-altair summaries from `kvstore`
* recreate missing summaries from altair+ blocks as well (in case
database has lost some to an involuntary restart)
* print database startup timings in chaindag load log
* avoid allocating superfluos state at startup
* use a recursive sql query to load the summaries of the unfinalized
blocks
* limit by-root requests to non-finalized blocks
Presently, we keep a mapping from block root to `BlockRef` in memory -
this has simplified reasoning about the dag, but is not sustainable with
the chain growing.
We can distinguish between two cases where by-root access is useful:
* unfinalized blocks - this is where the beacon chain is operating
generally, by validating incoming data as interesting for future fork
choice decisions - bounded by the length of the unfinalized period
* finalized blocks - historical access in the REST API etc - no bounds,
really
In this PR, we limit the by-root block index to the first use case:
finalized chain data can more efficiently be addressed by slot number.
Future work includes:
* limiting the `BlockRef` horizon in general - each instance is 40
bytes+overhead which adds up - this needs further refactoring to deal
with the tail vs state problem
* persisting the finalized slot-to-hash index - this one also keeps
growing unbounded (albeit slowly)
Anyway, this PR easily shaves ~128mb of memory usage at the time of
writing.
* No longer honor `BeaconBlocksByRoot` requests outside of the
non-finalized period - previously, Nimbus would generously return any
block through this libp2p request - per the spec, finalized blocks
should be fetched via `BeaconBlocksByRange` instead.
* return `Opt[BlockRef]` instead of `nil` when blocks can't be found -
this becomes a lot more common now and thus deserves more attention
* `dag.blocks` -> `dag.forkBlocks` - this index only carries unfinalized
blocks from now - `finalizedBlocks` covers the other `BlockRef`
instances
* in backfill, verify that the last backfilled block leads back to
genesis, or panic
* add backfill timings to log
* fix missing check that `BlockRef` block can be fetched with
`getForkedBlock` reliably
* shortcut doppelganger check when feature is not enabled
* in REST/JSON-RPC, fetch blocks without involving `BlockRef`
* fix dag.blocks ref
Backfilling is the process of downloading historical blocks via P2P that
are required to fulfill `GetBlocksByRange` duties - this happens during
both trusted node and finalized checkpoint syncs.
In particular, backfilling happens after syncing to head, such that
attestation work can start as soon as possible.
* Fix SyncQueue initialization procedure.
Remove usage of `awaitne`.
Add cancellation support.
Remove unneeded `sleepAsync()` if peer's head is older than needed.
Add `direction` field to all logs.
Fix syncmanager wedge issue.
Add proper resource cleaning procedure on backward sync finish.
Co-authored-by: cheatfate <eugene.kabanov@status.im>
The new format is based on compressed CSV files in two channels:
* Detailed per-epoch data
* Aggregated "daily" summaries
The use of append-only CSV file speeds up significantly the epoch
processing speed during data generation. The use of compression
results in smaller storage requirements overall. The use of the
aggregated files has a very minor cost in both CPU and storage,
but leads to near interactive speed for report generation.
Other changes:
- Implemented support for graceful shut downs to avoid corrupting
the saved files.
- Fixed a memory leak caused by lacking `StateCache` clean up on each
iteration.
- Addressed review comments
- Moved the rewards and penalties calculation code in a separate module
Required invasive changes to existing modules:
- The `data` field of the `KeyedBlockRef` type is made public to be used
by the validator rewards monitor's Chain DAG update procedure.
- The `getForkedBlock` procedure from the `blockchain_dag.nim` module
is made public to be used by the validator rewards monitor's Chain DAG
update procedure.
This is an alternative take on https://github.com/status-im/nimbus-eth2/pull/3107
that aims for more minimal interventions in the spec modules at the expense of
duplicating more of the spec logic in ncli_db.
Time in the beacon chain is expressed relative to the genesis time -
this PR creates a `beacon_time` module that collects helpers and
utilities for dealing the time units - the new module does not deal with
actual wall time (that's remains in `beacon_clock`).
Collecting the time related stuff in one place makes it easier to find,
avoids some circular imports and allows more easily identifying the code
actually needs wall time to operate.
* move genesis-time-related functionality into `spec/beacon_time`
* avoid using `chronos.Duration` for time differences - it does not
support negative values (such as when something happens earlier than it
should)
* saturate conversions between `FAR_FUTURE_XXX`, so as to avoid
overflows
* fix delay reporting in validator client so it uses the expected
deadline of the slot, not "closest wall slot"
* simplify looping over the slots of an epoch
* `compute_start_slot_at_epoch` -> `start_slot`
* `compute_epoch_at_slot` -> `epoch`
A follow-up PR will (likely) introduce saturating arithmetic for the
time units - this is merely code moves, renames and fixing of small
bugs.
* REST cleanups
* reject out-of-range committee requests
* print all hex values as lower-case
* allow requesting state information by head state root
* turn `DomainType` into array (follow spec)
* `uint_to_bytesXX` -> `uint_to_bytes` (follow spec)
* fix wrong dependent root in `/eth/v1/validator/duties/proposer/`
* update documentation - `--subscribe-all-subnets` is no longer needed
when using the REST interface with validator clients
* more fixes
* common helpers for dependent block
* remove test rules obsoleted by more strict epoch tests
* fix trailing commas
* Update docs/the_nimbus_book/src/rest-api.md
* Update docs/the_nimbus_book/src/rest-api.md
Co-authored-by: sacha <sacha@status.im>
Overhaul of era files, including documentation and reference
implementations
* store blocks, then state, then slot indices for easy lookup at low
cost
* document era file rationale
* altair+ support in era writer