Whether new blocks/attestations/etc are produced internally or received
via REST, their journey through the node is the same - to ensure that
they get the same treatment (logging, metrics, processing), this PR
moves the routing to a dedicated module and fixes several small
differences that existed before.
* `xxxValidator` -> `processMessageName` - the processor also was adding
messages to pools, so we want the name to reflect that action
* add missing "sent" metrics for some messages
* document ignore policy better - already-seen messages are not actaully
rebroadcast by libp2p
* skip redundant signature checks for internal validators consistently
* SSZ `[]` -> `mitem`
* `[]` -> `item`
immutable access via mutable instance cannot rely on template
overloading, and `[]` cannot be a `func` because of special seq handling
in compiler.
* document static vs dynamic range checking requirements
* add `vindices` iterator to iterate over valid validator indices in a
state
* clean up spec comments in general
* fixup
Co-authored-by: tersec <tersec@users.noreply.github.com>
Since we were not verifying BLS signature in blocks that we produce,
we were failing to notice that some deposits need to be ignored (due
to having an invalid signature). Processing these deposits resulted
in a different ending state after the state transition which caused
our blocks to be rejected by the network.
Other changes:
* logtrace can now verify sync committee messages and contributions
* Many unnecessary use of pairs() have been removed for consistency
* Map 40x BN response codes to BeaconNodeStatus.Incompatible in the VC
* era file verification
Implement and document era file verification
* era file states now come with block applied for easier verification
* clarify conflicting version handling
* document verification requirements
* remove count from name, use start-era, end-root to discover range
* remove obsolete todo
* abstract out block root loading
Updated outdated presets / configs / REST config to v1.1.10 specs.
- `TERMINAL_BLOCK_HASH_ACTIVATION_EPOCH` and `PROPOSER_SCORE_BOOST` are
not yet used in `eth2-networks`, added configurability as TODOs.
- `MIN_ANCHOR_POW_BLOCK_DIFFICULTY` is no longer needed, put on ignore
list as some Altair devnets still reference it.
This PR makes the necessary adjustments to deal with the revamped snappy
API.
In practical terms for nimbus-eth2, there are performance increases to
gossip processing, database reading and writing as well as era file
processing. Exporting `.era` files for example, a snappy-heavy
operation, almost halves in total processing time:
Pre:
```
Average, StdDev, Min, Max, Samples, Test
39.088, 8.735, 23.619, 53.301, 50, tState
237.079, 46.692, 165.620, 355.481, 49, tBlocks
```
Post:
```
All time are ms
Average, StdDev, Min, Max, Samples, Test
25.350, 5.303, 15.351, 41.856, 50, tState
141.238, 24.164, 99.990, 199.329, 49, tBlocks
```
Some upstream repos still need fixes, but this gets us close enough that
style hints can be enabled by default.
In general, "canonical" spellings are preferred even if they violate
nep-1 - this applies in particular to spec-related stuff like
`genesis_validators_root` which appears throughout the codebase.
`.era` files and Req/Resp protocols use framed formats - aligning the
database with these makes for less recompression work overall as gossip
is sent only once while req/resp repeats (potentially) - this also
allows efficient pruning-to-era where snappy-recompression is the major
cycle thief.
* harden validator API against pre-finalized slot requests
* check `syncHorizon` when responding to validator api requests too far
from `head`
* limit state-id based requests to one epoch ahead of `head`
* put historic data bounds on block/attestation/etc validator production API, preventing them from being used with already-finalized slots
* add validator block smoke tests
* make rest test create a new genesis with the tests running roughly in
the first epoch to allow testing a few more boundary conditions
* era: load blocks and states
Era files contain finalized history and can be thought of as an
alternative source for block and state data that allows clients to avoid
syncing this information from the P2P network - the P2P network is then
used to "top up" the client with the most recent data. They can be
freely shared in the community via whatever means (http, torrent, etc)
and serve as a permanent cold store of consensus data (and, after the
merge, execution data) for history buffs and bean counters alike.
This PR gently introduces support for loading blocks and states in two
cases: block requests from rest/p2p and frontfilling when doing
checkpoint sync.
The era files are used as a secondary source if the information is not
found in the database - compared to the database, there are a few key
differences:
* the database stores the block indexed by block root while the era file
indexes by slot - the former is used only in rest, while the latter is
used both by p2p and rest.
* when loading blocks from era files, the root is no longer trivially
available - if it is needed, it must either be computed (slow) or cached
(messy) - the good news is that for p2p requests, it is not needed
* in era files, "framed" snappy encoding is used while in the database
we store unframed snappy - for p2p2 requests, the latter requires
recompression while the former could avoid it
* front-filling is the process of using era files to replace backfilling
- in theory this front-filling could happen from any block and
front-fills with gaps could also be entertained, but our backfilling
algorithm cannot take advantage of this because there's no (simple) way
to tell it to "skip" a range.
* front-filling, as implemented, is a bit slow (10s to load mainnet): we
load the full BeaconState for every era to grab the roots of the blocks
- it would be better to partially load the state - as such, it would
also be good to be able to partially decompress snappy blobs
* lookups from REST via root are served by first looking up a block
summary in the database, then using the slot to load the block data from
the era file - however, there needs to be an option to create the
summary table from era files to fully support historical queries
To test this, `ncli_db` has an era file exporter: the files it creates
should be placed in an `era` folder next to `db` in the data directory.
What's interesting in particular about this setup is that `db` remains
as the source of truth for security purposes - it stores the latest
synced head root which in turn determines where a node "starts" its
consensus participation - the era directory however can be freely shared
between nodes / people without any (significant) security implications,
assuming the era files are consistent / not broken.
There's lots of future improvements to be had:
* we can drop the in-memory `BlockRef` index almost entirely - at this
point, resident memory usage of Nimbus should drop to a cool 500-600 mb
* we could serve era files via REST trivially: this would drop backfill
times to whatever time it takes to download the files - unlike the
current implementation that downloads block by block, downloading an era
at a time almost entirely cuts out request overhead
* we can "reasonably" recreate detailed state history from almost any
point in time, turning an O(slot) process into O(1) effectively - we'll
still need caches and indices to do this with sufficient efficiency for
the rest api, but at least it cuts the whole process down to minutes
instead of hours, for arbitrary points in time
* CI: ignore failures with Nim-1.6 (temporary)
* test fixes
Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
Up til now, the block dag has been using `BlockRef`, a structure adapted
for a full DAG, to represent all of chain history. This is a correct and
simple design, but does not exploit the linearity of the chain once
parts of it finalize.
By pruning the in-memory `BlockRef` structure at finalization, we save,
at the time of writing, a cool ~250mb (or 25%:ish) chunk of memory
landing us at a steady state of ~750mb normal memory usage for a
validating node.
Above all though, we prevent memory usage from growing proportionally
with the length of the chain, something that would not be sustainable
over time - instead, the steady state memory usage is roughly
determined by the validator set size which grows much more slowly. With
these changes, the core should remain sustainable memory-wise post-merge
all the way to withdrawals (when the validator set is expected to grow).
In-memory indices are still used for the "hot" unfinalized portion of
the chain - this ensure that consensus performance remains unchanged.
What changes is that for historical access, we use a db-based linear
slot index which is cache-and-disk-friendly, keeping the cost for
accessing historical data at a similar level as before, achieving the
savings at no percievable cost to functionality or performance.
A nice collateral benefit is the almost-instant startup since we no
longer load any large indicies at dag init.
The cost of this functionality instead can be found in the complexity of
having to deal with two ways of traversing the chain - by `BlockRef` and
by slot.
* use `BlockId` instead of `BlockRef` where finalized / historical data
may be required
* simplify clearance pre-advancement
* remove dag.finalizedBlocks (~50:ish mb)
* remove `getBlockAtSlot` - use `getBlockIdAtSlot` instead
* `parent` and `atSlot` for `BlockId` now require a `ChainDAGRef`
instance, unlike `BlockRef` traversal
* prune `BlockRef` parents on finality (~200:ish mb)
* speed up ChainDAG init by not loading finalized history index
* mess up light client server error handling - this need revisiting :)
One more step on the journey to reduce `BlockRef` usage across the
codebase - this one gets rid of `StateData` whose job was to keep track
of which block was last assigned to a state - these duties have now been
taken over by `latest_block_root`, a fairly recent addition that
computes this block root from state data (at a small cost that should be
insignificant)
99% mechanical change.
To calculate the deltas correctly, the `process_inactivity_updates` function
must be called before the rewards and penalties processing code in order to
update the `inactivity_scores` field in the state. This would have required
duplicating more logic from the spec in the ncli modules, so I've decided to
pay the price of introducing a run-time copy of the state at each epoch which
eliminates the need to duplicate logic (both for this fix and the previous one).
Other changes:
* Fixes for the read-only mode of the `BeaconChainDb`
* Fix an uint64 underflow in the debug output procedure for printing
balance deltas
* Allow Bellatrix states in the reward computation helpers
Streamline lookup with Forky and BeaconBlockFork (then we can do the
same for era)
We use type to avoid conditionals, as fork is often already known at a
"higher" level.
* load blockid before loading block by root - this is needed to map root
to slot and will eventually be done via block summary table for "old"
blocks
Co-authored-by: tersec <tersec@users.noreply.github.com>
Update several `ncli_db` commands to run in readOnly mode, allowing them
to be used with a running instance - in particular era export.
* export all eras by default
* skip already-exported eras
Notable improvements:
* A separate aggregation pass is no longer required.
* The user can opt to produce only aggregated data
(resuing in a much smaller data set).
* Large portion of the number cruching in Jupyter is now done in C
through the rich DataFrames API.
* Added support for comparisons against the "median" validator
performance in the network.
* limit by-root requests to non-finalized blocks
Presently, we keep a mapping from block root to `BlockRef` in memory -
this has simplified reasoning about the dag, but is not sustainable with
the chain growing.
We can distinguish between two cases where by-root access is useful:
* unfinalized blocks - this is where the beacon chain is operating
generally, by validating incoming data as interesting for future fork
choice decisions - bounded by the length of the unfinalized period
* finalized blocks - historical access in the REST API etc - no bounds,
really
In this PR, we limit the by-root block index to the first use case:
finalized chain data can more efficiently be addressed by slot number.
Future work includes:
* limiting the `BlockRef` horizon in general - each instance is 40
bytes+overhead which adds up - this needs further refactoring to deal
with the tail vs state problem
* persisting the finalized slot-to-hash index - this one also keeps
growing unbounded (albeit slowly)
Anyway, this PR easily shaves ~128mb of memory usage at the time of
writing.
* No longer honor `BeaconBlocksByRoot` requests outside of the
non-finalized period - previously, Nimbus would generously return any
block through this libp2p request - per the spec, finalized blocks
should be fetched via `BeaconBlocksByRange` instead.
* return `Opt[BlockRef]` instead of `nil` when blocks can't be found -
this becomes a lot more common now and thus deserves more attention
* `dag.blocks` -> `dag.forkBlocks` - this index only carries unfinalized
blocks from now - `finalizedBlocks` covers the other `BlockRef`
instances
* in backfill, verify that the last backfilled block leads back to
genesis, or panic
* add backfill timings to log
* fix missing check that `BlockRef` block can be fetched with
`getForkedBlock` reliably
* shortcut doppelganger check when feature is not enabled
* in REST/JSON-RPC, fetch blocks without involving `BlockRef`
* fix dag.blocks ref
The new format is based on compressed CSV files in two channels:
* Detailed per-epoch data
* Aggregated "daily" summaries
The use of append-only CSV file speeds up significantly the epoch
processing speed during data generation. The use of compression
results in smaller storage requirements overall. The use of the
aggregated files has a very minor cost in both CPU and storage,
but leads to near interactive speed for report generation.
Other changes:
- Implemented support for graceful shut downs to avoid corrupting
the saved files.
- Fixed a memory leak caused by lacking `StateCache` clean up on each
iteration.
- Addressed review comments
- Moved the rewards and penalties calculation code in a separate module
Required invasive changes to existing modules:
- The `data` field of the `KeyedBlockRef` type is made public to be used
by the validator rewards monitor's Chain DAG update procedure.
- The `getForkedBlock` procedure from the `blockchain_dag.nim` module
is made public to be used by the validator rewards monitor's Chain DAG
update procedure.