Notable improvements:
* A separate aggregation pass is no longer required.
* The user can opt to produce only aggregated data
(resuing in a much smaller data set).
* Large portion of the number cruching in Jupyter is now done in C
through the rich DataFrames API.
* Added support for comparisons against the "median" validator
performance in the network.
* limit by-root requests to non-finalized blocks
Presently, we keep a mapping from block root to `BlockRef` in memory -
this has simplified reasoning about the dag, but is not sustainable with
the chain growing.
We can distinguish between two cases where by-root access is useful:
* unfinalized blocks - this is where the beacon chain is operating
generally, by validating incoming data as interesting for future fork
choice decisions - bounded by the length of the unfinalized period
* finalized blocks - historical access in the REST API etc - no bounds,
really
In this PR, we limit the by-root block index to the first use case:
finalized chain data can more efficiently be addressed by slot number.
Future work includes:
* limiting the `BlockRef` horizon in general - each instance is 40
bytes+overhead which adds up - this needs further refactoring to deal
with the tail vs state problem
* persisting the finalized slot-to-hash index - this one also keeps
growing unbounded (albeit slowly)
Anyway, this PR easily shaves ~128mb of memory usage at the time of
writing.
* No longer honor `BeaconBlocksByRoot` requests outside of the
non-finalized period - previously, Nimbus would generously return any
block through this libp2p request - per the spec, finalized blocks
should be fetched via `BeaconBlocksByRange` instead.
* return `Opt[BlockRef]` instead of `nil` when blocks can't be found -
this becomes a lot more common now and thus deserves more attention
* `dag.blocks` -> `dag.forkBlocks` - this index only carries unfinalized
blocks from now - `finalizedBlocks` covers the other `BlockRef`
instances
* in backfill, verify that the last backfilled block leads back to
genesis, or panic
* add backfill timings to log
* fix missing check that `BlockRef` block can be fetched with
`getForkedBlock` reliably
* shortcut doppelganger check when feature is not enabled
* in REST/JSON-RPC, fetch blocks without involving `BlockRef`
* fix dag.blocks ref
The new format is based on compressed CSV files in two channels:
* Detailed per-epoch data
* Aggregated "daily" summaries
The use of append-only CSV file speeds up significantly the epoch
processing speed during data generation. The use of compression
results in smaller storage requirements overall. The use of the
aggregated files has a very minor cost in both CPU and storage,
but leads to near interactive speed for report generation.
Other changes:
- Implemented support for graceful shut downs to avoid corrupting
the saved files.
- Fixed a memory leak caused by lacking `StateCache` clean up on each
iteration.
- Addressed review comments
- Moved the rewards and penalties calculation code in a separate module
Required invasive changes to existing modules:
- The `data` field of the `KeyedBlockRef` type is made public to be used
by the validator rewards monitor's Chain DAG update procedure.
- The `getForkedBlock` procedure from the `blockchain_dag.nim` module
is made public to be used by the validator rewards monitor's Chain DAG
update procedure.
This is an alternative take on https://github.com/status-im/nimbus-eth2/pull/3107
that aims for more minimal interventions in the spec modules at the expense of
duplicating more of the spec logic in ncli_db.
Time in the beacon chain is expressed relative to the genesis time -
this PR creates a `beacon_time` module that collects helpers and
utilities for dealing the time units - the new module does not deal with
actual wall time (that's remains in `beacon_clock`).
Collecting the time related stuff in one place makes it easier to find,
avoids some circular imports and allows more easily identifying the code
actually needs wall time to operate.
* move genesis-time-related functionality into `spec/beacon_time`
* avoid using `chronos.Duration` for time differences - it does not
support negative values (such as when something happens earlier than it
should)
* saturate conversions between `FAR_FUTURE_XXX`, so as to avoid
overflows
* fix delay reporting in validator client so it uses the expected
deadline of the slot, not "closest wall slot"
* simplify looping over the slots of an epoch
* `compute_start_slot_at_epoch` -> `start_slot`
* `compute_epoch_at_slot` -> `epoch`
A follow-up PR will (likely) introduce saturating arithmetic for the
time units - this is merely code moves, renames and fixing of small
bugs.
* Harden CommitteeIndex, SubnetId, SyncSubcommitteeIndex
Harden the use of `CommitteeIndex` et al to prevent future issues by
using a distinct type, then validating before use in several cases -
datatypes in spec are kept simple though so that invalid data still can
be read.
* fix invalid epoch used in REST
`/eth/v1/beacon/states/{state_id}/committees` committee length (could
return invalid data)
* normalize some variable names
* normalize committee index loops
* fix `RestAttesterDuty` to use `uint64` for `validator_committee_index`
* validate `CommitteeIndex` on ingress in REST API
* update rest rules with stricter parsing
* better REST serializers
* save lots of memory by not using `zip` ...at least a few bytes!
* REST cleanups
* reject out-of-range committee requests
* print all hex values as lower-case
* allow requesting state information by head state root
* turn `DomainType` into array (follow spec)
* `uint_to_bytesXX` -> `uint_to_bytes` (follow spec)
* fix wrong dependent root in `/eth/v1/validator/duties/proposer/`
* update documentation - `--subscribe-all-subnets` is no longer needed
when using the REST interface with validator clients
* more fixes
* common helpers for dependent block
* remove test rules obsoleted by more strict epoch tests
* fix trailing commas
* Update docs/the_nimbus_book/src/rest-api.md
* Update docs/the_nimbus_book/src/rest-api.md
Co-authored-by: sacha <sacha@status.im>
Overhaul of era files, including documentation and reference
implementations
* store blocks, then state, then slot indices for easy lookup at low
cost
* document era file rationale
* altair+ support in era writer
* support downloading blocks / states via JSON in addition to SSZ -
slow, but needed for infura support - SSZ is still used when server
supports it
* use common forked block/state reader in REST API
* fix stack overflows in REST JSON decoder
* fix invalid serialization of `justification_bits` in
`/eth/v1/debug/beacon/states` and `/eth/v2/debug/beacon/states`
* fix REST client to use `/eth/...` instead of `/api/eth/...`, update
"default" urls to expose REST api via `/eth` as well as this is what the
standard says - `/api` was added early on based on an example "base url"
in the spec that has been removed since
* expose Nimbus REST extensions via `/nimbus` in addition to
`/api/nimbus` to stay consistent with `/eth`
* fix invalid state root when reading states via REST
* fix recursive imports in `spec/ssz_codec`
* remove usages of `serialization.useCustomSerialization` - fickle
With checkpoint sync in particular, and state pruning in the future,
loading states or state-dependent data may fail. This PR adjusts the
code to allow this to be handled gracefully.
In particular, the new availability assumption is that states are always
available for the finalized checkpoint and newer, but may fail for
anything older.
The `tail` remains the point where state loading de-facto fails, meaning
that between the tail and the finalized checkpoint, we can still get
historical data (but code should be prepared to handle this as an
error).
However, to harden the code against long replays, several operations
which are assumed to work only with non-final data (such as gossip
verification and validator duties) now limit their search horizon to
post-finalized data.
* harden several state-dependent operations by logging an error instead
of introducing a panic when state loading fails
* `withState` -> `withUpdatedState` to differentiate from the other
`withState`
* `updateStateData` can now fail if no state is found in database - it
is also hardened against excessively long replays
* `getEpochRef` can now fail when replay fails
* reject blocks with invalid target root - they would be ignored
previously
* fix recursion bug in `isProposed`
Introduced in #3171, it turns out we can just follow the block headers
to achieve the same effect
* leaves the constant in the code so as to avoid confusion when reading
database that had the constant written (such as the fleet nodes and
other unstable users)
Validator monitoring based on and mostly compatible with the
implementation in Lighthouse - tracks additional logs and metrics for
specified validators so as to stay on top on performance.
The implementation works more or less the following way:
* Validator pubkeys are singled out for monitoring - these can be
running on the node or not
* For every action that the validator takes, we record steps in the
process such as messages being seen on the network or published in the
API
* When the dust settles at the end of an epoch, we report the
information from one epoch before that, which coincides with the
balances being updated - this is a tradeoff between being correct
(waiting for finalization) and providing relevant information in a
timely manner)
Renames and cleanups split out from the validator monitoring branch, so
as to reduce conflict area vs other PR:s
* add constants for expected message timing
* name validators after the messages they validate, mostly, to make
grepping easier
* unify field naming of EpochInfo across forks to make cross-fork code
easier
* ncli_db: add putState, putBlock
These tools allow modifying an existing nimbus database for the purpose
of recovery or reorg, moving the head, tail and genesis to arbitrary
points.
* remove potentially expensive `putState` in `BeaconStateDB`
* introduce `latest_block_root` which computes the root of the latest
applied block from the `latest_block_header` field (instead of passing
it in separately)
* avoid some unnecessary BeaconState copies during init
* discover https://github.com/nim-lang/Nim/issues/19094
* prefer `HashedBeaconState` in a few places to avoid recomputing state
root
* fetch latest block root from state when creating blocks
* harden `get_beacon_proposer_index` against invalid slots and document
* move random spec function tests to `test_spec.nim`
* avoid unnecessary state root computation before block proposal
* Support starting from altair
* hide `finalized-checkpoint-` - they are incomplete and usage may cause
crashes
* remove genesis detection code (broken, obsolete)
* enable starting ChainDAG from altair checkpoints - this is a
prerequisite for checkpoint sync (TODO: backfill)
* tighten checkpoint state conditions
* show error when starting from checkpoint with existing database (not
supported)
* print rest-compatible JSON in ncli/state_sim
* altair/merge support in ncli
* more altair/merge support in ncli_db
* pre-load header to speed up loading
* fix forked block decoding
* fix stack overflow crash in REST/debug/getStateV2
* introduce `ForkyXxx` for generic type matching of `Xxx` across
branches (SomeHashedBeaconState -> ForkyHashedBeaconState et al) -
`Some` is already used for other types of type classes
* consolidate function naming in BeaconChainDB, use some generics
* import `forks.nim` from other spec modules and move `Forked*` helpers
around to resolve circular imports
* remove `ForkedBeaconState`, use `ForkedHashedBeaconState` throughout
(less data shuffling between the types)
* fix several cases of states being stored on stack in tests, causing
random failures on some platforms
* remove reading json support from ncli - this should be ported to the
rest json reading instead (doesn't currently work because stack sizes)
So far, the REST config response did not include all spec constants.
The spec for `/eth/v1/config/spec` defines that the response should
include constants for all hard forks known by the beacon node. This
patch extends the corresponding response to include more constants.
* Initial commit.
* Fix path.
* Add validator keys to indices cache mechanism.
Move syncComitteeParticipants to common place.
* Fix sync participants order issue.
* Fix error code when state could not be found.
Refactor `state/validators` to use keysToIndices mechanism.
* Fix RestValidatorIndex to ValidatorIndex conversion TODOs.
* Address review comments.
* Fix REST test rules.
Similar to the existing `RewardInfo`, this PR adds the infrastructure
needed to export epoch processing information from altair+. Because
accounting is done somewhat differently, the PR uses a fork-specific
object to extrct the information in order to make the cost on the spec
side low.
* RewardInfo -> EpochInfo, ForkedEpochInfo
* use array for computing new sync committee
* avoid repeated total active balance computations in block processing
* simplify proposer index check
* simplify epoch transition tests
* pre-compute base increment and reuse in epoch processing, and a few
other small optimizations
This PR introduces the type and does the heavy lifting in terms of
refactoring - the tools that use the accounting will need separate PR:s
(as well as refinements to the exportred information)
* reorganize ssz dependencies
This PR continues the work in
https://github.com/status-im/nimbus-eth2/pull/2646,
https://github.com/status-im/nimbus-eth2/pull/2779 as well as past
issues with serialization and type, to disentangle SSZ from eth2 and at
the same time simplify imports and exports with a structured approach.
The principal idea here is that when a library wants to introduce SSZ
support, they do so via 3 files:
* `ssz_codecs` which imports and reexports `codecs` - this covers the
basic byte conversions and ensures no overloads get lost
* `xxx_merkleization` imports and exports `merkleization` to specialize
and get access to `hash_tree_root` and friends
* `xxx_ssz_serialization` imports and exports `ssz_serialization` to
specialize ssz for a specific library
Those that need to interact with SSZ always import the `xxx_` versions
of the modules and never `ssz` itself so as to keep imports simple and
safe.
This is similar to how the REST / JSON-RPC serializers are structured in
that someone wanting to serialize spec types to REST-JSON will import
`eth2_rest_serialization` and nothing else.
* split up ssz into a core library that is independendent of eth2 types
* rename `bytes_reader` to `codec` to highlight that it contains coding
and decoding of bytes and native ssz types
* remove tricky List init overload that causes compile issues
* get rid of top-level ssz import
* reenable merkleization tests
* move some "standard" json serializers to spec
* remove `ValidatorIndex` serialization for now
* remove test_ssz_merkleization
* add tests for over/underlong byte sequences
* fix broken seq[byte] test - seq[byte] is not an SSZ type
There are a few things this PR doesn't solve:
* like #2646 this PR is weak on how to handle root and other
dontSerialize fields that "sometimes" should be computed - the same
problem appears in REST / JSON-RPC etc
* Fix a build problem on macOS
* Another way to fix the macOS builds
Co-authored-by: Zahary Karadjov <zahary@gmail.com>
The spec imports are a mess to work with, so this branch cleans them up
a bit to ensure that we avoid generic sandwitches and that importing
stuff generally becomes easier.
* reexport crypto/digest/presets because these are part of the public
symbol set of the rest of the spec types
* don't export `merge` types from `base` - this causes circular deps
* fix circular deps in `ssz/spec_types` - this is the first step in
disentangling ssz from spec
* be explicit about phase0 vs altair - longer term, `altair` will become
the "natural" type set, then merge and so on, so no point in giving
`phase0` special preferential treatment
Simpler module name for stuff that covers forks
* check that runtime config matches database state
* also include some assorted altair cleanups
* use "standard" genesis fork in local testnet to work around missing
runtime config support
* add aggregated attestation tracing to logtrace and enable it in Jenkins CI
* use a slightly less cryptic acronym than aasr
* mostly, nimbus and the eth2 spec use aggregate attestation, not aggregated attestation