* Use final `v1` version for light client protocols
* Unhide LC data collection options
* Default enable LC data serving
* rm unneeded import
* Connect to EL on startup
* Add docs for LC based EL sync
The light client sync protocol employs heuristics to ensure it does not
become stuck during non-finality or low sync committee participation.
These can enable use cases that prefer availability of recent data
over security. For our syncing use case, though, security is preferred.
An option is added to light client processor to configure this tradeoff.
Other changes:
* The Keymanager error responses differ from the Beacon API responses.
'keymanagerApiError' replaces the former usages of 'jsonError'.
* Return status code 401 and 403 for authorization errors in accordance
to the spec.
* Eliminate inconsistencies in the REST JSON parsing. Some of the code
paths allowed missing fields.
* Added logging of serialization failure details at DEBUG level.
Removes a few extra-ambitious templates to make `self` updates explicit,
and moves the `FinalityCheckpoints` type from `base` to `helpers` as it
is an additional Nimbus specific type not defined by spec.
Whether new blocks/attestations/etc are produced internally or received
via REST, their journey through the node is the same - to ensure that
they get the same treatment (logging, metrics, processing), this PR
moves the routing to a dedicated module and fixes several small
differences that existed before.
* `xxxValidator` -> `processMessageName` - the processor also was adding
messages to pools, so we want the name to reflect that action
* add missing "sent" metrics for some messages
* document ignore policy better - already-seen messages are not actaully
rebroadcast by libp2p
* skip redundant signature checks for internal validators consistently
The justified and finalized `Checkpoint` are frequently passed around
together. This introduces a new `FinalityCheckpoint` data structure that
combines them into one.
Due to the large usage of this structure in fork choice, also took this
opportunity to update fork choice tests to the latest v1.2.0-rc.1 spec.
Many additional tests enabled, some need more work, e.g. EL mock blocks.
Also implemented `discard_equivocations` which was skipped in #3661,
and improved code reuse across fork choice logic while at it.
* optimistic sync
* flag that initially loaded blocks from database might need execution block root filled in
* return optimistic status in REST calls
* refactor blockslot pruning
* ensure beacon_blocks_by_{root,range} do not provide optimistic blocks
* handle forkchoice head being pre-merge with block being postmerge
* re-enable blocking head updates on validator duties
* fix is_optimistic_candidate_block per spec; don't crash with nil future
* fix is_optimistic_candidate_block per spec; don't crash with nil future
* mark blocks sans execution payloads valid during head update
Separate LC initialization options from the main ChainDAGRef options to
allow ChainDAGRef to treat them as opaque and reduce risk for conflicts
when extending those options in the future.
Merkle proofs tend to have long underlying type definitions, e.g.,
`array[log2trunc(NEXT_SYNC_COMMITTEE_INDEX), Eth2Digest]`. For the
ones used in the LC sync protocol, dedicated types are introduced
to improve readability. Furthermore, the `CachedLightClientBootstrap`
wrapper that solely wrapped a merkle branch is eliminated.
This updates `nim-ssz-serialization` to
`3db6cc0f282708aca6c290914488edd832971d61`.
Notable changes:
- Use `uint64` for `GeneralizedIndex`
- Add support for building merkle multiproofs
For consistency with other options, use a common prefix for light client
data configuration options.
* `--serve-light-client-data` --> `--light-client-data-serve`
* `--import-light-client-data` --> `--light-client-data-import-mode`
No deprecation of the old identifiers as they were only sparingly used
and all usage can be easily updated without interferance.
When launched with `--light-client-enable` the latest blocks are fetched
and optimistic candidate blocks are passed to a callback (log for now).
This helps accelerate syncing in the future (optimistic sync).
Adds a `LightClient` instance to the beacon node as preparation to
accelerate syncing in the future (optimistic sync).
- `--light-client-enable` turns on the feature
- `--light-client-trusted-block-root` configures block to start from
If no block root is configured, light client tracks DAG `finalizedHead`.
Introduces a new library for syncing using libp2p based light client
sync protocol, and adds a new `nimbus_light_client` executable that uses
this library for syncing. The new executable emits log messages when
new beacon block headers are received, and is integrated into testing.
* SSZ `[]` -> `mitem`
* `[]` -> `item`
immutable access via mutable instance cannot rely on template
overloading, and `[]` cannot be a `func` because of special seq handling
in compiler.
Incorporates the latest changes to the light client sync protocol based
on Devconnect AMS feedback. Note that this breaks compatibility with the
previous prototype, due to changes to data structures and endpoints.
See https://github.com/ethereum/consensus-specs/pull/2802
Since we were not verifying BLS signature in blocks that we produce,
we were failing to notice that some deposits need to be ignored (due
to having an invalid signature). Processing these deposits resulted
in a different ending state after the state transition which caused
our blocks to be rejected by the network.
Follows up on https://github.com/status-im/nimbus-eth2/pull/3461 which
ensured that repeated `beaconBlocksByRange` requests get shrinked to
account for potential out-of-band advancements to `safeSlot`, with
similar logic for the initial request.
Other changes:
* logtrace can now verify sync committee messages and contributions
* Many unnecessary use of pairs() have been removed for consistency
* Map 40x BN response codes to BeaconNodeStatus.Incompatible in the VC
Other fixes:
* Fix bit rot in the `make prater-dev-deposit` target.
* Correct content-type in the responses of the Nimbus signing node
* Invalid JSON payload was being sent in the web3signer requests
This PR makes the necessary adjustments to deal with the revamped snappy
API.
In practical terms for nimbus-eth2, there are performance increases to
gossip processing, database reading and writing as well as era file
processing. Exporting `.era` files for example, a snappy-heavy
operation, almost halves in total processing time:
Pre:
```
Average, StdDev, Min, Max, Samples, Test
39.088, 8.735, 23.619, 53.301, 50, tState
237.079, 46.692, 165.620, 355.481, 49, tBlocks
```
Post:
```
All time are ms
Average, StdDev, Min, Max, Samples, Test
25.350, 5.303, 15.351, 41.856, 50, tState
141.238, 24.164, 99.990, 199.329, 49, tBlocks
```
Some upstream repos still need fixes, but this gets us close enough that
style hints can be enabled by default.
In general, "canonical" spellings are preferred even if they violate
nep-1 - this applies in particular to spec-related stuff like
`genesis_validators_root` which appears throughout the codebase.
`.era` files and Req/Resp protocols use framed formats - aligning the
database with these makes for less recompression work overall as gossip
is sent only once while req/resp repeats (potentially) - this also
allows efficient pruning-to-era where snappy-recompression is the major
cycle thief.
* harden validator API against pre-finalized slot requests
* check `syncHorizon` when responding to validator api requests too far
from `head`
* limit state-id based requests to one epoch ahead of `head`
* put historic data bounds on block/attestation/etc validator production API, preventing them from being used with already-finalized slots
* add validator block smoke tests
* make rest test create a new genesis with the tests running roughly in
the first epoch to allow testing a few more boundary conditions
Gracefully handles the new failure modes recently introduced to the DAG
as part of https://github.com/status-im/nimbus-eth2/pull/3513
Data that is deemed to exist but fails to load leads to an error log to
avoid suppressing logic errors accidentally. In `verifyFinalization`
mode, the assertions remain active.
When doing checkpoint sync, collecting light client data of known blocks
and states incorrectly assumes that `finalized_checkpoint` information
is also known. Hardens collection to only collect finalized checkpoint
data after `dag.computeEarliestLightClientSlot`.
Adds `LightClientProcessor` as the pendant to `BlockProcessor` while
operating in light client mode. Note that a similar mechanism based on
async futures is used for interoperability with existing infrastructure,
despite light client object validation being done synchronously.
Up til now, the block dag has been using `BlockRef`, a structure adapted
for a full DAG, to represent all of chain history. This is a correct and
simple design, but does not exploit the linearity of the chain once
parts of it finalize.
By pruning the in-memory `BlockRef` structure at finalization, we save,
at the time of writing, a cool ~250mb (or 25%:ish) chunk of memory
landing us at a steady state of ~750mb normal memory usage for a
validating node.
Above all though, we prevent memory usage from growing proportionally
with the length of the chain, something that would not be sustainable
over time - instead, the steady state memory usage is roughly
determined by the validator set size which grows much more slowly. With
these changes, the core should remain sustainable memory-wise post-merge
all the way to withdrawals (when the validator set is expected to grow).
In-memory indices are still used for the "hot" unfinalized portion of
the chain - this ensure that consensus performance remains unchanged.
What changes is that for historical access, we use a db-based linear
slot index which is cache-and-disk-friendly, keeping the cost for
accessing historical data at a similar level as before, achieving the
savings at no percievable cost to functionality or performance.
A nice collateral benefit is the almost-instant startup since we no
longer load any large indicies at dag init.
The cost of this functionality instead can be found in the complexity of
having to deal with two ways of traversing the chain - by `BlockRef` and
by slot.
* use `BlockId` instead of `BlockRef` where finalized / historical data
may be required
* simplify clearance pre-advancement
* remove dag.finalizedBlocks (~50:ish mb)
* remove `getBlockAtSlot` - use `getBlockIdAtSlot` instead
* `parent` and `atSlot` for `BlockId` now require a `ChainDAGRef`
instance, unlike `BlockRef` traversal
* prune `BlockRef` parents on finality (~200:ish mb)
* speed up ChainDAG init by not loading finalized history index
* mess up light client server error handling - this need revisiting :)
The spec implicitly talks about the slot of a block in several places,
and keeping it readily available is useful in a number of context -
might as well put this implicitly refereneced helper in the spec code
directly
One more step on the journey to reduce `BlockRef` usage across the
codebase - this one gets rid of `StateData` whose job was to keep track
of which block was last assigned to a state - these duties have now been
taken over by `latest_block_root`, a fairly recent addition that
computes this block root from state data (at a small cost that should be
insignificant)
99% mechanical change.
When a `beaconBlocksByRange` response advances the `safeSlot`, but later
has errors, the sync queue keeps repeating that same request until it is
fulfilled without errors. Data up through `safeSlot` is considered to be
immutable, i.e., finalized, so re-requesting that data is not useful.
By advancing the sync progress in that scenario, those redundant query
portions can be avoided. Note, the finalized block _itself_ is always
requested, even in the initial request. This behaviour is kept same.
* fewer deps on `BlockRef` traversal in anticipation of pruning
* allows identifying EpochRef:s by their shuffling as a first step of
* tighten error handling around missing blocks
using the zero hash for signalling "missing block" is fragile and easy
to miss - with checkpoint sync now, and pruning in the future, missing
blocks become "normal".
When syncing as a light client, different behaviour is needed to handle
the various ways how errors may occur. The existing logic for blocks can
also be applied to light client objects:
- `Invalid`: Malformed object that is clearly an error by its producer.
- `MissingParent`: More data is needed to decide applicability.
- `UnviableFork`: Object may be valid but will never apply on this fork.
- `Duplicate`: No errors were encountered but the object was not useful.
Light clients require full nodes to serve additional data so that they
can stay in sync with the network. This patch adds a new launch option
`--import-light-client-data` to configure what data to make available.
For now, data is only kept in memory; it is not persisted at this time.
Note that data is only locally collected, a separate patch is needed to
actually make it availble over the network. `--serve-light-client-data`
will be used for serving data, but is not functional yet outside tests.
When performing trusted node sync, historical access is limited to
states after the checkpoint.
Reindexing restores full historical access by replaying historical
blocks against the state and storing snapshots in the database.
The process can be initiated or resumed at any point in time.
This adopts the spec sections of the pre-release proposal of the libp2p
based light client sync protocol, and also adds a test runner for the
new accompanying tests. While the release version of the light client
sync protocol contains conflicting definitions, it is currently unused,
and the code specific to the pre-release proposal is marked as such.
See https://github.com/ethereum/consensus-specs/pull/2802
The spec does not provide code for validating the `fork_version` field
of `LightClientUpdate`. However, we can use our own logic for additional
validation of that field. The spec's python test suite sets up states
that do not follow the fork schedule (e.g., that use Altair fork version
before Altair fork epoch), which complicates upstreaming this as code.
* Refactor and optimize logs.
* Introduce shortLog(SyncRequest).
* Address review comment.
* make sync queue logs more consistent
Adds a few minor logging improvements:
- Fixes a typo (`was happened` -> `has happened`)
- Avoids passing `reset_slot` argument to log statement multiple times
- Uses same `rewind_to_slot` label when logging in both sync directions
- Consistent rewind point logging
Co-authored-by: cheatfate <eugene.kabanov@status.im>
In practice, the sync committee signs `LightClientUpdate` instances at
the next slot following the block. This is not correctly reflected in
the tests, where it is signed one slot early. This patch updates the
tests to use the correct slot for the computation.
This PR names and documents the concept of the archive: a range of slots
for which we have degraded functionality in terms of historical access -
in particular:
* we don't support rewinding to states in this range
* we don't keep an in-memory representation of the block dag
The archive de-facto exists in a trusted-node-synced node, but this PR
gives it a name and drops the in-memory digest index.
In order to satisfy `GetBlocksByRange` requests, we ensure that we have
blocks for the entire archive period via backfill. Future versions may
relax this further, adding a "pre-archive" period that is fully pruned.
During by-slot searches in the archive (both for libp2p and rest
requests), an extra database lookup is used to covert the given `slot`
to a `root` - future versions will avoid this using era files which
natively are indexed by `slot`. That said, the lookup is quite
fast compared to the actual block loading given how trivial the table
is - it's hard to measure, even.
A collateral benefit of this PR is that checkpoint-synced nodes will see
100-200MB memory usage savings, thanks to the dropped in-memory cache -
future pruning work will bring this benefit to full nodes as well.
* document chaindag storage architecture and assumptions
* look up parent using block id instead of full block in clearance
(future-proofing the code against a future in which blocks come from era
files)
* simplify finalized block init, always writing the backfill portion to
db at startup (to ensure lookups work as expected)
* preallocate some extra memory for finalized blocks, to avoid immediate
realloc
https://github.com/ethereum/consensus-specs/pull/2225 removed an ignore
rule that would filter out duplicate aggregates from gossip publishing -
however, this causes increased bandwidth and CPU usage as discussed in
https://github.com/ethereum/consensus-specs/issues/2183 - the intent is
to revert the removal and reinstate the rule.
This PR implements ignore filtering which cuts down on CPU usage (fewer
aggregates to validate) and bandwidth usage (less fanout of duplicates)
- as #2225 points out, this may lead to a small increase in IHAVE
messages.
Streamline lookup with Forky and BeaconBlockFork (then we can do the
same for era)
We use type to avoid conditionals, as fork is often already known at a
"higher" level.
* load blockid before loading block by root - this is needed to map root
to slot and will eventually be done via block summary table for "old"
blocks
Co-authored-by: tersec <tersec@users.noreply.github.com>
* Initial commit.
* Fix current test suite.
* Fix keymanager api test.
* Fix wss_sim.
* Add more keystore_management tests.
* Recover deleted isEmptyDir().
* Add `HttpHostUri` distinct type.
Move keymanager calls away from rest_beacon_calls to rest_keymanager_calls.
Add REST serialization of RemoteKeystore and Keystore object.
Add tests for Remote Keystore management API.
Add tests for Keystore management API (Add keystore).
Fix serialzation issues.
* Fix test to use HttpHostUri instead of Uri.
* Add links to specification in comments.
* Remove debugging echoes.
* harden and speed up block sync
The `GetBlockBy*` server implementation currently reads SSZ bytes from
database, deserializes them into a Nim object then serializes them right
back to SSZ - here, we eliminate the deser/ser steps and send the bytes
straight to the network. Unfortunately, the snappy recoding must still
be done because of differences in framing.
Also, the quota system makes one giant request for quota right before
sending all blocks - this means that a 1024 block request will be
"paused" for a long time, then all blocks will be sent at once causing a
spike in database reads which potentially will see the reading client
time out before any block is sent.
Finally, on the reading side we make several copies of blocks as they
travel through various queues - this was not noticeable before but
becomes a problem in two cases: bellatrix blocks are up to 10mb (instead
of .. 30-40kb) and when backfilling, we process a lot more of them a lot
faster.
* fix status comparisons for nodes syncing from genesis (#3327 was a bit
too hard)
* don't hit database at all for post-altair slots in GetBlock v1
requests