Commit Graph

367 Commits

Author SHA1 Message Date
Jacek Sieka f70ff38b53
enable `styleCheck:usages` (#3573)
Some upstream repos still need fixes, but this gets us close enough that
style hints can be enabled by default.

In general, "canonical" spellings are preferred even if they violate
nep-1 - this applies in particular to spec-related stuff like
`genesis_validators_root` which appears throughout the codebase.
2022-04-08 16:22:49 +00:00
Jacek Sieka 5092fc41c7
use snappy-framed format for compressing bellatrix+ database entries (#3551)
`.era` files and Req/Resp protocols use framed formats - aligning the
database with these makes for less recompression work overall as gossip
is sent only once while req/resp repeats (potentially) - this also
allows efficient pruning-to-era where snappy-recompression is the major
cycle thief.
2022-03-29 11:33:06 +00:00
tersec 9b43a76f2f
kiln beacon node (#3540)
* kiln bn

* use  version of beacon_chain_db

* have Eth1Monitor abstract more tightly over web3provider
2022-03-25 11:40:10 +00:00
Jacek Sieka e009728858
work around Nim assignment bug that breaks state pruning (#3545)
See https://github.com/nim-lang/Nim/issues/19613
2022-03-24 14:37:37 +00:00
Jacek Sieka bc80ac3be1
harden REST API `atSlot` against non-finalized blocks (#3538)
* harden validator API against pre-finalized slot requests
* check `syncHorizon` when responding to validator api requests too far
from `head`
* limit state-id based requests to one epoch ahead of `head`
* put historic data bounds on block/attestation/etc validator production API, preventing them from being used with already-finalized slots
* add validator block smoke tests
* make rest test create a new genesis with the tests running roughly in
the first epoch to allow testing a few more boundary conditions
2022-03-23 12:42:16 +01:00
Jacek Sieka 4207b127f9
era: load blocks and states (#3394)
* era: load blocks and states

Era files contain finalized history and can be thought of as an
alternative source for block and state data that allows clients to avoid
syncing this information from the P2P network - the P2P network is then
used to "top up" the client with the most recent data. They can be
freely shared in the community via whatever means (http, torrent, etc)
and serve as a permanent cold store of consensus data (and, after the
merge, execution data) for history buffs and bean counters alike.

This PR gently introduces support for loading blocks and states in two
cases: block requests from rest/p2p and frontfilling when doing
checkpoint sync.

The era files are used as a secondary source if the information is not
found in the database - compared to the database, there are a few key
differences:

* the database stores the block indexed by block root while the era file
indexes by slot - the former is used only in rest, while the latter is
used both by p2p and rest.
* when loading blocks from era files, the root is no longer trivially
available - if it is needed, it must either be computed (slow) or cached
(messy) - the good news is that for p2p requests, it is not needed
* in era files, "framed" snappy encoding is used while in the database
we store unframed snappy - for p2p2 requests, the latter requires
recompression while the former could avoid it
* front-filling is the process of using era files to replace backfilling
- in theory this front-filling could happen from any block and
front-fills with gaps could also be entertained, but our backfilling
algorithm cannot take advantage of this because there's no (simple) way
to tell it to "skip" a range.
* front-filling, as implemented, is a bit slow (10s to load mainnet): we
load the full BeaconState for every era to grab the roots of the blocks
- it would be better to partially load the state - as such, it would
also be good to be able to partially decompress snappy blobs
* lookups from REST via root are served by first looking up a block
summary in the database, then using the slot to load the block data from
the era file - however, there needs to be an option to create the
summary table from era files to fully support historical queries

To test this, `ncli_db` has an era file exporter: the files it creates
should be placed in an `era` folder next to `db` in the data directory.
What's interesting in particular about this setup is that `db` remains
as the source of truth for security purposes - it stores the latest
synced head root which in turn determines where a node "starts" its
consensus participation - the era directory however can be freely shared
between nodes / people without any (significant) security implications,
assuming the era files are consistent / not broken.

There's lots of future improvements to be had:

* we can drop the in-memory `BlockRef` index almost entirely - at this
point, resident memory usage of Nimbus should drop to a cool 500-600 mb
* we could serve era files via REST trivially: this would drop backfill
times to whatever time it takes to download the files - unlike the
current implementation that downloads block by block, downloading an era
at a time almost entirely cuts out request overhead
* we can "reasonably" recreate detailed state history from almost any
point in time, turning an O(slot) process into O(1) effectively - we'll
still need caches and indices to do this with sufficient efficiency for
the rest api, but at least it cuts the whole process down to minutes
instead of hours, for arbitrary points in time

* CI: ignore failures with Nim-1.6 (temporary)

* test fixes

Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
2022-03-23 09:58:17 +01:00
Jacek Sieka 361596c719
harden head update against missing parent (#3529)
in case BlockRef ends up in some lifetime leak

* fix duplicate head logging
2022-03-21 15:18:05 +01:00
Jacek Sieka 13fafe3a40
simplify unviable head pruning (#3528)
Also note bug that exists that potentially prevents states from being
pruned correctly
2022-03-21 09:20:26 +00:00
Etan Kissling fd1ffd62dd
update light client server for DAG failure modes (#3514)
Gracefully handles the new failure modes recently introduced to the DAG
as part of https://github.com/status-im/nimbus-eth2/pull/3513
Data that is deemed to exist but fails to load leads to an error log to
avoid suppressing logic errors accidentally. In `verifyFinalization`
mode, the assertions remain active.
2022-03-20 11:58:59 +01:00
Etan Kissling 04b851f775
fix light client data pruning (#3523)
When eliminating orphaned forks, light client data about blocks was also
deleted when the orphaned fork was referring to a state several slots
after the block. Linking light client data pruning with block deletion
instead of state deletion fixes this problem. Light client data always
refers to blocks and their immediate post-state.
2022-03-20 10:09:43 +01:00
Jacek Sieka ea1acd7397
fix loading when finalized checkpoint slot is missing block (#3525)
ref loop would stop one block early in this case - trying to load
everything in one loop ends up being pretty confusing..

* simplify finalizedBlocks topup by splitting it from the head loop /
query
2022-03-19 11:02:17 +00:00
Etan Kissling 637f1e2be6
simplify `computeEarliestLightClientSlot` (#3524)
Combine DAG and LC import tails in `computeEarliestLightClientSlot`.
2022-03-19 09:58:55 +01:00
Etan Kissling 18bd6df1b4
fix light client data collection for checkpoint sync (#3498)
When doing checkpoint sync, collecting light client data of known blocks
and states incorrectly assumes that `finalized_checkpoint` information
is also known. Hardens collection to only collect finalized checkpoint
data after `dag.computeEarliestLightClientSlot`.
2022-03-18 15:47:53 +01:00
Jacek Sieka d0223d1f28
fix finalized epoch ref loading on checkpoint start (#3517)
regression from #3513 that did not take tail into consideration when
loading epoch ancestor
2022-03-18 13:13:57 +01:00
Jacek Sieka b3d80827fb
tns: checkpoint wal periodically while backfilling (#3516)
Witout this, we end up with a massive .wal file that needs to be
checkpointed on first startup (which takes a few minutes) - it's much
more efficient to do smaller checkpoints, it turns out.
2022-03-18 12:32:20 +01:00
Jacek Sieka 05ffe7b2bf
Prune `BlockRef` on finalization (#3513)
Up til now, the block dag has been using `BlockRef`, a structure adapted
for a full DAG, to represent all of chain history. This is a correct and
simple design, but does not exploit the linearity of the chain once
parts of it finalize.

By pruning the in-memory `BlockRef` structure at finalization, we save,
at the time of writing, a cool ~250mb (or 25%:ish) chunk of memory
landing us at a steady state of ~750mb normal memory usage for a
validating node.

Above all though, we prevent memory usage from growing proportionally
with the length of the chain, something that would not be sustainable
over time -  instead, the steady state memory usage is roughly
determined by the validator set size which grows much more slowly. With
these changes, the core should remain sustainable memory-wise post-merge
all the way to withdrawals (when the validator set is expected to grow).

In-memory indices are still used for the "hot" unfinalized portion of
the chain - this ensure that consensus performance remains unchanged.

What changes is that for historical access, we use a db-based linear
slot index which is cache-and-disk-friendly, keeping the cost for
accessing historical data at a similar level as before, achieving the
savings at no percievable cost to functionality or performance.

A nice collateral benefit is the almost-instant startup since we no
longer load any large indicies at dag init.

The cost of this functionality instead can be found in the complexity of
having to deal with two ways of traversing the chain - by `BlockRef` and
by slot.

* use `BlockId` instead of `BlockRef` where finalized / historical data
may be required
* simplify clearance pre-advancement
* remove dag.finalizedBlocks (~50:ish mb)
* remove `getBlockAtSlot` - use `getBlockIdAtSlot` instead
* `parent` and `atSlot` for `BlockId` now require a `ChainDAGRef`
instance, unlike `BlockRef` traversal
* prune `BlockRef` parents on finality (~200:ish mb)
* speed up ChainDAG init by not loading finalized history index
* mess up light client server error handling - this need revisiting :)
2022-03-17 17:42:56 +00:00
Jacek Sieka 8a63efc413
move `BlockId` to `spec` (#3511)
The spec implicitly talks about the slot of a block in several places,
and keeping it readily available is useful in a number of context -
might as well put this implicitly refereneced helper in the spec code
directly
2022-03-16 16:00:18 +01:00
tersec 8fbcf29775
update unchanged specs/phase0/p2p-interface.md URL references from v1.1.9 to v1.1.10 (#3510) 2022-03-16 10:40:35 +00:00
Jacek Sieka c64bf045f3
remove StateData (#3507)
One more step on the journey to reduce `BlockRef` usage across the
codebase - this one gets rid of `StateData` whose job was to keep track
of which block was last assigned to a state - these duties have now been
taken over by `latest_block_root`, a fairly recent addition that
computes this block root from state data (at a small cost that should be
insignificant)

99% mechanical change.
2022-03-16 08:20:40 +01:00
Jacek Sieka a3bd01b58d
move dependent root computations to `BeaconState` / `EpochRef` (#3478)
* fewer deps on `BlockRef` traversal in anticipation of pruning
* allows identifying EpochRef:s by their shuffling as a first step of
* tighten error handling around missing blocks

using the zero hash for signalling "missing block" is fragile and easy
to miss - with checkpoint sync now, and pruning in the future, missing
blocks become "normal".
2022-03-15 09:24:55 +01:00
Etan Kissling ae408c279a
add option to collect light client data (#3474)
Light clients require full nodes to serve additional data so that they
can stay in sync with the network. This patch adds a new launch option
`--import-light-client-data` to configure what data to make available.
For now, data is only kept in memory; it is not persisted at this time.
Note that data is only locally collected, a separate patch is needed to
actually make it availble over the network. `--serve-light-client-data`
will be used for serving data, but is not functional yet outside tests.
2022-03-11 21:28:10 +01:00
tersec 21b71bd29c
update URL and document Nim bug blocking further genericizing cleanups (#3483) 2022-03-11 15:03:47 +00:00
Jacek Sieka d0183ccd77
Historical state reindex for trusted node sync (#3452)
When performing trusted node sync, historical access is limited to
states after the checkpoint.

Reindexing restores full historical access by replaying historical
blocks against the state and storing snapshots in the database.

The process can be initiated or resumed at any point in time.
2022-03-11 12:49:47 +00:00
Jacek Sieka 4363215a32
relax `BlockRef` database assumptions (#3472)
* remove `getForkedBlock(BlockRef)` which assumes block data exists but
doesn't support archive/backfilled blocks
* fix REST `/eth/v1/beacon/headers` request not returning
archive/backfilled blocks
* avoid re-encoding in REST block SSZ requests (using `getBlockSSZ`)
2022-03-11 13:08:17 +01:00
Etan Kissling 8955edf158
allow using `BlockId` as key in tables (#3467)
`BlockId` is a type that bundles a block root with its slot number.
The type can be useful as key in tables that deal with non-finalized
blocks (not uniquely identified by slot) and also support pruning
(drop data about older blocks by slot). Instead of creating a custom
type for those use cases, this patch suggests implementing `hash` for
`BlockId` to re-use the existing type.
2022-03-07 14:56:58 +01:00
tersec f0ada15dac
automated CL spec ref URL updates from v1.1.9 to v1.1.10 (#3455) 2022-03-02 10:00:21 +00:00
Jacek Sieka 12ed537f75
catch wrong-fork-blocks earlier (#3444)
Can't apply a phase0 block to a later phase state and vice versa.

Since instantiation has been a topic, pre/post c file size:

```
424K	@mspec@sstate_transition.nim.c
892K	@mspec@sstate_transition_block.nim.c
```

```
288K	@mspec@sstate_transition.nim.c
880K	@mspec@sstate_transition_block.nim.c
```
2022-02-28 12:58:34 +00:00
Jacek Sieka 40a4c01086
chaindag: don't keep backfill block table in memory (#3429)
This PR names and documents the concept of the archive: a range of slots
for which we have degraded functionality in terms of historical access -
in particular:

* we don't support rewinding to states in this range
* we don't keep an in-memory representation of the block dag

The archive de-facto exists in a trusted-node-synced node, but this PR
gives it a name and drops the in-memory digest index.

In order to satisfy `GetBlocksByRange` requests, we ensure that we have
blocks for the entire archive period via backfill. Future versions may
relax this further, adding a "pre-archive" period that is fully pruned.

During by-slot searches in the archive (both for libp2p and rest
requests), an extra database lookup is used to covert the given `slot`
to a `root` - future versions will avoid this using era files which
natively are indexed by `slot`. That said, the lookup is quite
fast compared to the actual block loading given how trivial the table
is - it's hard to measure, even.

A collateral benefit of this PR is that checkpoint-synced nodes will see
100-200MB memory usage savings, thanks to the dropped in-memory cache -
future pruning work will bring this benefit to full nodes as well.

* document chaindag storage architecture and assumptions
* look up parent using block id instead of full block in clearance
(future-proofing the code against a future in which blocks come from era
files)
* simplify finalized block init, always writing the backfill portion to
db at startup (to ensure lookups work as expected)
* preallocate some extra memory for finalized blocks, to avoid immediate
realloc
2022-02-26 19:16:19 +01:00
Jacek Sieka 92e7e288e7
Ignore seen aggregates (#3439)
https://github.com/ethereum/consensus-specs/pull/2225 removed an ignore
rule that would filter out duplicate aggregates from gossip publishing -
however, this causes increased bandwidth and CPU usage as discussed in
https://github.com/ethereum/consensus-specs/issues/2183 - the intent is
to revert the removal and reinstate the rule.

This PR implements ignore filtering which cuts down on CPU usage (fewer
aggregates to validate) and bandwidth usage (less fanout of duplicates)
- as #2225 points out, this may lead to a small increase in IHAVE
messages.
2022-02-25 17:15:39 +01:00
tersec 7de3f00f35
generic putCorruptState; {Merge=>Bellatrix}BeaconStateNoImmutableValidators (#3427) 2022-02-21 12:55:56 +01:00
Jacek Sieka adfe655b16
db: make block loading generic (#3413)
Streamline lookup with Forky and BeaconBlockFork (then we can do the
same for era)

We use type to avoid conditionals, as fork is often already known at a
"higher" level.

* load blockid before loading block by root - this is needed to map root
to slot and will eventually be done via block summary table for "old"
blocks

Co-authored-by: tersec <tersec@users.noreply.github.com>
2022-02-21 09:48:02 +01:00
tersec 79761c78a4
proc -> func, mainly in spec/state transition and adjecent modules (#3405) 2022-02-17 11:53:55 +00:00
tersec 5eecb9a21f
rename no{R=>r}eturn, no{I=>i}init, short{l=>L}og, E{T=>t}h2Node, Beacon{c=>C}hainDB (#3403) 2022-02-16 23:24:44 +01:00
Jacek Sieka 7db5647a6e
clean up / document init (#3387)
* clean up / document init

* drop `immutable_validators` data (pre-altair)
* document versions where data is first added
* avoid needlessly loading genesis block data on startup
* add a few more internal database consistency checks
* remove duplicate state root lookup on state load

* comment
2022-02-16 16:44:04 +01:00
tersec 873a8ec1e6
use isZeroMemory for Eth2Digest comparisons (#3386)
* use isZeroMemory for Eth2Digest comparisons

* use Eth2Digest.isZero abstraction
2022-02-14 05:26:19 +00:00
Jacek Sieka 40fe8f5336 fix missing backfill when restarting node
When node is restarted before backfill has started but after some blocks
have finalized with forward sync, we would not start the backfill.

* also clean up one last `SomeSome`
2022-02-11 23:08:50 +02:00
tersec d358299875
fork choice proposer boosting support (#3349)
* fork choice proposer boosting support

* detect nodeDelta underflow/overflow
2022-02-04 12:59:40 +01:00
Zahary Karadjov 215caa21ae Eth1 monitor fixes
* Fix a resource leak introduced in https://github.com/status-im/nimbus-eth2/pull/3279

* Don't restart the Eth1 syncing proggress from scratch in case of
  monitor failures during Eth2 syncing.

* Switch to the primary operator as soon as it is back online.

* Log the web3 credentials in fewer places

Other changes:

The 'web3 test' command has been enhanced to obtain and print more
data regarding the selected provider.
2022-02-03 14:01:55 +02:00
tersec 8e6a920bf4
rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH (#3350)
* rename MERGE_FORK_EPOCH to BELLATRIX_FORK_EPOCH

* fix REST test rules
2022-02-02 14:06:55 +01:00
Jacek Sieka ff4f2a6b6c
better log on finalized slot failure 2022-02-01 21:23:18 +01:00
Jacek Sieka 3df9ffca9f val-mon: remove redundant `_total` suffix from counters
It turns out nim-metrics adds this suffix on its own - it also turns out
some of the names are non-conventional and need follow-up.
2022-01-31 18:51:24 +02:00
Jacek Sieka ad327a8769
Fix counters in validator monitor totals mode (#3332)
The current counters set gauges etc to the value of the _last_ validator
to be processed - as the name of the feature implies, we should be using
sums instead.

* fix missing beacon state metrics on startup, pre-first-head-selection
* fix epoch metrics not being updated on cross-epoch reorg
2022-01-31 08:36:29 +01:00
Jacek Sieka d583e8e4ac
Store finalized block roots in database (3s startup) (#3320)
* Store finalized block roots in database (3s startup)

When the chain has finalized a checkpoint, the history from that point
onwards becomes linear - this is exploited in `.era` files to allow
constant-time by-slot lookups.

In the database, we can do the same by storing finalized block roots in
a simple sparse table indexed by slot, bringing the two representations
closer to each other in terms of conceptual layout and performance.

Doing so has a number of interesting effects:

* mainnet startup time is improved 3-5x (3s on my laptop)
* the _first_ startup might take slightly longer as the new index is
being built - ~10s on the same laptop
* we no longer rely on the beacon block summaries to load the full dag -
this is a lot faster because we no longer have to look up each block by
parent root
* a collateral benefit is that we no longer need to load the full
summaries table into memory - we get the RSS benefits of #3164 without
the CPU hit.

Other random stuff:

* simplify forky block generics
* fix withManyWrites multiple evaluation
* fix validator key cache not being updated properly in chaindag
read-only mode
* drop pre-altair summaries from `kvstore`
* recreate missing summaries from altair+ blocks as well (in case
database has lost some to an involuntary restart)
* print database startup timings in chaindag load log
* avoid allocating superfluos state at startup
* use a recursive sql query to load the summaries of the unfinalized
blocks
2022-01-30 18:51:04 +02:00
tersec 29e2169585
phase 0 & altair beacon chain and altair validator spec URL updates (#3339) 2022-01-29 13:53:31 +00:00
tersec 89ffa8a1a7
spec URL & copyright year update (#3338) 2022-01-29 01:05:39 +00:00
Jacek Sieka e264276b36
keep unviables in quarantine (#3331)
they remain unviable even after a reorg
2022-01-28 11:59:55 +01:00
tersec 2b4a960270
rename On{Merge,Bellatrix}BlockAdded and Rollback{Merge,Bellatrix}HashedProc (#3321) 2022-01-26 13:21:29 +01:00
Jacek Sieka f70aceef37
Harden handling of unviable forks (#3312)
* Harden handling of unviable forks

In our current handling of unviable forks, we allow peers to send us
blocks that come from a different fork - this is not necessarily an
error as it can happen naturally, but it does open up the client to a
case where the same unviable fork keeps getting requested - rather than
allowing this to happen, we'll now give these peers a small negative
score - if it keeps happening, we'll disconnect them.

* keep track of unviable forks in quarantine, to avoid filling it with
known junk
* collect peer scores in single module
* descore peers when they send unviable blocks during sync
* don't give score for duplicate blocks
* increase quarantine size to a level that allows finality to happen
under optimal conditions - this helps avoid downloading the same blocks
over and over in case of an unviable fork
* increase initial score for new peers to make room for one more failure
before disconnection
* log and score invalid/unviable blocks in requestmanager too
* avoid ChainDAG dependency in quarantine
* reject gossip blocks with unviable parent
* continue processing unviable sync blocks in order to build unviable
dag

* docs

* Update beacon_chain/consensus_object_pools/block_pools_types.nim

* add unviable queue test
2022-01-26 13:20:08 +01:00
Jacek Sieka d076e1a11b
ncli_db: import states and blocks from era file (#3313) 2022-01-25 09:28:26 +01:00
tersec 00a347457a
dynamic sync committee subscriptions (#3308)
* dynamic sync committee subscriptions

* fast-path trivial case rather than rely on RNG with probability 1 outcome

Co-authored-by: zah <zahary@gmail.com>

* use func instead of template; avoid calling async function unnecessarily

* avoid unnecessary sync committee topic computation; use correct epoch lookahead; enforce exception/effect tracking

* don't over-optimistically update ENR syncnets; non-looping version of nearSyncCommitteePeriod

* allow separately setting --allow-all-{sub,att,sync}nets

* remove unnecessary async

Co-authored-by: zah <zahary@gmail.com>
2022-01-24 20:40:59 +00:00
tersec 351c2fd48a
rename mergeData to bellatrixData and mergeFork to bellatrixFork (#3315) 2022-01-24 16:23:13 +00:00
Jacek Sieka 61342c2449
limit by-root requests to non-finalized blocks (#3293)
* limit by-root requests to non-finalized blocks

Presently, we keep a mapping from block root to `BlockRef` in memory -
this has simplified reasoning about the dag, but is not sustainable with
the chain growing.

We can distinguish between two cases where by-root access is useful:

* unfinalized blocks - this is where the beacon chain is operating
generally, by validating incoming data as interesting for future fork
choice decisions - bounded by the length of the unfinalized period
* finalized blocks - historical access in the REST API etc - no bounds,
really

In this PR, we limit the by-root block index to the first use case:
finalized chain data can more efficiently be addressed by slot number.

Future work includes:

* limiting the `BlockRef` horizon in general - each instance is 40
bytes+overhead which adds up - this needs further refactoring to deal
with the tail vs state problem
* persisting the finalized slot-to-hash index - this one also keeps
growing unbounded (albeit slowly)

Anyway, this PR easily shaves ~128mb of memory usage at the time of
writing.

* No longer honor `BeaconBlocksByRoot` requests outside of the
non-finalized period - previously, Nimbus would generously return any
block through this libp2p request - per the spec, finalized blocks
should be fetched via `BeaconBlocksByRange` instead.
* return `Opt[BlockRef]` instead of `nil` when blocks can't be found -
this becomes a lot more common now and thus deserves more attention
* `dag.blocks` -> `dag.forkBlocks` - this index only carries unfinalized
blocks from now - `finalizedBlocks` covers the other `BlockRef`
instances
* in backfill, verify that the last backfilled block leads back to
genesis, or panic
* add backfill timings to log
* fix missing check that `BlockRef` block can be fetched with
`getForkedBlock` reliably
* shortcut doppelganger check when feature is not enabled
* in REST/JSON-RPC, fetch blocks without involving `BlockRef`

* fix dag.blocks ref
2022-01-21 13:33:16 +02:00
Dustin Brody 9699858422 rename MERGE_FORK_VERSION to BELLATRIX_FORK_VERSION 2022-01-20 19:33:05 +02:00
Jacek Sieka 570379d3d9
Backfiller (#3263)
Backfilling is the process of downloading historical blocks via P2P that
are required to fulfill `GetBlocksByRange` duties - this happens during
both trusted node and finalized checkpoint syncs.

In particular, backfilling happens after syncing to head, such that
attestation work can start as soon as possible.

* Fix SyncQueue initialization procedure.
Remove usage of `awaitne`.
Add cancellation support.
Remove unneeded `sleepAsync()` if peer's head is older than needed.
Add `direction` field to all logs.
Fix syncmanager wedge issue.
Add proper resource cleaning procedure on backward sync finish.

Co-authored-by: cheatfate <eugene.kabanov@status.im>
2022-01-20 08:25:45 +01:00
tersec 9c0c9c98ce
complete switch to beacon_chain/specs/datatypes/bellatrix (#3295) 2022-01-18 13:36:52 +00:00
Zahary Karadjov 47f1f7ff1a More efficient reward data persistance; Address review comments
The new format is based on compressed CSV files in two channels:

* Detailed per-epoch data
* Aggregated "daily" summaries

The use of append-only CSV file speeds up significantly the epoch
processing speed during data generation. The use of compression
results in smaller storage requirements overall. The use of the
aggregated files has a very minor cost in both CPU and storage,
but leads to near interactive speed for report generation.

Other changes:

- Implemented support for graceful shut downs to avoid corrupting
  the saved files.

- Fixed a memory leak caused by lacking `StateCache` clean up on each
  iteration.

- Addressed review comments

- Moved the rewards and penalties calculation code in a separate module

Required invasive changes to existing modules:

- The `data` field of the `KeyedBlockRef` type is made public to be used
  by the validator rewards monitor's Chain DAG update procedure.

- The `getForkedBlock` procedure from the `blockchain_dag.nim` module
  is made public to be used by the validator rewards monitor's Chain DAG
  update procedure.
2022-01-18 01:56:56 +02:00
Zahary Karadjov 29aad0241b Precise per-component ETH-denominated rewards tracking
This is an alternative take on https://github.com/status-im/nimbus-eth2/pull/3107
that aims for more minimal interventions in the spec modules at the expense of
duplicating more of the spec logic in ncli_db.
2022-01-18 01:56:56 +02:00
Jacek Sieka 836f6984bb
move `state_transition` to `Result` (#3284)
* better error messages in api
* avoid `BlockData` copies when replaying blocks
2022-01-17 12:19:58 +01:00
tersec d878948ed2
update sync committee gossip validation comments; spec URL updates (#3280) 2022-01-13 13:46:08 +00:00
Jacek Sieka e9486f5e5b
state_sim: clean up attestation production (#3274)
* use same naming as everywhere
* avoid iterator bug that leads to state copy
2022-01-12 21:42:03 +01:00
Jacek Sieka 805e85e1ff
time: spring cleaning (#3262)
Time in the beacon chain is expressed relative to the genesis time -
this PR creates a `beacon_time` module that collects helpers and
utilities for dealing the time units - the new module does not deal with
actual wall time (that's remains in `beacon_clock`).

Collecting the time related stuff in one place makes it easier to find,
avoids some circular imports and allows more easily identifying the code
actually needs wall time to operate.

* move genesis-time-related functionality into `spec/beacon_time`
* avoid using `chronos.Duration` for time differences - it does not
support negative values (such as when something happens earlier than it
should)
* saturate conversions between `FAR_FUTURE_XXX`, so as to avoid
overflows
* fix delay reporting in validator client so it uses the expected
deadline of the slot, not "closest wall slot"
* simplify looping over the slots of an epoch
* `compute_start_slot_at_epoch` -> `start_slot`
* `compute_epoch_at_slot` -> `epoch`

A follow-up PR will (likely) introduce saturating arithmetic for the
time units - this is merely code moves, renames and fixing of small
bugs.
2022-01-11 11:01:54 +01:00
tersec ae61512ee9
rename upgrade_to_{merge,bellatrix}; detect unchanging spec YAMLs (#3265) 2022-01-10 09:39:43 +00:00
Jacek Sieka 20e700fae4
Harden CommitteeIndex, SubnetId, SyncSubcommitteeIndex (#3259)
* Harden CommitteeIndex, SubnetId, SyncSubcommitteeIndex

Harden the use of `CommitteeIndex` et al to prevent future issues by
using a distinct type, then validating before use in several cases -
datatypes in spec are kept simple though so that invalid data still can
be read.

* fix invalid epoch used in REST
`/eth/v1/beacon/states/{state_id}/committees` committee length (could
return invalid data)
* normalize some variable names
* normalize committee index loops
* fix `RestAttesterDuty` to use `uint64` for `validator_committee_index`
* validate `CommitteeIndex` on ingress in REST API
* update rest rules with stricter parsing
* better REST serializers
* save lots of memory by not using `zip` ...at least a few bytes!
2022-01-09 01:28:49 +02:00
Jacek Sieka 6f7e0e3393
REST cleanups (#3255)
* REST cleanups

* reject out-of-range committee requests
* print all hex values as lower-case
* allow requesting state information by head state root
* turn `DomainType` into array (follow spec)
* `uint_to_bytesXX` -> `uint_to_bytes` (follow spec)
* fix wrong dependent root in `/eth/v1/validator/duties/proposer/`
* update documentation - `--subscribe-all-subnets` is no longer needed
when using the REST interface with validator clients
* more fixes
* common helpers for dependent block
* remove test rules obsoleted by more strict epoch tests
* fix trailing commas

* Update docs/the_nimbus_book/src/rest-api.md
* Update docs/the_nimbus_book/src/rest-api.md

Co-authored-by: sacha <sacha@status.im>
2022-01-08 22:06:34 +02:00
tersec bac0eaa92e
update 10 modules from using merge to bellatrix (#3257) 2022-01-07 18:10:40 +01:00
Jacek Sieka ba99c8fe4f
update era file documentation / impl (#3226)
Overhaul of era files, including documentation and reference
implementations

* store blocks, then state, then slot indices for easy lookup at low
cost
* document era file rationale
* altair+ support in era writer
2022-01-07 11:13:19 +01:00
tersec 0fd8bf7b56
spec URL updates (#3254) 2022-01-06 18:35:38 +00:00
Jacek Sieka 0a4728a241
Handle access to historical data for which there is no state (#3217)
With checkpoint sync in particular, and state pruning in the future,
loading states or state-dependent data may fail. This PR adjusts the
code to allow this to be handled gracefully.

In particular, the new availability assumption is that states are always
available for the finalized checkpoint and newer, but may fail for
anything older.

The `tail` remains the point where state loading de-facto fails, meaning
that between the tail and the finalized checkpoint, we can still get
historical data (but code should be prepared to handle this as an
error).

However, to harden the code against long replays, several operations
which are assumed to work only with non-final data (such as gossip
verification and validator duties) now limit their search horizon to
post-finalized data.

* harden several state-dependent operations by logging an error instead
of introducing a panic when state loading fails
* `withState` -> `withUpdatedState` to differentiate from the other
`withState`
* `updateStateData` can now fail if no state is found in database - it
is also hardened against excessively long replays
* `getEpochRef` can now fail when replay fails
* reject blocks with invalid target root - they would be ignored
previously
* fix recursion bug in `isProposed`
2022-01-05 19:38:04 +01:00
zah fba1f08a5e
Implement #3129 (Optimized history traversals in the REST API) (#3219)
* Fix REST some rest call signatures and implement a simple API benchmark tool

* Implement #3129 (Optimized history traversals in the REST API)

Other notable changes:

The `updateStateData` procedure in the `blockchain_dag.nim` module is
optimized to not rewind down to the last snapshot state saved in the
database if the supplied input state can be used as a starting point
instead.

* Disallow await in withStateForBlockSlot
2022-01-05 15:49:10 +01:00
tersec 5878d34117
rename forkDigests.merge to forkDigests.bellatrix (#3245) 2022-01-05 14:24:15 +00:00
tersec b81c06edab
rename Beacon{Block,State}Fork.Merge to Bellatrix; update copyright years (#3240) 2022-01-04 09:45:38 +00:00
tersec da017d2ca5
update from phase0/altair v1.1.6 URLs to v1.1.8 spec URLs (#3238) 2022-01-04 03:57:15 +00:00
Jacek Sieka c4ce59e55b
Assorted logging improvements (#3237)
* log doppelganger detection when it activates and when it causes missed
duties
* less prominent eth1 sync progress
* log in-progress sync at notice only when actually missing duties
* better detail in replay log
* don't log finalization checkpoints - this is quite verbose when
syncing and already included in "Slot start"
2022-01-03 22:18:49 +01:00
Jacek Sieka 7ec97a6b35
Fix missing checkpoint states` (#3225)
With the right sequence of events (for example a REST request or a
validation), it can happen that the first traversal across a state
checkpoint boundary is done without storing that state on disk - this
causes problens when replaying states, because now states may be missing
from the database.

Here, we simply avoid using the caches when advancing a state that will
go into the database, ensuring that the information lost during caching
always is permanently stored.

* fix recursion bug in `isProposed`
2021-12-30 12:33:03 +01:00
tersec 1a6a56bdb1
use BeaconTime instead of Slot in fork choice (#3138)
* use v1.1.6 test vectors; use BeaconTime instead of Slot in fork choice

* tick through every slot at least once

* use div INTERVALS_PER_SLOT and use precomputed constants of them

* use correct (even if numerically equal) constant
2021-12-21 18:56:08 +00:00
tersec 0d4e49f946
Merge fork gossip support (#3213)
* Merge fork gossip support

* index directly by BeaconStateFork and remove debugging log statement
2021-12-21 15:24:23 +01:00
Jacek Sieka 1021e3324e
Revert writing backfill root to database (#3215)
Introduced in #3171, it turns out we can just follow the block headers
to achieve the same effect

* leaves the constant in the code so as to avoid confusion when reading
database that had the constant written (such as the fleet nodes and
other unstable users)
2021-12-21 11:40:14 +01:00
Jacek Sieka c270ec21e4
Validator monitoring (#2925)
Validator monitoring based on and mostly compatible with the
implementation in Lighthouse - tracks additional logs and metrics for
specified validators so as to stay on top on performance.

The implementation works more or less the following way:
* Validator pubkeys are singled out for monitoring - these can be
running on the node or not
* For every action that the validator takes, we record steps in the
process such as messages being seen on the network or published in the
API
* When the dust settles at the end of an epoch, we report the
information from one epoch before that, which coincides with the
balances being updated - this is a tradeoff between being correct
(waiting for finalization) and providing relevant information in a
timely manner)
2021-12-20 20:20:31 +01:00
Jacek Sieka 03005f48e1
Backfill support for ChainDAG (#3171)
In the ChainDAG, 3 block pointers are kept: genesis, tail and head. This
PR adds one more block pointer: the backfill block which represents the
block that has been backfilled so far.

When doing a checkpoint sync, a random block is given as starting point
- this is the tail block, and we require that the tail block has a
corresponding state.

When backfilling, we end up with blocks without corresponding states,
hence we cannot use `tail` as a backfill pointer - there is no state.

Nonetheless, we need to keep track of where we are in the backfill
process between restarts, such that we can answer GetBeaconBlocksByRange
requests.

This PR adds the basic support for backfill handling - it needs to be
integrated with backfill sync, and the REST API needs to be adjusted to
take advantage of the new backfilled blocks when responding to certain
requests.

Future work will also enable moving the tail in either direction:
* pruning means moving the tail forward in time and removing states
* backwards means recreating past states from genesis, such that
intermediate states are recreated step by step all the way to the tail -
at that point, tail, genesis and backfill will match up.
* backfilling is done when backfill != genesis - later, this will be the
WSS checkpoint instead
2021-12-13 14:36:06 +01:00
Jacek Sieka 9f27f0d97c
BlockId reform (#3176)
* BlockId reform

Introduce `BlockId` that helps track a root/slot pair - this prepares
the codebase for backfilling and handling out-of-dag blocks

* move block dag code to separate module
* fix finalised state root in REST event stream
* fix finalised head computation on head update, when starting from
checkpoint
* clean up chaindag init
* revert `epochAncestor` change in introduced in #3144 that would return
an epoch ancestor from the canoncial history instead of the given
history, causing `EpochRef` keys to point to the wrong block
2021-12-09 19:06:21 +02:00
Jacek Sieka 069bccd51b
batch-verify sync messages for a small perf boost (#3151)
* batch-verify sync messages for a small perf boost

Generally reuses the same structure as attestation and aggregate
verification

* normalize `signatures` and `signature_batch` to use the same pattern
of verification
* normalize parameter names, order etc for signature stuff in general
* avoid calling `blsSign` directly - instead, go through `signatures`
consistently
2021-12-09 14:56:54 +02:00
tersec 2ca28fb861
Merge BeaconBlock gossip validation (#3165)
* Merge BeaconBlock gossip validation

* figure/ground inversion

* revert cosmetic cleanups to reduce merge conflicts
2021-12-08 17:29:22 +00:00
Etan Kissling 38e64b3441 cleanup sync subcommittee accessors
This removes some dead code from `getSubcommitteePositionsAux` which is
no longer needed since the introduction of `SyncCommitteeCache`.
This also cleans up some formatting, uses `let` instead of `var` where
possible, and uses implicit `pairs` in one case for consistency.
2021-12-07 18:17:03 +02:00
Jacek Sieka 89d6a1b403
Introduce slot->BlockRef mapping for finalized chain (#3144)
* Introduce slot->BlockRef mapping for finalized chain

The finalized chain is linear, thus we can use a seq to lookup blocks by
slot number.

Here, we introduce such a seq, even though in the future, it should
likely be backed by a database structure instead, or, more likely, a
flat era file with a flat lookup index.

This dramatically speeds up requests by slot, such as those coming from
the REST interface or GetBlocksByRange, as these are currently served by
a linear iteration from head.

* fix REST block requests to not return blocks from an earlier slot when
the given slot is empty
* fix StateId interpretation such that it doesn't treat state roots as
block roots
* don't load full block from database just to return its root
2021-12-06 20:52:35 +02:00
Jacek Sieka 1a8b7469e3
move quarantine outside of chaindag (#3124)
* move quarantine outside of chaindag

The quarantine has been part of the ChainDAG for the longest time, but
this design has a few issues:

* the function in which blocks are verified and added to the dag becomes
reentrant and therefore difficult to reason about - we're currently
using a stateful flag to work around it
* quarantined blocks bypass the processing queue leading to a processing
stampede
* the quarantine flow is unsuitable for orphaned attestations - these
should also should be quarantined eventually

Instead of processing the quarantine inside ChainDAG, this PR moves
re-queueing to `block_processor` which already is responsible for
dealing with follow-up work when a block is added to the dag

This sets the stage for keeping attestations in the quarantine as well.

Also:

* make `BlockError` `{.pure.}`
* avoid use of `ValidationResult` in block clearance (that's for gossip)
2021-12-06 10:49:01 +01:00
tersec e6921f808f
cleanups, partly from kintsugi branch (#3161)
* cleanups, partly from kintsugi branch

* re-export shortLog(EthBlock) and preserve exception messages in batchVerify and processBatch
2021-12-05 17:32:41 +00:00
tersec 4378f3f096
almost all remaining ethereum/{eth2.0-specs -> consensus-specs} (#3158) 2021-12-03 20:01:13 +00:00
tersec cc51f3fd12
v1.1.{5 -> 6} phase 0 and altair spec URL updates (#3157) 2021-12-03 17:40:23 +00:00
zah 74c63ed740
More efficient implementation of the 'POST beacon_committee_subscriptions' API (#3153) 2021-12-03 17:04:58 +02:00
Jacek Sieka aa1dea03cd
speed up gossip and sync block validation (#3143)
* avoid recomputing hash for block signature check
* check block slot match before hitting the database
2021-12-01 10:52:40 +01:00
Etan Kissling eb777a6c8b allow `withState` to be called multiple times
This allows `blockchain_dag`'s `withState` template to be called more
than once in a single function. This led to a compilation error before
because the injected variables and functions shared the same scope.
2021-11-29 15:24:12 +02:00
Mamy Ratsimbazafy 97da6e1365
Fork choice EF consensus tests (#3041)
* add EF fork choice tests to CI

* checkpoints

* compilation fixes and add test to preset dependent suite

* support longpaths on Windows CI

* skip minimal tests (long paths issue + impl detals tested)

* fix stackoverflow on some platforms

* rebase on top of https://github.com/status-im/nimbus-eth2/pull/3054

* fix stack usage
2021-11-25 19:41:39 +01:00
Jacek Sieka a223d62b07
Cleanups (#3123)
Renames and cleanups split out from the validator monitoring branch, so
as to reduce conflict area vs other PR:s

* add constants for expected message timing
* name validators after the messages they validate, mostly, to make
grepping easier
* unify field naming of EpochInfo across forks to make cross-fork code
easier
2021-11-25 13:20:36 +01:00
Jacek Sieka 9c2f43ed0e
Speed up altair block processing 2x (#3115)
* Speed up altair block processing >2x

Like #3089, this PR drastially speeds up historical REST queries and
other long state replays.

* cache sync committee validator indices
* use ~80mb less memory for validator pubkey mappings
* batch-verify sync aggregate signature (fixes #2985)
* document sync committee hack with head block vs sync message block
* add batch signature verification failure tests

Before:

```
../env.sh nim c -d:release -r ncli_db --db:mainnet_0/db bench --start-slot:-1000
All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
Validation is turned off meaning that no BLS operations are performed
    5830.675,        0.000,     5830.675,     5830.675,            1, Initialize DB
       0.481,        1.878,        0.215,       59.167,          981, Load block from database
    8422.566,        0.000,     8422.566,     8422.566,            1, Load state from database
       6.996,        1.678,        0.042,       14.385,          969, Advance slot, non-epoch
      93.217,        8.318,       84.192,      122.209,           32, Advance slot, epoch
      20.513,       23.665,       11.510,      201.561,          981, Apply block, no slot processing
       0.000,        0.000,        0.000,        0.000,            0, Database load
       0.000,        0.000,        0.000,        0.000,            0, Database store
```

After:

```
    7081.422,        0.000,     7081.422,     7081.422,            1, Initialize DB
       0.553,        2.122,        0.175,       66.692,          981, Load block from database
    5439.446,        0.000,     5439.446,     5439.446,            1, Load state from database
       6.829,        1.575,        0.043,       12.156,          969, Advance slot, non-epoch
      94.716,        2.749,       88.395,      100.026,           32, Advance slot, epoch
      11.636,       23.766,        4.889,      205.250,          981, Apply block, no slot processing
       0.000,        0.000,        0.000,        0.000,            0, Database load
       0.000,        0.000,        0.000,        0.000,            0, Database store
```

* add comment
2021-11-24 13:43:50 +01:00
Jacek Sieka f19a497eec
ncli_db: add putState, putBlock (#3096)
* ncli_db: add putState, putBlock

These tools allow modifying an existing nimbus database for the purpose
of recovery or reorg, moving the head, tail and genesis to arbitrary
points.

* remove potentially expensive `putState` in `BeaconStateDB`
* introduce `latest_block_root` which computes the root of the latest
applied block from the `latest_block_header` field (instead of passing
it in separately)
* avoid some unnecessary BeaconState copies during init
* discover https://github.com/nim-lang/Nim/issues/19094
* prefer `HashedBeaconState` in a few places to avoid recomputing state
root
* fetch latest block root from state when creating blocks
* harden `get_beacon_proposer_index` against invalid slots and document
* move random spec function tests to `test_spec.nim`
* avoid unnecessary state root computation before block proposal
2021-11-18 13:02:43 +01:00
tersec 9e395011d9
update 22 spec URLs to v1.1.5 (#3111) 2021-11-18 08:08:00 +00:00
Jacek Sieka 222674b203
better logging on invalid database (#3097)
* remove redundant test report
2021-11-13 17:27:28 +01:00
Jacek Sieka ec650c7fd7
Support starting from altair (#3054)
* Support starting from altair

* hide `finalized-checkpoint-` - they are incomplete and usage may cause
crashes
* remove genesis detection code (broken, obsolete)
* enable starting ChainDAG from altair checkpoints - this is a
prerequisite for checkpoint sync (TODO: backfill)
* tighten checkpoint state conditions
* show error when starting from checkpoint with existing database (not
supported)
* print rest-compatible JSON in ncli/state_sim
* altair/merge support in ncli
* more altair/merge support in ncli_db
* pre-load header to speed up loading
* fix forked block decoding
2021-11-10 13:39:08 +02:00
tersec 5c48982280
a dozen spec URL updates to v1.1.5 (#3078) 2021-11-10 08:12:41 +00:00
tersec 48eba59971
manually-verified v1.1.5 spec URL updates (#3068) 2021-11-09 08:54:59 +00:00
tersec 2e868dc2ba
mass/mechanical update of 1.1.4 phase0 and altair spec URLs to 1.1.5 (#3067) 2021-11-09 07:40:41 +00:00
tersec 2c8600e746
mass/mechanical update of 1.1.3 phase0 spec URLs to 1.1.4 in markdown (#3059) 2021-11-08 09:26:18 +00:00
tersec eb3ad25859
mass/mechanical update of 1.1.3 phase/altair spec URLs to 1.1.4 (#3058) 2021-11-08 06:18:10 +00:00
Zahary Karadjov 29e5700838 Bugfix: Avoid the aggregation of duplicate signatures when creating sync committee contributions 2021-11-07 21:41:10 +02:00
Jacek Sieka ea0a191723
Better REST/RPC error messages (#3046)
* Better REST/RPC error messages
* homogenise block logging (root first)
* homegenise message verification pipeline (verify in
`gossip_verification`, act in `eth2_processor`)
* use `subcommitteeIdx` consistently
* log each sent contribution
* fix block_sim
* fix block topic
* don't recalc root on gossip block validation
* move position loop into sync pool
2021-11-05 17:39:47 +02:00
Jacek Sieka a086cf01ac
altair fork handling cleanups (#3050)
* fix stack overflow crash in REST/debug/getStateV2
* introduce `ForkyXxx` for generic type matching of `Xxx` across
branches (SomeHashedBeaconState -> ForkyHashedBeaconState et al) -
`Some` is already used for other types of type classes
* consolidate function naming in BeaconChainDB, use some generics
* import `forks.nim` from other spec modules and move `Forked*` helpers
around to resolve circular imports
* remove `ForkedBeaconState`, use `ForkedHashedBeaconState` throughout
(less data shuffling between the types)
* fix several cases of states being stored on stack in tests, causing
random failures on some platforms
* remove reading json support from ncli - this should be ported to the
rest json reading instead (doesn't currently work because stack sizes)
2021-11-05 08:34:34 +01:00
Jacek Sieka 233d756518
Logging and startup improvements (#3038)
* Logging and startup improvements

Color support for released binaries!

* startup scripts no longer log to file by default - this only affects
source builds - released binaries don't support file logging
* add --log-stdout option to control logging to stdout (colors, json)
* detect tty:s vs redirected logs and log accordingly
* add option to disable log colors at runtime
* simplify several "common" logs, showing the most important information
earlier and more clearly
* remove line numbers / file information / tid - these take up space and
are of little use to end users
  * still enabled in debug builds and tools
* remove `testnet_servers_image` compile-time option
* server images, released binaries and compile-from-source now offer
the same behaviour and features
* fixes https://github.com/status-im/nimbus-eth2/issues/2326
* fixes https://github.com/status-im/nimbus-eth2/issues/1794
* remove instanteneous block speed from sync message, keeping only
average

before:

```
INF 2021-10-28 16:45:59.000+02:00 Slot start                                 topics="beacnde" tid=386429 file=nimbus_beacon_node.nim:884 lastSlot=2384027 wallSlot=2384028 delay=461us84ns peers=0 head=75a10ee5:3348 headEpoch=104 finalized=cd6804ba:3264 finalizedEpoch=102 sync="wwwwwwwwww:0:0.0000:0.0000:00h00m (3348)"
INF 2021-10-28 16:45:59.046+02:00 Slot end                                   topics="beacnde" tid=386429 file=nimbus_beacon_node.nim:821 slot=2384028 nextSlot=2384029 head=75a10ee5:3348 headEpoch=104 finalizedHead=cd6804ba:3264 finalizedEpoch=102 nextAttestationSlot=-1 nextProposalSlot=-1 nextActionWait=n/a
```

after:

```
INF 2021-10-28 22:43:23.033+02:00 Slot start                                 topics="beacnde" slot=2385815 epoch=74556 sync="DDPDDPUDDD:10:5.2258:01h19m (2361088)" peers=37 head=eacd2dae:2361096 finalized=73782:a4751487 delay=33ms687us715ns
INF 2021-10-28 22:43:23.291+02:00 Slot end                                   topics="beacnde" slot=2385815 nextActionWait=n/a nextAttestationSlot=-1 nextProposalSlot=-1 head=eacd2dae:2361096
```

* fix comment

* documentation updates

* mention `--log-file` may be deprecated in the future
* update various docs
2021-11-02 18:06:36 +01:00
tersec 36e37bda40
v1.1.3 spec refs URLs (#3036) 2021-10-27 18:40:17 +00:00
tersec 8307e9c601
mechanical non-merge v1.1.2 to v1.1.3 spec URL updates (#3030) 2021-10-26 16:44:23 +00:00
Jacek Sieka 421bf936ff
odds and ends (#3015)
* `allSyncCommittees` => `allSyncSubcommittees`
* simplify `_snappy` topic generation (avoid pointless string copies)
* simplify gossip id generator (avoid pointless string copies)
* avoid redundant syncnet ENR updates
* simplify topic validation (allow only validated topics)
2021-10-21 15:09:19 +02:00
Jacek Sieka 9cf32c3748 clean up sync subcommittee handling
* `SyncCommitteeIndex` -> `SyncSubcommitteeIndex`
* `syncCommitteePeriod` -> `sync_committee_period` (spec spelling)
* tighten period comparisons
* fix assert when validating committee message with non-altair state in
REST api
2021-10-20 22:59:13 +03:00
Jacek Sieka bf6ad41d7d add drop and sync committee metrics
* use storeBlock for processing API blocks
* avoid double block dump
* count all gossip metrics at the same spot
* simplify block broadcast
2021-10-20 18:20:12 +03:00
tersec c0a2f1c98e
refactor executionPayload tests; reduce HashSet creation (#3003) 2021-10-20 13:36:38 +02:00
Jacek Sieka df3fc9525f
import cleanup (#2997)
* import cleanup

...and remove some unused types

* add random imports

* more imports
2021-10-19 16:09:26 +02:00
Jacek Sieka c40cc6cec1 clean up fork enum and field names
* single naming strategy
* simplify some fork code
* simplify forked block production
2021-10-19 11:06:38 +03:00
Jacek Sieka 5b33783898
load altair state from database on startup (#2994)
* fixes replay from last phase0 block on startup
* better forked block loading
2021-10-18 14:32:54 +02:00
Jacek Sieka 4f7a8cf79d
register vc duties with subnet tracker (#2949)
* register vc duties with subnet tracker
* fix activation logging during startup
* cache slot signature to avoid duplicate signature work
* schedule aggregation duties one slot at a time to avoid CPU spike at
each epoch
* lower aggregation subnet pre-subscription time to 4 slots (lowers
bandwidth and CPU usage)
* update stability subnets in ENR on startup
* log gossip state
* perform gossip subscriptions just before the next slot starts
* document stuff
* add random include
* don't overwrite subscription state when not subscribed
* log target gossip state
* updating gossip status once is enough
* add test
* remove syncQueueLen - this one is not updated at the end of the sync
and may cause gossip to disconnect itself completely - use a simple head
distance instead
* fix gossip disconnection - if in hysteresis, node.gossipState will be
set to disabled even though we don't disable topic subscriptions
* fix extra duty registration call
2021-10-18 11:11:44 +02:00
Eugene Kabanov 7319f150e8
Number of REST fixes. (#2987)
* Initial commit.

* Fix path.

* Add validator keys to indices cache mechanism.
Move syncComitteeParticipants to common place.

* Fix sync participants order issue.

* Fix error code when state could not be found.
Refactor `state/validators` to use keysToIndices mechanism.

* Fix RestValidatorIndex to ValidatorIndex conversion TODOs.

* Address review comments.

* Fix REST test rules.
2021-10-14 12:38:38 +02:00
tersec 6cc8757930
update 18 spec URLs to v1.1.2; switch from eth2.0-specs to consensus-specs (#2990) 2021-10-14 06:30:21 +00:00
tersec 10981639f1
update 27 spec URLs to v1.1.2 (#2989) 2021-10-13 16:49:06 +00:00
Jacek Sieka f90b2b8b1f
reward accounting for altair+ (#2981)
Similar to the existing `RewardInfo`, this PR adds the infrastructure
needed to export epoch processing information from altair+. Because
accounting is done somewhat differently, the PR uses a fork-specific
object to extrct the information in order to make the cost on the spec
side low.

* RewardInfo -> EpochInfo, ForkedEpochInfo
* use array for computing new sync committee
* avoid repeated total active balance computations in block processing
* simplify proposer index check
* simplify epoch transition tests
* pre-compute base increment and reuse in epoch processing, and a few
other small optimizations

This PR introduces the type and does the heavy lifting in terms of
refactoring - the tools that use the accounting will need separate PR:s
(as well as refinements to the exportred information)
2021-10-13 16:24:36 +02:00
tersec 2eb9a608a4
add payloadId; add merge vector test script; remove consensusValidated (#2982) 2021-10-13 16:08:50 +02:00
tersec 2ad1b7366a
update 62 spec URLs to v1.1.2 (#2979) 2021-10-12 10:17:37 +00:00
tersec 0ae736f397
update 67 spec URLs to v1.1.2 (#2977) 2021-10-12 08:09:59 +00:00
Jacek Sieka fabec894dd
harden sync sub committee selection (#2965)
* harden sync sub committee selection

also turn it into an iterator

* fix test and warning
2021-10-07 13:19:47 +00:00
Ștefan Talpalaru 359937fa70
interop metrics (#2964)
https://github.com/ethereum/beacon-metrics/blob/master/metrics.md#interop-metrics
2021-10-07 06:19:07 +00:00
Etan Kissling 9ee134324b
allow `withXxx` to access fork-specific fields (#2943)
So far, `withState` and `withBlck` templates could only be used to have
convenience access to fork-agnostic BeaconState and BeaconBlock fields.
This patch:
- injects an additional `stateFork` constant that allows to use
  `when` expressions to also access Altair and Merge-specific fields.
- introduces a `withStateAndBlck` template to support operating on both
  a `BeaconState` and `BeaconBlock` at a time.
- makes sync committee related functions Merge aware.
- changes a couple if-else trees for forks into case statements so that
  forgotten future forks are promoted to compile-time errors.
2021-10-06 20:05:06 +03:00
Eugene Kabanov 0fffdaf96d
Number of REST fixes. (#2961)
* Fix /eth/v1/config/spec response.
Cache /eth/v1/node/version response.

* Fix REST test rule.

* Workaround for `state_id` issue.

* Fix /eth/v1/events `head`, `chain_reorg` and `finalized_checkpoint` responses.

* Fix client configuration object.
2021-10-06 15:43:03 +02:00
Etan Kissling 2bbffbde10
abort compile when fork epoch is forgotten (#2939)
There are a few locations in the code that compare the current epoch to
the various FORK_EPOCH constants and branch off into fork-specific code.
When a new fork is introduced, it is sometimes forgotten to update all
of those branch locations. This patch introduces a compile-time check
that ensures that all branches need to be covered exhaustively. This is
done by replacing if-elif structures with case expressions.
2021-10-04 08:31:21 +00:00
Etan Kissling b217150f1d
use forked `getAttestationsForBlock` everywhere (#2937)
There are a number of locations in the code that get attestations on a
forked beacon state. For attestation pools test, a convenience wrapper
was available to reduce clutter. This patch integrates that wrapper into
the core component so that it can also take advantage of the wrapper.
2021-10-01 01:29:32 +00:00
Etan Kissling 2e9fa87f8b
use `SyncAggregate.init()` everywhere (#2932)
The initialization of a `SyncAggregate` to its default value is not very
intuitive. There is an `init` function in `sync_committee_msg_pool` that
provides a convenience wrapper. This patch exports that initializer so
that the rest of the code base can also take advantage of it.
2021-09-30 13:56:07 +00:00
tersec 6b3bf7eb7b
merge hardfork database support (#2911)
* merge hardfork database support

* working block_sim

* recreate state transition changes
2021-09-30 01:07:24 +00:00
tersec 47a6045b05
update a dozen spec URLs (#2929) 2021-09-29 23:43:28 +00:00
tersec 1dc94aa36f
update 40 spec URLs to v1.1.0 (#2918) 2021-09-28 18:23:15 +00:00
Etan Kissling c510ffb16a
use `allSyncCommittees` everywhere (#2917)
There are still a few cases with manual loops through sync subcommittees
even though an `allSyncCommittees` iterator exists. Adjusted the few
remaining instances to also use the iterator instead.
2021-09-28 18:02:01 +00:00
tersec ca4c6b4c5c
update 30 spec URLs to v1.1.0 (#2914) 2021-09-28 14:01:46 +00:00
tersec aec5cf2b1b
update 31 spec reference URLs to v1.1.0 (#2910) 2021-09-28 07:46:06 +00:00
Etan Kissling 01a9b275ec
handle duplicate pubkeys in sync committee (#2902)
When sync committee message handling was introduced in #2830, the edge
case of the same validator being selected multiple times as part of a
sync subcommittee was not covered. Not handling that edge case makes
sync contributions have a lower-than-expected participation rate as each
sync validator is only counted up through once per subcommittee.
This patch ensures that this edge case is properly covered.
2021-09-28 07:44:20 +00:00
tersec 2b2846b468
implement forked merge state/block support (#2890)
* implement forked state/block support

* merge support for containsOrphan; import cleanup; 80-column lines

* add merge block header operations and slot sanity fixture

* add epoch state transition tests; implement is_valid_gas_limit(), is_merge_block(), is_execution_enabled(), and compute_timestamp_at_slot()

* implement process_execution_payload() and add merge deposit operations tests

* add merge block sanity tests

* add merge case to syncCommitteeParticipants

* v1.1.0-beta.5 updates

* reduce getTestStates-based memory usage; don't try to REST-serialize ExecutionPayload transactions without underlying support

* add execution payload tests; switch var to let in tests/official/
2021-09-27 14:22:58 +00:00
Jacek Sieka e47a8cbe42
fixes (#2901)
* export kvstore from beacon_chain_db
* fix rest HashList deserialization
* fix asTrusted
2021-09-27 11:24:58 +02:00
Eugene Kabanov 0c635334a2
Sync committee related REST API implementation. (#2856) 2021-09-24 01:13:25 +03:00
Eugene Kabanov b566d4657f
REST /eth/v1/events API call implementation. (#2878)
* Placing callbacks into strategic places.

* Initial events call implementation.

* Post rebase fixes.

* Change addSyncContribution() implementation.

* Add `attestation-sent` event.
Remove gcsafe, raises from callbacks implementations.
Move `attestation-received` fire at the end of attestation processing.

* Address review comments.
2021-09-22 14:17:15 +02:00
Mamy Ratsimbazafy d1cb5b7220
Parallel attestation verification (#2718)
* Add parallel attestation verification

* Update tests, batchVerify doesn't use the threadpool with only single core (nim-blscurve update)

* bump nim-blscurve

* Debug info for failing eth2 test vectors

* remove submodule eth2-testnets

* verbose debugging of make failure on Windows (libbacktrace?)

* Remove CI debug mode

* initialization convention

* Fix new altair tests
2021-09-17 03:13:52 +03:00
Dustin Brody 6638476b5f discard putative blocks from invalid hardforks 2021-09-16 16:18:40 +03:00
Zahary Karadjov 7d1efa443d Restore the sync committee pool pruning and add tests 2021-08-30 11:06:45 +03:00
tersec 166e22a43b
sync committee message pool and gossip validation (#2830) 2021-08-28 10:40:01 +00:00
tersec 9725d15a3e
update spec references from eth2.0-specs to consensus-specs and to v1.1.0-beta.2 (#2822) 2021-08-26 10:21:52 +02:00
Jacek Sieka ba06f13942
cleanups (#2809)
* cleanups

* use ForkedTrustedSignedBeaconBlock.ionit where appropriate
* move `is_aggregator` to `spec/`
* use `errReject` in a few more places
* update enr fork id when time is auspicious
* use network broadcast functions

* Return Ignore for aggregate signature validation timeouts

...consistently between aggregates and attestations.

* clean up some more reject/ignore rules
* shorten texts a bit

* errReject->checkedReject, use err helpers throughout

* get rid of quarantine in exitpool as well
2021-08-24 21:49:51 +02:00
tersec 4678a2bee7
move BeaconClock from ChainDAG to BeaconNode (#2796) 2021-08-20 08:58:15 +00:00
Jacek Sieka a7a65bce42
disentangle eth2 types from the ssz library (#2785)
* reorganize ssz dependencies

This PR continues the work in
https://github.com/status-im/nimbus-eth2/pull/2646,
https://github.com/status-im/nimbus-eth2/pull/2779 as well as past
issues with serialization and type, to disentangle SSZ from eth2 and at
the same time simplify imports and exports with a structured approach.

The principal idea here is that when a library wants to introduce SSZ
support, they do so via 3 files:

* `ssz_codecs` which imports and reexports `codecs` - this covers the
basic byte conversions and ensures no overloads get lost
* `xxx_merkleization` imports and exports `merkleization` to specialize
and get access to `hash_tree_root` and friends
* `xxx_ssz_serialization` imports and exports `ssz_serialization` to
specialize ssz for a specific library

Those that need to interact with SSZ always import the `xxx_` versions
of the modules and never `ssz` itself so as to keep imports simple and
safe.

This is similar to how the REST / JSON-RPC serializers are structured in
that someone wanting to serialize spec types to REST-JSON will import
`eth2_rest_serialization` and nothing else.

* split up ssz into a core library that is independendent of eth2 types
* rename `bytes_reader` to `codec` to highlight that it contains coding
and decoding of bytes and native ssz types
* remove tricky List init overload that causes compile issues
* get rid of top-level ssz import
* reenable merkleization tests
* move some "standard" json serializers to spec
* remove `ValidatorIndex` serialization for now
* remove test_ssz_merkleization
* add tests for over/underlong byte sequences
* fix broken seq[byte] test - seq[byte] is not an SSZ type

There are a few things this PR doesn't solve:

* like #2646 this PR is weak on how to handle root and other
dontSerialize fields that "sometimes" should be computed - the same
problem appears in REST / JSON-RPC etc

* Fix a build problem on macOS

* Another way to fix the macOS builds

Co-authored-by: Zahary Karadjov <zahary@gmail.com>
2021-08-18 20:57:58 +02:00
tersec a060985abc
unexport various parts of tests/ and remove unused code (#2794) 2021-08-18 13:58:43 +00:00
Jacek Sieka 7a622e8505
rework spec imports (#2779)
The spec imports are a mess to work with, so this branch cleans them up
a bit to ensure that we avoid generic sandwitches and that importing
stuff generally becomes easier.

* reexport crypto/digest/presets because these are part of the public
symbol set of the rest of the spec types
* don't export `merge` types from `base` - this causes circular deps
* fix circular deps in `ssz/spec_types` - this is the first step in
disentangling ssz from spec
* be explicit about phase0 vs altair - longer term, `altair` will become
the "natural" type set, then merge and so on, so no point in giving
`phase0` special preferential treatment
2021-08-12 13:08:20 +00:00
Jacek Sieka 9697b73e71
forkedbeaconstate_helpers -> forks (#2772)
Simpler module name for stuff that covers forks

* check that runtime config matches database state
* also include some assorted altair cleanups
* use "standard" genesis fork in local testnet to work around missing
runtime config support
2021-08-10 22:46:35 +02:00
Jacek Sieka a663ed65f4
Merge branch 'merge-stable' into unstable 2021-08-09 15:00:58 +02:00
tersec 2afe2802b6
altair topic switching (#2767)
* altair topic switching

* remove validate{Committee,Validator}IndexOr unused within branch
2021-08-09 12:54:45 +00:00
Jacek Sieka 7bb76a6cd1
Merge remote-tracking branch 'origin/stable' into merge-stable 2021-08-09 13:14:28 +02:00
Jacek Sieka ee79c10a7d
update validator key cache on startup (#2760)
* update validator key cache on startup

Versions prior to 1.1.0 do not write a validator key cache at all.

Versions from 1.4.0 and upwards require an immutable validator key cache
to verify blocks - normally, block verification fills the cache but that
assumes that at least one block was verified by a version that has the
key cache.

Taken together, this breaks direct upgrades from anything <1.1.0 to
1.4.0.

The fix is simply to refresh fill the cache from an existing state on
startup.

* also log serious block validation failures at info level
2021-08-05 11:26:10 +03:00
tersec a2c1b96ac9
prevent resyncing from genesis with altair head block (#2750) 2021-08-01 10:20:43 +02:00
tersec e4afc36d71
use ForkedTrustedSignedBeaconBlock (#2720)
* use ForkedTrustedSignedBeaconBlock

* remove --subscribe-all-subnets

* https://ethereum.github.io/eth2.0-APIs/#/Beacon/getBlock implementation was passing through forked beaconblocks
2021-07-14 12:18:52 +00:00
Jacek Sieka 23eea197f6
Implement split preset/config support (#2710)
* Implement split preset/config support

This is the initial bulk refactor to introduce runtime config values in
a number of places, somewhat replacing the existing mechanism of loading
network metadata.

It still needs more work, this is the initial refactor that introduces
runtime configuration in some of the places that need it.

The PR changes the way presets and constants work, to match the spec. In
particular, a "preset" now refers to the compile-time configuration
while a "cfg" or "RuntimeConfig" is the dynamic part.

A single binary can support either mainnet or minimal, but not both.
Support for other presets has been removed completely (can be readded,
in case there's need).

There's a number of outstanding tasks:

* `SECONDS_PER_SLOT` still needs fixing
* loading custom runtime configs needs redoing
* checking constants against YAML file

* yeerongpilly support

`build/nimbus_beacon_node --network=yeerongpilly --discv5:no --log-level=DEBUG`

* load fork epoch from config

* fix fork digest sent in status
* nicer error string for request failures
* fix tools

* one more

* fixup

* fixup

* fixup

* use "standard" network definition folder in local testnet

Files are loaded from their standard locations, including genesis etc,
to conform to the format used in the `eth2-networks` repo.

* fix launch scripts, allow unknown config values

* fix base config of rest test

* cleanups

* bundle mainnet config using common loader
* fix spec links and names
* only include supported preset in binary

* drop yeerongpilly, add altair-devnet-0, support boot_enr.yaml
2021-07-12 15:01:38 +02:00
zah eb2dc5cbbb
Implement the new Altair req/resp protocols (#2676)
* Implement the new Altair req/resp protocols

Also fixes the altair message-id computation by providing the correct
forkdigest prefix in `isAltairTopic`.

Co-authored-by: Tanguy Cizain <tanguycizain@gmail.com>
2021-07-07 12:09:47 +03:00
tersec ac7f719382
use isomorphicCast between beacon block types (#2698) 2021-07-06 14:32:49 +02:00
tersec 7577f8c2ef
add blockchain_dag altair database reading; add rollback tests (#2683)
* add blockchain_dag altair database reading; add rollback tests; fix some unnecessary type conversions

* remove debugging scaffolding

* proposeSignedBlock() will need to be async for merge; introduce altair types to VC
2021-06-29 15:09:29 +00:00
tersec 445def6c8b
block_clearance, ncli, and ncli_db Altair state saving (#2672)
* block_clearance, ncli, and ncli_db Altair state saving

* avoid invalidating SSZ hash caches with every assignment
2021-06-24 18:34:08 +00:00
tersec 41e0a7abc0
introduce database support for Altair (#2667)
* introduce immutable Altair BeaconState

* add database support for Altair blocks and states

* add tests for Altair get/put/contains/delete state

* enable blockchain_dag Altair state database storing

* properly return error on getting missing altair block
2021-06-24 07:11:47 +00:00
tersec ae1abf24af
add Altair support to block quarantine/clearance and block_sim (#2662)
* add Altair support to the block quarantine

* switch some spec/datatypes imports to spec/datatypes/base

* add Altair support to block_clearance

* allow runtime configuration of Altair transition slot

* enable Altair in block_sim, including in CI
2021-06-23 14:43:18 +00:00
tersec b1d5609171
remove false OnBlockAdded dependency on phase0 HashedBeaconState (#2661)
* remove false OnBlockAdded dependency on phase.HashedBeaconState

* introduce altair data types into block_clearance; update some alpha.6 spec refs to alpha.7; add get_active_validator_indices_len ForkedHashedBeaconState wrapper

* switch many modules from using datatypes (with phase0 states/blocks) to datatypes/base (fork-independent); update spec refs from alpha.6 to alpha.7 and remove rm'd G2_POINT_AT_INFINITY

* switch more modules from using datatypes (with phase0 states/blocks) to datatypes/base (fork-independent); update spec refs from alpha.6 to alpha.7

* remove unnecessary phase0-only wrapper of get_attesting_indices(); allow signatures_batch to process either fork; remove O(n^2) nested loop in process_inactivity_updates(); add altair support to getAttestationsforTestBlock()

* add Altair versions of asSigVerified(), asTrusted(), and makeBeaconBlock()

* fix spec URL to be Altair for Altair makeBeaconBlock()
2021-06-21 08:35:24 +00:00
tersec 9616220280
implement Altair attestation pool cache init (#2659)
* implement Altair attestation pool cache init

* remove code duplication around previous/current epoch updates
2021-06-17 17:13:14 +00:00
tersec 1c3314f08b
update to Altair as of v1.1.0-alpha.7 (#2649)
* update to Altair as of v1.1.0-alpha.7

* introduce Altair types into attestation pool

* avoid allocating/copying pubkeys excessively in get_next_sync_committee()
2021-06-14 17:42:46 +00:00
tersec 146fa48454
use ForkedHashedBeaconState in StateData (#2634)
* use ForkedHashedBeaconState in StateData

* fix FAR_FUTURE_EPOCH -> slot overflow; almost always use assign()

* avoid stack allocation in maybeUpgradeStateToAltair()

* create and use dispatch functions for check_attester_slashing(), check_proposer_slashing(), and check_voluntary_exit()

* use getStateRoot() instead of various state.data.hbsPhase0.root

* remove withStateVars.hashedState(), which doesn't work as a design anymore

* introduce spec/datatypes/altair into beacon_chain_db

* fix inefficient codegen for getStateField(largeStateField)

* state_transition_slots() doesn't either need/use blocks or runtime presets

* combine process_slots(HBS)/state_transition_slots(HBS) which differ only in last-slot htr optimization

* getStateField(StateData, ...) was replaced by getStateField(ForkedHashedBeaconState, ...)

* fix rollback

* switch some state_transition(), process_slots, makeTestBlocks(), etc to use ForkedHashedBeaconState

* remove state_transition(phase0.HashedBeaconState)

* remove process_slots(phase0.HashedBeaconState)

* remove state_transition_block(phase0.HashedBeaconState)

* remove unused callWithBS(); separate case expression from if statement

* switch back from nested-ref-object construction to (ref Foo)(Bar())
2021-06-11 20:51:46 +03:00
Jacek Sieka 9193be9b7b
fix epoch logging (fixes #2283) (#2642)
Also put epoch first to disambiguate vs slot
2021-06-11 01:07:16 +03:00
Jacek Sieka d859bc12f0
write uncompressed validator keys to database (#2639)
* write uncompressed validator keys to database

Loading 150k+ validator keys on startup in compressed format takes a lot
of time - better store them in uncompressed format which makes behaviour
just after startup faster / more predictable.

* refactor cached validator key access
* fix isomorphic cast to work with non-var instances
* remove cooked pubkey cache - directly use database cache in chaindag
as well (one less cache to keep in sync)
* bump blscurve, introduce loadValid for known-to-be-valid keys
2021-06-10 10:37:02 +03:00
tersec 8ebd496fbe
Altair transition tests (#2624)
* Working Altair transition tests

* with fixed upstream test vectors, remove state root workaround

* switch upgrade_to_altair() to returning a reference

* remove test_state_transition

* fix invalid fork state/block combinations error messages

* avoid memory copies by reintroducing state_transition_slots(var SomeHashedBeaconState)
2021-06-04 10:38:00 +00:00
Jacek Sieka b11da2cb34 fix state cache loading
* load the cache of the current state epoch instead of the target state
epoch, when applying states and slots
* load state cache for each slot/block (for longer slot jumps)
* load state cache after full updateStateData
* look up two state cache epochs, instead of the same epoch twice :)
2021-06-03 21:37:52 +03:00
tersec 28a5bca71a
split state_transition() into slots/block parts and use only block where appropriate (#2630) 2021-06-03 11:42:25 +02:00
Jacek Sieka 0fb02b5206 log state update duration, lower info threshold for detail logging 2021-06-01 20:43:44 +03:00
Jacek Sieka abe0d7b4ae singe validator key cache
Instead of keeping a validator key list per EpochRef, this PR introduces
a single shared validator key list in ChainDAG, and cleans up some other
ChainDAG and key-related issues.

The PR does not introduce the validator key list in the state transition
- this is because we batch-check all signatures before entering the spec
code, thus the spec code never hits the cache.

A future refactor should _probably_ remove the threadvar altogether.

There's a few other small fixes in here that make the flow easier to
read:

* fix `var ChainDAGRef` -> `ChainDAGRef`
* fix `var QuarantineRef` -> `QuarantineRef`
* consistent `dag` variable name
* avoid using threadvar pubkey cache in most cases
* better error messages in batch signature checking
2021-06-01 20:43:44 +03:00
tersec ea9ceb693a
update ChainDAG.effective_balance() to use StateData; rm ChainDAG.getBlockByPreciseSlot() (#2622)
* update ChainDAG.effective_balance() to use StateData; rm unused ChainDAG.getBlockByPreciseSlot()

* update get_effective_balances to avoid god object; avoid most memory allocation in Altair epoch reward and penalty processing
2021-06-01 12:40:13 +00:00
Jacek Sieka 9b89f58089
revert advance back to trace 2021-06-01 14:09:11 +02:00
Jacek Sieka 60df17786e avoid reading legacy db on write
* don't consider legacy database when writing state - this read is slow
on kvstore
* avoid epoch transition when there's an exact match in cache already
* simplify init to only consider checkpoint states
2021-05-30 12:32:51 +03:00
Jacek Sieka df7bc87af5 Pre-compute slot transition for clearance state
This way we perform the expensive epoch processing before the block
arrives.

Of course, this may lead to speculative misses which in turn lead to
replays - it's likely that in the case of a miss, we'll see a replay
regardless.
2021-05-30 12:04:09 +03:00
Jacek Sieka 2df8a3b28d
add more block processing durations (#2611) 2021-05-28 21:03:20 +02:00
Jacek Sieka 7f52ffb8d9
clean up block processing (#2610)
* gossip_to_consensus -> block_processor (it's processing only blocks,
but not only from gossip)
* measure queue and validation time for blocks
* measure assignment and state loading times for updateStateData
* avoid some unnecessary block copies in block sync
* warn that database is corrupt if we hit tail without a state
2021-05-28 19:34:00 +03:00
tersec 46c5a0110a
log doppelganger attestation signature; rm withState.HashedBeaconState uses (#2608) 2021-05-28 15:51:15 +03:00
Jacek Sieka eebc828778
create new database in separate file (#2596)
The V1 table structure shows great improvements in performance, but if
there's an old `kvstore` without rowid:s, these benefits are nullified:
reorgs during writes and deletes remain expensive (even if the
degradation is reduced somewhat).

This PR creates the tables in a new file instead, and uses the old file
as a read-only store - this has several interesting properties:

* the old database is left completely untouched - this guarantees that
downgrades work smooth (they'll only need to resync their missing
portions)
* starting sync after this PR means only a v1 database is created
* v0 databases stick around - no migration is performed (for now)

Future PR:s can introduce migration of the data from one database to
another - a simply copy will take hours which is downtime we want to
avoid - at that point, it might make sense to migrate straight to era
files instead.
2021-05-26 09:07:18 +02:00
tersec 0b0bfd1de0
use StateData in place of BeaconState outside state transition code (#2551)
* use StateData in place of BeaconState outside state transition code

* propagate more StateData usage

* remove withStateVars().state

* wrap get_beacon_committee(BeaconState, ...) as gbc(StateData, ...)

* switch makeAttestation() to use StateData

* use StateData wrapper/dispatcher for get_committee_count_per_slot()

* convert AttestationCache.init(), weak subjectivity functions, and updateValidatorMetrics()

* add get_shuffled_active_validator_indices(StateData) and get_block_root_at_slot(StateData)

* switch makeAttestationData() to StateData

* sync AllTests-mainnet.md after rebase
2021-05-21 09:23:28 +00:00
Jacek Sieka 97f4e1fffe
Db1 cont (#2573)
* Revert "Revert "Upgrade database schema" (#2570)"

This reverts commit 6057c2ffb4.

* ssz: fix loading empty lists into existing instances

Not a problem earlier because we didn't reuse instances

* bump nim-eth

* bump nim-web3
2021-05-17 18:37:26 +02:00
tersec 6057c2ffb4
Revert "Upgrade database schema" (#2570)
This reverts commit 22ddf74752.
2021-05-17 06:34:44 +00:00
Jacek Sieka 22ddf74752 Upgrade database schema
The `kvstore` design we're using now turns out to not be the best way to
use `sqlite` - in particular, there are some significant benefits to
using rowid in certain situations and to keep data in separate tables.

With this branch, there are massive improvements in startup time
(seconds instead of minutes) and state/block storage and pruning times
(milliseconds instead of seconds) - these improvements can in particular
be seen on slow drives and translate directly into better attestation
performance.

* update kvstore to new keyspace design
* remove `DirStoreRef` and the hidden `--state-db-kind` option - this
was an experiment to store large blobs in files, but with the new
kvstore, there's no compelling reason to do so
* remove `DbMap` - unused and would need updating for new keyspace
design
* introduce separate tables for each data type (blocks, states etc)
* remove "WITHOUT ROWID" pessimization for tables with large blobs
* close DbSeq statements explicitly (and earlier)
* store beacon block summaries in separate table, without SSZ
compression and load them all with single query on startup
* stop storing backwards compat full states
* mark genesis beacon block as trusted
* avoid faststreams when loading SSZ data
* remove `DisagreementBehavior` (unused)
2021-05-14 20:05:23 +03:00
Jacek Sieka 867d8f3223
Perform attestation check before broadcast (#2550)
Currently, we have a bit of a convoluted flow where when sending
attestations, we start broadcasting them over gossip then pass them to
the attestation validation to include them in the local attestation pool
- it should be the other way around: we should be checking attestations
_before_ gossipping them - this serves as an additional safety net to
ensure that we don't publish junk - this becomes more important when
publishing attestations from the API.

Also, the REST API was performing its own validation meaning
attestations coming from REST would be validated twice - finally, the
JSON RPC wasn't pre-validating and would happily broadcast invalid
attestations.

* Unified attestation production pipeline with the same flow for gossip,
locally and API-produced attestations: all are now validated and entered
into the pool, then broadcast/republished
* Refactor subnet handling with specific SubnetId alias, streamlining
where subnets are computed, avoiding the need to pass around the number
of active validators
* Move some of the subnet handling code to eth2_network
* Use BitArray throughout for subnet handling
2021-05-10 09:13:36 +02:00
Jacek Sieka 646923c3dd
add attestation stats tool to ncli_db (#2539)
This also makes future efforts to provide metrics and logs for
attestation efficiency easier

* Export rewards from epoch transition
* Use less memory for reward calculation (bool -> set[enum], field
alignment)
* Reuse reward memory when replaying, avoiding spike
* Allow replaying any range in ncli_db benchmark
2021-05-07 13:36:21 +02:00
tersec 1d6c8ee9ab
store full state 4x less often (#2542) 2021-05-06 07:36:18 +02:00
Jacek Sieka 427c0f307c
avoid extraneous hash root calculation (#2537)
When applying a block, we'll currently compute a state root for the
state after slot processing but before block processing - this is
unnecessary when a block is being applied because the intermediate state
root is never observed.
2021-05-05 08:54:21 +02:00
Jacek Sieka ce49da6c0a
Introduce unittest2 and junit reports (#2522)
* Introduce unittest2 and junit reports

* fix XML path

* don't combine multiple CI runs

* fixup

* public combined report also

Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
2021-04-28 18:41:02 +02:00
Jacek Sieka 7dba1b37dd
remove attestation/aggregate queue (#2519)
With the introduction of batching and lazy attestation aggregation, it
no longer makes sense to enqueue attestations between the signature
check and adding them to the attestation pool - this only takes up
valuable CPU without any real benefit.

* add successfully validated attestations to attestion pool directly
* avoid copying participant list around for single-vote attestations,
pass single validator index instead
* release decompressed gossip memory earlier, specially during async
message validation
* use cooked signatures in a few more places to avoid reloads and errors
* remove some Defect-raising versions of signature-loading
* release decompressed data memory before validating message
2021-04-26 22:39:44 +02:00
Jacek Sieka 54d6884c89
fix sync issue when upgrading from 1.1.0-inited db
This patch writes a full genesis state to `kvstore` if one was missing,
which fixes 1.2.0 restarting sync when upgrading from 1.1.0, or when
downgrading to a pre-1.1.0 release.
2021-04-20 16:55:18 +03:00
tersec 99fccaee6e
more abstraction over BeaconState (#2509)
* more abstraction over BeaconState

* use HashedBeaconState copy of htr
2021-04-16 08:49:37 +00:00
Jacek Sieka f1f424cc2d attestation processing speedups
* avoid creating indexed attestation just to check signatures - above
all, don't create it when not checking signatures ;)
* avoid pointer op when adding attestation to pool
* better iterator for yielding attestations
* add metric / log for attestation packing time
2021-04-14 21:51:17 +03:00
Jacek Sieka 4ed2e34a9e Revamp attestation pool
This is a revamp of the attestation pool that cleans up several aspects
of attestation processing as the network grows larger and block space
becomes more precious.

The aim is to better exploit the divide between attestation subnets and
aggregations by keeping the two kinds separate until it's time to either
produce a block or aggregate. This means we're no longer eagerly
combining single-vote attestations, but rather wait until the last
moment, and then try to add singles to all aggregates, including those
coming from the network.

Importantly, the branch improves on poor aggregate quality and poor
attestation packing in cases where block space is running out.

A basic greed scoring mechanism is used to select attestations for
blocks - attestations are added based on how much many new votes they
bring to the table.

* Collect single-vote attestations separately and store these until it's
time to make aggregates
* Create aggregates based on single-vote attestations
* Select _best_ aggregate rather than _first_ aggregate when on
aggregation duty
* Top up all aggregates with singles when it's time make the attestation
cut, thus improving the chances of grabbing the best aggregates out
there
* Improve aggregation test coverage
* Improve bitseq operations
* Simplify aggregate signature creation
* Make attestation cache temporary instead of storing it in attestation
pool - most of the time, blocks are not being produced, no need to keep
the data around
* Remove redundant aggregate storage that was used only for RPC
* Use tables to avoid some linear seeks when looking up attestation data
* Fix long cleanup on large slot jumps
* Avoid some pointers
* Speed up iterating all attestations for a slot (fixes #2490)
2021-04-13 20:24:02 +03:00
cheatfate 477decbcf5 Address #2490. 2021-04-13 17:07:41 +03:00