Commit Graph

56 Commits

Author SHA1 Message Date
Jacek Sieka 23eea197f6
Implement split preset/config support (#2710)
* Implement split preset/config support

This is the initial bulk refactor to introduce runtime config values in
a number of places, somewhat replacing the existing mechanism of loading
network metadata.

It still needs more work, this is the initial refactor that introduces
runtime configuration in some of the places that need it.

The PR changes the way presets and constants work, to match the spec. In
particular, a "preset" now refers to the compile-time configuration
while a "cfg" or "RuntimeConfig" is the dynamic part.

A single binary can support either mainnet or minimal, but not both.
Support for other presets has been removed completely (can be readded,
in case there's need).

There's a number of outstanding tasks:

* `SECONDS_PER_SLOT` still needs fixing
* loading custom runtime configs needs redoing
* checking constants against YAML file

* yeerongpilly support

`build/nimbus_beacon_node --network=yeerongpilly --discv5:no --log-level=DEBUG`

* load fork epoch from config

* fix fork digest sent in status
* nicer error string for request failures
* fix tools

* one more

* fixup

* fixup

* fixup

* use "standard" network definition folder in local testnet

Files are loaded from their standard locations, including genesis etc,
to conform to the format used in the `eth2-networks` repo.

* fix launch scripts, allow unknown config values

* fix base config of rest test

* cleanups

* bundle mainnet config using common loader
* fix spec links and names
* only include supported preset in binary

* drop yeerongpilly, add altair-devnet-0, support boot_enr.yaml
2021-07-12 15:01:38 +02:00
zah eb2dc5cbbb
Implement the new Altair req/resp protocols (#2676)
* Implement the new Altair req/resp protocols

Also fixes the altair message-id computation by providing the correct
forkdigest prefix in `isAltairTopic`.

Co-authored-by: Tanguy Cizain <tanguycizain@gmail.com>
2021-07-07 12:09:47 +03:00
Jacek Sieka 7825d12448
increase block attestation wait time (#2705)
We generally send out attestations 250 ms after the block arrives.
Recent efficiency improvements have led to a slightly increased
incidence of "slot 0" issues  where attestations are dropped by other
nodes because they have not yet had time to process the block due to
epoch processing taking time.

This PR mitigates the problem by increasing the window between receiving
the block and sending out attestations.
2021-07-06 15:11:18 +02:00
tersec 7577f8c2ef
add blockchain_dag altair database reading; add rollback tests (#2683)
* add blockchain_dag altair database reading; add rollback tests; fix some unnecessary type conversions

* remove debugging scaffolding

* proposeSignedBlock() will need to be async for merge; introduce altair types to VC
2021-06-29 15:09:29 +00:00
tersec ae1abf24af
add Altair support to block quarantine/clearance and block_sim (#2662)
* add Altair support to the block quarantine

* switch some spec/datatypes imports to spec/datatypes/base

* add Altair support to block_clearance

* allow runtime configuration of Altair transition slot

* enable Altair in block_sim, including in CI
2021-06-23 14:43:18 +00:00
tersec b1d5609171
remove false OnBlockAdded dependency on phase0 HashedBeaconState (#2661)
* remove false OnBlockAdded dependency on phase.HashedBeaconState

* introduce altair data types into block_clearance; update some alpha.6 spec refs to alpha.7; add get_active_validator_indices_len ForkedHashedBeaconState wrapper

* switch many modules from using datatypes (with phase0 states/blocks) to datatypes/base (fork-independent); update spec refs from alpha.6 to alpha.7 and remove rm'd G2_POINT_AT_INFINITY

* switch more modules from using datatypes (with phase0 states/blocks) to datatypes/base (fork-independent); update spec refs from alpha.6 to alpha.7

* remove unnecessary phase0-only wrapper of get_attesting_indices(); allow signatures_batch to process either fork; remove O(n^2) nested loop in process_inactivity_updates(); add altair support to getAttestationsforTestBlock()

* add Altair versions of asSigVerified(), asTrusted(), and makeBeaconBlock()

* fix spec URL to be Altair for Altair makeBeaconBlock()
2021-06-21 08:35:24 +00:00
tersec 53d05060c9
fix assertion in beacon block creation rollback/restore (#2655) 2021-06-17 09:22:39 +02:00
tersec 146fa48454
use ForkedHashedBeaconState in StateData (#2634)
* use ForkedHashedBeaconState in StateData

* fix FAR_FUTURE_EPOCH -> slot overflow; almost always use assign()

* avoid stack allocation in maybeUpgradeStateToAltair()

* create and use dispatch functions for check_attester_slashing(), check_proposer_slashing(), and check_voluntary_exit()

* use getStateRoot() instead of various state.data.hbsPhase0.root

* remove withStateVars.hashedState(), which doesn't work as a design anymore

* introduce spec/datatypes/altair into beacon_chain_db

* fix inefficient codegen for getStateField(largeStateField)

* state_transition_slots() doesn't either need/use blocks or runtime presets

* combine process_slots(HBS)/state_transition_slots(HBS) which differ only in last-slot htr optimization

* getStateField(StateData, ...) was replaced by getStateField(ForkedHashedBeaconState, ...)

* fix rollback

* switch some state_transition(), process_slots, makeTestBlocks(), etc to use ForkedHashedBeaconState

* remove state_transition(phase0.HashedBeaconState)

* remove process_slots(phase0.HashedBeaconState)

* remove state_transition_block(phase0.HashedBeaconState)

* remove unused callWithBS(); separate case expression from if statement

* switch back from nested-ref-object construction to (ref Foo)(Bar())
2021-06-11 20:51:46 +03:00
Jacek Sieka d859bc12f0
write uncompressed validator keys to database (#2639)
* write uncompressed validator keys to database

Loading 150k+ validator keys on startup in compressed format takes a lot
of time - better store them in uncompressed format which makes behaviour
just after startup faster / more predictable.

* refactor cached validator key access
* fix isomorphic cast to work with non-var instances
* remove cooked pubkey cache - directly use database cache in chaindag
as well (one less cache to keep in sync)
* bump blscurve, introduce loadValid for known-to-be-valid keys
2021-06-10 10:37:02 +03:00
Jacek Sieka abe0d7b4ae singe validator key cache
Instead of keeping a validator key list per EpochRef, this PR introduces
a single shared validator key list in ChainDAG, and cleans up some other
ChainDAG and key-related issues.

The PR does not introduce the validator key list in the state transition
- this is because we batch-check all signatures before entering the spec
code, thus the spec code never hits the cache.

A future refactor should _probably_ remove the threadvar altogether.

There's a few other small fixes in here that make the flow easier to
read:

* fix `var ChainDAGRef` -> `ChainDAGRef`
* fix `var QuarantineRef` -> `QuarantineRef`
* consistent `dag` variable name
* avoid using threadvar pubkey cache in most cases
* better error messages in batch signature checking
2021-06-01 20:43:44 +03:00
tersec 46c5a0110a
log doppelganger attestation signature; rm withState.HashedBeaconState uses (#2608) 2021-05-28 15:51:15 +03:00
Jacek Sieka d16da06c92 ncli_db: validator performance database tool
Record attestation performance per epoch in sqlite database
2021-05-27 19:14:26 +03:00
tersec 0b0bfd1de0
use StateData in place of BeaconState outside state transition code (#2551)
* use StateData in place of BeaconState outside state transition code

* propagate more StateData usage

* remove withStateVars().state

* wrap get_beacon_committee(BeaconState, ...) as gbc(StateData, ...)

* switch makeAttestation() to use StateData

* use StateData wrapper/dispatcher for get_committee_count_per_slot()

* convert AttestationCache.init(), weak subjectivity functions, and updateValidatorMetrics()

* add get_shuffled_active_validator_indices(StateData) and get_block_root_at_slot(StateData)

* switch makeAttestationData() to StateData

* sync AllTests-mainnet.md after rebase
2021-05-21 09:23:28 +00:00
Zahary Karadjov dc49a51654
Merge stable into unstable (take 2) 2021-05-20 13:52:09 +03:00
Zahary Karadjov b7aa30adfd
Merge stable into unstable 2021-05-20 13:50:40 +03:00
tersec d8bb91d9a9
partially integrate eth1 merge changes (#2548)
* partially integrate eth1 merge changes

* use hexToSeqByte() and validate execution engine opaque transaction length

* remove incorrect REST serialization code
2021-05-20 10:44:13 +00:00
Zahary Karadjov 2cb1396969
Log the slashing DB pruning time 2021-05-17 21:42:28 +03:00
Zahary Karadjov b9924214ab
Better error-handling for the slashingdb import/export feature
* Error when specifying an invalid --data-dir (or --validator-dir)
* Error when entering an invalid validator public key (e.g. invalid hex value)
* Warning when attempting to export a validator not present in the local database

Some unnecessary remains of the v1 mode has been removed as well
2021-05-17 21:42:23 +03:00
Jacek Sieka 97f4e1fffe
Db1 cont (#2573)
* Revert "Revert "Upgrade database schema" (#2570)"

This reverts commit 6057c2ffb4.

* ssz: fix loading empty lists into existing instances

Not a problem earlier because we didn't reuse instances

* bump nim-eth

* bump nim-web3
2021-05-17 18:37:26 +02:00
Zahary Karadjov 5c313b958e
Simplify the slashing db import/export CLI 2021-05-17 17:12:03 +03:00
tersec 6057c2ffb4
Revert "Upgrade database schema" (#2570)
This reverts commit 22ddf74752.
2021-05-17 06:34:44 +00:00
Mamy André-Ratsimbazafy dacc508992
slashing import integrated in NBC 2021-05-16 21:48:38 +03:00
Mamy André-Ratsimbazafy 0574531c43
add restore from slashing DB 2021-05-16 21:45:24 +03:00
Jacek Sieka 895ccd1c95 clean up imports (#2557) 2021-05-14 20:08:07 +03:00
Jacek Sieka 22ddf74752 Upgrade database schema
The `kvstore` design we're using now turns out to not be the best way to
use `sqlite` - in particular, there are some significant benefits to
using rowid in certain situations and to keep data in separate tables.

With this branch, there are massive improvements in startup time
(seconds instead of minutes) and state/block storage and pruning times
(milliseconds instead of seconds) - these improvements can in particular
be seen on slow drives and translate directly into better attestation
performance.

* update kvstore to new keyspace design
* remove `DirStoreRef` and the hidden `--state-db-kind` option - this
was an experiment to store large blobs in files, but with the new
kvstore, there's no compelling reason to do so
* remove `DbMap` - unused and would need updating for new keyspace
design
* introduce separate tables for each data type (blocks, states etc)
* remove "WITHOUT ROWID" pessimization for tables with large blobs
* close DbSeq statements explicitly (and earlier)
* store beacon block summaries in separate table, without SSZ
compression and load them all with single query on startup
* stop storing backwards compat full states
* mark genesis beacon block as trusted
* avoid faststreams when loading SSZ data
* remove `DisagreementBehavior` (unused)
2021-05-14 20:05:23 +03:00
Jacek Sieka 0022015a91
clean up imports (#2557) 2021-05-12 14:31:02 +02:00
Mamy Ratsimbazafy 149ff49c8e
Remove correlated queries in finalization pruning, all use indexes (#2554) 2021-05-11 10:41:37 +02:00
Mamy Ratsimbazafy e6b559a35a
Slashing db pruning [Merge only after v2 has been default for 1 noticeable release] (#2452)
* Enable slashing DB pruning

* integrate slashing DB pruning with onSlotEnd

* rebase tests
2021-05-10 16:32:28 +02:00
Jacek Sieka 867d8f3223
Perform attestation check before broadcast (#2550)
Currently, we have a bit of a convoluted flow where when sending
attestations, we start broadcasting them over gossip then pass them to
the attestation validation to include them in the local attestation pool
- it should be the other way around: we should be checking attestations
_before_ gossipping them - this serves as an additional safety net to
ensure that we don't publish junk - this becomes more important when
publishing attestations from the API.

Also, the REST API was performing its own validation meaning
attestations coming from REST would be validated twice - finally, the
JSON RPC wasn't pre-validating and would happily broadcast invalid
attestations.

* Unified attestation production pipeline with the same flow for gossip,
locally and API-produced attestations: all are now validated and entered
into the pool, then broadcast/republished
* Refactor subnet handling with specific SubnetId alias, streamlining
where subnets are computed, avoiding the need to pass around the number
of active validators
* Move some of the subnet handling code to eth2_network
* Use BitArray throughout for subnet handling
2021-05-10 09:13:36 +02:00
Jacek Sieka efdf759cc0
avoid some slashing protection queries (#2528)
This PR reduces the number of database queries for slashing protection
from 5 reads and 1 write to 2 reads and 1 write in the optimistic case.

In the process, it removes user-level support for writing the database
in the version 1 format in order to simplify the code flow, and prevent
code rot. In particular, the v1 format was not covered by any unit tests
and has no advantages over v2. The concrete code to read and write it
remains for now, in particular to support upgrades from v1 to v2.

The branch also removes the use of concepts which doesn't work with
checked exceptions - in particular, this highlights code that both
raises exceptions and returns error codes, which could be cleaned up in
the future.

* Cache internal validator ID
* Rely on unique index to check for trivial duplicate votes
* Combine two surround vote queries into one
* Combine API for checking and registering slashing into single function

The slashing DB is normally not a bottleneck, but may become one with
high attached validator counts.
2021-05-04 15:17:28 +02:00
Jacek Sieka 7dba1b37dd
remove attestation/aggregate queue (#2519)
With the introduction of batching and lazy attestation aggregation, it
no longer makes sense to enqueue attestations between the signature
check and adding them to the attestation pool - this only takes up
valuable CPU without any real benefit.

* add successfully validated attestations to attestion pool directly
* avoid copying participant list around for single-vote attestations,
pass single validator index instead
* release decompressed gossip memory earlier, specially during async
message validation
* use cooked signatures in a few more places to avoid reloads and errors
* remove some Defect-raising versions of signature-loading
* release decompressed data memory before validating message
2021-04-26 22:39:44 +02:00
tersec 99fccaee6e
more abstraction over BeaconState (#2509)
* more abstraction over BeaconState

* use HashedBeaconState copy of htr
2021-04-16 08:49:37 +00:00
Jacek Sieka f1f424cc2d attestation processing speedups
* avoid creating indexed attestation just to check signatures - above
all, don't create it when not checking signatures ;)
* avoid pointer op when adding attestation to pool
* better iterator for yielding attestations
* add metric / log for attestation packing time
2021-04-14 21:51:17 +03:00
tersec 050e3ac48b
abstract over more BeaconState usage (#2496) 2021-04-14 11:34:35 +02:00
Jacek Sieka 4ed2e34a9e Revamp attestation pool
This is a revamp of the attestation pool that cleans up several aspects
of attestation processing as the network grows larger and block space
becomes more precious.

The aim is to better exploit the divide between attestation subnets and
aggregations by keeping the two kinds separate until it's time to either
produce a block or aggregate. This means we're no longer eagerly
combining single-vote attestations, but rather wait until the last
moment, and then try to add singles to all aggregates, including those
coming from the network.

Importantly, the branch improves on poor aggregate quality and poor
attestation packing in cases where block space is running out.

A basic greed scoring mechanism is used to select attestations for
blocks - attestations are added based on how much many new votes they
bring to the table.

* Collect single-vote attestations separately and store these until it's
time to make aggregates
* Create aggregates based on single-vote attestations
* Select _best_ aggregate rather than _first_ aggregate when on
aggregation duty
* Top up all aggregates with singles when it's time make the attestation
cut, thus improving the chances of grabbing the best aggregates out
there
* Improve aggregation test coverage
* Improve bitseq operations
* Simplify aggregate signature creation
* Make attestation cache temporary instead of storing it in attestation
pool - most of the time, blocks are not being produced, no need to keep
the data around
* Remove redundant aggregate storage that was used only for RPC
* Use tables to avoid some linear seeks when looking up attestation data
* Fix long cleanup on large slot jumps
* Avoid some pointers
* Speed up iterating all attestations for a slot (fixes #2490)
2021-04-13 20:24:02 +03:00
Dustin Brody 398c151b7d
revert change 2021-04-13 18:50:06 +02:00
Dustin Brody d6fa4d06bc
abstract over more BeaconState usage 2021-04-13 18:47:44 +02:00
tersec 498c998552
abstract over most withStateVars/withState state var usage (#2484)
* abstract over most withStateVars/withState state var usage

* cleanups
2021-04-13 15:05:44 +02:00
tersec d3cad92693
remove some BeaconState use and abstract over other uses (#2482)
* remove some BeaconState use and abstract over other uses

* remove out-of-context comment
2021-04-08 08:24:25 +00:00
Mamy Ratsimbazafy 6b13cdce36
Batch attestations (#2439)
* batch attestations

* Fixes (but now need to investigate the chronos 0 .. 4095 crash similar to https://github.com/status-im/nimbus-eth2/issues/1518

* Try to remove the processing loop to no avail :/

* batch aggregates

* use resultsBuffer size for triggering deadline schedule

* pass attestation pool tests

* Introduce async gossip validators. May fix the 4096 bug (reentrancy issue?) (similar to sync unknown blocks #1518)

* Put logging at debug level, add speed info

* remove unnecessary batch info when it is known to be one

* downgrade some logs to trace level

* better comments [skip ci]

* Address most review comments

* only use ref for async proc

* fix exceptions in eth2_network

* update async exceptions in gossip_validation

* eth2_network 2nd pass

* change to sleepAsync

* Update beacon_chain/gossip_processing/batch_validation.nim

Co-authored-by: Jacek Sieka <jacek@status.im>

Co-authored-by: Jacek Sieka <jacek@status.im>
2021-04-02 16:36:43 +02:00
Jacek Sieka 74732a23fe
json cleanups (#2456)
* move json-rpc specific marshalling to rpc
* serialize Epoch/Slot with cast to avoid Defect
* avoid a few eth1 deps
* simplify imports
2021-03-26 15:11:06 +01:00
Jacek Sieka 2695cfa864
EH cleanup (#2455)
almost 100% raises in nimbus-eth2 now!

* fix some rare exception-related crashes in json-rpc
2021-03-26 07:52:01 +01:00
Zahary Karadjov 2eacfc4685 Bump modules to take advantage of the new Json format flavors support
Since quite a lot of additional procs were now compiled as generics, this lead to compiler bugs that had
to be worked-around:

* The `Domain` type was renamed to `Eth2Domain` to avoid compilation errors
  due to conflicts with `nativesockets.Domain`.
  Similarly, `eth2_network.KeyPair` was renamed to `NetKeyPair`.

* A new more robust version of `hexToByteArray` was added to stew
2021-03-25 09:37:35 +02:00
Jacek Sieka 8b76ceed52
Fix minor exception effect issues (#2448)
Makes code compatible with
https://github.com/status-im/nim-chronos/pull/166 without requiring it.
2021-03-24 17:20:55 +01:00
tersec 3076f5c3b6
rm std/random from beacon_chain and rm attestation timing randomness (#2442)
* remove added attestation timing randomness

* remove os/random from rest of beacon_chain, primarily deposit_contract

* remove scaffolding

* randomize std/random seed in beacon node and validator client

* use CSPRNG to more securely seed std/random
2021-03-23 06:57:10 +00:00
tersec 97850741a0
remove unused imports in slashing prtection (#2436) 2021-03-19 08:26:02 +00:00
Jacek Sieka 3cb31e66b4
set upper bound on EpochRef cache (#2403)
* set upper bound on EpochRef cache

* max 32 EpochRef instances
* less memory waste in BlockRef by removing EpochRef seq that is mostly
unused (~20mb)
* less memory waste in dag block lookup by not keeping an extra copy of
digest (~70mb)
* fix `==` and `$` for Eth2Digest
* remove `ChainDAG.tmpState` (~50mb?)

all in all, this branch cuts mainnet memory usage by ~160-180mb and puts
limits on EpochRef cache usage - where normally it hovered around 950mb
before, it's now sitting at 600-700mb on my machine.

* docs
2021-03-17 11:17:15 +01:00
Mamy Ratsimbazafy c47d636cb3
Split Eth2Processor in prep for batching (#2396)
* Split Eth2Processor in gossip and consensus part and materialize the shared block queue

* Update initialization in test_sync_manager
2021-03-11 11:10:57 +01:00
Mamy Ratsimbazafy f7cddcc8ab
Fix #2393 (#2395)
* Fix #2393

* check both

* Fix shortLog(int64)
2021-03-10 16:53:42 +01:00
tersec 82c300186b
annotate slashing protection v2 with uint64 -> int64 overflow conditions (#2392)
* annotate slashing protection v2 with uint64 -> int64 overflow conditions

* fix variables

* remove assertion which gets tripped by interchange tests
2021-03-10 08:35:04 +00:00