* update validator key cache on startup
Versions prior to 1.1.0 do not write a validator key cache at all.
Versions from 1.4.0 and upwards require an immutable validator key cache
to verify blocks - normally, block verification fills the cache but that
assumes that at least one block was verified by a version that has the
key cache.
Taken together, this breaks direct upgrades from anything <1.1.0 to
1.4.0.
The fix is simply to refresh fill the cache from an existing state on
startup.
* also log serious block validation failures at info level
* Implement split preset/config support
This is the initial bulk refactor to introduce runtime config values in
a number of places, somewhat replacing the existing mechanism of loading
network metadata.
It still needs more work, this is the initial refactor that introduces
runtime configuration in some of the places that need it.
The PR changes the way presets and constants work, to match the spec. In
particular, a "preset" now refers to the compile-time configuration
while a "cfg" or "RuntimeConfig" is the dynamic part.
A single binary can support either mainnet or minimal, but not both.
Support for other presets has been removed completely (can be readded,
in case there's need).
There's a number of outstanding tasks:
* `SECONDS_PER_SLOT` still needs fixing
* loading custom runtime configs needs redoing
* checking constants against YAML file
* yeerongpilly support
`build/nimbus_beacon_node --network=yeerongpilly --discv5:no --log-level=DEBUG`
* load fork epoch from config
* fix fork digest sent in status
* nicer error string for request failures
* fix tools
* one more
* fixup
* fixup
* fixup
* use "standard" network definition folder in local testnet
Files are loaded from their standard locations, including genesis etc,
to conform to the format used in the `eth2-networks` repo.
* fix launch scripts, allow unknown config values
* fix base config of rest test
* cleanups
* bundle mainnet config using common loader
* fix spec links and names
* only include supported preset in binary
* drop yeerongpilly, add altair-devnet-0, support boot_enr.yaml
* add blockchain_dag altair database reading; add rollback tests; fix some unnecessary type conversions
* remove debugging scaffolding
* proposeSignedBlock() will need to be async for merge; introduce altair types to VC
* introduce immutable Altair BeaconState
* add database support for Altair blocks and states
* add tests for Altair get/put/contains/delete state
* enable blockchain_dag Altair state database storing
* properly return error on getting missing altair block
* use ForkedHashedBeaconState in StateData
* fix FAR_FUTURE_EPOCH -> slot overflow; almost always use assign()
* avoid stack allocation in maybeUpgradeStateToAltair()
* create and use dispatch functions for check_attester_slashing(), check_proposer_slashing(), and check_voluntary_exit()
* use getStateRoot() instead of various state.data.hbsPhase0.root
* remove withStateVars.hashedState(), which doesn't work as a design anymore
* introduce spec/datatypes/altair into beacon_chain_db
* fix inefficient codegen for getStateField(largeStateField)
* state_transition_slots() doesn't either need/use blocks or runtime presets
* combine process_slots(HBS)/state_transition_slots(HBS) which differ only in last-slot htr optimization
* getStateField(StateData, ...) was replaced by getStateField(ForkedHashedBeaconState, ...)
* fix rollback
* switch some state_transition(), process_slots, makeTestBlocks(), etc to use ForkedHashedBeaconState
* remove state_transition(phase0.HashedBeaconState)
* remove process_slots(phase0.HashedBeaconState)
* remove state_transition_block(phase0.HashedBeaconState)
* remove unused callWithBS(); separate case expression from if statement
* switch back from nested-ref-object construction to (ref Foo)(Bar())
* write uncompressed validator keys to database
Loading 150k+ validator keys on startup in compressed format takes a lot
of time - better store them in uncompressed format which makes behaviour
just after startup faster / more predictable.
* refactor cached validator key access
* fix isomorphic cast to work with non-var instances
* remove cooked pubkey cache - directly use database cache in chaindag
as well (one less cache to keep in sync)
* bump blscurve, introduce loadValid for known-to-be-valid keys
* don't consider legacy database when writing state - this read is slow
on kvstore
* avoid epoch transition when there's an exact match in cache already
* simplify init to only consider checkpoint states
This reverts commit eebc828778.
Adding a separate file turns out not to be enough. This PR reverts the
separate file change.
Another theory is that the large kvstore table causes cache thrashing -
all database connections share a common page cache which would explain
the poor performance of the separate file solution.
The V1 table structure shows great improvements in performance, but if
there's an old `kvstore` without rowid:s, these benefits are nullified:
reorgs during writes and deletes remain expensive (even if the
degradation is reduced somewhat).
This PR creates the tables in a new file instead, and uses the old file
as a read-only store - this has several interesting properties:
* the old database is left completely untouched - this guarantees that
downgrades work smooth (they'll only need to resync their missing
portions)
* starting sync after this PR means only a v1 database is created
* v0 databases stick around - no migration is performed (for now)
Future PR:s can introduce migration of the data from one database to
another - a simply copy will take hours which is downtime we want to
avoid - at that point, it might make sense to migrate straight to era
files instead.
* Revert "Revert "Upgrade database schema" (#2570)"
This reverts commit 6057c2ffb4.
* ssz: fix loading empty lists into existing instances
Not a problem earlier because we didn't reuse instances
* bump nim-eth
* bump nim-web3
The `kvstore` design we're using now turns out to not be the best way to
use `sqlite` - in particular, there are some significant benefits to
using rowid in certain situations and to keep data in separate tables.
With this branch, there are massive improvements in startup time
(seconds instead of minutes) and state/block storage and pruning times
(milliseconds instead of seconds) - these improvements can in particular
be seen on slow drives and translate directly into better attestation
performance.
* update kvstore to new keyspace design
* remove `DirStoreRef` and the hidden `--state-db-kind` option - this
was an experiment to store large blobs in files, but with the new
kvstore, there's no compelling reason to do so
* remove `DbMap` - unused and would need updating for new keyspace
design
* introduce separate tables for each data type (blocks, states etc)
* remove "WITHOUT ROWID" pessimization for tables with large blobs
* close DbSeq statements explicitly (and earlier)
* store beacon block summaries in separate table, without SSZ
compression and load them all with single query on startup
* stop storing backwards compat full states
* mark genesis beacon block as trusted
* avoid faststreams when loading SSZ data
* remove `DisagreementBehavior` (unused)
This patch writes a full genesis state to `kvstore` if one was missing,
which fixes 1.2.0 restarting sync when upgrading from 1.1.0, or when
downgrading to a pre-1.1.0 release.
* Reset cached indices when resetting cache on SSZ read
When deserializing into an existing structure, the cache should be
cleared - goes for json also. Also improve error messages.
* initial immutable validator database factoring
* remove changes from chain_dag: this abstraction properly belongs in beacon_chain_db
* add merging mutable/immutable validator portions; individually test database roundtripping of immutable validators and states-sans-immutable-validators
* update test summaries
* use stew/assign2 instead of Nim assignment
* add reading/writing of immutable validators in chaindag
* remove unused import
* replace chunked k/v store of immutable validators with per-row SQL table storage
* use List instead of HashList
* un-stub some ncli_db code so that it uses
* switch HashArray to array; move BeaconStateNoImmutableValidators from datatypes to beacon_chain_db
* begin only-mutable-part state storage
* uncomment some assigns
* work around https://github.com/nim-lang/Nim/issues/17253
* fix most of the issues/oversights; local sim runs again
* fix test suite by adding missing beaconstate field to copy function
* have ncli bench also store immutable validators
* extract some immutable-validator-specific code from the beacon chain db module
* add more rigorous database state roundtripping, with changing validator sets
* adjust ncli_db to use new schema
* simplify putState/getState by moving all immutable validator accounting into beacon state DB
* remove redundant test case and move code to immutable-beacon-chain module
* more efficient, but still brute-force, mutable+immutable validator merging
* reuse BeaconState in getState
* ensure HashList/HashArray caches are cleared when reusing getState buffers; add ncli_db and a unit test to verify this
* HashList.clear() -> HashList.clearCache()
* only copy incrementally necessary immutable validators
* increase strictness of test cases and fix/work around resulting HashList cache invalidation issues
* remove explanatory scaffolding
* allow for storage of full (with all validators) states for backwards/forwards-compatibility
* adjust DbSeq type usage
* store full, with-validators, state every 64 epochs to enable reverting versions
* reduce memory allocation and intermediate objects in state storage codepath
* eliminate allocation/copying through intermediate BeaconStateNoImmutableValidators objects
* skip benchmarking initial genesis-validator-heavy state store
* always store new-style state and sometimes old-style state
* document intent behind BeaconState/Validator type-punnery
* more accurate failure message on SQLite in-memory database initialization failure
* database state storage benchmarking via ncli_db
* more cleanups from immutable validator state branch
* unexport some eth2_network constants and remove unused variables/templates
* make two PeerScore constants public
* checkpoint database at end of each slot
To avoid spending time on synchronizing with the file system while doing
processing, the manual checkpointing mode turns off fsync during
processing and instead checkpoints the database when the slot has ended.
From an sqlite perspecitve, in WAL mode this guaranees database
consistency but may lead to data loss which is fine - anything missing
from the beacon chain database can be recovered on the next startup.
* log sync status and delay in slot start message
* bump
* Handle some web3 timeouts better
* Add support for developer .env files
* Eth1 improvements; Mainnet genesis state
Notable changes:
* The deposits table have been removed from the database. The client
will no longer process all deposits on start-up.
* The network metadata now includes a "state snapshot" of the deposit
contract. This allows the client to skip syncing deposits made prior
to the snapshot (i.e. genesis). Suitable metadata added for Pyrmont
and Mainnet.
* The Eth1 monitor won't be started unless there are validators attached
to the node.
* The genesis detection code is now optional and disabled by default
* Bugfix: The client should not produce blocks that will fail validation
when it hasn't downloaded the latest deposits yet
* Bugfix: Work around the database corruption affecting Pyrmont nodes
* Remove metadata for Toledo and Medalla
This introcudes a cache for block summaries, useful for instantiating
the block dag on startup, bringing medalla startup times down from
minutes to seconds.
This is something of a temporary band-aid that would be obsoleted by a
finalized block store.
* Concentrate all sensitive writeFile/createPath calls in one place.
Fix eth2_network_simulation for Windows.
* Remove artifacts.
* fix import
Co-authored-by: Jacek Sieka <jacek@status.im>
* fix invalid state root being written to database
When rewinding state data, the wrong block reference would be used when
saving the state root - this would cause state loading to fail by
loading a different state than expected, preventing blocks to be
applied.
* refactor state loading and saving to consistently use and set
StateData block
* avoid rollback when state is missing from database (as opposed to
being partially overwritten and therefore in need of rollback)
* don't store state roots for empty slots - previously, these were used
as a cache to avoid recalculating them in state transition, but this has
been superceded by hash tree root caching
* don't attempt loading states / state roots for non-epoch slots, these
are not saved to the database
* simplify rewinder and clean up funcitions after caches have been
reworked
* fix chaindag logscope
* add database reload metric
* re-enable clearance epoch tests
* names
hash_tree_root was turning up when running beacon_node, turns out to be
repeated hash_tree_root invocations - this pr brings them back down to
normal.
this PR caches the root of a block in the SignedBeaconBlock object -
this has the potential downside that even invalid blocks will be hashed
(as part of deserialization) - later, one could imagine delaying this
until checks have passed
there's also some cleanup of the `cat=` logs which were applied randomly
and haphazardly, and to a large degree are duplicated by other
information in the log statements - in particular, topics fulfill the
same role
* cleanups
* fix ncli state root check flag
* add block dump to ncli_db
* limit ncli_db benchmark length
* tone down finalization logs
* introduce trusted blocks
We only store blocks whose signature we've verified in the database - as
such, there's no need to check it again, and most importantly, no need
to deserialize the signature when loading from database.
50x startup time improvement, 200x block load time improvement.
* fix rewinding when deposits have invalid signature
* speed up ancestor iteration by avoiding copy
* avoid deserializing signatures for trusted data
* load blocks lazily when rewinding (less memory used)
* chronicles workarounds
* document trustedbeaconblock
* 3gb vs 12gb for 4000 epochs of witti
* 3-4x sync blocks/sec performance improvement on my FDE SSD drive
* generate less quirky code for primitive types
* reorder ssz
* split into hash_trees and ssz_serialization, roughly, for hashing and
IO
* move bitseqs into ssz (from stew)
* clean up imports
* docs, imports
* fix crash when state root is present but state is missing
* fix state root removal when state is removed
* fix block pool initialization which needs tail state
* remove tail block pruning
* incomplete - fork states are not pruned
* incomplete - fork blocks are not pruned
* incomplete - empty slot states are not pruned
* unknown - tail/finalized block on empty slot might be incorrect
* simplify data storage to key-value, tries are not relevant for NBC
* locked-down version of lmdb dependency
* easier to build / maintain on various platforms
* rename compute_epoch_of_slot(...) to compute_epoch_at_slot(...)
* remove some unnecessary imports; remove some crosslink-related code and tests; complete renaming of compute_epoch_of_slot(...) to compute_epoch_at_slot(...)
* rm more transfer-related code and tests; rm more unnecessary strutils imports
* rm remaining unused imports
* remove useless get_empty_per_epoch_cache(...)/compute_start_slot_of_epoch(...) calls
* rename compute_start_slot_of_epoch(...) to compute_start_slot_at_epoch(...)
* rename ACTIVATION_EXIT_DELAY to MAX_SEED_LOOKAHEAD
* update domain types to 0.9.0
* mark AttesterSlashing, IndexedAttestation, AttestationDataAndCustodyBit, DepositData, BeaconBlockHeader, Fork, integer_squareroot(...), and process_voluntary_exit(...) as 0.9.0
* mark increase_balance(...), decrease_balance(...), get_block_root(...), CheckPoint, Deposit, PendingAttestation, HistoricalBatch, is_active_validator(...), and is_slashable_attestation_data(...) as 0.9.0
* mark compute_activation_exit_epoch(...), bls_verify(...), Validator, get_active_validator_indices(...), get_current_epoch(...), get_total_active_balance(...), and get_previous_epoch(...) as 0.9.0
* mark get_block_root_at_slot(...), ProposerSlashing, get_domain(...), VoluntaryExit, mainnet preset Gwei values, minimal preset max operations, process_block_header(...), and is_slashable_validator(...) as 0.9.0
* mark makeWithdrawalCredentials(...), get_validator_churn_limit(...), get_total_balance(...), is_valid_indexed_attestation(...), bls_aggregate_pubkeys(...), initial genesis value/constants, Attestation, get_randao_mix(...), mainnet preset max operations per block constants, minimal preset Gwei values and time parameters, process_eth1_data(...), get_shuffled_seq(...), compute_committee(...), and process_slots(...) as 0.9.0; partially update get_indexed_attestation(...) to 0.9.0 by removing crosslink refs and associated tests
* mark initiate_validator_exit(...), process_registry_updates(...), BeaconBlock, Eth1Data, compute_domain(...), process_randao(...), process_attester_slashing(...), get_base_reward(...), and process_slot(...) as 0.9.0
* signed_root -> signing_root
* implement new process_deposit; update timing parameters to 0.6.1; update Deposit to 0.6.1 and remove DepositInput
* update get_active_validator_indices and get_epoch_committee_count to 0.6.1; rm get_current_epoch_committee_count, get_previous_epoch_committee_count, and get_next_epoch_committee_count; bump state_sim default validator count a bit more
* re-introduce 0.5.1-ish get_active_validator_indices, get_epoch_committee_count as scaffolding for still-0.5.1ish shuffling
this is the beginning of tracking block-slots more precisely, so we can
the justification epoch slot bug.
* avoid asyncDiscard (swallows assertions)
* fix attestation delay
* fix several state root cache bugs
* introduce workaround for genesis epoch spec issue
* no need to pass prev_block_root when updating state
* some Slot fixes
* fix `hash_tree_root` for Slot, Epoch
* detect missing hash_tree_root type support
* remove some <0.5 state checks
* ssz: update to 0.5.1:ish
* slightly fewer seq allocations
* still a lot of potential for optimization
* fixes#174
* ssz: avoid reallocating leaves (logN merkle impl)
* begin 0.5.0 spec update: parent_root -> previous_block_root, BeaconState.justified_epoch -> BeaconState.current_justified_epoch, DOMAIN_PROPOSAL -> DOMAIN_BEACON_BLOCK, temporarily rename BeaconBlockHeader to BeaconBlockHeaderRLP to allow for gradual re-merging without disrupting RLP; mark a few unchanged functions and data types, implement get_temporary_block_header/get_empty_block/should_update_validator_registry/processBlockHeader/cacheState; update a few others
* a dozen or so more trivial mostly comment changes, finding more unchanged parts of 0.5.0
* several more trivial changes; goal is to reduce noise around the upcoming substantial changes (epoch processing, etc)
* keep track of a finalized block
* keep track of all justified blocks
* use naive spec version of LMD ghost
* cache slot number and a few more things in BlockRef
* keep track of the latest vote of each validator
* depend less on the state of node.state (it's a cache, effectively)
To enable it, comment out the 'withLibp2p' line in nim.cfg
The history was squashed in order to remove an accidentally
commited binary file.
Other changes:
* SSZ was adapted to use the common serialization framework
* gossibsup.subscribe is not using async handlers at the moment
and this allowed me to simplify it
* fix state db lookup typo
* fix randao reveal slot when proposing blocks
* only store blocks that can be applied to a state
* store state at every epoch boundary (yes, needs pruning!)
* split out state advancement function when there's no block
* default state sim to 0.9 attestation ratio
* implement in-memory block graph
* store tail block in database
* resolve unknown parents by syncing them from peers
* introduce concept of resolved blocks and attestations - those that
follow minimal protocol rules
* update state head lazily
* log more stuff
* shortHash -> shortLog
* start 9/10 beacon nodes by default, last can be started manually
* see also #134
* fix start.sh epoch length
* fix initial attestation pool on reordered attestations
* simplify db layer api
* load head block from database on startup, then load state
* significantly changes database format
* move subscriptions to separate proc's
* implement block replay from historical state
* avoid rescheduling epoch actions on block receipt (why?)
* make sure genesis block is created and used
* relax initial state sim parameters a bit
You'll need the latest versions of nim-eth-p2p, nim-serialization
and nim-json-serialization.
Before starting the simulation script, make sure to delete any previous
json files from the simulation folder:
```
rm tests/simulation/*.json
tests/simulation/start.sh
```
This should survive the creation of few blocks before diying with a
block validation error.
* spec updates
* balances move out to separate seq
* bunch of placeholders for proof-of-custody / phase1
* fix inclusion distance adjustment
* modify state in-place in `updateState` (tests spent over 80% time
copying state! now it's down to 25-50)
* document several conditions and conversations
* some renames here and there to follow spec