This is a revamp of the attestation pool that cleans up several aspects
of attestation processing as the network grows larger and block space
becomes more precious.
The aim is to better exploit the divide between attestation subnets and
aggregations by keeping the two kinds separate until it's time to either
produce a block or aggregate. This means we're no longer eagerly
combining single-vote attestations, but rather wait until the last
moment, and then try to add singles to all aggregates, including those
coming from the network.
Importantly, the branch improves on poor aggregate quality and poor
attestation packing in cases where block space is running out.
A basic greed scoring mechanism is used to select attestations for
blocks - attestations are added based on how much many new votes they
bring to the table.
* Collect single-vote attestations separately and store these until it's
time to make aggregates
* Create aggregates based on single-vote attestations
* Select _best_ aggregate rather than _first_ aggregate when on
aggregation duty
* Top up all aggregates with singles when it's time make the attestation
cut, thus improving the chances of grabbing the best aggregates out
there
* Improve aggregation test coverage
* Improve bitseq operations
* Simplify aggregate signature creation
* Make attestation cache temporary instead of storing it in attestation
pool - most of the time, blocks are not being produced, no need to keep
the data around
* Remove redundant aggregate storage that was used only for RPC
* Use tables to avoid some linear seeks when looking up attestation data
* Fix long cleanup on large slot jumps
* Avoid some pointers
* Speed up iterating all attestations for a slot (fixes#2490)
* only deserialize attestation and aggregation gossiped signatures once
* re-indent some aggregate checks into block scope
* spelling
* remove debugging assertion
* put part of gossip validation back into block context
* attestation pool test signature loading isn't so unsafe, and exportRaw isn't free
* remove more development doAsserts; don't exportRaw in loops
* set upper bound on EpochRef cache
* max 32 EpochRef instances
* less memory waste in BlockRef by removing EpochRef seq that is mostly
unused (~20mb)
* less memory waste in dag block lookup by not keeping an extra copy of
digest (~70mb)
* fix `==` and `$` for Eth2Digest
* remove `ChainDAG.tmpState` (~50mb?)
all in all, this branch cuts mainnet memory usage by ~160-180mb and puts
limits on EpochRef cache usage - where normally it hovered around 950mb
before, it's now sitting at 600-700mb on my machine.
* docs
* Deferred DAG and fork choice pruning
* fixup
* Address https://github.com/status-im/nimbus-eth2/pull/2384/files#r589448448, rely only on onSLotEnd for state pruning
* no need to store needPruning in the data structure
* lastPrunePoint is updated in pruning proc
* Split eager and LazyPruning
* enforce pruning in updateHead
* database state storage benchmarking via ncli_db
* more cleanups from immutable validator state branch
* unexport some eth2_network constants and remove unused variables/templates
* make two PeerScore constants public
* use IntSet rather than HashSet[ValidatorIndex]
* add bounds check before uint64 -> int conversion
* use intsets in block transitions
* remove superfluous Nim issue explanation/reference
* in exit pool, filter out already-packaged messages; bundle remaining messages into beaconblocks
* filter messages at block construction time
* allow adding up to intended capacity of buffers, beyond per-block limits
* document rationale/design for filtering mechanism
* remove some superfluous gcsafes
* remove getTailState (unused)
* don't store old epochrefs in blocks
* document attestation pool a bit
* remove `pcs =` cruft from log
This implements disparity, resolving a part of
https://github.com/status-im/nim-beacon-chain/issues/1367
* make BeaconTime a duration for fractional seconds
* factor out attestation/aggregate validation
* simplify recording of queued attestations
* simplify attestation signature check
* fix blocks_received metric
* add some trivial validation tests
* remove unresolved attestation table - attestations for unknown blocks
are dropped instead (cannot verify their signature)
* collect all epochrefs in specific blocks to make them easier to find
and to avoid lots of small seqs
* reuse validator key databases more aggressively by comparing keys
* make state cache available from within `withState`
* make epochRef available from within onBlockAdded callback
* integrate getEpochInfo into block resolution and epoch ref logic such
that epochrefs are created when blocks are added to pool or lazily when
needed by a getEpochRef
* fill state cache better from EpochRef, speeding up replay and
validation
* store epochRef in specific blocks to make them easier to find and
reuse
* fix database corruption when state is saved while replaying quarantine
* replay slots fully from block pool before processing state
* compare bls values more smartly
* store epoch state without block applied in database - it's recommended
to resync the node!
this branch will drastically speed up processing in times of long
non-finality, as well as cut memory usage by 10x during the recent
medalla madness.
* Bump BLSCurve
* Use unified aggregation API
* use new blscurve with unified aggregate API
* bump
* fix toRaw
* replace state_sim combine with AggregateSignature
* Fix 32-bit
* Fix 32-bit for real and test deactivating ccache for fno-tree-lopp-vectorize flag
* change compilation switches to narrow down Linux issue
* Use -fno-tree-vectorize to disable both tree-loop-vectorize and tree-slp-vectorize
* blscurve now disables both Loop and SLP vectorization
* Add tests for the miracl/milagro fallback
* Travis has max log size of 4MB
* Test with Miracl in the finalization test
* fix state_sim log level
* Coment out the slow fallback tests
* removed the BlockPool type and all of the proxy functions around it - passing the chain DAG and the quarantine explicitly where appropriately - they don't need to be bundled in a type
* fixed the build after the rebase
* limit attestations kept in attestation pool
With fork choice updated, the attestation pool only needs to keep track
of attestations that will eventually end up in blocks - we can thus
limit the horizon of attestations that we keep more aggressively.
To get here, we expose getEpochRef which gets metadata about a
particular epochref, and make sure to populate it when a block is added
- this ensures that state rewinds during block addition are minimized.
In addition, we'll use the target root/epoch when validating
attestations - this helps minimize the number of different states that
we need to rewind to, in general.
* remove CandidateChains.justifiedState
unused
* remove BlockPools.Head object
* avoid quadratic quarantine loop
* fix
* add helper for beacon committee length (used for quickly validating
attestations)
* refactor some attestation checks to do cheap checks early
* validate attestation epoch before computing active validator set
* clean up documentation / comments
* fill state cache on demand
* avoid converting from uint64 to int, and where most feasible, int type conversion at all
* .len.uint64 -> .len64
* fix 32-bit compilation
* try keeping state_sim loop variable/bounds as int for 32-bit Azure
* len64 -> lenu64
* fork choice fixes, round 3
* introduce checkpoint tracker
* split out fork choice backend that is independent of dag
* correctly update best checkpoint to use for head selection
* correctly consider wall clock when processing attestations
* preload head history only (only one history is loaded from database
anyway)
* love the DAG
* switch to fork choice v2
also remove BlockRef.children
* fix
* cache beacon committee size calculation
this fixes a bug in get_validator_churn_limit as well
* fix
* make committee counts consistently uint64
mixing feels like the worst of the two worlds
* remove cruft
* reenable fork choice and fix several issues
* in addForkChoice_v2, the `.error` field would be accessed even when
Result is ok
* remove workaround for invalid block structure in fork choice
* fix `tmpState` being used recursively in callback, causing state
corruption while processing attestation
* fix block callback being called twice per block
* pass state to callback to avoid unnecessary rewinding
* enable head select, fix another bug
* never use `get` without `isOk`
* log nil blockref in case blockref is nil
* add missing error checking
* use correct epoch when updating attestation message
hash_tree_root was turning up when running beacon_node, turns out to be
repeated hash_tree_root invocations - this pr brings them back down to
normal.
this PR caches the root of a block in the SignedBeaconBlock object -
this has the potential downside that even invalid blocks will be hashed
(as part of deserialization) - later, one could imagine delaying this
until checks have passed
there's also some cleanup of the `cat=` logs which were applied randomly
and haphazardly, and to a large degree are duplicated by other
information in the log statements - in particular, topics fulfill the
same role
* restore EpochRef and flush statecaches on epoch transitions
* more targeted cache invalidation
* remove get_empty_per_epoch_cache(); implement simpler but still faster get_beacon_proposer_index()/compute_proposer_index() approach; add some abstraction layer for accessing the shuffled validator indices cache
* reduce integer type conversions
* remove most of rest of integer type conversion in compute_proposer_index()
This also assigns precise types to the constants in the minimal
and mainnet presets in order to reduce the chance of compilation
errors when custom presets are used (previously, only the custom
presets have precisely assigned types for the constants).
* Dual headed fork choice
* fix finalizedEpoch not moving
* reduce fork choice verbosity
* Add failing tests due to pruning
* Properly handle duplicate blocks in sync
* test_block_pool also add a test for duplicate blocks
* comments addressing review
* Fix fork choice v2, was missing integrating block proposed
* remove a spurious debug writeStackTrace
* update block_sim
* Use OrderedTable to ensure that we always load parents before children in fork choice
* Load the DAG data in fork choice at init if there is some (can sync witti)
* Cluster of quarantined blocks were not properly added to the fork choice
* Workaround async gcsafe warnings
* Update blockpoool tests
* Do the callback before clearing the quarantine
* Revert OrderedTable, implement topological sort of DAG, allow forkChoice to be initialized from arbitrary finalized heads
* Make it work with latest devel - Altona readyness
* Add a recovery mechanism when forkchoice desyncs with blockpool
* add the current problematic node to the stack
* Fix rebase indentation bug (but still producing invalid block)
* Fix cache at epoch boundaries and lateBlock addition
* add tests for unviable blocks
also enable finalization tests in all test configs - they're plenty fast
now
also fix newClone for non-rvo cases. sigh.
* fixes
* collect signature production and verificaiton in one place
Signatures are made over data and domain - here we collect all such
activities in one place.
Also:
* security: fix cast-before-range-check
* log block/attestation verification consistently
* run block verification based on `getProposer` in its own history
* clean up some unused stuff
* import
* missing raises
* reorder ssz
* split into hash_trees and ssz_serialization, roughly, for hashing and
IO
* move bitseqs into ssz (from stew)
* clean up imports
* docs, imports
* in makeBeaconBlock - use rollback instead
* in tests - this helps state_sim give more accurate data and makes it
30% faster
* fix some usages of raw BeaconState
When replaying state transitions, for the slots that have a block, the
state root is taken from the block. For slots that lack a block, it's
currently calculated using hash_tree_root which is expensive.
Caching the empty slot state roots helps us avoid recalculating this
hash, meaning that for replay, hashes are never calculated. This turns
blocks into fairly lightweight "state-diffs"!
* avoid re-saving state when replaying blocks
* advance empty slots slot-by-slot and save root
* fix sim randomness
* fix sim genesis filename
* introduce `isEpoch` to check if a slot is an epoch slot
In BlockPool, we keep the head state around, so it's trivial to restore
the temporary state there and keep going as if nothing happened.
This solves 3 problems:
* stack space - the state copy on mainnet is huge
* GC scanning - using stack space for state slows down the GC
significantly
* reckless copying - the copy itself takes a long time
In state_sim, we'll do the same and allocate on heap - this helps a
little with GC - without it, the collection of the temporary strings
created with `toHex` while printing the json dominates the trace.
* refactor and fix merkle proof construction in test suite and thereby remove most remaining skipMerkleValidation flags, now unnecessary
* a few non-semantic comment update/removals
* Initial implementation of runtime bls skipping.
Add libnfuzz skipBLSValidation handling, check that it propagates.
* Rename skipBLSValidation -> skipBlsValidation, start skipStateRootValidation
* Replace skipValidation flags with more granular flags.
Also added skipBlockParentRootValidation flag
Mainly replaced with skipBlsValidation but also StateRoot or
BlockParentRootValidation flags where appropriate.
* Adjust interop test to pass when skipping merkle validation.
* Stop skipping validation for mainchain_monitor.
* Remove comment.
* Also skipMerkleValidation for test_beacon_chain_db.
* switch out quadratically scaling and wasteful attestation in state_sim to attest only to exactly the correct slots; avoid pointless committee index interconversion for 9-10x increase in state_sim speed at d:release, 60k validators, and validate=off
* remove debugechos
* rename compute_epoch_of_slot(...) to compute_epoch_at_slot(...)
* remove some unnecessary imports; remove some crosslink-related code and tests; complete renaming of compute_epoch_of_slot(...) to compute_epoch_at_slot(...)
* rm more transfer-related code and tests; rm more unnecessary strutils imports
* rm remaining unused imports
* remove useless get_empty_per_epoch_cache(...)/compute_start_slot_of_epoch(...) calls
* rename compute_start_slot_of_epoch(...) to compute_start_slot_at_epoch(...)
* rename ACTIVATION_EXIT_DELAY to MAX_SEED_LOOKAHEAD
* update domain types to 0.9.0
* mark AttesterSlashing, IndexedAttestation, AttestationDataAndCustodyBit, DepositData, BeaconBlockHeader, Fork, integer_squareroot(...), and process_voluntary_exit(...) as 0.9.0
* mark increase_balance(...), decrease_balance(...), get_block_root(...), CheckPoint, Deposit, PendingAttestation, HistoricalBatch, is_active_validator(...), and is_slashable_attestation_data(...) as 0.9.0
* mark compute_activation_exit_epoch(...), bls_verify(...), Validator, get_active_validator_indices(...), get_current_epoch(...), get_total_active_balance(...), and get_previous_epoch(...) as 0.9.0
* mark get_block_root_at_slot(...), ProposerSlashing, get_domain(...), VoluntaryExit, mainnet preset Gwei values, minimal preset max operations, process_block_header(...), and is_slashable_validator(...) as 0.9.0
* mark makeWithdrawalCredentials(...), get_validator_churn_limit(...), get_total_balance(...), is_valid_indexed_attestation(...), bls_aggregate_pubkeys(...), initial genesis value/constants, Attestation, get_randao_mix(...), mainnet preset max operations per block constants, minimal preset Gwei values and time parameters, process_eth1_data(...), get_shuffled_seq(...), compute_committee(...), and process_slots(...) as 0.9.0; partially update get_indexed_attestation(...) to 0.9.0 by removing crosslink refs and associated tests
* mark initiate_validator_exit(...), process_registry_updates(...), BeaconBlock, Eth1Data, compute_domain(...), process_randao(...), process_attester_slashing(...), get_base_reward(...), and process_slot(...) as 0.9.0
* rename get_genesis_beacon_state(...) to initialize_beacon_state_from_eth1(...); remove unused vestiges get_temporary_block_header(...) and get_empty_block(...); partly align initialize_beacon_state_from_eth1(...) with 0.8.0 version; implement CompactCommittee object type; update process_final_updates(...) to 0.8.0
* mark get_shard_delta(...) as 0.8.0
* mark get_total_active_balance(...) and epoch transition helper functions as 0.8.0
* move FORK_CHOICE_BALANCE_INCREMENT, not in spec anymore, to sole user in fork_choice
* rename slot_to_epoch(...) to compute_epoch_of_slot(...)
* rename aggregation_bitfield/custody_bitfield fields to aggregation_bits/custody_bits; rename convert_to_indexed(...) to get_indexed_attestation(...); mark BeaconBlockHeader as 0.8.0; update minimal preset MIN_ATTESTATION_INCLUSION_DELAY/time parameters in general to 0.8
* fix beacon node compilation; it used slightly different capitalizations of slot_to_epoch/slotToEpoch
* mark integer_squareroot(...), Deposit, VoluntaryExit, and Transfer as 0.8.0; rename get_epoch_start_slot(...) to compute_start_slot_of_epoch(...)
* add new Domain alias for uint64; rename bls_domain(...) to compute_domain(...), MAX_INDICES_PER_ATTESTATION to MAX_VALIDATORS_PER_COMMITTEE, and SignatureDomain to DomainType; update get_domain(...) to 0.8.0
* update/mark initiate_validator_exit(...) and Crosslink as 0.8.0; rename BeaconState.latest_block_roots -> BeaconState.block_roots; mark HistoricalBatch as 0.8.0
* migrate attestation pool tests from get_crosslink_committees_at_slot(...) to get_crosslink_committee(...)
* rm obsolete, unused get_crosslink_committees_at_slot_cached(...) and migrate tests/test_state_transition from get_crosslink_committees_at_slot(...) to get_crosslink_committee(...)
* migrate tests/testutil from get_crosslink_committees_at_slot(...) to get_crosslink_committee(...)
* use more pervasive caching infrastructure, initially of compute_committee; remove buggy (and per-index) shuffling to fix research/state_sim, which was noticing validators on multiple shard committees; rm now-unused specific get_attesting_balance_cached
* add some get_active_validator_index caching
* rm obsolete/unused get_attesting_indices_cached
* rm get_crosslink_committees_at_slot(...)
* some more 0.7 changes -- some of which can't be completed pending some data structure updates -- and shard handling changes to offset with epoch start shards
* signed_root -> signing_root
* implement new process_deposit; update timing parameters to 0.6.1; update Deposit to 0.6.1 and remove DepositInput
* update get_active_validator_indices and get_epoch_committee_count to 0.6.1; rm get_current_epoch_committee_count, get_previous_epoch_committee_count, and get_next_epoch_committee_count; bump state_sim default validator count a bit more
* re-introduce 0.5.1-ish get_active_validator_indices, get_epoch_committee_count as scaffolding for still-0.5.1ish shuffling
* rm now-superceded shuffling cache (shuffling is only called from get_crosslinks), which was badly architected due to trying to exist in state; rm one more vestige of previous light-client regime (one more to go, from datatypes)
* fix wrong shuffling list size (active validator size, not validator size) to make consistent with 0.5.1 (will be inconsistent with testnet0); fix typo and change defaults in state_sim
* doAssert a couple of constant relationships necessary to avoid underflow; rm non-spec, unused helper function get_new_recent_block_roots
* refactor separate crosslink_committee_cache and winning_root_participants_cache(s) into StateData object; remove last vestige of previous shuffling cache
* separate out caching parts of StateData to new StateCache object
* no need to pass prev_block_root when updating state
* some Slot fixes
* fix `hash_tree_root` for Slot, Epoch
* detect missing hash_tree_root type support
* remove some <0.5 state checks
* ssz: update to 0.5.1:ish
* slightly fewer seq allocations
* still a lot of potential for optimization
* fixes#174
* ssz: avoid reallocating leaves (logN merkle impl)
* rm gone-in-0.5.0 Proposal, verifyBlockSignature, and slot check which moved to spec function processBlockHeader
* mark get_attestation_participants and get_epoch_committee_count as 0.5.1; finish updating processAttestations to 0.5.1; add kludgy workaround for bug relating to get_winning_roots_etc using crosslink_data_root as index when we have that as ZERO_HASH for all leading to it confusing attesters on different shards; rm BeaconState.latest_attestations, which splits into previous_epoch_attestations and current_epoch_attestations
* Fix CI due to removed latest_attestations field
* add simple bitfield type (fixes#19)
* fix bit endianess to match spec
* consolidate attestation and block logging
* cruft cleanup
* run optimizer on tests, speeds up build
* begin 0.5.0 spec update: parent_root -> previous_block_root, BeaconState.justified_epoch -> BeaconState.current_justified_epoch, DOMAIN_PROPOSAL -> DOMAIN_BEACON_BLOCK, temporarily rename BeaconBlockHeader to BeaconBlockHeaderRLP to allow for gradual re-merging without disrupting RLP; mark a few unchanged functions and data types, implement get_temporary_block_header/get_empty_block/should_update_validator_registry/processBlockHeader/cacheState; update a few others
* a dozen or so more trivial mostly comment changes, finding more unchanged parts of 0.5.0
* several more trivial changes; goal is to reduce noise around the upcoming substantial changes (epoch processing, etc)
* fix state db lookup typo
* fix randao reveal slot when proposing blocks
* only store blocks that can be applied to a state
* store state at every epoch boundary (yes, needs pruning!)
* split out state advancement function when there's no block
* default state sim to 0.9 attestation ratio
* allow running more or fewer validators
* use deterministic key generation for tests to avoid exhausting system
RNG
* update README with simulator docs
* write the data of each validator to separate file, instead of a big
chainstart.json (makes it easier to run different validator counts)
* implement in-memory block graph
* store tail block in database
* resolve unknown parents by syncing them from peers
* introduce concept of resolved blocks and attestations - those that
follow minimal protocol rules
* update state head lazily
* log more stuff
* shortHash -> shortLog
* start 9/10 beacon nodes by default, last can be started manually
* see also #134
* fix start.sh epoch length
* convert some asserts to doAsserts to keep them in release mode builds; rename get_initial_beacon_state to get_genesis_beacon_state to track spec; switch target spec version to 0.3.0; switch references to penalize_validator to slash_validator/slashValidator to track spec; make some function returns safer by omitting 'return'
* 2x shuffling speedup by hoisting pivot calculations per https://github.com/protolambda/eth2-shuffle
* fix initial attestation pool on reordered attestations
* simplify db layer api
* load head block from database on startup, then load state
* significantly changes database format
* move subscriptions to separate proc's
* implement block replay from historical state
* avoid rescheduling epoch actions on block receipt (why?)
* make sure genesis block is created and used
* relax initial state sim parameters a bit
* move attestation pool to separate file
* combine attestations lazily when needed
* advance state when there's a gap while attesting
* compile beacon node with optimizations - it's tooo slow right now
* log when unable to keep up
* s/slot_included/inclusion_slot/; s/participation_bitfield/aggregation_bitfield/; rearrange some type definitions to match spec order
* a couple more Eth1Data-related spec syncs
You'll need the latest versions of nim-eth-p2p, nim-serialization
and nim-json-serialization.
Before starting the simulation script, make sure to delete any previous
json files from the simulation folder:
```
rm tests/simulation/*.json
tests/simulation/start.sh
```
This should survive the creation of few blocks before diying with a
block validation error.
* spec updates
* balances move out to separate seq
* bunch of placeholders for proof-of-custody / phase1
* fix inclusion distance adjustment
* modify state in-place in `updateState` (tests spent over 80% time
copying state! now it's down to 25-50)
* document several conditions and conversations
* some renames here and there to follow spec
* spec updates
* random small updates
* ssz no longer sorts by field, fix enum serialization
* rewire block processing a little to avoid a few state copies
* add a state simulation tool that writes out jsons
* ssz: finish implementation
* add object support, simplify implementation
* fix extra round of hashing in tree_hash_root
* ssz: cleanups
* work around Nim range bug for Uint24, cleanups