* removed the BlockPool type and all of the proxy functions around it - passing the chain DAG and the quarantine explicitly where appropriately - they don't need to be bundled in a type
* fixed the build after the rebase
* more fork-choice fixes
* use target block/epoch to validate attestations
* make addLocalValidators sync
* add current and previous epoch to cache before doing state transition
* update head state using clearance state as a shortcut, when possible
* use blockslot for fork choice balances
* send attestations using epochref cache
* fix invalid finalized parent being used
also simplify epoch block traversal
* single error handling style in fork choice
* import fix, remove unused async
* Lazy loading of crypto objects
* Try to fix incorrect field access by hiding fields but no luck. SSZ/Chronicles/macro bug?
* Fix incorrect blsValue access. was "aggregate" not "chronicles"
* Fix tests that rely on the internal BLSValue representation
* limit attestations kept in attestation pool
With fork choice updated, the attestation pool only needs to keep track
of attestations that will eventually end up in blocks - we can thus
limit the horizon of attestations that we keep more aggressively.
To get here, we expose getEpochRef which gets metadata about a
particular epochref, and make sure to populate it when a block is added
- this ensures that state rewinds during block addition are minimized.
In addition, we'll use the target root/epoch when validating
attestations - this helps minimize the number of different states that
we need to rewind to, in general.
* remove CandidateChains.justifiedState
unused
* remove BlockPools.Head object
* avoid quadratic quarantine loop
* fix
* add helper for beacon committee length (used for quickly validating
attestations)
* refactor some attestation checks to do cheap checks early
* validate attestation epoch before computing active validator set
* clean up documentation / comments
* fill state cache on demand
* avoid converting from uint64 to int, and where most feasible, int type conversion at all
* .len.uint64 -> .len64
* fix 32-bit compilation
* try keeping state_sim loop variable/bounds as int for 32-bit Azure
* len64 -> lenu64
* fork choice fixes, round 3
* introduce checkpoint tracker
* split out fork choice backend that is independent of dag
* correctly update best checkpoint to use for head selection
* correctly consider wall clock when processing attestations
* preload head history only (only one history is loaded from database
anyway)
* love the DAG
* switch to fork choice v2
also remove BlockRef.children
* fix
* don't kill the program if not connected to a bootstrap node within 30 seconds
* recover faster from loss of network connectivity
* connectWorker(): sleep 1s between dials
* launch_local_testnet.sh: increase BOOTSTRAP_TIMEOUT
* don't use metric value in program logic
* refactor some ungainly variable names
* cache beacon committee size calculation
this fixes a bug in get_validator_churn_limit as well
* fix
* make committee counts consistently uint64
mixing feels like the worst of the two worlds
- delete "tests/simulation/{data,validators}" by default, because old
validator keys can lead to a crash with a cryptic error message
- actually start the Prometheus daemon on `make eth2_network_simulation`
- kill any running Prometheus daemon on exit
- fix some shell script syntax incompatible with Bash
* remove cruft
* reenable fork choice and fix several issues
* in addForkChoice_v2, the `.error` field would be accessed even when
Result is ok
* remove workaround for invalid block structure in fork choice
* fix `tmpState` being used recursively in callback, causing state
corruption while processing attestation
* fix block callback being called twice per block
* pass state to callback to avoid unnecessary rewinding
* enable head select, fix another bug
* never use `get` without `isOk`
* log nil blockref in case blockref is nil
* add missing error checking
* use correct epoch when updating attestation message
When clearing blocks, a callback is called - this callback, if it uses
`tmpState`, will be corrupted because it's not fully up to date when the
callback is called - we thus introduce a specific state cache for this
purpose - ideally, it can be removed later when epoch caching is
improved.
Incidentally, this helps block sync speed a lot - without this state,
the block sync would ping-pong between attestation state and block state
which is costly.
hash_tree_root was turning up when running beacon_node, turns out to be
repeated hash_tree_root invocations - this pr brings them back down to
normal.
this PR caches the root of a block in the SignedBeaconBlock object -
this has the potential downside that even invalid blocks will be hashed
(as part of deserialization) - later, one could imagine delaying this
until checks have passed
there's also some cleanup of the `cat=` logs which were applied randomly
and haphazardly, and to a large degree are duplicated by other
information in the log statements - in particular, topics fulfill the
same role
* restore EpochRef and flush statecaches on epoch transitions
* more targeted cache invalidation
* remove get_empty_per_epoch_cache(); implement simpler but still faster get_beacon_proposer_index()/compute_proposer_index() approach; add some abstraction layer for accessing the shuffled validator indices cache
* reduce integer type conversions
* remove most of rest of integer type conversion in compute_proposer_index()
This also assigns precise types to the constants in the minimal
and mainnet presets in order to reduce the chance of compilation
errors when custom presets are used (previously, only the custom
presets have precisely assigned types for the constants).
* quick workaround for epochref cache issue
* disable assertion which doesn't work without epochref caches
* get local testnets and altona running again
htr is fast now, so hitting the database to load the state root is no
longer motivated - the more simple code is less vulnerable to database
corruption.
* update most of the remaining non-fork-choice spec refs, updating code where necessary
* revert presumably harmless compute_signing_root() change, but this way, keep things really unchanged outside inspector
* Dual headed fork choice
* fix finalizedEpoch not moving
* reduce fork choice verbosity
* Add failing tests due to pruning
* Properly handle duplicate blocks in sync
* test_block_pool also add a test for duplicate blocks
* comments addressing review
* Fix fork choice v2, was missing integrating block proposed
* remove a spurious debug writeStackTrace
* update block_sim
* Use OrderedTable to ensure that we always load parents before children in fork choice
* Load the DAG data in fork choice at init if there is some (can sync witti)
* Cluster of quarantined blocks were not properly added to the fork choice
* Workaround async gcsafe warnings
* Update blockpoool tests
* Do the callback before clearing the quarantine
* Revert OrderedTable, implement topological sort of DAG, allow forkChoice to be initialized from arbitrary finalized heads
* Make it work with latest devel - Altona readyness
* Add a recovery mechanism when forkchoice desyncs with blockpool
* add the current problematic node to the stack
* Fix rebase indentation bug (but still producing invalid block)
* Fix cache at epoch boundaries and lateBlock addition
- testnets can now be launched with a separate validator client - make altona SCRIPT_PARAMS="--separateVC"
- reverted the ctrl+C signal handler code reuse - not necessary for the VC anyway (default is good enough)
- added a bit more logging in the VC
- removed unnecessary code in the VC - connect() just parses the address & port...
- fixed a couple more VC issues - when fetching the duties for an epoch fails on the BN side ==> the VC shouldn't be left in a broken state
- documented the currently supported json-rpc endpoints
- added more checks on the BN side for the API - bounds-checking the requests & also checking if the BN itself is synced
- other cleanup
currently a local sim doesn't finalize, but participation in the altona network with a separate VC is painless and works just as well as with in-process validators in a BN
* adopt Result[void, string] in place of some bool return signatures
* string -> cstring to reduce memory allocations; ensure all err() strings are constants, with contextual information from higher-level callers
* logScope usage fixes
* homogenize err() reporting convention
* invalid signature in deposit isn't an error
* add tests for unviable blocks
also enable finalization tests in all test configs - they're plenty fast
now
also fix newClone for non-rvo cases. sigh.
* fixes
* cleanups
* fix ncli state root check flag
* add block dump to ncli_db
* limit ncli_db benchmark length
* tone down finalization logs
* introduce trusted blocks
We only store blocks whose signature we've verified in the database - as
such, there's no need to check it again, and most importantly, no need
to deserialize the signature when loading from database.
50x startup time improvement, 200x block load time improvement.
* fix rewinding when deposits have invalid signature
* speed up ancestor iteration by avoiding copy
* avoid deserializing signatures for trusted data
* load blocks lazily when rewinding (less memory used)
* chronicles workarounds
* document trustedbeaconblock
* re-add minimal preset constant checking; organize presets to better support multiple spec versions
* bump spec ref
* increase Azure timeout to 90 minutes to accomodate Nim compiler building
- better logging & retrying requests on the VC side if the BN fails for some reason
- VC now fetches the attestation duties 1 epoch in advance - in the future it will tell the BN to subscribe to the appropriate attestation topics in advance based on that info
- a bunch of other code cleanup & fixes such as better naming for consoles when using multitail, etc.
reviewed in PR #1184 - proper review of the API & VC are pending
* don't write blocks that get added to database
* don't write states
* write to folders
* add state dumping feature to `ncli_db` to get any known state from the
database
* collect signature production and verificaiton in one place
Signatures are made over data and domain - here we collect all such
activities in one place.
Also:
* security: fix cast-before-range-check
* log block/attestation verification consistently
* run block verification based on `getProposer` in its own history
* clean up some unused stuff
* import
* missing raises
* 3gb vs 12gb for 4000 epochs of witti
* 3-4x sync blocks/sec performance improvement on my FDE SSD drive
* generate less quirky code for primitive types
* random fixes
* create dump dir on startup
* don't crash on failure to write dump
* fix a few `uint64` instances being used when indexing arrays - this
should be a compile error but isn't due to compiler bugs
* fix standalone test_block_pool compilation
* add signed block processing in ncli
* reuse cache entry instead of allocating a new one
* allow for small clock disparities when validating blocks
- each BN gets half of its previous validators as inProcess and the other half goes to the respective VC for that BN - using separate data dirs where the keys are copied
- also removed a few command line options which are no longer necessary
- block proposals originating from a VC are propagated from one BN to the rest properly
- other cleanup & moving code back to since it is no longer used elsewhere
These are the 3 most important things in the life of a beacon-node -
finalization checkpoint for security, head + time for seeing that we're
synced and working.
The order given is the expected increasing order that things should be
moving in - finalization lags head, and head lags clock - the structure
helps the user remember, but also see the pattern where time is getting
out of sync with head.
A validator would also want to know when the next action is going to be
(ie next attestation / block production) but that's for another day.
* reorder ssz
* split into hash_trees and ssz_serialization, roughly, for hashing and
IO
* move bitseqs into ssz (from stew)
* clean up imports
* docs, imports
Add SeenTable to avoid continuous attempts to dead peers.
Refactor onSecond.
Block backward sync while forward sync is working.
SyncManager now checks responses according corresponding requests + tests.
SyncManager now watching for not progressing local_head_slot and resets SyncQueue.
* wip: cache
* cache lists and arrays of complex objects (5x block processing speed
on ncli_db)
trivial baseline cache that stores tree in flat memory structure
* support array of uint64
* work around type issues
* more type compiler bug workarounds
* cache balances, more type fixes
* index type
* ncli_db: add validation flag, better ux
* int64 fixes
* test fix
* "oops"
```
647.913, 0.000, 647.913, 647.913, 1,
Initialize DB
0.540, 0.402, 0.340, 9.451, 619,
Load block from database
40.268, 0.000, 40.268, 40.268, 1,
Load state from database
0.498, 0.150, 0.343, 0.930, 596,
Apply block
3.548, 11.005, 0.729, 54.022, 23,
Apply epoch block
```
* support all basic types
* cleanups
* a few more cleanups
* switch state transition caching usage to shuffled active validator indices to match EpochRef
* refactor the EpochRef -> StateCache transformation; elide pointless mapIt
* limit state passed between get_beacon_committee(...) and compute_committee(...)
* tweaks
The first offset of an SSZ object should always have a fixed constant
value. Otherwise, some unused bytes may appear between the fixed portion
and the dynamic portion.
Please note that this fix shutds down the minimal forward compatibility
currently supported by the SSZ format (and thus, the expected behavior
must be clarified in the SSZ spec).
* plumbing between block pool and state transition functions around active validator indices and committees
* have shared epochrefs followed by blockref tree while allowing for skipped slots
* factor out the epoch info extraction; document how the EpochRef follows forks
* be stricter about SSZ length prefix
* compute zeroHash list at compile time
* remove SSZ schema stuff
* move SSZ navigation to ncli
* cleanup a few leftover openArray uses
* Add ability to reset state of sync manager.
Fix bug when sync got stuck on `zero-point` reset.
Fix bug when sync got stuck when some of the workers waiting for failing one.
* Remove debugging comments and imports.
* Remove not used pendingLock.
The lack of body of `goodbye` in sync_protocol.nim was preventing
the respective LibP2P protocol to be mounted and advertised on the
network.
Adding a body fixes that, but I've also made some changes in the
P2P protocol codegen that will prevent the issue from happening
again (no body is now considered the equivalent of having an empty
body).
the cast worked around the bug at compile time by means of casting, but
introduced a runtime error, because the pre-cast type was still being
used during deserialization
- we have a new binary which connects via RPC to the respective BN and has an internal clock - waking it up on every slot
- the BN has a new option called --external-validators and currently in order to have the VC binaries to run we need to pass EXTERNAL_VALIDATORS=yes to make
- factored some code out of beacon_node.nim for easier reuse in validator_api.nim and validator_client.nim
- the VC loads its associated private keys from the datadir for its BN
- most of the validator API calls have been implemented as a stub.
- the VC polls its BN at the start of each epoch - getting a list of all active validators for the current epoch - and then continues to request blocks and sign them with its appropriate validators when necessary
* in makeBeaconBlock - use rollback instead
* in tests - this helps state_sim give more accurate data and makes it
30% faster
* fix some usages of raw BeaconState
* update more spec refs in beacon_chain/spec/presets; skip skipped constant sanity checks also from markdown reports' perspectives
* mark skipped as skipped in markdown
* Fix nbench compilation with HashedBeaconState
* Add nbench to tooling
* use newClone - fix 265e01e404 (r425198575)
* Detail advance_slot and hashTreeRoot
* Report throughput
* Fallback for ARM
* windows does not support inline ASM
The work on this was started last week while I was waiting
for a decision on the "Async Snappy" PR. It was prompted by
a failing test in the test suite, where the HashingStream
was inserting some incorrectly padded chunks that affected
the result of `hash_tree_root`. Instead of working around
the problem in the HashingStream, I've decided to implement
a planned optimisation that allows us to remove the hashing
stream altogether.
With the optimisation in place, `hash_tree_root` will now
use only stack memory and only the precise amount neccesary
to build the chunks-merging tree.
* switch state cache to use ref statedata objects to limit memory usage
* more directly initialize ref StateData
* use HashedBeaconState instead of StateData to try to fix memory leak
* switch cache to seq[ref HashedBeaconState]
* remove unused import
Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
* sync fixes
* fix Status message finalized info
* work around sync starting before initial status exchange
* don't fail block on deposit signature check failure (fixes#989)
* print ForkDigest and Version nicely
* dump incoming blocks
* fix crash when libp2p peer connection is closed
* update chunk size to 16 to work around missing blocks when syncing
* bump libp2p
* bump libp2p
* better deposit skip message
* fix some warnings related to beacon_node splitting; reimplement finalization verification more robustly; improve attestation pool block selection logic
* re-add missing import
* whitelist allowed state transition flags and make rollback/restore naming more consistent
* restore usage of update flags passed into skipAndUpdateState(...) in addition to the potential verifyFinalization flag
* switch rest of rollback -> restore
When replaying state transitions, for the slots that have a block, the
state root is taken from the block. For slots that lack a block, it's
currently calculated using hash_tree_root which is expensive.
Caching the empty slot state roots helps us avoid recalculating this
hash, meaning that for replay, hashes are never calculated. This turns
blocks into fairly lightweight "state-diffs"!
* avoid re-saving state when replaying blocks
* advance empty slots slot-by-slot and save root
* fix sim randomness
* fix sim genesis filename
* introduce `isEpoch` to check if a slot is an epoch slot
* remove incorrect/obsolete comment; deprecate BeaconState state transition functions
* remove deprecated state_transition(state: var BeaconState)
* add specific workarounds for state_transition() and process_slots() to nfuzz_block() and addTestBlock()
* remove near-duplicate code paths: process_slot(), process_slots(), and state_transition() for BeaconState are now wrappers around the HashedBeaconState versions
* convert tests/test_state_transition.nim to use HashedBeaconState
* convert mocking infrastructure and spec_block/epoch_processing tests to use HashedBeaconBlock, and remove thus unused process_slot*(state: var BeaconState)
* ssz: move ref support outside
Instead of allocating ref's inside SSZ, move it to separate helper:
* makes `ref` allocations explicit
* less magic inside SSZ
* `ref` in nim generally means reference whereas SSZ was loading as
value - if a type indeed used references it would get copies instead of
references to a single value on roundtrip which is unexpected
TODO: EF tests would benefit from some refactoring since they all do the
same thing practically..
Co-authored-by: Zahary Karadjov <zahary@gmail.com>
* refactor blook pool caches to directly use TableRef to avoid SSZ decoding, which was consuming 20% of profile on mainnet eth2_network_simulation
* use table's hasKeyOrPut
* bump eth2 spec reference to v0.11.1
* cache whole StateData objects and switch from expensive clear() to cheaper new object instantiation for caching
* remove scaffolding and stop re-assigning to part of StateData object
* 80-character lines
Please see the newly added 'schlesi-dev' Makefile target.
It demonstrates how the log level can be specified for individual topics.
Additionally, when connecting to testnets like 'schlesi' there will be
two additional log files produced in the working directory:
* json-log.txt
* text-log.txt (in the textblocks format)
In BlockPool, we keep the head state around, so it's trivial to restore
the temporary state there and keep going as if nothing happened.
This solves 3 problems:
* stack space - the state copy on mainnet is huge
* GC scanning - using stack space for state slows down the GC
significantly
* reckless copying - the copy itself takes a long time
In state_sim, we'll do the same and allocate on heap - this helps a
little with GC - without it, the collection of the temporary strings
created with `toHex` while printing the json dominates the trace.
* add another check for inconsistent aggregation and committee length, since ncli_transition bypasses process_attestation(...)/check_attestation(...) and calls almost directly into process_epoch(...)
* bump validator functions to v0.11.1 spec references
* bump some spec references to v0.11.1
* poke
* Add "drop by score" ability to PeerPool.
Add tests.
Fix syncmanager queue to start from most fresh data.
* Fix endless cycle at the end of syncing process.
* reduce stack space usage in process_final_updates(...) to avoid fuzzed segfault in https://github.com/status-im/nim-beacon-chain/issues/921
* document motivation behind manually constructing hash_tree_root of a HistoricalBatch
* fix remaining block pool extended validation issues and re-enable first-block-received and block-signature EV checks; enable Merkle validation in beacon_node in eth2_network_simulation; refactor some Merkle proof generation code outside tests/ as a result
* re-enable Merkle validation skipping, since while it works on make eth2_network_simulation, it has issues with local testnet
* tighten already-seen-block blockpool check; move comment closer to conceptually proximate code; queue up maybe-future-valid-blocks as pending to keep libp2p-synchronous interrupt handling time lower
* revert the cleanups, now in a separate PR
* remove the remaining merkle_minimal cleanup remnants, also moved to other PR
* restore PR to only modifying one file after rebasing
* use signatures as summary to compare block contents
* switch signature comparison to be raw byte-wise to ensure no attempts to deserialize it to valid (or not) BLS signatures first
* fix mainnet finalization and swith eth2_network_simulation to a kind of small-mainnet profile
* Fix slot reference in trace logging
* bump a couple of spec refs from v0.11.0 to v0.11.1
* bump another spec ref to v0.11.1, one more try at Jenkins test vector download CI issue
* fix other slot reference in trace logging and skip past single-block/multi-slot gaps to re-approach from ancestry side by state_transitioning, by requiring exact match on both root hash and slot for fast path
* make more precise the fast path condition
* redo logic to make uniform with BeaconChainDB; fix chronos deprecation warning
* revert not-working replacement of deprecated chronos futures `or`
* switch testnet1 to mainnet
* if we don't have validators, don't consider aggregation work
* if we do have validators, don't aggregate when we're out of sync
* when we do aggregate, use a fresh state, and not one from before
sleeping