This addresses the issues by detecting and rejecting keystores with
incorrect PBKDF2 and SCrypt params. It also bumps the version of
nim-json-serialization to include a bugfix for incorrect parsing
of json files featuring comments.
It turns out that we often save lots of states in the database that are
the result of empty slot processing only - here, we make sure to only
save a state if a block follows - this fixes several issues:
* empty slot states are not always pruned leading to state database size
explosion
* storing states is (very) slow which slows down processing in general,
so we should only do it when it's likely to be useful
* attestation processing doesn't get stuck on saving random states that
won't appear in the chain history
* in exit pool, filter out already-packaged messages; bundle remaining messages into beaconblocks
* filter messages at block construction time
* allow adding up to intended capacity of buffers, beyond per-block limits
* document rationale/design for filtering mechanism
* fixed#1663 - Interger overflow in compute_start_slot_at_epoch through RPC
* changed the way the overflow check is done - took the approach from PR #1797 - see the comment in PR #1810 for more details
this would reallocate the attestation queue on every attestation and
other call to update_time, causing quite the overhead (~10% cpu spent
when gossiping)
* addPeer() and addPeerNoWait() now returns PeerStatus, not bool.
Minor refactoring of PeerPool.
Fix tests.
* Refactor PeerPool.
Add lenSpace.
Add tests for lenSpace.
PeerPool.add procedures now return different error codes.
Fix SyncManager break/continue problem.
Fix connectWorker break/continue problem.
Refactor connectWorker and discoveryLoop.
Fix incoming/outgoing blocking problem.
* Refactor discovery loop.
Add checkPeer.
* Fix logic and compilation bugs.
* Adjust position of debugging log.
* Fix issue with maximum peers in PeerPool.
Optimize node record decoding.
* fix discoveryLoop.
* Remove aliases and fix tests using aliases.
* Bump BLST
* Test for https://github.com/supranational/blst/issues/22 regression
* Use SHA256 from BLST + bump nim-blscurve to reenable fno-tree-vectorize
* SHA256 on non-blst platforms import fixes
* import fixes again
* can't prefix with nimcrypto
* address review comment [skip ci]
* {.noInit.} on the digests
about 40% better slot processing times (with LTO enabled) - these don't
do BLS but are used
heavily during replay (state transition = slot + block transition)
tests using a recent medalla state and advancing it 1000 slots:
```
./ncli slots --preState2:state-302271-3c1dbf19-c1f944bf.ssz --slot:1000
--postState2:xx.ssz
```
pre:
```
All time are ms
Average, StdDev, Min, Max, Samples,
Test
Validation is turned off meaning that no BLS operations are performed
39.236, 0.000, 39.236, 39.236, 1,
Load state from file
0.049, 0.002, 0.046, 0.063, 968,
Apply slot
256.504, 81.008, 213.471, 591.902, 32,
Apply epoch slot
28.597, 0.000, 28.597, 28.597, 1,
Save state to file
```
cast:
```
All time are ms
Average, StdDev, Min, Max, Samples,
Test
Validation is turned off meaning that no BLS operations are performed
37.079, 0.000, 37.079, 37.079, 1,
Load state from file
0.042, 0.002, 0.040, 0.090, 968,
Apply slot
215.552, 68.763, 180.155, 500.103, 32,
Apply epoch slot
25.106, 0.000, 25.106, 25.106, 1,
Save state to file
```
cast+rewards:
```
All time are ms
Average, StdDev, Min, Max, Samples,
Test
Validation is turned off meaning that no BLS operations are performed
40.049, 0.000, 40.049, 40.049, 1,
Load state from file
0.048, 0.001, 0.045, 0.060, 968,
Apply slot
164.981, 76.273, 142.099, 477.868, 32,
Apply epoch slot
28.498, 0.000, 28.498, 28.498, 1,
Save state to file
```
cast+rewards+shr
```
All time are ms
Average, StdDev, Min, Max, Samples,
Test
Validation is turned off meaning that no BLS operations are performed
12.898, 0.000, 12.898, 12.898, 1,
Load state from file
0.039, 0.002, 0.038, 0.054, 968,
Apply slot
139.971, 68.797, 120.088, 428.844, 32,
Apply epoch slot
24.761, 0.000, 24.761, 24.761, 1,
Save state to file
```
* Slashing protection + interchange initial commit
* Restrict the when UseSlashingProtection dance in other modules
* Integrate slashing tests in other all_tests
* Add attestation slashing protection support
* Add a message that mention if built with/without slashing protection
* no op the initialization proc
* test slashing protection in Jenkins (temp)
* where to configure NIMFLAGS in Jenkins ...
* Jenkins -> ensure Built with slashing protection
* Add slashing protection complete import
* use Opt.get(otherwise)
* Don't use negation in proc name
* Turn slashing protection on by default
* Refactor peer_pool.
Fix eth2_network peer counters.
Fix PeerPool do not allow to add more peers when empty space available.
* Remove unused imports.
* Add test for a bug.
* Fix eth2_network disconnect should deletePeer not release.
More PeerPool refactoring.
* remove some superfluous gcsafes
* remove getTailState (unused)
* don't store old epochrefs in blocks
* document attestation pool a bit
* remove `pcs =` cruft from log
* skeleton of attester slashing pool & validators
* add skeleton for proposer slashings and voluntary exits; rename pool to more inclusive exit pool to stay consistent with all three; ensure is initialized by beacon_node so is safe to merge, even if it doesn't do much yet
* add ncli_db subcommand to prune database of unnecessary blocks, states, and state roots
* tweak comments
* reduce default aggressiveness in pruning old states
* move copyPrunedDatabase() to ncli_db, as it's not generally useful as part of beacon_chain_db and doesn't use any internal interfaces
* Syncing workers now not bound to peers.
Sync status is now printed in statusbar.
* Add `SyncQueue.outSlot` to statusbar too.
* Add `inRangeEvent` and `rangeAge` parameter.
* Fix rangeAge is not depends on SyncQueue latest slot.
Fix syncManager to start from latest local head slot.
* Add notInRange event.
* Remove suspects field.
Validator duties proceed slot-by-slot - we should not start a new
validator duty iteration before the previous one is gone or we might run
into consistency and voting issues
* Quick fix to prune some states, pending smarter state storage
Adverse effects might include slow rewinds - typically the protocol
doesn't ask for pre-finalized states but RPC might
* document issue, add test
* fix cache miss log
per spec, we must half-close request stream - not doing so may lead to
failure of the other end to start processing our request leading to
timeouts.
In particular, this fixes many sync problems that have been seen on
medalla.
* remove safeClose - close no longer raises
* use per-chunk timeouts in request processing
* stop discarding non-existent future epochs during epoch state transitions; remove a pointless StateCache() construction in advance_slots()
* update nbench to pass StateCache to process_slots()
* ignore sqlite WAL journal files in git; switch attestation resolved from info to debug
* promote sent attestations/blocks to notice rather than demote resolved attestations/blocks to debug
This implements disparity, resolving a part of
https://github.com/status-im/nim-beacon-chain/issues/1367
* make BeaconTime a duration for fractional seconds
* factor out attestation/aggregate validation
* simplify recording of queued attestations
* simplify attestation signature check
* fix blocks_received metric
* add some trivial validation tests
* remove unresolved attestation table - attestations for unknown blocks
are dropped instead (cannot verify their signature)
* harden beacon_pending_deposits metrics calculation
* ...
* move beacon_pending_deposits and beacon_processed_deposits_total out of specs and into chain DAG
* initial - cheaper pruning - addresses #1534
* Pass tests: update offset when pruning, proper handling of pruned parents
* Use options instead of nil for nilable newHead (finalization passing but rootcause not solved)
* First line of defense against stackoverflow in tests
* Fix compute_delta offset after pruning
* Rebase fix - medalla ready
* Remove Option[BlockRef]
Replace shuffling function with zrnt version - `get_shuffled_seq` in
particular puts more strain on the GC by allocating superfluous seq's
which turns out to have a significant impact on block processing (when
replaying blocks for example) - 4x improvement on non-epoch, 1.5x on
epoch blocks (replay is done without signature checking)
Medalla, first 10k slots - pre:
```
Loaded 68973 blocks, head slot 117077
All time are ms
Average, StdDev, Min, Max, Samples,
Test
Validation is turned off meaning that no BLS operations are performed
76855.848, 0.000, 76855.848, 76855.848, 1,
Initialize DB
1.073, 0.914, 0.071, 12.454, 7831,
Load block from database
31.382, 0.000, 31.382, 31.382, 1,
Load state from database
85.644, 30.350, 3.056, 466.136, 7519,
Apply block
506.569, 91.129, 130.654, 874.786, 312,
Apply epoch block
```
post:
```
Loaded 68973 blocks, head slot 117077
All time are ms
Average, StdDev, Min, Max, Samples,
Test
Validation is turned off meaning that no BLS operations are performed
72457.303, 0.000, 72457.303, 72457.303, 1,
Initialize DB
1.015, 0.858, 0.070, 11.231, 7831,
Load block from database
28.983, 0.000, 28.983, 28.983, 1,
Load state from database
21.725, 17.461, 2.659, 393.217, 7519,
Apply block
324.012, 33.954, 45.452, 440.532, 312,
Apply epoch block
```
When blocks and attestations arrive, they are SSZ-decoded twice: once
for validation and once for processing. This branch enqueues the decoded
block directly for processing, avoiding the second, slow
deserialization.
* move processing of blocks and attestations to queue
* ...and out from beacon_node
* split attestation processing into attestations and aggregates
* also updates metrics
* clean up logging to better follow the lifetime of gossip: arrival,
validation and processing
* drop attestations and aggregates if there are too many
* try to prioritise blocks and aggregates before single-validator
attestations
* evaluate block attestations under the epochref of the block - this is
what the state transition function does
* avoid copying attestation seq unnecessarily
* avoid unnecessary hashset for unslashed indices
* don't sort shuffled_validator_indices, just get them directly with
iteration
* grab full epoch of proposer indices while we have the data available -
they'll get cached and reused
* avoid computing active validator set when not used for logging
* collect all epochrefs in specific blocks to make them easier to find
and to avoid lots of small seqs
* reuse validator key databases more aggressively by comparing keys
* make state cache available from within `withState`
* make epochRef available from within onBlockAdded callback
* integrate getEpochInfo into block resolution and epoch ref logic such
that epochrefs are created when blocks are added to pool or lazily when
needed by a getEpochRef
* fill state cache better from EpochRef, speeding up replay and
validation
* store epochRef in specific blocks to make them easier to find and
reuse
* fix database corruption when state is saved while replaying quarantine
* replay slots fully from block pool before processing state
* compare bls values more smartly
* store epoch state without block applied in database - it's recommended
to resync the node!
this branch will drastically speed up processing in times of long
non-finality, as well as cut memory usage by 10x during the recent
medalla madness.
* add attestation processing queue so attestations don't get processed
too early
* rework justified slot delay to match spec / lighthouse better
* keep less state in fork choice
* request epochref less
* add --enable-logtrace argument to launch_local_testnet
* scan for all available logfiles
* remove specific filename references
* update v0.11.3 spec ref to v0.12.2
* Bump BLSCurve
* Use unified aggregation API
* use new blscurve with unified aggregate API
* bump
* fix toRaw
* replace state_sim combine with AggregateSignature
* Fix 32-bit
* Fix 32-bit for real and test deactivating ccache for fno-tree-lopp-vectorize flag
* change compilation switches to narrow down Linux issue
* Use -fno-tree-vectorize to disable both tree-loop-vectorize and tree-slp-vectorize
* blscurve now disables both Loop and SLP vectorization
* Add tests for the miracl/milagro fallback
* Travis has max log size of 4MB
* Test with Miracl in the finalization test
* fix state_sim log level
* Coment out the slow fallback tests
* fix invalid state root being written to database
When rewinding state data, the wrong block reference would be used when
saving the state root - this would cause state loading to fail by
loading a different state than expected, preventing blocks to be
applied.
* refactor state loading and saving to consistently use and set
StateData block
* avoid rollback when state is missing from database (as opposed to
being partially overwritten and therefore in need of rollback)
* don't store state roots for empty slots - previously, these were used
as a cache to avoid recalculating them in state transition, but this has
been superceded by hash tree root caching
* don't attempt loading states / state roots for non-epoch slots, these
are not saved to the database
* simplify rewinder and clean up funcitions after caches have been
reworked
* fix chaindag logscope
* add database reload metric
* re-enable clearance epoch tests
* names
* initial dynamic subscribe/unsubscribe for attestations to/from subnets
* implement random stability subnet and clean up
* switch from HashSet[uint64] to set[uint8]
* refactor subnet logic out from beacon_node and actual (un)subscribing
* only try to subscribe to marginally different subnets
* add assertions
* maintain ENR subnets
* assert that beacon_node and eth2_network have consistent view of subscribed subnets
* disable actual cycling
* Eliminate possibilities for range errors and overflows
* Handle more properly invalid requests for furute slots
* Eliminate the confusing surrounding the MAX_REQUEST_BLOCKS constant
Addresses https://github.com/status-im/nim-beacon-chain/issues/1366
removed in 0.12.2 - the flow, in particular when the other peer doesn't
support snappy, is hard to follow because of the trial-and-error
approach - removing it simplifies things and removes some of the
hard-to-read parts of the thunking etc
this resolves some peer counting issues that were happening because the
lifetime future in PeerInfo was unreliable (multiple PeerInfo instances
existed per peer)
In addition, this solves another race condition: when connecting to a
peer and later dialling that protocol, it is not certain that the same
connection will be used if there's a concurrent incoming peer connection
ongoing - better not make too many assumptions about who sent statuses
when.
* Allow sync manager process blocks one by one.
* Log storeBlock() and updateHead() duration.
* Calculate duration only for blocks added without any error.
* Fix float compilation error.
* Fix duration.
* Fix SyncQueue tests.
* remove one cache, add another
This cache removes the need for rewinding in most attestation validation
flow since the attestations come from one of two epochs and must be
targetting a viable block.
Additionally, it also removes all state caches which are less likely to
be used over-all - more metrics are needed to track the rewinding.
On risk is that when chains don't finalize, we'll have lots of epochrefs
in memory meaning lots of validator key databases, most being exactly
the same. This can be addressed in any number of ways. Some of the
memory usage is mitigated by the fact that we previously had lots of big
state caches and now we're keeping only keys instead.
* cleanups
* doc