* allow multiple hard fork datatypes to coexist
* update to 1.0.1
* merge recent datatypes.nim updates
* trigger rebuild now the out-of-disk-space machine offline
* fix replays stalling processing
Occasionally, attestations will arrive that vote for a target derived
either from the finalized block or earlier. In these cases, Nimbus would
replay the state transition of up to 32 epochs worth of blocks because
the finalized state has been pruned, delaying other processing and
leading to poor inclusion distance.
* put cheap attestation checks before forming EpochRef
* check that attestation target is not from an unviable history with
regards to finalization
* fix overly aggressive state pruning removing the state close to the
finalized checkpoint resulting in rare long replays for valid
attestations
* log long replays
* harden logging and traversal of nil BlockSlot
* simplify target check
no need to lookup target in chain dag again
* fixup
* fixup
* refactor slot loop
* fix attestations being sent out early when _any_ block arrives (as
opposed to the block for the "correct" slot)
* fix attestations being sent out late when block already arrived
* refactor slot processing loop
* shutdown if clock moves backwards significantly
* fix docs
* notify caller whether the block actually arrived
* fix several memory leaks due to temporaries not being reset during
init
* avoid massive main() function with lots of stuff in it
* disable nim-prompt (unused)
* reuse validator pool instance in eth2_processor
* style cleanup
* database state storage benchmarking via ncli_db
* more cleanups from immutable validator state branch
* unexport some eth2_network constants and remove unused variables/templates
* make two PeerScore constants public
* Create CLI tool for slashing export
* Use SQLite as a DB instead of a KV-store
* Keeps v1 and v2 DBs around
* Uses the same schema as Lighthouse v1.1.0
* Passes all interchange tests + skeleton of finalization pruning
* Removes tests that would violate v5 / minimal slashing DB and MinSlot rules
* Migration tool added using low-watermark scheme for faster migration of large number of validators
* force pushing to fix unstable base
* increase attestation/aggregate queue sizes
when there are many validators, many aggregates and attestations arrive
every slot - increase the queue size a bit - also do batches on each
idle loop iteration since it's fairly quick
* don't score subnets for now
* wrapping up
* refactor and cleanups
* gossip parameters fixes
* comment fix
Co-authored-by: Jacek Sieka <jacek@status.im>
* hotfix gossip scoring
* skip gossip scoring parameters validation as they violate for now (but does not matter cos we don't score)
* workaround again gossip validation
* expose node signatures
* format bitseqs as hex strings
* format trusted sigs as hex strings (same as untrusted)
* reuse rpc client sigs
* include validator index in duties
* move SyncInfo to spec
when there are many validators, many aggregates and attestations arrive
every slot - increase the queue size a bit - also do batches on each
idle loop iteration since it's fairly quick
* use IntSet rather than HashSet[ValidatorIndex]
* add bounds check before uint64 -> int conversion
* use intsets in block transitions
* remove superfluous Nim issue explanation/reference
* performance fixes
* don't mark tree cache as dirty on read-only List accesses
* store only blob in memory for keys and signatures, parse blob lazily
* compare public keys by blob instead of parsing / converting to raw
* compare Eth2Digest using non-constant-time comparison
* avoid some unnecessary validator copying
This branch will in particular speed up deposit processing which has
been slowing down block replay.
Pre (mainnet, 1600 blocks):
```
All time are ms
Average, StdDev, Min, Max, Samples, Test
Validation is turned off meaning that no BLS operations are performed
3450.269, 0.000, 3450.269, 3450.269, 1, Initialize DB
0.417, 0.822, 0.036, 21.098, 1400, Load block from database
16.521, 0.000, 16.521, 16.521, 1, Load state from database
27.906, 50.846, 8.104, 1507.633, 1350, Apply block
52.617, 37.029, 20.640, 135.938, 50, Apply epoch block
```
Post:
```
3502.715, 0.000, 3502.715, 3502.715, 1, Initialize DB
0.080, 0.560, 0.035, 21.015, 1400, Load block from database
17.595, 0.000, 17.595, 17.595, 1, Load state from database
15.706, 11.028, 8.300, 107.537, 1350, Apply block
33.217, 12.622, 17.331, 60.580, 50, Apply epoch block
```
* more perf fixes
* load EpochRef cache into StateCache more aggressively
* point out security concern with public key cache
* reuse proposer index from state when processing block
* avoid genericAssign in a few more places
* don't parse key when signature is unparseable
* fix `==` overload for Eth2Digest
* preallocate validator list when getting active validators
* speed up proposer index calculation a little bit
* reuse cache when replaying blocks in ncli_db
* avoid a few more copying loops
```
Average, StdDev, Min, Max, Samples, Test
Validation is turned off meaning that no BLS operations are performed
3279.158, 0.000, 3279.158, 3279.158, 1, Initialize DB
0.072, 0.357, 0.035, 13.400, 1400, Load block from database
17.295, 0.000, 17.295, 17.295, 1, Load state from database
5.918, 9.896, 0.198, 98.028, 1350, Apply block
15.888, 10.951, 7.902, 39.535, 50, Apply epoch block
0.000, 0.000, 0.000, 0.000, 0, Database block store
```
* clear full balance cache before processing rewards and penalties
```
All time are ms
Average, StdDev, Min, Max, Samples, Test
Validation is turned off meaning that no BLS operations are performed
3947.901, 0.000, 3947.901, 3947.901, 1, Initialize DB
0.124, 0.506, 0.026, 202.370, 363345, Load block from database
97.614, 0.000, 97.614, 97.614, 1, Load state from database
0.186, 0.188, 0.012, 99.561, 357262, Advance slot, non-epoch
14.161, 5.966, 1.099, 395.511, 11524, Advance slot, epoch
1.372, 4.170, 0.017, 276.401, 363345, Apply block, no slot processing
0.000, 0.000, 0.000, 0.000, 0, Database block store
```
* use `idleAsync` to more evenly divide cpu attention when syncing in
particular - this gives networking better latency
* more strict exception handling in eth2_processor
* allow always-on subscription to all attestation subnets when gossiping
* in subscribe-all-subnets mode, consider all subnets to be stability subnets for ENR purposes
* remove await/async from sub/unsub
* fix unsubscribe wrong key (missed _snappy)
* use the right libp2p commit hash
* remove unused async
* fix inspector
* fix subnet calculation in RPC and insert broadcast attestations into node's pool
* unify codepaths to ensure only mostly-checked-to-be-valid attestations enter the pool, even from node's own broadcasts
* update attestation pool tests for new validateAttestation param
Co-authored-by: Dustin Brody <tersec@users.noreply.github.com>
* fix subnet calculation in RPC and insert broadcast attestations into node's pool
* unify codepaths to ensure only mostly-checked-to-be-valid attestations enter the pool, even from node's own broadcasts
* update attestation pool tests for new validateAttestation param
* make subnet cycling more robust; use one stability subnet/validator; explicitly represent gossip enabled/disabled
* fix asymmetry in _snappy being used for subscriptions but not unsubscriptions
* remove redundant comment
* minimal RPC and VC support for infoming BN of subnets
* create and verify slot signatures in RPC interface and VC
* loosen old slot check
* because Slot + uint64 works but uint64 + Slot doesn't
* document assumptions for head state use; don't clear stability subnets; guard against VC not having checked an epoch ahead, fixing a crash; clarify unsigned comparison
* revert unsub fix
* checkpoint database at end of each slot
To avoid spending time on synchronizing with the file system while doing
processing, the manual checkpointing mode turns off fsync during
processing and instead checkpoints the database when the slot has ended.
From an sqlite perspecitve, in WAL mode this guaranees database
consistency but may lead to data loss which is fine - anything missing
from the beacon chain database can be recovered on the next startup.
* log sync status and delay in slot start message
* bump
* detect already-aggregate-voted condition before attestation pool; add is_aggregator tests
* replace pair of attestation-per-epoch tracking lists with single list and remove Option use
* fix attestation condition
* use safer type conversions; add more is_aggregator tests
* don't lag aggregated attestations by a slot
* don't use aggregation topic at all
* use aggregates again, but with aggressively low ATTESTATION_PROPAGATION_SLOT_RANGE; seems to hold on to LH 1.0 nodes
* clean up scaffolding and double ATTESTATION_PROPAGATION_SLOT_RANGE to 16
* increase ATTESTATION_PROPAGATION_SLOT_RANGE to 24
* increase ATTESTATION_PROPAGATION_SLOT_RANGE to 28 and isolate in only used function due to customization; remove TRAILING_DISTANCE machinery
The key change here is that `addChunksAndGenMerkleProofs` is called
with all pending deposits instead of just the deposits included in
the block. The later was effectively producing merkle proofs against
a different root.
* increase default max peers
also avoid reconnection when opening stream as this might induce
a loop
* Use dial without addresses
* dial back max peers a little
* Revert "Revert "Full "node" RPC calls implementation and fixes to peer lifetime states. (#2065)" (#2082)"
This reverts commit 7cc3dc8027.
* fix nil disconnectedFut crash
* fixes
don't resetPeer, it causes peer miscounts
* disconnect disconnecting peers
...when there's a race.
* avoid connection spamming
* never decrease SeenTable timeout
* only recover ENR for known peers
* seen only when really disconnected
* Handle some web3 timeouts better
* Add support for developer .env files
* Eth1 improvements; Mainnet genesis state
Notable changes:
* The deposits table have been removed from the database. The client
will no longer process all deposits on start-up.
* The network metadata now includes a "state snapshot" of the deposit
contract. This allows the client to skip syncing deposits made prior
to the snapshot (i.e. genesis). Suitable metadata added for Pyrmont
and Mainnet.
* The Eth1 monitor won't be started unless there are validators attached
to the node.
* The genesis detection code is now optional and disabled by default
* Bugfix: The client should not produce blocks that will fail validation
when it hasn't downloaded the latest deposits yet
* Bugfix: Work around the database corruption affecting Pyrmont nodes
* Remove metadata for Toledo and Medalla
* log when database is loading (to avoid confusion)
* generate network keys later during startup
* fix quarantine not scheduling chain of parents for download and
increase size to one epoch
* log validator count, enr and peerid more clearly on startup
* Avoid hangs when wss:// is specified for a non-secure HTTP server
* Produce an ERROR when the web3 provider is unsupported, but still launch the node
This introcudes a cache for block summaries, useful for instantiating
the block dag on startup, bringing medalla startup times down from
minutes to seconds.
This is something of a temporary band-aid that would be obsoleted by a
finalized block store.
This reverts commit 63173ab2c1.
It appears the cluster is having trouble staying connected - since the culprit is unknown, this is a first step on the way to what was stable.
Notably, this does not fully revert libp2p itself, merely the gossip version.
Validators exiting is normal, no need to scream about it
* avoid reallocating seq on big exit queue
* avoid fetching state cache when updating head (it's rarely needed)
* remove incorrectly implemented live validator counts (avoids memory
allocs)
* Add exponential rewind on MissingParent.
* Try to avoid peers which are useless for syncing.
Fix forward sync restart at proper point.
Fix getLocalWallSlot() to not return slots from the future.
* Fix incorrect logs.
* Fix logging.
Enable peer's status messages log on DEBUG level.
* Fix watch task to monitor operation progress, but not local head progress.
* Add more logging information.
Remove recurring failures detection mechanism.
* Concentrate all sensitive writeFile/createPath calls in one place.
Fix eth2_network_simulation for Windows.
* Remove artifacts.
* fix import
Co-authored-by: Jacek Sieka <jacek@status.im>
* Fix continuous sync queue rewinds on slow PCs.
Fix recurring disconnects on low peer score.
* Calculate average syncing speed, not the current one.
Move speed calculation to different task.
* Address review comments.
* Bump nim-eth to get UseDiscv51 flag
* Switch medalla to discovery v5.1, other targets to v5.0
* Bump nim-eth for better discv5.1 logging
* Bump eth2-testnets for updated medalla bootnodes
Calculating rewards/penalties is slow due to how we compute sets of
attestations validators then use the sets for inclusion checks, to see
who attested. The dominant function during validated block processing /
epoch processing is hash set building and lookup.
This PR inverts the flow by removing the sets and creating a single
large validator status list, then applying all relevant state
attestations, then updating rewards and penalties.
This provides a 10x speedup to epoch processing which in turn speeds up
both empty slot and block processing - for example, on startup, we
replay all non-finalized blocks to prime fork choice - the same when
validating attestations or replaying states on reorg.
* misc memory and perf fixes
* use EpochRef for attestation aggregation
* compress effective balances in memory (medalla unfinalized: 4gb ->
1gb)
* avoid hitting db when rewinding to head or clearance state
* avoid hitting db when blocks can be applied to in-memory state -
speeds up startup considerably
* avoid storing epochref in fork choice
* simplify and speed up beacon block creation flow - avoids state reload
thanks to head rewind optimization
* iterator-based committee and attestation participation help avoid lots
of small memory allocations throughout epoch transition (40% speedup on
epoch processing, for example during startup)
* add constant for threshold
* update ve1.0.0-rc.0 preset spec references
* remove runtime preset ETH1_FOLLOW_DISTANCE from preset files; remove two CI build items to try to keep Travis from timing out
* update attestation extended validation to v1.0.0-rc.0
* attestation block and target must be within same epoch
* remove duplicate attestation epoch/target epoch check
* in spec now