Commit Graph

120 Commits

Author SHA1 Message Date
tersec e661f7d0c7
prevent uint64 to int64-induced RangeError/RangeDefects in metrics (#2358)
* prevent uint64 to int64-induced RangeError/RangeDefects in metrics

* remove redundant min(foo, int64.high)

* adjust spacing to be consistent
2021-03-01 20:55:25 +01:00
Jacek Sieka 3f8764ee61
fix replays stalling processing (#2361)
* fix replays stalling processing

Occasionally, attestations will arrive that vote for a target derived
either from the finalized block or earlier. In these cases, Nimbus would
replay the state transition of up to 32 epochs worth of blocks because
the finalized state has been pruned, delaying other processing and
leading to poor inclusion distance.

* put cheap attestation checks before forming EpochRef
* check that attestation target is not from an unviable history with
regards to finalization
* fix overly aggressive state pruning removing the state close to the
finalized checkpoint resulting in rare long replays for valid
attestations
* log long replays
* harden logging and traversal of nil BlockSlot

* simplify target check

no need to lookup target in chain dag again

* fixup

* fixup
2021-03-01 20:50:43 +01:00
tersec 97f7284e51
bump spec refs from v1.0.0 to v1.0.1 and update copyright years (#2357) 2021-02-25 13:37:22 +00:00
tersec 8d25663681
remove several IntSet usages in lieu of seq[ValidatorIndex] (#2288)
* remove several IntSet usages in lieu of seq[ValidatorIndex]

* convert smaller types to larger types

* larger type, again
2021-02-08 08:27:30 +01:00
tersec 1bdbf099cc
use IntSet rather than HashSet[ValidatorIndex] (#2267)
* use IntSet rather than HashSet[ValidatorIndex]

* add bounds check before uint64 -> int conversion

* use intsets in block transitions

* remove superfluous Nim issue explanation/reference
2021-01-26 12:52:00 +01:00
Mamy Ratsimbazafy 70a03658e3
Block validation flow v2 + Batch (serial) sig verification (#2250)
* bump nim-blscurve

* Outline the block validation flow

* introduce the SigVerified types, pass the tests

* Split clearance/quarantine to prepare for batch crypto verif

* Add a batch signature collector

* Make clearance use SigVerified block and split verification between crypto and state transition

* Always use signedBeaconBlock for the onBlockAdded callback

* RANDAO signing_root is the epoch instead of the full block

* Support skipping BLS for testing

* Fix compilation of the validator client

* Try to fix strange errors MacOS and Jenkins (Clang, unknown type name br_hmac_drbg_context in stdlib_assertions.nim.c)

* address https://github.com/status-im/nimbus-eth2/pull/2250#discussion_r561819858

* address https://github.com/status-im/nimbus-eth2/pull/2250#discussion_r561828025

* onBlockAdded callback should use TrustedSignedBeaconBlock https://github.com/status-im/nimbus-eth2/pull/2250#discussion_r561837261

* address https://github.com/status-im/nimbus-eth2/pull/2250#discussion_r561828946

* Use the application RNG: https://github.com/status-im/nimbus-eth2/pull/2250#discussion_r561815336

* Improve codegen of conversion zero-cost)

* Quick fixes with loadWithCache after #2259 (TODO: graceful error since pubkey validations is now done first in signatures_batch)

* Graceful handle rogue pubkeys and signatures now that those are lazy-loaded
2021-01-25 20:45:48 +02:00
Jacek Sieka 5ca10d3de2 avoid copying keys for trusted signatures
this removes `is_valid_indexed_attestion` from performance benchmarks
when replaying blocks
2021-01-25 19:48:02 +02:00
Jacek Sieka 5713a3ce4c
performance fixes (#2259)
* performance fixes

* don't mark tree cache as dirty on read-only List accesses
* store only blob in memory for keys and signatures, parse blob lazily
* compare public keys by blob instead of parsing / converting to raw
* compare Eth2Digest using non-constant-time comparison
* avoid some unnecessary validator copying

This branch will in particular speed up deposit processing which has
been slowing down block replay.

Pre (mainnet, 1600 blocks):

```
All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
Validation is turned off meaning that no BLS operations are performed
    3450.269,        0.000,     3450.269,     3450.269,            1, Initialize DB
       0.417,        0.822,        0.036,       21.098,         1400, Load block from database
      16.521,        0.000,       16.521,       16.521,            1, Load state from database
      27.906,       50.846,        8.104,     1507.633,         1350, Apply block
      52.617,       37.029,       20.640,      135.938,           50, Apply epoch block
```

Post:

```
    3502.715,        0.000,     3502.715,     3502.715,            1, Initialize DB
       0.080,        0.560,        0.035,       21.015,         1400, Load block from database
      17.595,        0.000,       17.595,       17.595,            1, Load state from database
      15.706,       11.028,        8.300,      107.537,         1350, Apply block
      33.217,       12.622,       17.331,       60.580,           50, Apply epoch block
```

* more perf fixes

* load EpochRef cache into StateCache more aggressively
* point out security concern with public key cache
* reuse proposer index from state when processing block
* avoid genericAssign in a few more places
* don't parse key when signature is unparseable
* fix `==` overload for Eth2Digest
* preallocate validator list when getting active validators
* speed up proposer index calculation a little bit
* reuse cache when replaying blocks in ncli_db
* avoid a few more copying loops

```
     Average,       StdDev,          Min,          Max,      Samples,         Test
Validation is turned off meaning that no BLS operations are performed
    3279.158,        0.000,     3279.158,     3279.158,            1, Initialize DB
       0.072,        0.357,        0.035,       13.400,         1400, Load block from database
      17.295,        0.000,       17.295,       17.295,            1, Load state from database
       5.918,        9.896,        0.198,       98.028,         1350, Apply block
      15.888,       10.951,        7.902,       39.535,           50, Apply epoch block
       0.000,        0.000,        0.000,        0.000,            0, Database block store
```

* clear full balance cache before processing rewards and penalties

```
All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
Validation is turned off meaning that no BLS operations are performed
    3947.901,        0.000,     3947.901,     3947.901,            1, Initialize DB
       0.124,        0.506,        0.026,      202.370,       363345, Load block from database
      97.614,        0.000,       97.614,       97.614,            1, Load state from database
       0.186,        0.188,        0.012,       99.561,       357262, Advance slot, non-epoch
      14.161,        5.966,        1.099,      395.511,        11524, Advance slot, epoch
       1.372,        4.170,        0.017,      276.401,       363345, Apply block, no slot processing
       0.000,        0.000,        0.000,        0.000,            0, Database block store
```
2021-01-25 13:04:18 +01:00
Jacek Sieka 7d5edb4353
use new stew helpers for assignment (#2172)
* bump libp2p (reduces libp2p gossip memory usage to ~1/3)
* use "generic" assign version
2020-12-16 09:37:22 +01:00
tersec 8d1443f03c
detect already-aggregate-voted condition before attestation pool; add is_aggregator tests (#2170)
* detect already-aggregate-voted condition before attestation pool; add is_aggregator tests

* replace pair of attestation-per-epoch tracking lists with single list and remove Option use

* fix attestation condition

* use safer type conversions; add more is_aggregator tests
2020-12-14 20:58:32 +00:00
Jacek Sieka e7f2735271
fix broken metrics during replay (#2090)
* move metrics out of state transition
* add validator count metric
* remove expensive beacon_current_validators, beacon_previous_validators
metrics (they should be reimplemented with cache), add cheap
beacon_active_validators to approximate
* remove unused validator count metrics
* tidy imports/defects
2020-11-27 23:16:13 +01:00
zah 372c9b798c
Fix the corrupted database state on Pyrmont nodes; Add mainnet genesis (#2056)
* Handle some web3 timeouts better

* Add support for developer .env files

* Eth1 improvements; Mainnet genesis state

Notable changes:

* The deposits table have been removed from the database. The client
  will no longer process all deposits on start-up.

* The network metadata now includes a "state snapshot" of the deposit
  contract. This allows the client to skip syncing deposits made prior
  to the snapshot (i.e. genesis). Suitable metadata added for Pyrmont
  and Mainnet.

* The Eth1 monitor won't be started unless there are validators attached
  to the node.

* The genesis detection code is now optional and disabled by default

* Bugfix: The client should not produce blocks that will fail validation
  when it hasn't downloaded the latest deposits yet

* Bugfix: Work around the database corruption affecting Pyrmont nodes

* Remove metadata for Toledo and Medalla
2020-11-24 22:21:47 +01:00
Jacek Sieka dbcc0686ff delay pruning of cache for finalized epoch (fixes #2049) 2020-11-20 20:57:50 +02:00
Jacek Sieka a6b188bfd4
misc fixes (#2027)
* log when database is loading (to avoid confusion)
* generate network keys later during startup
* fix quarantine not scheduling chain of parents for download and
increase size to one epoch
* log validator count, enr and peerid more clearly on startup
2020-11-16 20:15:43 +01:00
tersec 21c4ce8fd4
remove superfluous TODOs/not-really-TODOs, type conversion, imports (#2025) 2020-11-16 17:10:51 +01:00
Zahary Karadjov 389c11743a
Review TODO items and self-assign the most important ones 2020-11-10 20:41:04 +02:00
Jacek Sieka 6e40ca5e15 Use separate cache for getting epochref
...else tmpState gets overwritten when producing blocks and requesting
epochRef for finalized state.
2020-11-10 17:40:32 +02:00
tersec 271df8b604
bump 1.0.0rc-0 spec refs to 1.0.0 (#1974) 2020-11-09 14:18:55 +00:00
Jacek Sieka 95f5f76180
Datatype cleanup (#1953)
* clear up spec todo

* test fix

* remove unnecessary toSszType

* type

* one more
2020-11-04 21:52:47 +00:00
Jacek Sieka fc7885b27e Store block summary in database
This introcudes a cache for block summaries, useful for instantiating
the block dag on startup, bringing medalla startup times down from
minutes to seconds.

This is something of a temporary band-aid that would be obsoleted by a
finalized block store.
2020-11-04 11:28:55 +02:00
Zahary Karadjov 18639c3eff Don't require requests that might fail on non-archive Geth nodes 2020-11-03 23:23:10 +02:00
Jacek Sieka ca63d48b82
tone down validator exit logging (#1939)
Validators exiting is normal, no need to scream about it

* avoid reallocating seq on big exit queue
* avoid fetching state cache when updating head (it's rarely needed)
* remove incorrectly implemented live validator counts (avoids memory
allocs)
2020-11-02 18:34:23 +01:00
tersec 19c51e5900
use MAXIMUM_GOSSIP_CLOCK_DISPARITY for BeaconBlock verification (#1918) 2020-10-28 14:04:21 +01:00
Jacek Sieka ee2ebc96a8
move rpc api to own folder, mimic upstream structure (#1905)
also implements a few more endpoints
2020-10-27 10:00:57 +01:00
Jacek Sieka 499e5ca991
misc memory and perf fixes (#1899)
* misc memory and perf fixes

* use EpochRef for attestation aggregation
* compress effective balances in memory (medalla unfinalized: 4gb ->
1gb)
* avoid hitting db when rewinding to head or clearance state
* avoid hitting db when blocks can be applied to in-memory state -
speeds up startup considerably
* avoid storing epochref in fork choice
* simplify and speed up beacon block creation flow - avoids state reload
thanks to head rewind optimization
* iterator-based committee and attestation participation help avoid lots
of small memory allocations throughout epoch transition (40% speedup on
epoch processing, for example during startup)

* add constant for threshold
2020-10-22 12:53:33 +02:00
tersec a136c2e95a
bump libp2p; integrate pubsub.ValidationResult into extended validation (#1893) 2020-10-20 12:31:20 +00:00
tersec 5b13a5ba10
update attestation extended validation to v1.0.0-rc.0 (#1889)
* update attestation extended validation to v1.0.0-rc.0

* attestation block and target must be within same epoch

* remove duplicate attestation epoch/target epoch check

* in spec now
2020-10-19 09:25:06 +00:00
Jacek Sieka df43b8aa8b
save some more states after all (#1887)
Don't save states when replaying history, but do save states when
applying new blocks (!)
2020-10-18 15:47:39 +00:00
Zahary Karadjov 7a577b2cef More tests for getBlockRange 2020-10-15 20:15:51 +03:00
Zahary Karadjov 080609eee1 Address #1366 Avoid uint64 overflow in getBlockRange when skipStep is large 2020-10-15 20:15:51 +03:00
Jacek Sieka 6b9419e547
fix db growth on attestation processing (#1860)
It turns out that we often save lots of states in the database that are
the result of empty slot processing only - here, we make sure to only
save a state if a block follows - this fixes several issues:

* empty slot states are not always pruned leading to state database size
explosion
* storing states is (very) slow which slows down processing in general,
so we should only do it when it's likely to be useful
* attestation processing doesn't get stuck on saving random states that
won't appear in the chain history
2020-10-15 14:28:44 +02:00
tersec 3ee2dd8da4
p2p-interface spec ref bump (except non-updated places) (#1862) 2020-10-12 14:37:14 +00:00
tersec 1994ffe5a0
update 130+ spec references from v0.12.3 to v1.0.0-rc1 (#1854) 2020-10-12 08:59:24 +00:00
Zahary Karadjov deaddc1fc0 Address review comments 2020-10-07 09:32:03 +03:00
Zahary Karadjov aed291128a Add support for starting from weak subjectivity checkpoints
Also removes the `genesis.ssz` file stored in the data folder.
The `medalla-fast-sync` target has been adapted to use the new features.
2020-10-07 09:32:03 +03:00
Mamy Ratsimbazafy 0280d6c73e
Revisiting log levels (#1788)
* Update log level - https://github.com/status-im/nim-beacon-chain/issues/1779 https://github.com/status-im/nim-beacon-chain/issues/1785

* Address review comments

* Document the logging strategy [skip ci]
2020-10-01 20:56:42 +02:00
tersec 7eaaab908c
fix output of proposer slashing test fixture (#1780)
* fix output of proposer slashing test fixture

* run make test

* a few more v0.12.3 spec refs
2020-09-30 13:12:03 +00:00
tersec f96ad87d28
switch another 50+ spec refs from v0.12.2 to v0.12.3 (#1749) 2020-09-25 11:52:50 +00:00
tersec 6398a43cc1
update 120+ beacon_chain and validator spec refs from v0.12.2 to v0.12.3 (#1740) 2020-09-24 19:04:10 +02:00
tersec 0eb53f2802
avoid unpacking phase 1 tests to reduce Azure CI disk usage (#1736) 2020-09-24 17:16:00 +02:00
Jacek Sieka f0dbebfd3f
avoid storing empty slot states (#1720)
with the improved empty slot processing, these provide relatively little
benefit, but take up lots of storage that's difficult to free
2020-09-24 09:02:03 +02:00
tersec e106549efe
keep REJECT/IGNORE of messages failing validation for libp2p scoring (#1676)
* keep REJECT/IGNORE status of messages failing validation for libp2p scoring

* fix test suite
2020-09-18 13:53:09 +02:00
Jacek Sieka c76305f824
fix some todo (#1645)
* remove some superfluous gcsafes
* remove getTailState (unused)
* don't store old epochrefs in blocks
* document attestation pool a bit
* remove `pcs =` cruft from log
2020-09-14 14:50:03 +00:00
tersec 48893f1c2e
add ncli_db subcommand to prune database of unnecessary blocks and states (#1593)
* add ncli_db subcommand to prune database of unnecessary blocks, states, and state roots

* tweak comments

* reduce default aggressiveness in pruning old states

* move copyPrunedDatabase() to ncli_db, as it's not generally useful as part of beacon_chain_db and doesn't use any internal interfaces
2020-09-11 15:20:34 +02:00
Jacek Sieka 8a5a261fcd
Quick fix to prune some states, pending smarter state storage (#1624)
* Quick fix to prune some states, pending smarter state storage

Adverse effects might include slow rewinds - typically the protocol
doesn't ask for pre-finalized states but RPC might

* document issue, add test

* fix cache miss log
2020-09-11 10:03:50 +02:00
Jacek Sieka dde26f359f
better state cache reuse (#1612) 2020-09-08 09:23:48 +02:00
tersec 3d5f24f14c
stop discarding future epochs; remove a StateCache() construction (#1610)
* stop discarding non-existent future epochs during epoch state transitions; remove a pointless StateCache() construction in advance_slots()

* update nbench to pass StateCache to process_slots()
2020-09-07 15:04:33 +00:00
Jacek Sieka d584591ded
simplify libp2p logging (#1605)
and a few other small logging fixes
2020-09-06 10:39:25 +02:00
tersec 456bdc87cd
address issue #1552 (#1601) 2020-09-04 08:39:46 +02:00
tersec ab255662df
bound block quarantine size (#1564)
* bound block quarantine size

* add additional logging for block quarantining

* re-add quarantine.add() call

* remove pre-finalization blocks; add logging for full quarantine

* clear quarantine on chain reorganization

* update block_sim and tests

* update test_attestation_pool
2020-08-31 11:00:38 +02:00