Commit Graph

59 Commits

Author SHA1 Message Date
Jacek Sieka 22998fdfd4 avoid double deserialization
When blocks and attestations arrive, they are SSZ-decoded twice: once
for validation and once for processing. This branch enqueues the decoded
block directly for processing, avoiding the second, slow
deserialization.

* move processing of blocks and attestations to queue
* ...and out from beacon_node
* split attestation processing into attestations and aggregates
  * also updates metrics
* clean up logging to better follow the lifetime of gossip: arrival,
validation and processing
* drop attestations and aggregates if there are too many
* try to prioritise blocks and aggregates before single-validator
attestations
2020-08-21 11:46:25 +03:00
Dustin Brody bbc90afa27 fix attestation aggregation broadcasting 2020-08-21 11:32:43 +03:00
Jacek Sieka 46c94a18ba rework epoch cache referencing
* collect all epochrefs in specific blocks to make them easier to find
and to avoid lots of small seqs
* reuse validator key databases more aggressively by comparing keys
* make state cache available from within `withState`
* make epochRef available from within onBlockAdded callback
* integrate getEpochInfo into block resolution and epoch ref logic such
that epochrefs are created when blocks are added to pool or lazily when
needed by a getEpochRef
* fill state cache better from EpochRef, speeding up replay and
validation
* store epochRef in specific blocks to make them easier to find and
reuse
* fix database corruption when state is saved while replaying quarantine
* replay slots fully from block pool before processing state
* compare bls values more smartly
* store epoch state without block applied in database - it's recommended
to resync the node!

this branch will drastically speed up processing in times of long
non-finality, as well as cut memory usage by 10x during the recent
medalla madness.
2020-08-19 10:09:06 +03:00
Jacek Sieka 2a36949913
use epochcache for attesting (#1478) 2020-08-10 15:21:31 +02:00
Jacek Sieka 84a501d1ff
remove one cache, add another (#1449)
* remove one cache, add another

This cache removes the need for rewinding in most attestation validation
flow since the attestations come from one of two epochs and must be
targetting a viable block.

Additionally, it also removes all state caches which are less likely to
be used over-all - more metrics are needed to track the rewinding.

On risk is that when chains don't finalize, we'll have lots of epochrefs
in memory meaning lots of validator key databases, most being exactly
the same. This can be addressed in any number of ways. Some of the
memory usage is mitigated by the fact that we previously had lots of big
state caches and now we're keeping only keys instead.

* cleanups

* doc
2020-08-06 19:48:47 +00:00
Zahary Karadjov b427c7249f More logging and metrics (incoming gossip blocks; outgoing aggregated attestations) 2020-08-05 19:28:35 +03:00
Dustin Brody c142de4b7f be more consistent about pubkeys fed to verify_foo_signature() not being separately initialized, while pubkeys, generally, used for matching purposes, elsewhere explicitly initialized 2020-08-04 23:00:33 +03:00
Jacek Sieka c6674de5d2 use epoch ref to update fork choice
this dramatically speeds up startup in long periods of non-finality
2020-08-04 20:00:31 +03:00
tersec df80071bcf
update attestation and block validation to v0.12.2; clean up getAncestorAt()/get_ancestor() (#1417)
* update attestation validation to v0.12.2; clean up getAncestorAt()/get_ancestor()

* update beacon block validation to v0.12.2
2020-08-03 19:47:42 +00:00
Viktor Kirilov 0a96e5f564
renamed CandidateChains to ChainDagRef and made the Quarantine type a ref type so there is a single instance in the beacon node (#1407) 2020-07-31 14:49:06 +00:00
tersec e0a6f58abe
convert 10 v0.12.1 spec refs to v0.12.2 (#1406) 2020-07-31 09:59:14 +00:00
Viktor Kirilov c032366547
removed the BlockPool type and all of the proxy functions around it (#1401)
* removed the BlockPool type and all of the proxy functions around it - passing the chain DAG and the quarantine explicitly where appropriately - they don't need to be bundled in a type

* fixed the build after the rebase
2020-07-30 21:18:17 +02:00
Jacek Sieka c5fecd472f
more fork-choice fixes (#1388)
* more fork-choice fixes

* use target block/epoch to validate attestations
* make addLocalValidators sync
* add current and previous epoch to cache before doing state transition
* update head state using clearance state as a shortcut, when possible
* use blockslot for fork choice balances
* send attestations using epochref cache

* fix invalid finalized parent being used

also simplify epoch block traversal

* single error handling style in fork choice

* import fix, remove unused async
2020-07-30 17:48:25 +02:00
tersec 99f2d8e06c
update 14 v0.12.1 spec refs to v0.12.2 (#1400) 2020-07-30 09:47:57 +00:00
tersec d97cc35d30
switch 14 v0.12.1 spec refs to v0.12.2 spec refs (#1395) 2020-07-29 12:47:03 +00:00
Jacek Sieka 157ddd2ac4
Fork choice fixes 5 (#1381)
* limit attestations kept in attestation pool

With fork choice updated, the attestation pool only needs to keep track
of attestations that will eventually end up in blocks - we can thus
limit the horizon of attestations that we keep more aggressively.

To get here, we expose getEpochRef which gets metadata about a
particular epochref, and make sure to populate it when a block is added
- this ensures that state rewinds during block addition are minimized.

In addition, we'll use the target root/epoch when validating
attestations - this helps minimize the number of different states that
we need to rewind to, in general.

* remove CandidateChains.justifiedState

unused

* remove BlockPools.Head object

* avoid quadratic quarantine loop

* fix
2020-07-28 13:54:32 +00:00
Jacek Sieka 48cebc7157
spec cleanups (#1379)
* add helper for beacon committee length (used for quickly validating
attestations)
* refactor some attestation checks to do cheap checks early
* validate attestation epoch before computing active validator set
* clean up documentation / comments
* fill state cache on demand
2020-07-27 16:04:44 +00:00
tersec 20a2525390
v0.12.2 beacon chain protocol update (#1378) 2020-07-27 12:59:57 +02:00
Jacek Sieka fd4d319450
Use fork v2 (#1358)
* fork choice fixes, round 3

* introduce checkpoint tracker
* split out fork choice backend that is independent of dag
* correctly update best checkpoint to use for head selection
* correctly consider wall clock when processing attestations
* preload head history only (only one history is loaded from database
anyway)
* love the DAG

* switch to fork choice v2

also remove BlockRef.children

* fix
2020-07-25 21:41:12 +02:00
Jacek Sieka e0a18a3105
cache beacon committee size calculation (#1363)
* cache beacon committee size calculation

this fixes a bug in get_validator_churn_limit as well

* fix

* make committee counts consistently uint64

mixing feels like the worst of the two worlds
2020-07-23 19:01:07 +02:00
Jacek Sieka f0720faf17
Fork choice fixes (#1350)
* remove cruft

* reenable fork choice and fix several issues

* in addForkChoice_v2, the `.error` field would be accessed even when
Result is ok
* remove workaround for invalid block structure in fork choice
* fix `tmpState` being used recursively in callback, causing state
corruption while processing attestation
* fix block callback being called twice per block
* pass state to callback to avoid unnecessary rewinding

* enable head select, fix another bug

* never use `get` without `isOk`
* log nil blockref in case blockref is nil

* add missing error checking

* use correct epoch when updating attestation message
2020-07-22 11:42:55 +02:00
tersec 6b77f3dda5
update compute_subnet_for_attestation() to use https://github.com/ethereum/eth2.0-specs/pull/1876 signature, which isn't in v0.12.1, which works with lookahead (#1346) 2020-07-22 08:04:21 +00:00
Jacek Sieka 8b01284b0e
cache block hash (#1329)
hash_tree_root was turning up when running beacon_node, turns out to be
repeated hash_tree_root invocations - this pr brings them back down to
normal.

this PR caches the root of a block in the SignedBeaconBlock object -
this has the potential downside that even invalid blocks will be hashed
(as part of deserialization) - later, one could imagine delaying this
until checks have passed

there's also some cleanup of the `cat=` logs which were applied randomly
and haphazardly, and to a large degree are duplicated by other
information in the log statements - in particular, topics fulfill the
same role
2020-07-16 15:16:51 +02:00
tersec 26e893ffc2
restore EpochRef and flush statecaches on epoch transitions (#1312)
* restore EpochRef and flush statecaches on epoch transitions

* more targeted cache invalidation

* remove get_empty_per_epoch_cache(); implement simpler but still faster get_beacon_proposer_index()/compute_proposer_index() approach; add some abstraction layer for accessing the shuffled validator indices cache

* reduce integer type conversions

* remove most of rest of integer type conversion in compute_proposer_index()
2020-07-15 12:44:18 +02:00
Zahary Karadjov 93b04bc214 Add an option for graffiti customization 2020-07-12 21:01:31 +03:00
tersec 853bd5b799
quick workaround for epochref cache issue (#1296)
* quick workaround for epochref cache issue

* disable assertion which doesn't work without epochref caches

* get local testnets and altona running again
2020-07-12 17:09:49 +02:00
Mamy André-Ratsimbazafy 344175d1a2
Deactivate dual fork choice 2020-07-10 18:47:48 +02:00
Zahary Karadjov 3ec6a02b12
Merge devel and resolve conflicts 2020-07-10 02:02:40 +03:00
Mamy Ratsimbazafy 3cdae9f6be
Dual headed fork choice [Revolution] (#1238)
* Dual headed fork choice

* fix finalizedEpoch not moving

* reduce fork choice verbosity

* Add failing tests due to pruning

* Properly handle duplicate blocks in sync

* test_block_pool also add a test for duplicate blocks

* comments addressing review

* Fix fork choice v2, was missing integrating block proposed

* remove a spurious debug writeStackTrace

* update block_sim

* Use OrderedTable to ensure that we always load parents before children in fork choice

* Load the DAG data in fork choice at init if there is some (can sync witti)

* Cluster of quarantined blocks were not properly added to the fork choice

* Workaround async gcsafe warnings

* Update blockpoool tests

* Do the callback before clearing the quarantine

* Revert OrderedTable, implement topological sort of DAG, allow forkChoice to be initialized from arbitrary finalized heads

* Make it work with latest devel - Altona readyness

* Add a recovery mechanism when forkchoice desyncs with blockpool

* add the current problematic node to the stack

* Fix rebase indentation bug (but still producing invalid block)

* Fix cache at epoch boundaries and lateBlock addition
2020-07-09 11:29:32 +02:00
Dustin Brody 4140b3b9d9 update 29 spec refs to v0.12.1 2020-07-08 20:49:25 +00:00
Zahary Karadjov 318b225ccd
Merge devel and resolve the conflicts 2020-07-08 15:36:03 +03:00
Viktor Kirilov 1482b0430d - work towards more REST API endpoints being implemented
- testnets can now be launched with a separate validator client - make altona SCRIPT_PARAMS="--separateVC"
- reverted the ctrl+C signal handler code reuse - not necessary for the VC anyway (default is good enough)
- added a bit more logging in the VC
- removed unnecessary code in the VC - connect() just parses the address & port...
- fixed a couple more VC issues - when fetching the duties for an epoch fails on the BN side ==> the VC shouldn't be left in a broken state
- documented the currently supported json-rpc endpoints
- added more checks on the BN side for the API - bounds-checking the requests & also checking if the BN itself is synced
- other cleanup

currently a local sim doesn't finalize, but participation in the altona network with a separate VC is painless and works just as well as with in-process validators in a BN
2020-07-08 13:29:03 +03:00
Zahary Karadjov c4af4e2f35
Working test suite with run-time presets 2020-07-08 02:02:14 +03:00
tersec b4db2ad693
remove v0.11.3 support; add block_sim to CI (#1253)
* remove v0.11.3 support; add block_sim to CI

* rm stray PERSISTENT_COMMITTEE_PERIOD

* remove TopicFilter.InteropAttestations

* bump two comment-spec-refs to v0.12.1
2020-06-29 18:08:58 +00:00
tersec 807b920c19
state_transition implements the spec fairly directly (#1220) 2020-06-23 13:54:24 +00:00
tersec a683656238
send and validate with v0.12.1 attestations (#1213)
* send and validate with v0.12.1 attestations

* use EpochRef instead of empty cache in attestation validation
2020-06-23 10:38:59 +00:00
Dustin Brody db870fead4 update 20 beacon chain protocol spec refs 2020-06-18 08:39:28 +00:00
Dustin Brody ffca27b45f update 24 v0.11.x spec refs to v0.12.1 2020-06-17 12:11:03 +00:00
Jacek Sieka 49e9167b28 clean up dump feature
* don't write blocks that get added to database
* don't write states
* write to folders
* add state dumping feature to `ncli_db` to get any known state from the
database
2020-06-16 13:44:37 +00:00
Jacek Sieka 78b767f645
avoid genericAssign for beacon node types (#1166)
* avoid genericAssign for beacon node types

ok, I got fed up of this function messing up cpu measurements - it's so
ridiculously slow, it's sad.

before, while syncing:

```
40,65%  beacon_node_shared_witti_0  [.]
genericAssignAux__U5DxFPRpHCCZDKWQzM9adaw
   9,02%  libc-2.31.so                [.] __memmove_avx_unaligned_erms
   7,07%  beacon_node_shared_witti_0  [.] BIG_384_58_monty
   5,19%  beacon_node_shared_witti_0  [.] BIG_384_58_mul
   2,72%  beacon_node_shared_witti_0  [.] memcpy@plt
   1,18%  [kernel]                    [k] rb_next
   1,17%  beacon_node_shared_witti_0  [.] genericReset
   1,06%  [kernel]                    [k] map_private_extent_buffer
```

after:

```
  24,88%  beacon_node_shared_witti_0  [.] BIG_384_58_monty
  20,29%  beacon_node_shared_witti_0  [.] BIG_384_58_mul
   3,15%  beacon_node_shared_witti_0  [.] BIG_384_58_norm
   2,93%  beacon_node_shared_witti_0  [.] BIG_384_58_add
   2,55%  beacon_node_shared_witti_0  [.] BIG_384_58_sqr
   1,64%  beacon_node_shared_witti_0  [.] BIG_384_58_mod
1,63%  beacon_node_shared_witti_0  [.]
sha256Transform__BJNBQtWr9bJwzqbyfKXd38Q
   1,48%  beacon_node_shared_witti_0  [.] FP_BLS381_add
   1,39%  beacon_node_shared_witti_0  [.] BIG_384_58_sub
   1,33%  beacon_node_shared_witti_0  [.] BIG_384_58_dnorm
   1,14%  beacon_node_shared_witti_0  [.] FP2_BLS381_mul
   1,05%  beacon_node_shared_witti_0  [.] BIG_384_58_cmove
1,05%  beacon_node_shared_witti_0  [.]
get_shuffled_seq__4uncAHNsSG3Pndo5H11U9aQ
```

* better field iteration
2020-06-12 21:10:22 +02:00
Zahary Karadjov cf6a869e9e Address some TODO items; Handle start-up before genesis more properly 2020-06-11 17:40:08 +03:00
Zahary Karadjov c773e10c1a Attempt to reduce the risk of dropped network connections during the loading of KeyStores 2020-06-11 17:40:08 +03:00
Zahary Karadjov fdaf419e41 Address review comments 2020-06-11 17:40:08 +03:00
Zahary Karadjov 2acda1c115 Provide a default value for secretsDir (similar to validatorsDir) 2020-06-11 17:40:08 +03:00
Zahary Karadjov a75c632f7a Fixed launch_local_testnet; Renamed validator_keygen to keystore_directories 2020-06-11 17:40:08 +03:00
Zahary Karadjov 17343442ea Implement more of the KeyStore spec and integrate it in the beacon node 2020-06-11 17:40:08 +03:00
Viktor Kirilov 3bae40ae91 - extracted the commands to run a VC into a separate run_validator.sh script
- each BN gets half of its previous validators as inProcess and the other half goes to the respective VC for that BN - using separate data dirs where the keys are copied
    - also removed a few command line options which are no longer necessary
- block proposals originating from a VC are propagated from one BN to the rest properly
- other cleanup & moving code back to  since it is no longer used elsewhere
2020-06-10 13:50:50 +03:00
Mamy Ratsimbazafy ce897fe83f
[Split fork choice PR] Derisk-ed attestation checks changes (#1154)
* Derisked attestation pool improvements

* tune down frequent logs

* VoteTracker logging
2020-06-10 08:58:12 +02:00
Jacek Sieka 68b5638da4 allow non-power-of-2 limits in hashlist
fixes "make eth2_network_simulation"

a bit sad: no test coverage except against our own tests
2020-06-05 14:51:22 +03:00
Dustin Brody 3cb7896bab 12x speedup on state sim with 100k validators sans BLS by caching get_beacon_proposer_index(...) 2020-06-04 17:07:51 +00:00