Commit Graph

133 Commits

Author SHA1 Message Date
Jacek Sieka 157ddd2ac4
Fork choice fixes 5 (#1381)
* limit attestations kept in attestation pool

With fork choice updated, the attestation pool only needs to keep track
of attestations that will eventually end up in blocks - we can thus
limit the horizon of attestations that we keep more aggressively.

To get here, we expose getEpochRef which gets metadata about a
particular epochref, and make sure to populate it when a block is added
- this ensures that state rewinds during block addition are minimized.

In addition, we'll use the target root/epoch when validating
attestations - this helps minimize the number of different states that
we need to rewind to, in general.

* remove CandidateChains.justifiedState

unused

* remove BlockPools.Head object

* avoid quadratic quarantine loop

* fix
2020-07-28 13:54:32 +00:00
Jacek Sieka 48cebc7157
spec cleanups (#1379)
* add helper for beacon committee length (used for quickly validating
attestations)
* refactor some attestation checks to do cheap checks early
* validate attestation epoch before computing active validator set
* clean up documentation / comments
* fill state cache on demand
2020-07-27 16:04:44 +00:00
tersec 90708a8287
Prefer converting int` to uint64 and switch foo.len.uint64 to .len64 (#1375)
* avoid converting from uint64 to int, and where most feasible, int type conversion at all

* .len.uint64 -> .len64

* fix 32-bit compilation

* try keeping state_sim loop variable/bounds as int for 32-bit Azure

* len64 -> lenu64
2020-07-26 20:55:48 +02:00
Jacek Sieka fd4d319450
Use fork v2 (#1358)
* fork choice fixes, round 3

* introduce checkpoint tracker
* split out fork choice backend that is independent of dag
* correctly update best checkpoint to use for head selection
* correctly consider wall clock when processing attestations
* preload head history only (only one history is loaded from database
anyway)
* love the DAG

* switch to fork choice v2

also remove BlockRef.children

* fix
2020-07-25 21:41:12 +02:00
Jacek Sieka e0a18a3105
cache beacon committee size calculation (#1363)
* cache beacon committee size calculation

this fixes a bug in get_validator_churn_limit as well

* fix

* make committee counts consistently uint64

mixing feels like the worst of the two worlds
2020-07-23 19:01:07 +02:00
Jacek Sieka f0720faf17
Fork choice fixes (#1350)
* remove cruft

* reenable fork choice and fix several issues

* in addForkChoice_v2, the `.error` field would be accessed even when
Result is ok
* remove workaround for invalid block structure in fork choice
* fix `tmpState` being used recursively in callback, causing state
corruption while processing attestation
* fix block callback being called twice per block
* pass state to callback to avoid unnecessary rewinding

* enable head select, fix another bug

* never use `get` without `isOk`
* log nil blockref in case blockref is nil

* add missing error checking

* use correct epoch when updating attestation message
2020-07-22 11:42:55 +02:00
tersec 4a9a7be271
faster syncing (#1348)
* maybe faster syncing

* 80-character lines

* remove instrumentation debugEchos; fix target attestation epoch in attestation pool validation

* use the epoch-granularity matching in attestation.addResolved(...)
2020-07-22 09:51:45 +02:00
Jacek Sieka 8b01284b0e
cache block hash (#1329)
hash_tree_root was turning up when running beacon_node, turns out to be
repeated hash_tree_root invocations - this pr brings them back down to
normal.

this PR caches the root of a block in the SignedBeaconBlock object -
this has the potential downside that even invalid blocks will be hashed
(as part of deserialization) - later, one could imagine delaying this
until checks have passed

there's also some cleanup of the `cat=` logs which were applied randomly
and haphazardly, and to a large degree are duplicated by other
information in the log statements - in particular, topics fulfill the
same role
2020-07-16 15:16:51 +02:00
Zahary Karadjov 3ec6a02b12
Merge devel and resolve conflicts 2020-07-10 02:02:40 +03:00
Mamy Ratsimbazafy 3cdae9f6be
Dual headed fork choice [Revolution] (#1238)
* Dual headed fork choice

* fix finalizedEpoch not moving

* reduce fork choice verbosity

* Add failing tests due to pruning

* Properly handle duplicate blocks in sync

* test_block_pool also add a test for duplicate blocks

* comments addressing review

* Fix fork choice v2, was missing integrating block proposed

* remove a spurious debug writeStackTrace

* update block_sim

* Use OrderedTable to ensure that we always load parents before children in fork choice

* Load the DAG data in fork choice at init if there is some (can sync witti)

* Cluster of quarantined blocks were not properly added to the fork choice

* Workaround async gcsafe warnings

* Update blockpoool tests

* Do the callback before clearing the quarantine

* Revert OrderedTable, implement topological sort of DAG, allow forkChoice to be initialized from arbitrary finalized heads

* Make it work with latest devel - Altona readyness

* Add a recovery mechanism when forkchoice desyncs with blockpool

* add the current problematic node to the stack

* Fix rebase indentation bug (but still producing invalid block)

* Fix cache at epoch boundaries and lateBlock addition
2020-07-09 11:29:32 +02:00
Zahary Karadjov c4af4e2f35
Working test suite with run-time presets 2020-07-08 02:02:14 +03:00
Jacek Sieka 1301600341
Trusted blocks (#1227)
* cleanups

* fix ncli state root check flag
* add block dump to ncli_db
* limit ncli_db benchmark length
* tone down finalization logs

* introduce trusted blocks

We only store blocks whose signature we've verified in the database - as
such, there's no need to check it again, and most importantly, no need
to deserialize the signature when loading from database.

50x startup time improvement, 200x block load time improvement.

* fix rewinding when deposits have invalid signature
* speed up ancestor iteration by avoiding copy
* avoid deserializing signatures for trusted data
* load blocks lazily when rewinding (less memory used)

* chronicles workarounds

* document trustedbeaconblock
2020-06-25 12:23:10 +02:00
Mamy Ratsimbazafy 902093f57c
Revert "Dual headed fork choice [Reloaded] (#1223)" (#1234)
This reverts commit 6836d41ebd.
2020-06-25 11:36:03 +02:00
Mamy Ratsimbazafy 6836d41ebd
Dual headed fork choice [Reloaded] (#1223)
* Dual headed fork choice

* fix finalizedEpoch not moving

* reduce fork choice verbosity

* Add failing tests due to pruning

* Properly handle duplicate blocks in sync

* test_block_pool also add a test for duplicate blocks

* comments addressing review
2020-06-24 20:24:36 +02:00
tersec a683656238
send and validate with v0.12.1 attestations (#1213)
* send and validate with v0.12.1 attestations

* use EpochRef instead of empty cache in attestation validation
2020-06-23 10:38:59 +00:00
Eugene Kabanov f60235b3e9
Attestation validator now populates list of missing blocks. (#1211) 2020-06-23 11:29:08 +02:00
Jacek Sieka 49e9167b28 clean up dump feature
* don't write blocks that get added to database
* don't write states
* write to folders
* add state dumping feature to `ncli_db` to get any known state from the
database
2020-06-16 13:44:37 +00:00
Jacek Sieka 061be037d1 ncli_db: database tool
includes a benchmark tool for now
2020-05-28 17:43:02 +00:00
Jacek Sieka f06df1cea6 remove some copies
* in makeBeaconBlock - use rollback instead
* in tests - this helps state_sim give more accurate data and makes it
30% faster
* fix some usages of raw BeaconState
2020-05-22 17:15:35 +00:00
Jacek Sieka 7fbb8c0bc2
return block result details (#1049) 2020-05-21 19:08:31 +02:00
Mamy Ratsimbazafy c014f0b301
Split quarantine (#1038)
* split blockpool into hotDB and Quarantine

* Rename hotdb -> dag/candidate chains
2020-05-19 16:18:07 +02:00
tersec 74db0f3c8d
fix some XDeclaredButNotUsed hints (#1027) 2020-05-15 14:41:00 +02:00
Jacek Sieka a605c7244e simplify libp2p snappy
* handle a few more exceptions gracefully (in libp2p also)
* unify libp2p varint parsing
* decompress directly into seq
* avoid seq slice
* stop oversized snappy processing earlier (lowers risk)
2020-05-14 16:41:19 +03:00
tersec ba1d7e2ed4
switch state cache to use ref statedata objects to limit memory usage (#1007)
* switch state cache to use ref statedata objects to limit memory usage

* more directly initialize ref StateData

* use HashedBeaconState instead of StateData to try to fix memory leak

* switch cache to seq[ref HashedBeaconState]

* remove unused import

Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
2020-05-12 16:26:58 +00:00
Ștefan Talpalaru a7a50824a1
more metrics (#1004) 2020-05-11 06:25:49 +00:00
tersec c498103b2f
quick/minimal mitigation of beacon_node memory usage resulting from 2*Table.defaultInitialSize pointless BeaconState objects in block pool state cache (#1002) 2020-05-10 16:31:55 +00:00
tersec 093d298b2b
Increase finalization and finalization checking robustness (#990)
* fix some warnings related to beacon_node splitting; reimplement finalization verification more robustly; improve attestation pool block selection logic

* re-add missing import

* whitelist allowed state transition flags and make rollback/restore naming more consistent

* restore usage of update flags passed into skipAndUpdateState(...) in addition to the potential verifyFinalization flag

* switch rest of rollback -> restore
2020-05-09 12:43:15 +00:00
Ștefan Talpalaru c572f61129
bump vendor/nim-metrics 2020-05-09 01:13:57 +02:00
Jacek Sieka 01e9df97cb
cleanups (#962)
* remove broken serialized_sizes
* actually use sszdump module
* avoid bitops
* fix stack_sizes module name
2020-05-04 07:38:14 +02:00
Jacek Sieka 2449d4b479
cache empty slot state root (#961)
When replaying state transitions, for the slots that have a block, the
state root is taken from the block. For slots that lack a block, it's
currently calculated using hash_tree_root which is expensive.

Caching the empty slot state roots helps us avoid recalculating this
hash, meaning that for replay, hashes are never calculated. This turns
blocks into fairly lightweight "state-diffs"!

* avoid re-saving state when replaying blocks
* advance empty slots slot-by-slot and save root
* fix sim randomness
* fix sim genesis filename
* introduce `isEpoch` to check if a slot is an epoch slot
2020-05-03 19:44:04 +02:00
Jacek Sieka a3e098cf92
block pool simulator (#956)
* block pool simulator

like state_sim, but more
2020-05-01 17:51:24 +02:00
tersec 57aba5d3a6
Switch block pool caches from BeaconChainDB to TableRefs (#945)
* refactor blook pool caches to directly use TableRef to avoid SSZ decoding, which was consuming 20% of profile on mainnet eth2_network_simulation

* use table's hasKeyOrPut

* bump eth2 spec reference to v0.11.1

* cache whole StateData objects and switch from expensive clear() to cheaper new object instantiation for caching

* remove scaffolding and stop re-assigning to part of StateData object

* 80-character lines
2020-04-29 16:58:44 +02:00
Jacek Sieka 28d6cd2524
avoid memory allocations and copies when loading states (#942)
* rolls back some of the ref changes
* adds utility to calculate stack sizes
* works around bugs in nim exception handling and rvo
2020-04-28 10:08:32 +02:00
Jacek Sieka 898df9ba45
kvstore: port to nim-eth (#938) 2020-04-27 18:36:28 +02:00
tersec 7790644e52
remove a pointless hash_tree_root(BeaconState) per node per proposed block (#933)
* remove a pointless hash_tree_root(BeaconBlock)

* use ref with putState
2020-04-27 12:47:49 +02:00
Jacek Sieka 03a147ab8d
avoid state copy in state transition (#930)
In BlockPool, we keep the head state around, so it's trivial to restore
the temporary state there and keep going as if nothing happened.

This solves 3 problems:
* stack space - the state copy on mainnet is huge
* GC scanning - using stack space for state slows down the GC
significantly
* reckless copying - the copy itself takes a long time

In state_sim, we'll do the same and allocate on heap - this helps a
little with GC - without it, the collection of the temporary strings
created with `toHex` while printing the json dominates the trace.
2020-04-26 21:13:33 +02:00
Zahary Karadjov 4e9fa51ae9 Introduce BeaconNodeRef and use it in all the right places 2020-04-26 13:04:53 +03:00
Zahary Karadjov fdcbfdff05 Pass the test suite with a BeaconState ref type 2020-04-26 13:04:53 +03:00
Jacek Sieka 494ffb63ce
eh fixes (#926)
* work around improbable exceptions in metrics / chronos
* fix unnecessary lookup in block pool
2020-04-24 09:16:11 +02:00
tersec 7e94482409
avoid rewindStateData/state_transition in fast-path (#919)
* avoid rewindStateData/state_transition totally in fast-pathed updateStateData calls

* document motivation behind BlockPool.getStateDataCached(...) and clarify naming
2020-04-23 11:59:29 +02:00
Jacek Sieka 6729d3c032
kvstore: fix raising, be harsher on database errors (#923)
* kvstore: fix raising, be harsher on database errors

* bump stew/serialization
2020-04-23 08:27:35 +02:00
Jacek Sieka 65ca74c980 req: cap requested blocks better
also cap blocks in roots request
2020-04-22 12:09:26 +03:00
tersec bdea75e4c3
Restore all blockpool extended validation checks (#886)
* fix remaining block pool extended validation issues and re-enable first-block-received and block-signature EV checks; enable Merkle validation in beacon_node in eth2_network_simulation; refactor some Merkle proof generation code outside tests/ as a result

* re-enable Merkle validation skipping, since while it works on make eth2_network_simulation, it has issues with local testnet

* tighten already-seen-block blockpool check; move comment closer to conceptually proximate code; queue up maybe-future-valid-blocks as pending to keep libp2p-synchronous interrupt handling time lower

* revert the cleanups, now in a separate PR

* remove the remaining merkle_minimal cleanup remnants, also moved to other PR

* restore PR to only modifying one file after rebasing

* use signatures as summary to compare block contents

* switch signature comparison to be raw byte-wise to ensure no attempts to deserialize it to valid (or not) BLS signatures first
2020-04-21 18:52:53 +02:00
Jacek Sieka da988e4e40 more block range request updates
* handle skip == 0 gracefully
* avoid memory allocation at expense of more complex API
* add more tests
* log block request results
2020-04-21 11:07:27 +03:00
Jacek Sieka fccce85f0d simplify block request
this matches the intended spec behaviour
2020-04-21 11:07:27 +03:00
tersec 8a72ae89b9
fix mainnet finalization (#906)
* fix mainnet finalization and swith eth2_network_simulation to a kind of small-mainnet profile

* Fix slot reference in trace logging

* bump a couple of spec refs from v0.11.0 to v0.11.1

* bump another spec ref to v0.11.1, one more try at Jenkins test vector download CI issue

* fix other slot reference in trace logging and skip past single-block/multi-slot gaps to re-approach from ancestry side by state_transitioning, by requiring exact match on both root hash and slot for fast path

* make more precise the fast path condition

* redo logic to make uniform with BeaconChainDB; fix chronos deprecation warning

* revert not-working replacement of deprecated chronos futures `or`

* switch testnet1 to mainnet
2020-04-20 19:27:52 +02:00
tersec d25d674502
Remove more warnings, both deprecations and unused imports (#884)
* fix warnings by switching from deprecated chronos API addTimer(...) to setTimer(...) and removing especially some unnecessary chronicles and extras imports from test suite modules

* update a couple v0.10.1 spec references to v0.11.1
2020-04-11 19:41:50 +02:00
Jacek Sieka 0f47c76b50
reenable block bls verification (#875)
* finalizing state_transition

* cleanup

Co-authored-by: Dustin Brody <tersec@users.noreply.github.com>
2020-04-09 09:41:02 +00:00
Dustin Brody 2771deadfc re-add test_interop to all_tests and mark several v0.10.1 phase 0 spec references as v0.11.1 2020-04-07 13:16:55 +03:00
Dustin Brody f9e45dc121 document and temporary workaround for extended validation issue 2020-04-04 13:43:04 +00:00