Commit Graph

85 Commits

Author SHA1 Message Date
Mamy Ratsimbazafy 0280d6c73e
Revisiting log levels (#1788)
* Update log level - https://github.com/status-im/nim-beacon-chain/issues/1779 https://github.com/status-im/nim-beacon-chain/issues/1785

* Address review comments

* Document the logging strategy [skip ci]
2020-10-01 20:56:42 +02:00
tersec 7eaaab908c
fix output of proposer slashing test fixture (#1780)
* fix output of proposer slashing test fixture

* run make test

* a few more v0.12.3 spec refs
2020-09-30 13:12:03 +00:00
tersec f96ad87d28
switch another 50+ spec refs from v0.12.2 to v0.12.3 (#1749) 2020-09-25 11:52:50 +00:00
tersec 6398a43cc1
update 120+ beacon_chain and validator spec refs from v0.12.2 to v0.12.3 (#1740) 2020-09-24 19:04:10 +02:00
tersec 0eb53f2802
avoid unpacking phase 1 tests to reduce Azure CI disk usage (#1736) 2020-09-24 17:16:00 +02:00
Jacek Sieka f0dbebfd3f
avoid storing empty slot states (#1720)
with the improved empty slot processing, these provide relatively little
benefit, but take up lots of storage that's difficult to free
2020-09-24 09:02:03 +02:00
tersec e106549efe
keep REJECT/IGNORE of messages failing validation for libp2p scoring (#1676)
* keep REJECT/IGNORE status of messages failing validation for libp2p scoring

* fix test suite
2020-09-18 13:53:09 +02:00
Jacek Sieka c76305f824
fix some todo (#1645)
* remove some superfluous gcsafes
* remove getTailState (unused)
* don't store old epochrefs in blocks
* document attestation pool a bit
* remove `pcs =` cruft from log
2020-09-14 14:50:03 +00:00
tersec 48893f1c2e
add ncli_db subcommand to prune database of unnecessary blocks and states (#1593)
* add ncli_db subcommand to prune database of unnecessary blocks, states, and state roots

* tweak comments

* reduce default aggressiveness in pruning old states

* move copyPrunedDatabase() to ncli_db, as it's not generally useful as part of beacon_chain_db and doesn't use any internal interfaces
2020-09-11 15:20:34 +02:00
Jacek Sieka 8a5a261fcd
Quick fix to prune some states, pending smarter state storage (#1624)
* Quick fix to prune some states, pending smarter state storage

Adverse effects might include slow rewinds - typically the protocol
doesn't ask for pre-finalized states but RPC might

* document issue, add test

* fix cache miss log
2020-09-11 10:03:50 +02:00
Jacek Sieka dde26f359f
better state cache reuse (#1612) 2020-09-08 09:23:48 +02:00
tersec 3d5f24f14c
stop discarding future epochs; remove a StateCache() construction (#1610)
* stop discarding non-existent future epochs during epoch state transitions; remove a pointless StateCache() construction in advance_slots()

* update nbench to pass StateCache to process_slots()
2020-09-07 15:04:33 +00:00
Jacek Sieka d584591ded
simplify libp2p logging (#1605)
and a few other small logging fixes
2020-09-06 10:39:25 +02:00
tersec 456bdc87cd
address issue #1552 (#1601) 2020-09-04 08:39:46 +02:00
tersec ab255662df
bound block quarantine size (#1564)
* bound block quarantine size

* add additional logging for block quarantining

* re-add quarantine.add() call

* remove pre-finalization blocks; add logging for full quarantine

* clear quarantine on chain reorganization

* update block_sim and tests

* update test_attestation_pool
2020-08-31 11:00:38 +02:00
Jacek Sieka fa1621db46
implement clock disparity for attestation validation (#1568)
This implements disparity, resolving a part of
https://github.com/status-im/nim-beacon-chain/issues/1367

* make BeaconTime a duration for fractional seconds
* factor out attestation/aggregate validation
* simplify recording of queued attestations
* simplify attestation signature check
* fix blocks_received metric
* add some trivial validation tests
* remove unresolved attestation table - attestations for unknown blocks
are dropped instead (cannot verify their signature)
2020-08-27 09:34:12 +02:00
Jacek Sieka 2081c4f505
avoid unnecessary seq allocations (#1573)
"sigh"
2020-08-27 08:32:51 +02:00
tersec 83667dddfe
harden beacon_pending_deposits metrics calculation against overflow (#1566)
* harden beacon_pending_deposits metrics calculation

* ...

* move beacon_pending_deposits and beacon_processed_deposits_total out of specs and into chain DAG
2020-08-26 17:25:39 +02:00
Mamy Ratsimbazafy 81788becfc
Fork choice - almost free pruning - fix #1534 (#1535)
* initial - cheaper pruning - addresses  #1534

* Pass tests: update offset when pruning, proper handling of pruned parents

* Use options instead of nil for nilable newHead (finalization passing but rootcause not solved)

* First line of defense against stackoverflow in tests

* Fix compute_delta offset after pruning

* Rebase fix - medalla ready

* Remove Option[BlockRef]
2020-08-26 17:23:34 +02:00
Jacek Sieka f26d6a4fd3
reuse validator key cache better (#1562)
new key cache can be used for old epochs in the same tree
2020-08-26 17:06:40 +02:00
Jacek Sieka 7de05efaaf small perf fixes
* don't sort shuffled_validator_indices, just get them directly with
iteration
* grab full epoch of proposer indices while we have the data available -
they'll get cached and reused
* avoid computing active validator set when not used for logging
2020-08-19 14:51:04 +03:00
Jacek Sieka 46c94a18ba rework epoch cache referencing
* collect all epochrefs in specific blocks to make them easier to find
and to avoid lots of small seqs
* reuse validator key databases more aggressively by comparing keys
* make state cache available from within `withState`
* make epochRef available from within onBlockAdded callback
* integrate getEpochInfo into block resolution and epoch ref logic such
that epochrefs are created when blocks are added to pool or lazily when
needed by a getEpochRef
* fill state cache better from EpochRef, speeding up replay and
validation
* store epochRef in specific blocks to make them easier to find and
reuse
* fix database corruption when state is saved while replaying quarantine
* replay slots fully from block pool before processing state
* compare bls values more smartly
* store epoch state without block applied in database - it's recommended
to resync the node!

this branch will drastically speed up processing in times of long
non-finality, as well as cut memory usage by 10x during the recent
medalla madness.
2020-08-19 10:09:06 +03:00
Jacek Sieka a1bd44f4b0
small spec cleanups (#1501)
* clean up logging a bit
* return error on indexed attestation check
2020-08-13 13:47:06 +00:00
Jacek Sieka 58d77153fc
fix invalid state root being written to database (#1493)
* fix invalid state root being written to database

When rewinding state data, the wrong block reference would be used when
saving the state root - this would cause state loading to fail by
loading a different state than expected, preventing blocks to be
applied.

* refactor state loading and saving to consistently use and set
StateData block
* avoid rollback when state is missing from database (as opposed to
being partially overwritten and therefore in need of rollback)
* don't store state roots for empty slots - previously, these were used
as a cache to avoid recalculating them in state transition, but this has
been superceded by hash tree root caching
* don't attempt loading states / state roots for non-epoch slots, these
are not saved to the database
* simplify rewinder and clean up funcitions after caches have been
reworked
* fix chaindag logscope
* add database reload metric
* re-enable clearance epoch tests

* names
2020-08-13 11:50:05 +02:00
Jacek Sieka 5da25e76be
avoid rewind in fork choice application (#1489) 2020-08-12 04:49:52 +00:00
Jacek Sieka 8b0f2cc96f
share validator keys in EpochRef (#1486) 2020-08-11 21:39:53 +02:00
Zahary Karadjov 30a8ec410d More spec compliant blocksByRange requests
* Eliminate possibilities for range errors and overflows
* Handle more properly invalid requests for furute slots
* Eliminate the confusing surrounding the MAX_REQUEST_BLOCKS constant

Addresses https://github.com/status-im/nim-beacon-chain/issues/1366
2020-08-10 22:09:13 +03:00
Jacek Sieka 2a36949913
use epochcache for attesting (#1478) 2020-08-10 15:21:31 +02:00
Jacek Sieka 3b6a8a692d
cleanup unused chaindag epoch features
these are somewhat obsoleted by the more extensive use of EpochRef
2020-08-07 19:49:52 +02:00
Jacek Sieka 84a501d1ff
remove one cache, add another (#1449)
* remove one cache, add another

This cache removes the need for rewinding in most attestation validation
flow since the attestations come from one of two epochs and must be
targetting a viable block.

Additionally, it also removes all state caches which are less likely to
be used over-all - more metrics are needed to track the rewinding.

On risk is that when chains don't finalize, we'll have lots of epochrefs
in memory meaning lots of validator key databases, most being exactly
the same. This can be addressed in any number of ways. Some of the
memory usage is mitigated by the fact that we previously had lots of big
state caches and now we're keeping only keys instead.

* cleanups

* doc
2020-08-06 19:48:47 +00:00
Jacek Sieka deaeb62de3
clean up quarantine 2020-08-05 16:19:55 +02:00
Jacek Sieka 15b99e4c11
cache beacon proposer indices (#1440)
also clear old epochrefs as they're growing unwieldy

in particular, this speeds up gossip block validation by avoiding the
rewind
2020-08-05 08:28:43 +02:00
Dustin Brody c142de4b7f be more consistent about pubkeys fed to verify_foo_signature() not being separately initialized, while pubkeys, generally, used for matching purposes, elsewhere explicitly initialized 2020-08-04 23:00:33 +03:00
Jacek Sieka ac78e75bf8
lear missing on orphan add in quarantine (#1441) 2020-08-04 19:49:25 +00:00
Jacek Sieka 70df0ad057 don't mark quarantined blocks as missing 2020-08-04 22:37:06 +03:00
Jacek Sieka c6674de5d2 use epoch ref to update fork choice
this dramatically speeds up startup in long periods of non-finality
2020-08-04 20:00:31 +03:00
tersec df80071bcf
update attestation and block validation to v0.12.2; clean up getAncestorAt()/get_ancestor() (#1417)
* update attestation validation to v0.12.2; clean up getAncestorAt()/get_ancestor()

* update beacon block validation to v0.12.2
2020-08-03 19:47:42 +00:00
Viktor Kirilov 0a96e5f564
renamed CandidateChains to ChainDagRef and made the Quarantine type a ref type so there is a single instance in the beacon node (#1407) 2020-07-31 14:49:06 +00:00
tersec e0a6f58abe
convert 10 v0.12.1 spec refs to v0.12.2 (#1406) 2020-07-31 09:59:14 +00:00
Viktor Kirilov c032366547
removed the BlockPool type and all of the proxy functions around it (#1401)
* removed the BlockPool type and all of the proxy functions around it - passing the chain DAG and the quarantine explicitly where appropriately - they don't need to be bundled in a type

* fixed the build after the rebase
2020-07-30 21:18:17 +02:00
Jacek Sieka c5fecd472f
more fork-choice fixes (#1388)
* more fork-choice fixes

* use target block/epoch to validate attestations
* make addLocalValidators sync
* add current and previous epoch to cache before doing state transition
* update head state using clearance state as a shortcut, when possible
* use blockslot for fork choice balances
* send attestations using epochref cache

* fix invalid finalized parent being used

also simplify epoch block traversal

* single error handling style in fork choice

* import fix, remove unused async
2020-07-30 17:48:25 +02:00
tersec 99f2d8e06c
update 14 v0.12.1 spec refs to v0.12.2 (#1400) 2020-07-30 09:47:57 +00:00
Jacek Sieka 157ddd2ac4
Fork choice fixes 5 (#1381)
* limit attestations kept in attestation pool

With fork choice updated, the attestation pool only needs to keep track
of attestations that will eventually end up in blocks - we can thus
limit the horizon of attestations that we keep more aggressively.

To get here, we expose getEpochRef which gets metadata about a
particular epochref, and make sure to populate it when a block is added
- this ensures that state rewinds during block addition are minimized.

In addition, we'll use the target root/epoch when validating
attestations - this helps minimize the number of different states that
we need to rewind to, in general.

* remove CandidateChains.justifiedState

unused

* remove BlockPools.Head object

* avoid quadratic quarantine loop

* fix
2020-07-28 13:54:32 +00:00
Jacek Sieka fd4d319450
Use fork v2 (#1358)
* fork choice fixes, round 3

* introduce checkpoint tracker
* split out fork choice backend that is independent of dag
* correctly update best checkpoint to use for head selection
* correctly consider wall clock when processing attestations
* preload head history only (only one history is loaded from database
anyway)
* love the DAG

* switch to fork choice v2

also remove BlockRef.children

* fix
2020-07-25 21:41:12 +02:00
Jacek Sieka fb2f742972
Fork choice fixes 2 (#1356)
* fork choice cleanup

* enable v2 pruning
* prefer `get_current_epoch`
* fix finalization check to use correct epoch

* small cleanups

* add `count_active_validators`
* remove misleading logs
* fix justified checkpoint slot calculation in rpc
2020-07-22 23:01:44 +02:00
Jacek Sieka f0720faf17
Fork choice fixes (#1350)
* remove cruft

* reenable fork choice and fix several issues

* in addForkChoice_v2, the `.error` field would be accessed even when
Result is ok
* remove workaround for invalid block structure in fork choice
* fix `tmpState` being used recursively in callback, causing state
corruption while processing attestation
* fix block callback being called twice per block
* pass state to callback to avoid unnecessary rewinding

* enable head select, fix another bug

* never use `get` without `isOk`
* log nil blockref in case blockref is nil

* add missing error checking

* use correct epoch when updating attestation message
2020-07-22 11:42:55 +02:00
tersec 4a9a7be271
faster syncing (#1348)
* maybe faster syncing

* 80-character lines

* remove instrumentation debugEchos; fix target attestation epoch in attestation pool validation

* use the epoch-granularity matching in attestation.addResolved(...)
2020-07-22 09:51:45 +02:00
Jacek Sieka 7e0bf7b1b5
Add separate state for clearance (#1352)
When clearing blocks, a callback is called - this callback, if it uses
`tmpState`, will be corrupted because it's not fully up to date when the
callback is called - we thus introduce a specific state cache for this
purpose - ideally, it can be removed later when epoch caching is
improved.

Incidentally, this helps block sync speed a lot - without this state,
the block sync would ping-pong between attestation state and block state
which is costly.
2020-07-22 08:25:13 +02:00
tersec 83abbcb917
drop get_attesting_indices()/get_unslashed_attesting_indices() from 15% to 1% of workload at block_sim at 100k validators (#1351) 2020-07-21 18:35:43 +02:00
Jacek Sieka 8b01284b0e
cache block hash (#1329)
hash_tree_root was turning up when running beacon_node, turns out to be
repeated hash_tree_root invocations - this pr brings them back down to
normal.

this PR caches the root of a block in the SignedBeaconBlock object -
this has the potential downside that even invalid blocks will be hashed
(as part of deserialization) - later, one could imagine delaying this
until checks have passed

there's also some cleanup of the `cat=` logs which were applied randomly
and haphazardly, and to a large degree are duplicated by other
information in the log statements - in particular, topics fulfill the
same role
2020-07-16 15:16:51 +02:00