Commit Graph

787 Commits

Author SHA1 Message Date
Jacek Sieka ee79c10a7d
update validator key cache on startup (#2760)
* update validator key cache on startup

Versions prior to 1.1.0 do not write a validator key cache at all.

Versions from 1.4.0 and upwards require an immutable validator key cache
to verify blocks - normally, block verification fills the cache but that
assumes that at least one block was verified by a version that has the
key cache.

Taken together, this breaks direct upgrades from anything <1.1.0 to
1.4.0.

The fix is simply to refresh fill the cache from an existing state on
startup.

* also log serious block validation failures at info level
2021-08-05 11:26:10 +03:00
Jacek Sieka 9193be9b7b
fix epoch logging (fixes #2283) (#2642)
Also put epoch first to disambiguate vs slot
2021-06-11 01:07:16 +03:00
Jacek Sieka d859bc12f0
write uncompressed validator keys to database (#2639)
* write uncompressed validator keys to database

Loading 150k+ validator keys on startup in compressed format takes a lot
of time - better store them in uncompressed format which makes behaviour
just after startup faster / more predictable.

* refactor cached validator key access
* fix isomorphic cast to work with non-var instances
* remove cooked pubkey cache - directly use database cache in chaindag
as well (one less cache to keep in sync)
* bump blscurve, introduce loadValid for known-to-be-valid keys
2021-06-10 10:37:02 +03:00
Kim De Mey 961b29ad5f
Remove no longer needed raw Exception raises for queryRandom (#2633) 2021-06-08 22:23:19 +02:00
tersec 8ebd496fbe
Altair transition tests (#2624)
* Working Altair transition tests

* with fixed upstream test vectors, remove state root workaround

* switch upgrade_to_altair() to returning a reference

* remove test_state_transition

* fix invalid fork state/block combinations error messages

* avoid memory copies by reintroducing state_transition_slots(var SomeHashedBeaconState)
2021-06-04 10:38:00 +00:00
Jacek Sieka b11da2cb34 fix state cache loading
* load the cache of the current state epoch instead of the target state
epoch, when applying states and slots
* load state cache for each slot/block (for longer slot jumps)
* load state cache after full updateStateData
* look up two state cache epochs, instead of the same epoch twice :)
2021-06-03 21:37:52 +03:00
tersec 28a5bca71a
split state_transition() into slots/block parts and use only block where appropriate (#2630) 2021-06-03 11:42:25 +02:00
Jacek Sieka abe0d7b4ae singe validator key cache
Instead of keeping a validator key list per EpochRef, this PR introduces
a single shared validator key list in ChainDAG, and cleans up some other
ChainDAG and key-related issues.

The PR does not introduce the validator key list in the state transition
- this is because we batch-check all signatures before entering the spec
code, thus the spec code never hits the cache.

A future refactor should _probably_ remove the threadvar altogether.

There's a few other small fixes in here that make the flow easier to
read:

* fix `var ChainDAGRef` -> `ChainDAGRef`
* fix `var QuarantineRef` -> `QuarantineRef`
* consistent `dag` variable name
* avoid using threadvar pubkey cache in most cases
* better error messages in batch signature checking
2021-06-01 20:43:44 +03:00
Zahary Karadjov ed2f6f753d Allow the custom testnet metadata to specify a path to the genesis file 2021-06-01 15:50:50 +03:00
tersec ea9ceb693a
update ChainDAG.effective_balance() to use StateData; rm ChainDAG.getBlockByPreciseSlot() (#2622)
* update ChainDAG.effective_balance() to use StateData; rm unused ChainDAG.getBlockByPreciseSlot()

* update get_effective_balances to avoid god object; avoid most memory allocation in Altair epoch reward and penalty processing
2021-06-01 12:40:13 +00:00
tersec a808a6755a
add Altair fork tests for upgrade_to_altair() (#2620)
* add Altair fork tests for upgrade_to_altair()

* Deposit processing ensures all keys are valid
2021-05-31 12:54:53 +00:00
tersec 8912196335
Fix Altair epoch processing and add/enable remaining epoch processing tests (#2619)
* fix Altair epoch state transitions

* re-enable and re-add near-complete Altair epoch processing test vectors: inactivity, justification & finalization, participation flag, slashings, and sync committee updates
2021-05-30 20:23:10 +00:00
tersec 820a6f65d5
use v1.1.0-alpha.6 test vectors (#2618)
* bump nim-eth2-vendors and use v1.1.0-alpha6 test vectors

* continue to download alpha 5 test vectors for const sanity checks
2021-05-30 09:51:01 +00:00
tersec e8c62553ec
reorganize SSZ-based tests to support phase0 and altair (#2612)
* reorganize SSZ-based tests to support phase0 and altair

* label Altair SSZ consensus object tests as such

* update test summary
2021-05-28 20:32:26 +00:00
Jacek Sieka 7f52ffb8d9
clean up block processing (#2610)
* gossip_to_consensus -> block_processor (it's processing only blocks,
but not only from gossip)
* measure queue and validation time for blocks
* measure assignment and state loading times for updateStateData
* avoid some unnecessary block copies in block sync
* warn that database is corrupt if we hit tail without a state
2021-05-28 19:34:00 +03:00
tersec c06ffc7804
proposed structure for altair (#2323)
* proposed structure for hf1

* refactor datatypes.nim into datatypes/{base, phase0, hf1}.nim

* hf1 is Altair

* some syncing with alpha 2

* adjust epoch processing to disambiguate access to RewardFlags

* relocate StateData to stay consistent with meaning phase 0 StateData

* passes v1.1.0 alpha 5 SSZ consensus object tests

* Altair block header test fixtures work

* fix slash_validator() so that Altair attester slashings, proposer slashings, and voluntary exit textures work

* deposit operation Altair test fixtures work

* slot sanity and all but a couple epoch transition tests switched to Altair

* attestation Altair test fixtures work

* Altair block sanity test fixtures work

* add working altair sync committee tests

* improve workarounds for sum-types-across-modules Nim bug; incorporate SignedBeaconBlock root reconstuction to SSZ byte reader
2021-05-28 15:25:58 +00:00
tersec 46c5a0110a
log doppelganger attestation signature; rm withState.HashedBeaconState uses (#2608) 2021-05-28 15:51:15 +03:00
tersec 3a59dc16c2
use v1.1.0 phase 0 test vectors (#2592)
* use v1.1.0 phase 0 test vectors

* re-export process_final_updates()
2021-05-24 10:42:40 +02:00
tersec 0b0bfd1de0
use StateData in place of BeaconState outside state transition code (#2551)
* use StateData in place of BeaconState outside state transition code

* propagate more StateData usage

* remove withStateVars().state

* wrap get_beacon_committee(BeaconState, ...) as gbc(StateData, ...)

* switch makeAttestation() to use StateData

* use StateData wrapper/dispatcher for get_committee_count_per_slot()

* convert AttestationCache.init(), weak subjectivity functions, and updateValidatorMetrics()

* add get_shuffled_active_validator_indices(StateData) and get_block_root_at_slot(StateData)

* switch makeAttestationData() to StateData

* sync AllTests-mainnet.md after rebase
2021-05-21 09:23:28 +00:00
tersec d8bb91d9a9
partially integrate eth1 merge changes (#2548)
* partially integrate eth1 merge changes

* use hexToSeqByte() and validate execution engine opaque transaction length

* remove incorrect REST serialization code
2021-05-20 10:44:13 +00:00
Eugene Kabanov 5b5ea2e813
Fix integer overflow issue in sync_manager. (#2564)
* Make Refactor rewind point assignment more concrete.

* Fix overflow issue in getRewindPoint().
Add tests.
2021-05-18 12:25:14 +02:00
Jacek Sieka 97f4e1fffe
Db1 cont (#2573)
* Revert "Revert "Upgrade database schema" (#2570)"

This reverts commit 6057c2ffb4.

* ssz: fix loading empty lists into existing instances

Not a problem earlier because we didn't reuse instances

* bump nim-eth

* bump nim-web3
2021-05-17 18:37:26 +02:00
tersec 6057c2ffb4
Revert "Upgrade database schema" (#2570)
This reverts commit 22ddf74752.
2021-05-17 06:34:44 +00:00
Jacek Sieka 22ddf74752 Upgrade database schema
The `kvstore` design we're using now turns out to not be the best way to
use `sqlite` - in particular, there are some significant benefits to
using rowid in certain situations and to keep data in separate tables.

With this branch, there are massive improvements in startup time
(seconds instead of minutes) and state/block storage and pruning times
(milliseconds instead of seconds) - these improvements can in particular
be seen on slow drives and translate directly into better attestation
performance.

* update kvstore to new keyspace design
* remove `DirStoreRef` and the hidden `--state-db-kind` option - this
was an experiment to store large blobs in files, but with the new
kvstore, there's no compelling reason to do so
* remove `DbMap` - unused and would need updating for new keyspace
design
* introduce separate tables for each data type (blocks, states etc)
* remove "WITHOUT ROWID" pessimization for tables with large blobs
* close DbSeq statements explicitly (and earlier)
* store beacon block summaries in separate table, without SSZ
compression and load them all with single query on startup
* stop storing backwards compat full states
* mark genesis beacon block as trusted
* avoid faststreams when loading SSZ data
* remove `DisagreementBehavior` (unused)
2021-05-14 20:05:23 +03:00
Jacek Sieka 0a477eb2d3
fix subnet logic (#2555)
This PR decreases the lead subscription time which should help
decrease bandwidth usage and CPU making the subscription for future
aggregation happen a bit later. There's room for more tuning here,
probably.

* fix missing negation from in #2550
* fix silly bitarray issues
* decrease subnet lead subscription time
* log all subnet switching source data
* rename subnet trackers to refer to stability and aggregate subnets
* more tests
2021-05-11 22:03:40 +02:00
Mamy Ratsimbazafy e6b559a35a
Slashing db pruning [Merge only after v2 has been default for 1 noticeable release] (#2452)
* Enable slashing DB pruning

* integrate slashing DB pruning with onSlotEnd

* rebase tests
2021-05-10 16:32:28 +02:00
Jacek Sieka 867d8f3223
Perform attestation check before broadcast (#2550)
Currently, we have a bit of a convoluted flow where when sending
attestations, we start broadcasting them over gossip then pass them to
the attestation validation to include them in the local attestation pool
- it should be the other way around: we should be checking attestations
_before_ gossipping them - this serves as an additional safety net to
ensure that we don't publish junk - this becomes more important when
publishing attestations from the API.

Also, the REST API was performing its own validation meaning
attestations coming from REST would be validated twice - finally, the
JSON RPC wasn't pre-validating and would happily broadcast invalid
attestations.

* Unified attestation production pipeline with the same flow for gossip,
locally and API-produced attestations: all are now validated and entered
into the pool, then broadcast/republished
* Refactor subnet handling with specific SubnetId alias, streamlining
where subnets are computed, avoiding the need to pass around the number
of active validators
* Move some of the subnet handling code to eth2_network
* Use BitArray throughout for subnet handling
2021-05-10 09:13:36 +02:00
Jacek Sieka 646923c3dd
add attestation stats tool to ncli_db (#2539)
This also makes future efforts to provide metrics and logs for
attestation efficiency easier

* Export rewards from epoch transition
* Use less memory for reward calculation (bool -> set[enum], field
alignment)
* Reuse reward memory when replaying, avoiding spike
* Allow replaying any range in ncli_db benchmark
2021-05-07 13:36:21 +02:00
Jacek Sieka 427c0f307c
avoid extraneous hash root calculation (#2537)
When applying a block, we'll currently compute a state root for the
state after slot processing but before block processing - this is
unnecessary when a block is being applied because the intermediate state
root is never observed.
2021-05-05 08:54:21 +02:00
Jacek Sieka efdf759cc0
avoid some slashing protection queries (#2528)
This PR reduces the number of database queries for slashing protection
from 5 reads and 1 write to 2 reads and 1 write in the optimistic case.

In the process, it removes user-level support for writing the database
in the version 1 format in order to simplify the code flow, and prevent
code rot. In particular, the v1 format was not covered by any unit tests
and has no advantages over v2. The concrete code to read and write it
remains for now, in particular to support upgrades from v1 to v2.

The branch also removes the use of concepts which doesn't work with
checked exceptions - in particular, this highlights code that both
raises exceptions and returns error codes, which could be cleaned up in
the future.

* Cache internal validator ID
* Rely on unique index to check for trivial duplicate votes
* Combine two surround vote queries into one
* Combine API for checking and registering slashing into single function

The slashing DB is normally not a bottleneck, but may become one with
high attached validator counts.
2021-05-04 15:17:28 +02:00
tersec e0f4d28116
rename initialize_beacon_state to initialize_beacon_state_from_eth1 (#2536) 2021-05-04 12:19:11 +02:00
Jacek Sieka ce49da6c0a
Introduce unittest2 and junit reports (#2522)
* Introduce unittest2 and junit reports

* fix XML path

* don't combine multiple CI runs

* fixup

* public combined report also

Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
2021-04-28 18:41:02 +02:00
Dustin Brody 7f42d38219 rename initialize_beacon_state{_from_eth1,}; suppress warnings when doppelganger detection disabled 2021-04-28 00:12:41 +03:00
Eugene Kabanov 0c3302b826
REST API test framework and tests. (#2492)
* REST API test framework and tests.

* Fix ValidatorIndex tests to properly handle int32, but not uint32 values.

* Fix tests to follow latest REST fixes.

* refactor restapi.sh

and add it to the test suite

* Fix issues.
Add delay timeout which is required.

* Fix restapi.sh script for Windows.

Co-authored-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
2021-04-27 23:46:24 +03:00
Jacek Sieka 7dba1b37dd
remove attestation/aggregate queue (#2519)
With the introduction of batching and lazy attestation aggregation, it
no longer makes sense to enqueue attestations between the signature
check and adding them to the attestation pool - this only takes up
valuable CPU without any real benefit.

* add successfully validated attestations to attestion pool directly
* avoid copying participant list around for single-vote attestations,
pass single validator index instead
* release decompressed gossip memory earlier, specially during async
message validation
* use cooked signatures in a few more places to avoid reloads and errors
* remove some Defect-raising versions of signature-loading
* release decompressed data memory before validating message
2021-04-26 22:39:44 +02:00
tersec 99fccaee6e
more abstraction over BeaconState (#2509)
* more abstraction over BeaconState

* use HashedBeaconState copy of htr
2021-04-16 08:49:37 +00:00
Jacek Sieka 4ed2e34a9e Revamp attestation pool
This is a revamp of the attestation pool that cleans up several aspects
of attestation processing as the network grows larger and block space
becomes more precious.

The aim is to better exploit the divide between attestation subnets and
aggregations by keeping the two kinds separate until it's time to either
produce a block or aggregate. This means we're no longer eagerly
combining single-vote attestations, but rather wait until the last
moment, and then try to add singles to all aggregates, including those
coming from the network.

Importantly, the branch improves on poor aggregate quality and poor
attestation packing in cases where block space is running out.

A basic greed scoring mechanism is used to select attestations for
blocks - attestations are added based on how much many new votes they
bring to the table.

* Collect single-vote attestations separately and store these until it's
time to make aggregates
* Create aggregates based on single-vote attestations
* Select _best_ aggregate rather than _first_ aggregate when on
aggregation duty
* Top up all aggregates with singles when it's time make the attestation
cut, thus improving the chances of grabbing the best aggregates out
there
* Improve aggregation test coverage
* Improve bitseq operations
* Simplify aggregate signature creation
* Make attestation cache temporary instead of storing it in attestation
pool - most of the time, blocks are not being produced, no need to keep
the data around
* Remove redundant aggregate storage that was used only for RPC
* Use tables to avoid some linear seeks when looking up attestation data
* Fix long cleanup on large slot jumps
* Avoid some pointers
* Speed up iterating all attestations for a slot (fixes #2490)
2021-04-13 20:24:02 +03:00
tersec 79bb0d5379
only deserialize attestation and aggregation gossiped signatures once (#2472)
* only deserialize attestation and aggregation gossiped signatures once

* re-indent some aggregate checks into block scope

* spelling

* remove debugging assertion

* put part of gossip validation back into block context

* attestation pool test signature loading isn't so unsafe, and exportRaw isn't free

* remove more development doAsserts; don't exportRaw in loops
2021-04-09 14:59:24 +02:00
Zahary Karadjov 9776fbfe17
Merge branch 'version-1.1.0' into unstable 2021-04-08 20:50:06 +03:00
Zahary Karadjov 61669f269d Clarify that the deposit cretion tools are intended for testnets 2021-04-08 19:58:22 +03:00
Jacek Sieka 7165e0ac31
Reset cached indices when resetting cache on SSZ read (#2480)
* Reset cached indices when resetting cache on SSZ read

When deserializing into an existing structure, the cache should be
cleared - goes for json also. Also improve error messages.
2021-04-08 13:11:04 +03:00
Jacek Sieka beceb060c4
Write state diffs to separate table (and experimentally, files instead of db) (#2460) 2021-04-06 21:56:45 +03:00
Mamy Ratsimbazafy 6b13cdce36
Batch attestations (#2439)
* batch attestations

* Fixes (but now need to investigate the chronos 0 .. 4095 crash similar to https://github.com/status-im/nimbus-eth2/issues/1518

* Try to remove the processing loop to no avail :/

* batch aggregates

* use resultsBuffer size for triggering deadline schedule

* pass attestation pool tests

* Introduce async gossip validators. May fix the 4096 bug (reentrancy issue?) (similar to sync unknown blocks #1518)

* Put logging at debug level, add speed info

* remove unnecessary batch info when it is known to be one

* downgrade some logs to trace level

* better comments [skip ci]

* Address most review comments

* only use ref for async proc

* fix exceptions in eth2_network

* update async exceptions in gossip_validation

* eth2_network 2nd pass

* change to sleepAsync

* Update beacon_chain/gossip_processing/batch_validation.nim

Co-authored-by: Jacek Sieka <jacek@status.im>

Co-authored-by: Jacek Sieka <jacek@status.im>
2021-04-02 16:36:43 +02:00
Jacek Sieka f821bc878e
Remove `-d:insecure` compile option (#2468)
With metrics running on top of chronos, the metrics server no longer
needs to be compiled in conditionally - it remains disabled by default.
2021-04-01 14:44:11 +02:00
tersec 49a5667288
update some v1.1.0 alpha1 to alpha2 (#2457)
* update some v1.1.0 alpha1 to alpha2

* remove unused getDepositMessage overload and move other out of datatypes/base

* bump nim-eth2-scenarios to download v1.1.0-alpha.2 test vectors

* construct object rather than result
2021-03-29 19:17:48 +00:00
Zahary Karadjov 2eacfc4685 Bump modules to take advantage of the new Json format flavors support
Since quite a lot of additional procs were now compiled as generics, this lead to compiler bugs that had
to be worked-around:

* The `Domain` type was renamed to `Eth2Domain` to avoid compilation errors
  due to conflicts with `nativesockets.Domain`.
  Similarly, `eth2_network.KeyPair` was renamed to `NetKeyPair`.

* A new more robust version of `hexToByteArray` was added to stew
2021-03-25 09:37:35 +02:00
Kim De Mey a4a2c1c0e1
Add discovery query with forkId and attnets filter (#2420) 2021-03-24 11:48:53 +01:00
tersec 97850741a0
remove unused imports in slashing prtection (#2436) 2021-03-19 08:26:02 +00:00
Jacek Sieka 3cb31e66b4
set upper bound on EpochRef cache (#2403)
* set upper bound on EpochRef cache

* max 32 EpochRef instances
* less memory waste in BlockRef by removing EpochRef seq that is mostly
unused (~20mb)
* less memory waste in dag block lookup by not keeping an extra copy of
digest (~70mb)
* fix `==` and `$` for Eth2Digest
* remove `ChainDAG.tmpState` (~50mb?)

all in all, this branch cuts mainnet memory usage by ~160-180mb and puts
limits on EpochRef cache usage - where normally it hovered around 950mb
before, it's now sitting at 600-700mb on my machine.

* docs
2021-03-17 11:17:15 +01:00
Kim De Mey 5e3770e994
Remove unused test (#2424) 2021-03-17 06:36:48 +00:00