Commit Graph

123 Commits

Author SHA1 Message Date
tersec 87ed9e62a1
doppelganger detection walltime refactor (#2656)
* strawman doppelganger detection walltime refactor

* move DoppelgangerProtection to Eth2Processor

* increase comment precision

* document difference between broadcastStartEpoch and nodeLaunchSlot, and allow for one-slot overlap to avoid false positives on intra-slot restarts
2021-06-17 11:51:04 +00:00
Jacek Sieka a5711ecf18
remove a few obsolete raises pops (#2654) 2021-06-16 14:45:05 +02:00
Zahary Karadjov ff8212e58e
Fix #2287 2021-06-10 11:39:53 +03:00
Jacek Sieka d859bc12f0
write uncompressed validator keys to database (#2639)
* write uncompressed validator keys to database

Loading 150k+ validator keys on startup in compressed format takes a lot
of time - better store them in uncompressed format which makes behaviour
just after startup faster / more predictable.

* refactor cached validator key access
* fix isomorphic cast to work with non-var instances
* remove cooked pubkey cache - directly use database cache in chaindag
as well (one less cache to keep in sync)
* bump blscurve, introduce loadValid for known-to-be-valid keys
2021-06-10 10:37:02 +03:00
Jacek Sieka abe0d7b4ae singe validator key cache
Instead of keeping a validator key list per EpochRef, this PR introduces
a single shared validator key list in ChainDAG, and cleans up some other
ChainDAG and key-related issues.

The PR does not introduce the validator key list in the state transition
- this is because we batch-check all signatures before entering the spec
code, thus the spec code never hits the cache.

A future refactor should _probably_ remove the threadvar altogether.

There's a few other small fixes in here that make the flow easier to
read:

* fix `var ChainDAGRef` -> `ChainDAGRef`
* fix `var QuarantineRef` -> `QuarantineRef`
* consistent `dag` variable name
* avoid using threadvar pubkey cache in most cases
* better error messages in batch signature checking
2021-06-01 20:43:44 +03:00
Jacek Sieka 7f52ffb8d9
clean up block processing (#2610)
* gossip_to_consensus -> block_processor (it's processing only blocks,
but not only from gossip)
* measure queue and validation time for blocks
* measure assignment and state loading times for updateStateData
* avoid some unnecessary block copies in block sync
* warn that database is corrupt if we hit tail without a state
2021-05-28 19:34:00 +03:00
tersec 46c5a0110a
log doppelganger attestation signature; rm withState.HashedBeaconState uses (#2608) 2021-05-28 15:51:15 +03:00
Jacek Sieka 867d8f3223
Perform attestation check before broadcast (#2550)
Currently, we have a bit of a convoluted flow where when sending
attestations, we start broadcasting them over gossip then pass them to
the attestation validation to include them in the local attestation pool
- it should be the other way around: we should be checking attestations
_before_ gossipping them - this serves as an additional safety net to
ensure that we don't publish junk - this becomes more important when
publishing attestations from the API.

Also, the REST API was performing its own validation meaning
attestations coming from REST would be validated twice - finally, the
JSON RPC wasn't pre-validating and would happily broadcast invalid
attestations.

* Unified attestation production pipeline with the same flow for gossip,
locally and API-produced attestations: all are now validated and entered
into the pool, then broadcast/republished
* Refactor subnet handling with specific SubnetId alias, streamlining
where subnets are computed, avoiding the need to pass around the number
of active validators
* Move some of the subnet handling code to eth2_network
* Use BitArray throughout for subnet handling
2021-05-10 09:13:36 +02:00
Dustin Brody 7f42d38219 rename initialize_beacon_state{_from_eth1,}; suppress warnings when doppelganger detection disabled 2021-04-28 00:12:41 +03:00
Jacek Sieka 7dba1b37dd
remove attestation/aggregate queue (#2519)
With the introduction of batching and lazy attestation aggregation, it
no longer makes sense to enqueue attestations between the signature
check and adding them to the attestation pool - this only takes up
valuable CPU without any real benefit.

* add successfully validated attestations to attestion pool directly
* avoid copying participant list around for single-vote attestations,
pass single validator index instead
* release decompressed gossip memory earlier, specially during async
message validation
* use cooked signatures in a few more places to avoid reloads and errors
* remove some Defect-raising versions of signature-loading
* release decompressed data memory before validating message
2021-04-26 22:39:44 +02:00
Jacek Sieka daf98e4330 fix committee vs subnet confusion 2021-04-18 14:17:45 +03:00
Ștefan Talpalaru 85acf55ad8
remove duplicate metric (#2507) 2021-04-15 13:42:40 +02:00
cheatfate 5885068c63 Rebase and fixes. 2021-04-09 21:42:13 +03:00
tersec 79bb0d5379
only deserialize attestation and aggregation gossiped signatures once (#2472)
* only deserialize attestation and aggregation gossiped signatures once

* re-indent some aggregate checks into block scope

* spelling

* remove debugging assertion

* put part of gossip validation back into block context

* attestation pool test signature loading isn't so unsafe, and exportRaw isn't free

* remove more development doAsserts; don't exportRaw in loops
2021-04-09 14:59:24 +02:00
Jacek Sieka 10d99c166c
print attestation/aggregate drop notice once per slot (#2475)
* add metrics for queue-related drops
* avoid importing beacon node conf in processor
2021-04-06 13:59:11 +02:00
Mamy Ratsimbazafy 6b13cdce36
Batch attestations (#2439)
* batch attestations

* Fixes (but now need to investigate the chronos 0 .. 4095 crash similar to https://github.com/status-im/nimbus-eth2/issues/1518

* Try to remove the processing loop to no avail :/

* batch aggregates

* use resultsBuffer size for triggering deadline schedule

* pass attestation pool tests

* Introduce async gossip validators. May fix the 4096 bug (reentrancy issue?) (similar to sync unknown blocks #1518)

* Put logging at debug level, add speed info

* remove unnecessary batch info when it is known to be one

* downgrade some logs to trace level

* better comments [skip ci]

* Address most review comments

* only use ref for async proc

* fix exceptions in eth2_network

* update async exceptions in gossip_validation

* eth2_network 2nd pass

* change to sleepAsync

* Update beacon_chain/gossip_processing/batch_validation.nim

Co-authored-by: Jacek Sieka <jacek@status.im>

Co-authored-by: Jacek Sieka <jacek@status.im>
2021-04-02 16:36:43 +02:00
Jacek Sieka 2695cfa864
EH cleanup (#2455)
almost 100% raises in nimbus-eth2 now!

* fix some rare exception-related crashes in json-rpc
2021-03-26 07:52:01 +01:00
Jacek Sieka 3cb31e66b4
set upper bound on EpochRef cache (#2403)
* set upper bound on EpochRef cache

* max 32 EpochRef instances
* less memory waste in BlockRef by removing EpochRef seq that is mostly
unused (~20mb)
* less memory waste in dag block lookup by not keeping an extra copy of
digest (~70mb)
* fix `==` and `$` for Eth2Digest
* remove `ChainDAG.tmpState` (~50mb?)

all in all, this branch cuts mainnet memory usage by ~160-180mb and puts
limits on EpochRef cache usage - where normally it hovered around 950mb
before, it's now sitting at 600-700mb on my machine.

* docs
2021-03-17 11:17:15 +01:00
tersec 5ebf36f54d
add metric for nextActionWait (#2399)
* add metric for nextActionWait

* use toFloatSeconds
2021-03-12 09:46:26 +00:00
Mamy Ratsimbazafy c47d636cb3
Split Eth2Processor in prep for batching (#2396)
* Split Eth2Processor in gossip and consensus part and materialize the shared block queue

* Update initialization in test_sync_manager
2021-03-11 11:10:57 +01:00
Mamy Ratsimbazafy 8e28a05cea
Move pruning out of latency critical path (#2384)
* Deferred DAG and fork choice pruning

* fixup

* Address https://github.com/status-im/nimbus-eth2/pull/2384/files#r589448448, rely only on onSLotEnd for state pruning

* no need to store needPruning in the data structure

* lastPrunePoint is updated in pruning proc

* Split eager and LazyPruning

* enforce pruning in updateHead
2021-03-09 15:36:17 +01:00
Mamy Ratsimbazafy de1060e7f3
centralize p2p validation in a single file and address https://github.com/status-im/nimbus-eth2/pull/2377#issuecomment-791313118 (#2383) 2021-03-06 08:32:55 +01:00
Mamy Ratsimbazafy d47f53cd9d
Reorg (5/5) (#2377)
* Reorg things left into networking and gossip_processing

* time -> beacon_clock

* fix builds
2021-03-05 14:12:00 +01:00