Commit Graph

101 Commits

Author SHA1 Message Date
tersec 0fd8bf7b56
spec URL updates (#3254) 2022-01-06 18:35:38 +00:00
Jacek Sieka 0a4728a241
Handle access to historical data for which there is no state (#3217)
With checkpoint sync in particular, and state pruning in the future,
loading states or state-dependent data may fail. This PR adjusts the
code to allow this to be handled gracefully.

In particular, the new availability assumption is that states are always
available for the finalized checkpoint and newer, but may fail for
anything older.

The `tail` remains the point where state loading de-facto fails, meaning
that between the tail and the finalized checkpoint, we can still get
historical data (but code should be prepared to handle this as an
error).

However, to harden the code against long replays, several operations
which are assumed to work only with non-final data (such as gossip
verification and validator duties) now limit their search horizon to
post-finalized data.

* harden several state-dependent operations by logging an error instead
of introducing a panic when state loading fails
* `withState` -> `withUpdatedState` to differentiate from the other
`withState`
* `updateStateData` can now fail if no state is found in database - it
is also hardened against excessively long replays
* `getEpochRef` can now fail when replay fails
* reject blocks with invalid target root - they would be ignored
previously
* fix recursion bug in `isProposed`
2022-01-05 19:38:04 +01:00
tersec b81c06edab
rename Beacon{Block,State}Fork.Merge to Bellatrix; update copyright years (#3240) 2022-01-04 09:45:38 +00:00
tersec da017d2ca5
update from phase0/altair v1.1.6 URLs to v1.1.8 spec URLs (#3238) 2022-01-04 03:57:15 +00:00
Jacek Sieka c4ce59e55b
Assorted logging improvements (#3237)
* log doppelganger detection when it activates and when it causes missed
duties
* less prominent eth1 sync progress
* log in-progress sync at notice only when actually missing duties
* better detail in replay log
* don't log finalization checkpoints - this is quite verbose when
syncing and already included in "Slot start"
2022-01-03 22:18:49 +01:00
tersec e78d12beb9
support GOSSIP_MAX_SIZE_MERGE blocks; prevent fork choice stutter via aggregate attestations (#3230)
* support GOSSIP_MAX_SIZE_MERGE-sized blocks; prevent fork choice clock stutter via aggregate attestations

* relay max gossip size to libp2p, use tight uncompressed bounds for fixed-size messages

* Update beacon_chain/networking/eth2_network.nim

Co-authored-by: Jacek Sieka <jacek@status.im>

* Update beacon_chain/networking/eth2_network.nim

Co-authored-by: Jacek Sieka <jacek@status.im>

Co-authored-by: Jacek Sieka <jacek@status.im>
2022-01-03 16:20:15 +00:00
Jacek Sieka 6b60a774e0
Lazy aggregated batch verification (#3212)
A novel optimisation for attestation and sync committee message
validation: when batching, we look for signatures of the same message
and aggregate these before batch-validating: this results in up to 60%
fewer signature verifications on a busy server, leading to a significant
reduction in CPU usage.

* increase batch size slightly which helps finding more aggregates
* add metrics for batch verification efficiency
* use simple `blsVerify` when there is only one signature to verify in
the batch, avoiding the RNG
2021-12-29 15:28:40 +01:00
tersec 1a6a56bdb1
use BeaconTime instead of Slot in fork choice (#3138)
* use v1.1.6 test vectors; use BeaconTime instead of Slot in fork choice

* tick through every slot at least once

* use div INTERVALS_PER_SLOT and use precomputed constants of them

* use correct (even if numerically equal) constant
2021-12-21 18:56:08 +00:00
Jacek Sieka c270ec21e4
Validator monitoring (#2925)
Validator monitoring based on and mostly compatible with the
implementation in Lighthouse - tracks additional logs and metrics for
specified validators so as to stay on top on performance.

The implementation works more or less the following way:
* Validator pubkeys are singled out for monitoring - these can be
running on the node or not
* For every action that the validator takes, we record steps in the
process such as messages being seen on the network or published in the
API
* When the dust settles at the end of an epoch, we report the
information from one epoch before that, which coincides with the
balances being updated - this is a tradeoff between being correct
(waiting for finalization) and providing relevant information in a
timely manner)
2021-12-20 20:20:31 +01:00
tersec d7799ecdcc
v1.1.6 spec updates (#3206) 2021-12-17 06:56:33 +00:00
Jacek Sieka 118840d241
SyncManager cleanups for backfill support (#3189)
* SyncManager cleanups for backfill support

Cleanups, fixes and simplifications, in anticipation of backfill support
for the `SyncManager`:

* reformat sync progress indicator to show time left and % done more
prominently:
  * old: `sync="sPssPsssss:2:2.4229:00h57m (2706898)"`
  * new: `sync="14d12h31m (0.52%) 1.1378slots/s (wQQQQQDDQQ:1287520)"`
* reset average speed when going out of sync
* pass all block errors to sync manager, including duplicate/unviable
* penalize peers for reporting a head block that is outside of our
expected wall clock time (they're likely on a different network or
trying to disrupt sync)
* remove `SyncFailureKind` (unused)
* remove `inRange` (unused)
* add `Q` for sync queue requests that are in the `SyncQueue` but not
yet in the `BlockProcessor` queue
* update last slot in `SyncQueue` after getting peer status
* fix race condition between `wakeupWaiters` and `resetWait`, where
workers would not be correctly reset if block verification returned a
completed future without event loop
* log syncmanager direction

* Fix ordering issue.
Some of the requests size of which are not equal to `chunkSize` could be processed in wrong order which could lead to sync process freezes.

Co-authored-by: cheatfate <eugene.kabanov@status.im>
2021-12-16 15:57:16 +01:00
tersec 36ade1c1c6
v1.1.6 spec updates (minor, mostly URLs) (#3197) 2021-12-14 21:02:29 +00:00
tersec f09686e835
update some spec URLs to v1.1.6 (#3188) 2021-12-13 15:45:48 +00:00
Jacek Sieka 03005f48e1
Backfill support for ChainDAG (#3171)
In the ChainDAG, 3 block pointers are kept: genesis, tail and head. This
PR adds one more block pointer: the backfill block which represents the
block that has been backfilled so far.

When doing a checkpoint sync, a random block is given as starting point
- this is the tail block, and we require that the tail block has a
corresponding state.

When backfilling, we end up with blocks without corresponding states,
hence we cannot use `tail` as a backfill pointer - there is no state.

Nonetheless, we need to keep track of where we are in the backfill
process between restarts, such that we can answer GetBeaconBlocksByRange
requests.

This PR adds the basic support for backfill handling - it needs to be
integrated with backfill sync, and the REST API needs to be adjusted to
take advantage of the new backfilled blocks when responding to certain
requests.

Future work will also enable moving the tail in either direction:
* pruning means moving the tail forward in time and removing states
* backwards means recreating past states from genesis, such that
intermediate states are recreated step by step all the way to the tail -
at that point, tail, genesis and backfill will match up.
* backfilling is done when backfill != genesis - later, this will be the
WSS checkpoint instead
2021-12-13 14:36:06 +01:00
Jacek Sieka dfbd50b4d6
avoid SyncCommitteMsgPool copy (#3185)
introduced by batch verification, when verifiers were made async
2021-12-11 16:39:24 +01:00
Jacek Sieka 069bccd51b
batch-verify sync messages for a small perf boost (#3151)
* batch-verify sync messages for a small perf boost

Generally reuses the same structure as attestation and aggregate
verification

* normalize `signatures` and `signature_batch` to use the same pattern
of verification
* normalize parameter names, order etc for signature stuff in general
* avoid calling `blsSign` directly - instead, go through `signatures`
consistently
2021-12-09 14:56:54 +02:00
tersec 2ca28fb861
Merge BeaconBlock gossip validation (#3165)
* Merge BeaconBlock gossip validation

* figure/ground inversion

* revert cosmetic cleanups to reduce merge conflicts
2021-12-08 17:29:22 +00:00
Jacek Sieka 1a8b7469e3
move quarantine outside of chaindag (#3124)
* move quarantine outside of chaindag

The quarantine has been part of the ChainDAG for the longest time, but
this design has a few issues:

* the function in which blocks are verified and added to the dag becomes
reentrant and therefore difficult to reason about - we're currently
using a stateful flag to work around it
* quarantined blocks bypass the processing queue leading to a processing
stampede
* the quarantine flow is unsuitable for orphaned attestations - these
should also should be quarantined eventually

Instead of processing the quarantine inside ChainDAG, this PR moves
re-queueing to `block_processor` which already is responsible for
dealing with follow-up work when a block is added to the dag

This sets the stage for keeping attestations in the quarantine as well.

Also:

* make `BlockError` `{.pure.}`
* avoid use of `ValidationResult` in block clearance (that's for gossip)
2021-12-06 10:49:01 +01:00
tersec e6921f808f
cleanups, partly from kintsugi branch (#3161)
* cleanups, partly from kintsugi branch

* re-export shortLog(EthBlock) and preserve exception messages in batchVerify and processBatch
2021-12-05 17:32:41 +00:00
tersec 4378f3f096
almost all remaining ethereum/{eth2.0-specs -> consensus-specs} (#3158) 2021-12-03 20:01:13 +00:00
tersec cc51f3fd12
v1.1.{5 -> 6} phase 0 and altair spec URL updates (#3157) 2021-12-03 17:40:23 +00:00
Jacek Sieka 065d72fb15 move head update to storeBlock
when blocks are supplied via rest, this ensures the newly posted head is
chosen
2021-12-03 11:18:37 +02:00
Jacek Sieka aa1dea03cd
speed up gossip and sync block validation (#3143)
* avoid recomputing hash for block signature check
* check block slot match before hitting the database
2021-12-01 10:52:40 +01:00
Jacek Sieka a223d62b07
Cleanups (#3123)
Renames and cleanups split out from the validator monitoring branch, so
as to reduce conflict area vs other PR:s

* add constants for expected message timing
* name validators after the messages they validate, mostly, to make
grepping easier
* unify field naming of EpochInfo across forks to make cross-fork code
easier
2021-11-25 13:20:36 +01:00
Jacek Sieka 9c2f43ed0e
Speed up altair block processing 2x (#3115)
* Speed up altair block processing >2x

Like #3089, this PR drastially speeds up historical REST queries and
other long state replays.

* cache sync committee validator indices
* use ~80mb less memory for validator pubkey mappings
* batch-verify sync aggregate signature (fixes #2985)
* document sync committee hack with head block vs sync message block
* add batch signature verification failure tests

Before:

```
../env.sh nim c -d:release -r ncli_db --db:mainnet_0/db bench --start-slot:-1000
All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
Validation is turned off meaning that no BLS operations are performed
    5830.675,        0.000,     5830.675,     5830.675,            1, Initialize DB
       0.481,        1.878,        0.215,       59.167,          981, Load block from database
    8422.566,        0.000,     8422.566,     8422.566,            1, Load state from database
       6.996,        1.678,        0.042,       14.385,          969, Advance slot, non-epoch
      93.217,        8.318,       84.192,      122.209,           32, Advance slot, epoch
      20.513,       23.665,       11.510,      201.561,          981, Apply block, no slot processing
       0.000,        0.000,        0.000,        0.000,            0, Database load
       0.000,        0.000,        0.000,        0.000,            0, Database store
```

After:

```
    7081.422,        0.000,     7081.422,     7081.422,            1, Initialize DB
       0.553,        2.122,        0.175,       66.692,          981, Load block from database
    5439.446,        0.000,     5439.446,     5439.446,            1, Load state from database
       6.829,        1.575,        0.043,       12.156,          969, Advance slot, non-epoch
      94.716,        2.749,       88.395,      100.026,           32, Advance slot, epoch
      11.636,       23.766,        4.889,      205.250,          981, Apply block, no slot processing
       0.000,        0.000,        0.000,        0.000,            0, Database load
       0.000,        0.000,        0.000,        0.000,            0, Database store
```

* add comment
2021-11-24 13:43:50 +01:00
tersec 9e395011d9
update 22 spec URLs to v1.1.5 (#3111) 2021-11-18 08:08:00 +00:00
tersec 2e868dc2ba
mass/mechanical update of 1.1.4 phase0 and altair spec URLs to 1.1.5 (#3067) 2021-11-09 07:40:41 +00:00
tersec 2c8600e746
mass/mechanical update of 1.1.3 phase0 spec URLs to 1.1.4 in markdown (#3059) 2021-11-08 09:26:18 +00:00
Zahary Karadjov 29e5700838 Bugfix: Avoid the aggregation of duplicate signatures when creating sync committee contributions 2021-11-07 21:41:10 +02:00
Jacek Sieka ea0a191723
Better REST/RPC error messages (#3046)
* Better REST/RPC error messages
* homogenise block logging (root first)
* homegenise message verification pipeline (verify in
`gossip_verification`, act in `eth2_processor`)
* use `subcommitteeIdx` consistently
* log each sent contribution
* fix block_sim
* fix block topic
* don't recalc root on gossip block validation
* move position loop into sync pool
2021-11-05 17:39:47 +02:00
Jacek Sieka a086cf01ac
altair fork handling cleanups (#3050)
* fix stack overflow crash in REST/debug/getStateV2
* introduce `ForkyXxx` for generic type matching of `Xxx` across
branches (SomeHashedBeaconState -> ForkyHashedBeaconState et al) -
`Some` is already used for other types of type classes
* consolidate function naming in BeaconChainDB, use some generics
* import `forks.nim` from other spec modules and move `Forked*` helpers
around to resolve circular imports
* remove `ForkedBeaconState`, use `ForkedHashedBeaconState` throughout
(less data shuffling between the types)
* fix several cases of states being stored on stack in tests, causing
random failures on some platforms
* remove reading json support from ncli - this should be ported to the
rest json reading instead (doesn't currently work because stack sizes)
2021-11-05 08:34:34 +01:00
tersec 8307e9c601
mechanical non-merge v1.1.2 to v1.1.3 spec URL updates (#3030) 2021-10-26 16:44:23 +00:00
Jacek Sieka 9cf32c3748 clean up sync subcommittee handling
* `SyncCommitteeIndex` -> `SyncSubcommitteeIndex`
* `syncCommitteePeriod` -> `sync_committee_period` (spec spelling)
* tighten period comparisons
* fix assert when validating committee message with non-altair state in
REST api
2021-10-20 22:59:13 +03:00
Jacek Sieka bf6ad41d7d add drop and sync committee metrics
* use storeBlock for processing API blocks
* avoid double block dump
* count all gossip metrics at the same spot
* simplify block broadcast
2021-10-20 18:20:12 +03:00
Jacek Sieka c247702ebc normalize subnet logging
* call it subnet id everywhere
* log aggregate sent from VC
* log subnet with aggregate
2021-10-20 15:06:44 +03:00
tersec c0a2f1c98e
refactor executionPayload tests; reduce HashSet creation (#3003) 2021-10-20 13:36:38 +02:00
Jacek Sieka df3fc9525f
import cleanup (#2997)
* import cleanup

...and remove some unused types

* add random imports

* more imports
2021-10-19 16:09:26 +02:00
tersec 10981639f1
update 27 spec URLs to v1.1.2 (#2989) 2021-10-13 16:49:06 +00:00
tersec 2ad1b7366a
update 62 spec URLs to v1.1.2 (#2979) 2021-10-12 10:17:37 +00:00
tersec 0ae736f397
update 67 spec URLs to v1.1.2 (#2977) 2021-10-12 08:09:59 +00:00
Etan Kissling 4743807079 use errReject template everywhere
There were still a few instances that used the expansion of `errReject`
instead of using the template itself. It seems that those cases were
forgotten as part of other cleanups in #2809. Done now for readability.
2021-09-29 14:16:09 +03:00
tersec 1dc94aa36f
update 40 spec URLs to v1.1.0 (#2918) 2021-09-28 18:23:15 +00:00
tersec ca4c6b4c5c
update 30 spec URLs to v1.1.0 (#2914) 2021-09-28 14:01:46 +00:00
tersec aec5cf2b1b
update 31 spec reference URLs to v1.1.0 (#2910) 2021-09-28 07:46:06 +00:00
Etan Kissling 01a9b275ec
handle duplicate pubkeys in sync committee (#2902)
When sync committee message handling was introduced in #2830, the edge
case of the same validator being selected multiple times as part of a
sync subcommittee was not covered. Not handling that edge case makes
sync contributions have a lower-than-expected participation rate as each
sync validator is only counted up through once per subcommittee.
This patch ensures that this edge case is properly covered.
2021-09-28 07:44:20 +00:00
Etan Kissling ba3884f449
ignore instead of reject duplicate sync msgs (#2903)
The P2P spec defines how certain error classes should be handled through
either IGNORE or REJECT verdicts. For sync committee message, the spec
defines that only the first message from each validator per subcommittee
and slot shall be accepted, the rest is ignored. However, current code
rejects those messages instead of ignoring them. Fixed to match spec.
2021-09-27 14:36:28 +00:00
tersec 2b2846b468
implement forked merge state/block support (#2890)
* implement forked state/block support

* merge support for containsOrphan; import cleanup; 80-column lines

* add merge block header operations and slot sanity fixture

* add epoch state transition tests; implement is_valid_gas_limit(), is_merge_block(), is_execution_enabled(), and compute_timestamp_at_slot()

* implement process_execution_payload() and add merge deposit operations tests

* add merge block sanity tests

* add merge case to syncCommitteeParticipants

* v1.1.0-beta.5 updates

* reduce getTestStates-based memory usage; don't try to REST-serialize ExecutionPayload transactions without underlying support

* add execution payload tests; switch var to let in tests/official/
2021-09-27 14:22:58 +00:00
Etan Kissling cc30bf63b4
update gossip_processing and attestation docs (#2860)
The README file explaining gossip_processing, and the attestation_flow
docs were no longer accurate, as attestations and aggregates no longer
go through a queue (pending batching). This patch updates the docs
accordingly. It also improves some grammar and fixes some typos.
2021-09-27 15:11:10 +02:00
Eugene Kabanov b566d4657f
REST /eth/v1/events API call implementation. (#2878)
* Placing callbacks into strategic places.

* Initial events call implementation.

* Post rebase fixes.

* Change addSyncContribution() implementation.

* Add `attestation-sent` event.
Remove gcsafe, raises from callbacks implementations.
Move `attestation-received` fire at the end of attestation processing.

* Address review comments.
2021-09-22 14:17:15 +02:00
Mamy Ratsimbazafy d1cb5b7220
Parallel attestation verification (#2718)
* Add parallel attestation verification

* Update tests, batchVerify doesn't use the threadpool with only single core (nim-blscurve update)

* bump nim-blscurve

* Debug info for failing eth2 test vectors

* remove submodule eth2-testnets

* verbose debugging of make failure on Windows (libbacktrace?)

* Remove CI debug mode

* initialization convention

* Fix new altair tests
2021-09-17 03:13:52 +03:00