Commit Graph

68 Commits

Author SHA1 Message Date
Jacek Sieka b08d0ff2ab
Optimistic mode (#4262)
In optimistic mode, Nimbus will sync optimistically even when the
execution client is offline / not available.

An optimistic node is less secure because it has not validated block
transactions via the execution client and can thus not be used for
validation duties.
2022-10-26 20:44:45 +00:00
tersec fb6e6d9cf4
remove `newPayload` from block production flow (#4186)
* remove `newPayload` from block production flow

* refactor block_processor to run `newPayload` as part of `storeBlock`
2022-10-14 22:48:56 +03:00
tersec c367b14ad9
deprecate `--safe-slots-to-import-optimistically` (#4182) 2022-09-29 06:29:49 +00:00
tersec 1819d79e07
avoid potential database inconsistency after fcU `INVALID`+crash (#4192)
* avoid database race-condition inconsistency after fcU `INVALID` then crash

* ensure head doesn't fall behind finalized; add more tests for head movement/reloading DAG
2022-09-28 21:07:31 +00:00
tersec 0f6d19b4b3
implement v1.2.0 optimistic sync tests (#4174)
* implement v1.2.0 optimistic sync tests

* Update beacon_chain/consensus_object_pools/blockchain_dag.nim

Co-authored-by: Etan Kissling <etan@status.im>

* `lvh` -> `latestValidHash` and only invalidate one specific block"

* `getEarliestInvalidRoot` -> `getEarliestInvalidBlockRoot`; `defaultEarliestInvalidRoot` -> `defaultEarliestInvalidBlockRoot`

Co-authored-by: Etan Kissling <etan@status.im>
2022-09-27 15:11:47 +03:00
tersec a0ead042ad
newPayload `INVALIDATED` should be `unviableFork` (#4180) 2022-09-26 21:24:32 +00:00
zah 154723947b
Don't search for the TTD block after the merge (#4152) 2022-09-20 09:17:25 +03:00
tersec ab3ac64b19
Remove optimistic sync candidate check (#4129) 2022-09-17 20:45:35 +00:00
tersec 19bf460a3b
more `withState` `state` -> `forkyState` (#4104) 2022-09-10 08:12:07 +02:00
tersec cd46af17e9
handle INVALIDATED forkchoiceUpdated better (#4081) 2022-09-07 22:54:37 +02:00
tersec bf3a014287
more efficient forkchoiceUpdated usage (#4055)
* more efficient forkchoiceUpdated usage

* await rather than asyncSpawn; ensure head update before dag.updateHead

* use action tracker rather than attached validators to check for next slot proposal; use wall slot + 1 rather than state slot + 1 to correctly check when missing blocks

* re-add two-fcU case for when newPayload not VALID

* check dynamicFeeRecipientsStore for potential proposal

* remove duplicate checks for whether next proposer
2022-09-07 20:34:52 +02:00
Etan Kissling 613f4a9a50
accelerate EL sync with LC with `--sync-light-client` (#4041)
When the BN-embedded LC makes sync progress, pass the corresponding
execution block hash to the EL via `engine_forkchoiceUpdatedV1`.
This allows the EL to sync to wall slot while the chain DAG is behind.
Renamed `--light-client` to `--sync-light-client` for clarity, and
`--light-client-trusted-block-root` to `--trusted-block-root` for
consistency with `nimbus_light_client`.

Note that this does not work well in practice at this time:
- Geth sticks to the optimistic sync:
  "Ignoring payload while snap syncing" (when passing the LC head)
  "Forkchoice requested unknown head" (when updating to LC head)
- Nethermind syncs to LC head but does not report ancestors as VALID,
  so the main forward sync is still stuck in optimistic mode:
  "Pre-pivot block, ignored and returned Syncing"

To aid EL client teams in fixing those issues, having this available
as a hidden option is still useful.
2022-08-29 12:16:35 +00:00
Etan Kissling 64972e3c8a
set `safe_block_hash` to fork choice justified (#4010)
Implements the fork choice safe block spec, where `safe_block_hash` in
`forkChoiceUpdated` is set to justified (used to be `ZERO_HASH`).
https://github.com/ethereum/consensus-specs/blob/v1.2.0-rc.3/fork_choice/safe-block.md#get_safe_execution_payload_hash
2022-08-25 23:34:02 +00:00
Etan Kissling eec6c04d32
do not descore peer when EL connection fails (#4020)
When the EL fails to respond to `newPayload`, e.g., because connection
to the EL got interrupted, or due to misconfiguration, optimistic blocks
cannot be imported according to spec. This condition is treated the same
as if the peer returned a block with missing parent which gets the block
out of our processing queue, but can have nasty side effects.

For example, if sync manager asks for validation of a block known to be
in the finalized range, if it receives a `MissingParent` verdict, the
peer is immediately removed from the peer pool.

```
DBG 2022-08-24 11:45:26.874+02:00 newPayload: inserting block into execution engine parentHash=e4ca7424 blockHash=36cdc198 stateRoot=cf3902c1 receiptsRoot=56e81f17 prevRandao=0b49a172 blockNumber=1518089 gasLimit=30000000 gasUsed=0 timestamp=1657980396 extraDataLen=0 baseFeePerGas=7 numTransactions=0
ERR 2022-08-24 11:45:26.875+02:00 newPayload failed                          msg="Transport is not initialised (missing a call to connect?)"
DBG 2022-08-24 11:45:26.875+02:00 Block pool rejected peer's response        topics="syncman" request=187232:32@1475 peer=16U*MsCJdx direction=forward blocks_map=xxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxx blocks_count=31 ok=false unviable=false missing_parent=true sync_ident=main
ERR 2022-08-24 11:45:26.875+02:00 Unexpected missing parent at finalized epoch slot topics="syncman" request=187232:32@1475 peer=16U*MsCJdx direction=forward rewind_to_slot=187232 blocks_count=31 blocks_map=xxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxx sync_ident=main
DBG 2022-08-24 11:45:26.875+02:00 Peer was removed from PeerPool due to low score topics="beacnde" peer=16U*MsCJdx peer_score=-1000 score_low_limit=0 score_high_limit=1000
DBG 2022-08-24 11:45:26.875+02:00 Lost connection to peer                    topics="networking" peer=16U*MsCJdx connections=0
```

By delaying issuing a verdict until the EL connection is restored and
`newPayload` successfully ran, the problem should be fixed. This also
induces back pressure to the sync manager by stopping download of new
blocks (or re-downloading the same block over and over again).
2022-08-24 16:55:41 +00:00
tersec 1d55743ebb
allow execution clients several seconds to construct blocks (#4012) 2022-08-23 19:19:52 +03:00
tersec c65eaca1bf
update spec ref URLs (#4005) 2022-08-20 16:03:32 +00:00
tersec f537f263df
don't use empty execution payload when newPayload rejects it (#3999)
* don't use empty execution payload when newPayload rejects it

* disallow optimistic import except when accepted/syncing
2022-08-20 00:20:57 +03:00
tersec 3ad1d251ef
make newPayload/forkchoiceUpdated failures errors (#3989) 2022-08-18 12:57:32 +00:00
Miran dfd4afc9f2
compatibility with Nim 1.4+ (#3888) 2022-07-29 10:53:42 +00:00
tersec 2f77f05a1a
optimistic block gossip validation (#3876) 2022-07-21 21:39:43 +03:00
tersec f4208cfb23
opportunistically even less async optimistic sync (#3880) 2022-07-21 21:26:36 +03:00
tersec 06c8e10ae2
move consensus_manager to consensus_object_pools (#3852) 2022-07-13 14:13:54 +00:00
tersec 1250c56e32
less async optimistic sync (#3842)
* less async optimistic sync

* use asyncSpawn; adapt changes to message router
2022-07-07 16:57:52 +00:00
Jacek Sieka e1830519a4
Introduce message router (#3829)
Whether new blocks/attestations/etc are produced internally or received
via REST, their journey through the node is the same - to ensure that
they get the same treatment (logging, metrics, processing), this PR
moves the routing to a dedicated module and fixes several small
differences that existed before.

* `xxxValidator` -> `processMessageName` - the processor also was adding
messages to pools, so we want the name to reflect that action
* add missing "sent" metrics for some messages
* document ignore policy better - already-seen messages are not actaully
rebroadcast by libp2p
* skip redundant signature checks for internal validators consistently
2022-07-06 16:11:44 +00:00
Etan Kissling 2a2bcea70d
group justified and finalized `Checkpoint` (#3841)
The justified and finalized `Checkpoint` are frequently passed around
together. This introduces a new `FinalityCheckpoint` data structure that
combines them into one.

Due to the large usage of this structure in fork choice, also took this
opportunity to update fork choice tests to the latest v1.2.0-rc.1 spec.
Many additional tests enabled, some need more work, e.g. EL mock blocks.
Also implemented `discard_equivocations` which was skipped in #3661,
and improved code reuse across fork choice logic while at it.
2022-07-06 13:33:02 +03:00
tersec 1221bb66e8
optimistic sync (#3793)
* optimistic sync

* flag that initially loaded blocks from database might need execution block root filled in

* return optimistic status in REST calls

* refactor blockslot pruning

* ensure beacon_blocks_by_{root,range} do not provide optimistic blocks

* handle forkchoice head being pre-merge with block being postmerge

* re-enable blocking head updates on validator duties

* fix is_optimistic_candidate_block per spec; don't crash with nil future

* fix is_optimistic_candidate_block per spec; don't crash with nil future

* mark blocks sans execution payloads valid during head update
2022-07-04 23:35:33 +03:00
Jacek Sieka 347a485b5b
bearssl: split abi (#3755) 2022-06-21 10:29:16 +02:00
tersec 2c623e5f92
don't try to fcU on pre-merge bellatrix blocks (#3773) 2022-06-18 13:39:21 +03:00
tersec d41c2a293b
rewrite merge sync (#3759) 2022-06-17 17:16:03 +03:00
tersec 8d421f3d91
keep fcU consistent with actual DAG (#3748) 2022-06-14 08:28:30 +00:00
tersec 65cecc50ca
cleanups: unused and duplicate imports, inconsistent naming conventions, URL updates (#3724) 2022-06-09 14:30:13 +00:00
tersec 62bfe97bbe
fix ExecutionPayload(Header) JSON serialization (#3679) 2022-06-01 14:57:28 +02:00
tersec dfd8cd22b7
bump nim-web3 and use engine API v1.0.0.alpha.9 (#3663) 2022-05-25 10:30:37 +00:00
tersec 1177f33363
standardize on upcoming/specified engine API timeouts (#3637) 2022-05-17 13:57:33 +00:00
tersec 104cc3053f
fcU on syncing newPayload syncing response (#3618) 2022-05-08 09:09:46 +02:00
tersec ab1fac7236
post-merge Bellatrix block proposals (#3570)
* post-merge Bellatrix block proposals

* tolerate running without an Eth1Monitor better

* remove obsolete comment

* use correct empty receipts root

* handle invalid CLI parameters in parseCmdArg overloads
2022-04-14 20:15:34 +00:00
Jacek Sieka f70ff38b53
enable `styleCheck:usages` (#3573)
Some upstream repos still need fixes, but this gets us close enough that
style hints can be enabled by default.

In general, "canonical" spellings are preferred even if they violate
nep-1 - this applies in particular to spec-related stuff like
`genesis_validators_root` which appears throughout the codebase.
2022-04-08 16:22:49 +00:00
Jacek Sieka 30eef0a369
Validator monitor polish (#3569)
* lower "Previous epoch attestation missing" to `NOTICE` for easier
filtering
* add delay logging to validator monitor logs
* simplify delay logging code post-`BeaconTime`
2022-04-06 09:23:01 +00:00
tersec 759a793764
use Eth1Monitor as abstraction; increase timeouts; handle newPayload 'accepted' (#3563) 2022-04-05 08:40:59 +00:00
tersec 9b43a76f2f
kiln beacon node (#3540)
* kiln bn

* use  version of beacon_chain_db

* have Eth1Monitor abstract more tightly over web3provider
2022-03-25 11:40:10 +00:00
Jacek Sieka c64bf045f3
remove StateData (#3507)
One more step on the journey to reduce `BlockRef` usage across the
codebase - this one gets rid of `StateData` whose job was to keep track
of which block was last assigned to a state - these duties have now been
taken over by `latest_block_root`, a fairly recent addition that
computes this block root from state data (at a small cost that should be
insignificant)

99% mechanical change.
2022-03-16 08:20:40 +01:00
tersec 79761c78a4
proc -> func, mainly in spec/state transition and adjecent modules (#3405) 2022-02-17 11:53:55 +00:00
Jacek Sieka f70aceef37
Harden handling of unviable forks (#3312)
* Harden handling of unviable forks

In our current handling of unviable forks, we allow peers to send us
blocks that come from a different fork - this is not necessarily an
error as it can happen naturally, but it does open up the client to a
case where the same unviable fork keeps getting requested - rather than
allowing this to happen, we'll now give these peers a small negative
score - if it keeps happening, we'll disconnect them.

* keep track of unviable forks in quarantine, to avoid filling it with
known junk
* collect peer scores in single module
* descore peers when they send unviable blocks during sync
* don't give score for duplicate blocks
* increase quarantine size to a level that allows finality to happen
under optimal conditions - this helps avoid downloading the same blocks
over and over in case of an unviable fork
* increase initial score for new peers to make room for one more failure
before disconnection
* log and score invalid/unviable blocks in requestmanager too
* avoid ChainDAG dependency in quarantine
* reject gossip blocks with unviable parent
* continue processing unviable sync blocks in order to build unviable
dag

* docs

* Update beacon_chain/consensus_object_pools/block_pools_types.nim

* add unviable queue test
2022-01-26 13:20:08 +01:00
tersec 9c0c9c98ce
complete switch to beacon_chain/specs/datatypes/bellatrix (#3295) 2022-01-18 13:36:52 +00:00
Jacek Sieka 20e700fae4
Harden CommitteeIndex, SubnetId, SyncSubcommitteeIndex (#3259)
* Harden CommitteeIndex, SubnetId, SyncSubcommitteeIndex

Harden the use of `CommitteeIndex` et al to prevent future issues by
using a distinct type, then validating before use in several cases -
datatypes in spec are kept simple though so that invalid data still can
be read.

* fix invalid epoch used in REST
`/eth/v1/beacon/states/{state_id}/committees` committee length (could
return invalid data)
* normalize some variable names
* normalize committee index loops
* fix `RestAttesterDuty` to use `uint64` for `validator_committee_index`
* validate `CommitteeIndex` on ingress in REST API
* update rest rules with stricter parsing
* better REST serializers
* save lots of memory by not using `zip` ...at least a few bytes!
2022-01-09 01:28:49 +02:00
tersec 1a6a56bdb1
use BeaconTime instead of Slot in fork choice (#3138)
* use v1.1.6 test vectors; use BeaconTime instead of Slot in fork choice

* tick through every slot at least once

* use div INTERVALS_PER_SLOT and use precomputed constants of them

* use correct (even if numerically equal) constant
2021-12-21 18:56:08 +00:00
Jacek Sieka c270ec21e4
Validator monitoring (#2925)
Validator monitoring based on and mostly compatible with the
implementation in Lighthouse - tracks additional logs and metrics for
specified validators so as to stay on top on performance.

The implementation works more or less the following way:
* Validator pubkeys are singled out for monitoring - these can be
running on the node or not
* For every action that the validator takes, we record steps in the
process such as messages being seen on the network or published in the
API
* When the dust settles at the end of an epoch, we report the
information from one epoch before that, which coincides with the
balances being updated - this is a tradeoff between being correct
(waiting for finalization) and providing relevant information in a
timely manner)
2021-12-20 20:20:31 +01:00
Jacek Sieka 118840d241
SyncManager cleanups for backfill support (#3189)
* SyncManager cleanups for backfill support

Cleanups, fixes and simplifications, in anticipation of backfill support
for the `SyncManager`:

* reformat sync progress indicator to show time left and % done more
prominently:
  * old: `sync="sPssPsssss:2:2.4229:00h57m (2706898)"`
  * new: `sync="14d12h31m (0.52%) 1.1378slots/s (wQQQQQDDQQ:1287520)"`
* reset average speed when going out of sync
* pass all block errors to sync manager, including duplicate/unviable
* penalize peers for reporting a head block that is outside of our
expected wall clock time (they're likely on a different network or
trying to disrupt sync)
* remove `SyncFailureKind` (unused)
* remove `inRange` (unused)
* add `Q` for sync queue requests that are in the `SyncQueue` but not
yet in the `BlockProcessor` queue
* update last slot in `SyncQueue` after getting peer status
* fix race condition between `wakeupWaiters` and `resetWait`, where
workers would not be correctly reset if block verification returned a
completed future without event loop
* log syncmanager direction

* Fix ordering issue.
Some of the requests size of which are not equal to `chunkSize` could be processed in wrong order which could lead to sync process freezes.

Co-authored-by: cheatfate <eugene.kabanov@status.im>
2021-12-16 15:57:16 +01:00
Jacek Sieka 03005f48e1
Backfill support for ChainDAG (#3171)
In the ChainDAG, 3 block pointers are kept: genesis, tail and head. This
PR adds one more block pointer: the backfill block which represents the
block that has been backfilled so far.

When doing a checkpoint sync, a random block is given as starting point
- this is the tail block, and we require that the tail block has a
corresponding state.

When backfilling, we end up with blocks without corresponding states,
hence we cannot use `tail` as a backfill pointer - there is no state.

Nonetheless, we need to keep track of where we are in the backfill
process between restarts, such that we can answer GetBeaconBlocksByRange
requests.

This PR adds the basic support for backfill handling - it needs to be
integrated with backfill sync, and the REST API needs to be adjusted to
take advantage of the new backfilled blocks when responding to certain
requests.

Future work will also enable moving the tail in either direction:
* pruning means moving the tail forward in time and removing states
* backwards means recreating past states from genesis, such that
intermediate states are recreated step by step all the way to the tail -
at that point, tail, genesis and backfill will match up.
* backfilling is done when backfill != genesis - later, this will be the
WSS checkpoint instead
2021-12-13 14:36:06 +01:00
Jacek Sieka 1a8b7469e3
move quarantine outside of chaindag (#3124)
* move quarantine outside of chaindag

The quarantine has been part of the ChainDAG for the longest time, but
this design has a few issues:

* the function in which blocks are verified and added to the dag becomes
reentrant and therefore difficult to reason about - we're currently
using a stateful flag to work around it
* quarantined blocks bypass the processing queue leading to a processing
stampede
* the quarantine flow is unsuitable for orphaned attestations - these
should also should be quarantined eventually

Instead of processing the quarantine inside ChainDAG, this PR moves
re-queueing to `block_processor` which already is responsible for
dealing with follow-up work when a block is added to the dag

This sets the stage for keeping attestations in the quarantine as well.

Also:

* make `BlockError` `{.pure.}`
* avoid use of `ValidationResult` in block clearance (that's for gossip)
2021-12-06 10:49:01 +01:00