Merge pull request #2770 from sigp/opt-sync-2

Optimistic Sync
This commit is contained in:
Danny Ryan 2022-01-25 10:26:29 -07:00 committed by GitHub
commit d5e4828aec
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 416 additions and 2 deletions

View File

@ -868,6 +868,7 @@ class PySpecCommand(Command):
specs/bellatrix/fork.md specs/bellatrix/fork.md
specs/bellatrix/fork-choice.md specs/bellatrix/fork-choice.md
specs/bellatrix/validator.md specs/bellatrix/validator.md
sync/optimistic.md
""" """
if len(self.md_doc_paths) == 0: if len(self.md_doc_paths) == 0:
raise Exception('no markdown files specified, and spec fork "%s" is unknown', self.spec_fork) raise Exception('no markdown files specified, and spec fork "%s" is unknown', self.spec_fork)

View File

@ -29,6 +29,7 @@ Readers should understand the Phase 0 and Altair documents and use them as a bas
- [Why was the max gossip message size increased at Bellatrix?](#why-was-the-max-gossip-message-size-increased-at-bellatrix) - [Why was the max gossip message size increased at Bellatrix?](#why-was-the-max-gossip-message-size-increased-at-bellatrix)
- [Req/Resp](#reqresp) - [Req/Resp](#reqresp)
- [Why was the max chunk response size increased at Bellatrix?](#why-was-the-max-chunk-response-size-increased-at-bellatrix) - [Why was the max chunk response size increased at Bellatrix?](#why-was-the-max-chunk-response-size-increased-at-bellatrix)
- [Why allow invalid payloads on the P2P network?](#why-allow-invalid-payloads-on-the-p2p-network)
<!-- END doctoc generated TOC please keep comment here to allow auto update --> <!-- END doctoc generated TOC please keep comment here to allow auto update -->
<!-- /TOC --> <!-- /TOC -->
@ -85,6 +86,10 @@ The *type* of the payload of this topic changes to the (modified) `SignedBeaconB
Specifically, this type changes with the addition of `execution_payload` to the inner `BeaconBlockBody`. Specifically, this type changes with the addition of `execution_payload` to the inner `BeaconBlockBody`.
See Bellatrix [state transition document](./beacon-chain.md#beaconblockbody) for further details. See Bellatrix [state transition document](./beacon-chain.md#beaconblockbody) for further details.
Blocks with execution enabled will be permitted to propagate regardless of the
validity of the execution payload. This prevents network segregation between
[optimistic](/sync/optimistic.md) and non-optimistic nodes.
In addition to the gossip validations for this topic from prior specifications, In addition to the gossip validations for this topic from prior specifications,
the following validations MUST pass before forwarding the `signed_beacon_block` on the network. the following validations MUST pass before forwarding the `signed_beacon_block` on the network.
Alias `block = signed_beacon_block.message`, `execution_payload = block.body.execution_payload`. Alias `block = signed_beacon_block.message`, `execution_payload = block.body.execution_payload`.
@ -92,6 +97,15 @@ Alias `block = signed_beacon_block.message`, `execution_payload = block.body.exe
then validate the following: then validate the following:
- _[REJECT]_ The block's execution payload timestamp is correct with respect to the slot - _[REJECT]_ The block's execution payload timestamp is correct with respect to the slot
-- i.e. `execution_payload.timestamp == compute_timestamp_at_slot(state, block.slot)`. -- i.e. `execution_payload.timestamp == compute_timestamp_at_slot(state, block.slot)`.
- If `exection_payload` verification of block's parent by an execution node is *not* complete:
- [REJECT] The block's parent (defined by `block.parent_root`) passes all
validation (excluding execution node verification of the `block.body.execution_payload`).
- otherwise:
- [IGNORE] The block's parent (defined by `block.parent_root`) passes all
validation (including execution node verification of the `block.body.execution_payload`).
The following gossip validation from prior specifications MUST NOT be applied if the execution is enabled for the block -- i.e. `is_execution_enabled(state, block.body)`:
- [REJECT] The block's parent (defined by `block.parent_root`) passes validation.
### Transitioning the gossip ### Transitioning the gossip
@ -100,6 +114,14 @@ details on how to handle transitioning gossip topics for Bellatrix.
## The Req/Resp domain ## The Req/Resp domain
Non-faulty, [optimistic](/sync/optimistic.md) nodes may send blocks which
result in an INVALID response from an execution engine. To prevent network
segregation between optimistic and non-optimistic nodes, transmission of an
INVALID execution payload via the Req/Resp domain SHOULD NOT cause a node to be
down-scored or disconnected. Transmission of a block which is invalid due to
any consensus layer rules (i.e., *not* execution layer rules) MAY result in
down-scoring or disconnection.
### Messages ### Messages
#### BeaconBlocksByRange v2 #### BeaconBlocksByRange v2
@ -181,3 +203,26 @@ valid block sizes in the range of gas limits expected in the medium term.
As with both gossip and req/rsp maximum values, type-specific limits should As with both gossip and req/rsp maximum values, type-specific limits should
always by simultaneously respected. always by simultaneously respected.
### Why allow invalid payloads on the P2P network?
The specification allows blocks with invalid execution payloads to propagate across
gossip and via RPC calls. The reasoning for this is as follows:
1. Optimistic nodes must listen to block gossip to obtain a view of the head of
the chain.
2. Therefore, optimistic nodes must propagate gossip blocks. Otherwise, they'd
be censoring.
3. If optimistic nodes will propagate blocks via gossip, then they must respond
to requests for the parent via RPC.
4. Therefore, optimistic nodes must send optimistic blocks via RPC.
So, to prevent network segregation from optimistic nodes inadvertently sending
invalid execution payloads, nodes should never downscore/disconnect nodes due to such invalid
payloads. This does open the network to some DoS attacks from invalid execution
payloads, but the scope of actors is limited to validators who can put those
payloads in valid (and slashable) beacon blocks. Therefore, it is argued that
the DoS risk introduced in tolerable.
More complicated schemes are possible that could restrict invalid payloads from
RPC. However, it's not clear that complexity is warranted.

368
sync/optimistic.md Normal file
View File

@ -0,0 +1,368 @@
# Optimistic Sync
../specs/bellatrix/fork-choice.md#validate_merge_block
## Introduction
In order to provide a syncing execution engine with a partial view of the head
of the chain, it may be desirable for a consensus engine to import beacon
blocks without verifying the execution payloads. This partial sync is called an
*optimistic sync*.
Optimistic sync is designed to be opt-in and backwards compatible (i.e.,
non-optimistic nodes can tolerate optimistic nodes on the network and vice
versa). Optimistic sync is not a fundamental requirement for consensus nodes.
Rather, it's a stop-gap measure to allow execution nodes to sync via
established methods until future Ethereum roadmap items are implemented (e.g.,
statelessness).
## Constants
|Name|Value|Unit
|---|---|---|
|`SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY`| `128` | slots
*Note: the `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` must be user-configurable. See
[Fork Choice Poisoning](#fork-choice-poisoning).*
## Helpers
Let `head: BeaconBlock` be the result of calling of the fork choice
algorithm at the time of block production. Let `head_block_root: Root` be the
root of that block.
Let `blocks: Dict[Root, BeaconBlock]` and `block_states: Dict[Root,
BeaconState]` be the blocks (and accompanying states) that have been verified
either completely or optimistically.
Let `optimistic_roots: Set[Root]` be the set of `hash_tree_root(block)` for all
optimistically imported blocks which have only received a `SYNCING` designation
from an execution engine (i.e., they are not known to be `INVALID` or `VALID`).
Let `current_slot: Slot` be `(time - genesis_time) // SECONDS_PER_SLOT` where
`time` is the UNIX time according to the local system clock.
```python
@dataclass
class OptimisticStore(object):
optimistic_roots: Set[Root]
head_block_root: Root
blocks: Dict[Root, BeaconBlock]
block_states: Dict[Root, BeaconState]
```
```python
def is_optimistic(opt_store: OptimisticStore, block: BeaconBlock) -> bool:
return hash_tree_root(block) in opt_store.optimistic_roots
```
```python
def latest_verified_ancestor(opt_store: OptimisticStore, block: BeaconBlock) -> BeaconBlock:
# It is assumed that the `block` parameter is never an INVALID block.
while True:
if not is_optimistic(opt_store, block) or block.parent_root == Root():
return block
block = opt_store.blocks[block.parent_root]
```
```python
def is_execution_block(block: BeaconBlock) -> bool:
return block.body.execution_payload != ExecutionPayload()
```
```python
def is_optimistic_candidate_block(opt_store: OptimisticStore, current_slot: Slot, block: BeaconBlock) -> bool:
justified_root = opt_store.block_states[opt_store.head_block_root].current_justified_checkpoint.root
justifed_is_execution_block = is_execution_block(opt_store.blocks[justified_root])
block_is_deep = block.slot + SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY <= current_slot
return justifed_is_execution_block or block_is_deep
```
Let only a node which returns `is_optimistic(opt_store, head) is True` be an *optimistic
node*. Let only a validator on an optimistic node be an *optimistic validator*.
When this specification only defines behaviour for an optimistic
node/validator, but *not* for the non-optimistic case, assume default
behaviours without regard for optimistic sync.
## Mechanisms
### When to optimistically import blocks
A block MAY be optimistically imported when
`is_optimistic_candidate_block(opt_store, current_slot, block)` returns
`True`. This ensures that blocks are only optimistically imported if either:
1. The justified checkpoint has execution enabled.
1. The current slot (as per the system clock) is at least
`SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` ahead of the slot of the block being
imported.
*See [Fork Choice Poisoning](#fork-choice-poisoning) for the motivations behind
these conditions.*
### How to optimistically import blocks
To optimistically import a block:
- The [`execute_payload`](../specs/bellatrix/beacon-chain.md#execute_payload) function MUST return `True` if the execution
engine returns `SYNCING` or `VALID`. An `INVALID` response MUST return `False`.
- The [`validate_merge_block`](../specs/bellatrix/fork-choice.md#validate_merge_block)
function MUST NOT raise an assertion if both the
`pow_block` and `pow_parent` are unknown to the execution engine.
- All other assertions in [`validate_merge_block`](../specs/bellatrix/fork-choice.md#validate_merge_block)
(e.g., `TERMINAL_BLOCK_HASH`) MUST prevent an optimistic import.
- The parent of the block MUST NOT have an INVALID execution payload.
In addition to this change in validation, the consensus engine MUST track which
blocks returned `SYNCING` and which returned `VALID` for subsequent processing.
Optimistically imported blocks MUST pass all verifications included in
`process_block` (withstanding the modifications to `execute_payload`).
A consensus engine MUST be able to retrospectively (i.e., after import) modify
the status of `SYNCING` blocks to be either `VALID` or `INVALID` based upon responses
from an execution engine. I.e., perform the following transitions:
- `SYNCING` -> `VALID`
- `SYNCING` -> `INVALID`
When a block transitions from `SYNCING` -> `VALID`, all *ancestors* of the
block MUST also transition from `SYNCING` -> `VALID`. Such a block and any previously `SYNCING` ancestors are no longer
considered "optimistically imported".
When a block transitions from `SYNCING` -> `INVALID`, all *descendants* of the
block MUST also transition from `SYNCING` -> `INVALID`.
When a block transitions from the `SYNCING` state, it is removed from the set of
`opt_store.optimistic_roots`.
When a "merge block" (i.e. the first block which enables execution in a chain) is declared to be
`VALID` by an execution engine (either directly or indirectly), the full
[`validate_merge_block`](../specs/bellatrix/fork-choice.md#validate_merge_block)
MUST be run against the merge block. If the block
fails [`validate_merge_block`](../specs/bellatrix/fork-choice.md#validate_merge_block),
the merge block MUST be treated the same as
an `INVALID` block (i.e., it and all its descendants are invalidated and
removed from the block tree).
### Execution Engine Errors
When an execution engine returns an error or fails to respond to a payload
validity request for some block, a consensus engine:
- MUST NOT optimistically import the block.
- MUST NOT apply the block to the fork choice store.
- MAY queue the block for later processing.
### Assumptions about Execution Engine Behaviour
This specification assumes execution engines will only return `SYNCING` when
there is insufficient information available to make a `VALID` or `INVALID`
determination on the given `ExecutionPayload` (e.g., the parent payload is
unknown). Specifically, `SYNCING` responses should be fork-specific, in that
the search for a block on one chain MUST NOT trigger a `SYNCING` response for
another chain.
### Re-Orgs
The consensus engine MUST support any chain reorganisation which does *not*
affect the justified checkpoint.
If the justified checkpoint transitions from `SYNCING` -> `INVALID`, a
consensus engine MAY choose to alert the user and force the application to
exit.
## Fork Choice
Consensus engines MUST support removing blocks from fork choice that transition
from `SYNCING` to `INVALID`. Specifically, a block deemed `INVALID` at any
point MUST NOT be included in the canonical chain and the weights from those
`INVALID` blocks MUST NOT be applied to any `VALID` or `SYNCING` ancestors.
### Fork Choice Poisoning
During the merge transition it is possible for an attacker to craft a
`BeaconBlock` with an execution payload that references an
eternally-unavailable `body.execution_payload.parent_hash` (i.e., the parent
hash is random bytes). In rare circumstances, it is possible that an attacker
can build atop such a block to trigger justification. If an optimistic node
imports this malicious chain, that node will have a "poisoned" fork choice
store, such that the node is unable to produce a block that descends from the
head (due to the invalid chain of payloads) and the node is unable to produce a
block that forks around the head (due to the justification of the malicious
chain).
If an honest chain exists which justifies a higher epoch than the malicious
chain, that chain will take precedence and revive any poisoned store. Such a
chain, if imported before the malicious chain, will prevent the store from
being poisoned. Therefore, the poisoning attack is temporary if >= 2/3rds of
the network is honest and non-faulty.
The `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` parameter assumes that the network
will justify a honest chain within some number of slots. With this assumption,
it is acceptable to optimistically import transition blocks during the sync
process. Since there is an assumption that an honest chain with a higher
justified checkpoint exists, any fork choice poisoning will be short-lived and
resolved before that node is required to produce a block.
However, the assumption that the honest, canonical chain will always justify
within `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` slots is dubious. Therefore,
clients MUST provide the following command line flag to assist with manual
disaster recovery:
- `--safe-slots-to-import-optimistically`: modifies the
`SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY`.
## Checkpoint Sync (Weak Subjectivity Sync)
A consensus engine MAY assume that the `ExecutionPayload` of a block used as an
anchor for checkpoint sync is `VALID` without necessarily providing that
payload to an execution engine.
## Validator assignments
An optimistic node is *not* a full node. It is unable to produce blocks, since
an execution engine cannot produce a payload upon an unknown parent. It cannot
faithfully attest to the head block of the chain, since it has not fully
verified that block.
### Block Production
An optimistic validator MUST NOT produce a block (i.e., sign across the
`DOMAIN_BEACON_PROPOSER` domain).
### Attesting
An optimistic validator MUST NOT participate in attestation (i.e., sign across the
`DOMAIN_BEACON_ATTESTER`, `DOMAIN_SELECTION_PROOF` or
`DOMAIN_AGGREGATE_AND_PROOF` domains).
### Participating in Sync Committees
An optimistic validator MUST NOT participate in sync committees (i.e., sign across the
`DOMAIN_SYNC_COMMITTEE`, `DOMAIN_SYNC_COMMITTEE_SELECTION_PROOF` or
`DOMAIN_CONTRIBUTION_AND_PROOF` domains).
## Ethereum Beacon APIs
Consensus engines which provide an implementation of the [Ethereum Beacon
APIs](https://github.com/ethereum/beacon-APIs) must take care to avoid
presenting optimistic blocks as fully-verified blocks.
### Helpers
Let the following response types be defined as any response with the
corresponding HTTP status code:
- "Success" Response: Status Codes 200-299.
- "Not Found" Response: Status Code 404.
- "Syncing" Response: Status Code 503.
### Requests for Optimistic Blocks
When information about an optimistic block is requested, the consensus engine:
- MUST NOT respond with success.
- MAY respond with not found.
- MAY respond with syncing.
### Requests for an Optimistic Head
When `is_optimistic(opt_store, head) is True`, the consensus engine:
- MUST NOT return an optimistic `head`.
- MAY substitute the head block with `latest_verified_ancestor(block)`.
- MAY return syncing.
### Requests to Validators Endpoints
When `is_optimistic(opt_store, head) is True`, the consensus engine MUST return syncing to
all endpoints which match the following pattern:
- `eth/*/validator/*`
## Design Decision Rationale
### Why `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY`?
Nodes can only import an optimistic block if their justified checkpoint is
verified or the block is older than `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY`.
These restraints are applied in order to mitigate an attack where a block which
enables execution (a *transition block*) can reference a junk parent hash. This
makes it impossible for honest nodes to build atop that block. If an attacker
exploits a nuance in fork choice `filter_block_tree`, they can, in some rare
cases, produce a junk block that out-competes all locally produced blocks for
the head. This prevents a node from producing a chain of blocks, therefore
breaking liveness.
Thankfully, if 2/3rds of validators are not poisoned, they can justify an
honest chain which will un-poison all other nodes.
Notably, this attack only exists for optimistic nodes. Nodes which fully verify
the transition block will reject a block with a junk parent hash. Therefore,
liveness is unaffected if a vast majority of nodes have fully synced execution
and consensus clients before and during the transition.
Given all of this, we can say two things:
1. **BNs which are following the head during the transition shouldn't
optimistically import the transition block.** If 1/3rd of validators
optimistically import the poison block, there will be no remaining nodes to
justify an honest chain.
2. **BNs which are syncing can optimistically import transition blocks.** In
this case a justified chain already exists blocks. The poison block would be
quickly reverted and would have no affect on liveness.
Astute readers will notice that (2) contains a glaring assumption about network
liveness. This is necessary because a node cannot feasibly ascertain that the
transition block is justified without importing that block and risking
poisoning. Therefore, we use `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` to say
something along the lines of: *"if the transition block is sufficiently old
enough, then we can just assume that block is honest or there exists an honest
justified chain to out-compete it."*
Note the use of "feasibly" in the previous paragraph. One can imagine
mechanisms to check that a block is justified before importing it. For example,
just keep processing blocks without adding them to fork choice. However, there
are still edge-cases here (e.g., when to halt and declare there was no
justification?) and how to mitigate implementation complexity. At this point,
it's important to reflect on the attack and how likely it is to happen. It
requires some rather contrived circumstances and it seems very unlikely to
occur. Therefore, we need to consider if adding complexity to avoid an
unlikely attack increases or decreases our total risk. Presently, it appears
that `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` sits in a sweet spot for this
trade-off.
### Transitioning from VALID -> INVALID or INVALID -> VALID
These operations are purposefully omitted. It is outside of the scope of the
specification since it's only possible with a faulty EE.
Such a scenario requires manual intervention.
## What about Light Clients?
An alternative to optimistic sync is to run a light client inside/alongside
beacon nodes that mitigates the need for optimistic sync by providing
tip-of-chain blocks to the execution engine. However, light clients comes with
their own set of complexities. Relying on light clients may also restrict nodes
from syncing from genesis, if they so desire.
A notable thing about optimistic sync is that it's *optional*. Should an
implementation decide to go the light-client route, then they can just ignore
optimistic sync all together.
## What if `TERMINAL_BLOCK_HASH` is used?
If the terminal block hash override is used (i.e., `TERMINAL_BLOCK_HASH !=
Hash32()`), the [`validate_merge_block`](../specs/bellatrix/fork-choice.md#validate_merge_block)
function will deterministically
return `True` or `False`. Whilst it's not *technically* required
retrospectively call [`validate_merge_block`](../specs/bellatrix/fork-choice.md#validate_merge_block)
on a transition block that
matches `TERMINAL_BLOCK_HASH` after an optimistic sync, doing so will have no
effect. For simplicity, the optimistic sync specification does not define
edge-case behaviour for when `TERMINAL_BLOCK_HASH` is used.