diff --git a/setup.py b/setup.py index f11951415..ac60eb3b7 100644 --- a/setup.py +++ b/setup.py @@ -868,6 +868,7 @@ class PySpecCommand(Command): specs/bellatrix/fork.md specs/bellatrix/fork-choice.md specs/bellatrix/validator.md + sync/optimistic.md """ if len(self.md_doc_paths) == 0: raise Exception('no markdown files specified, and spec fork "%s" is unknown', self.spec_fork) diff --git a/specs/bellatrix/p2p-interface.md b/specs/bellatrix/p2p-interface.md index aefbb54c7..60a9be774 100644 --- a/specs/bellatrix/p2p-interface.md +++ b/specs/bellatrix/p2p-interface.md @@ -29,6 +29,7 @@ Readers should understand the Phase 0 and Altair documents and use them as a bas - [Why was the max gossip message size increased at Bellatrix?](#why-was-the-max-gossip-message-size-increased-at-bellatrix) - [Req/Resp](#reqresp) - [Why was the max chunk response size increased at Bellatrix?](#why-was-the-max-chunk-response-size-increased-at-bellatrix) + - [Why allow invalid payloads on the P2P network?](#why-allow-invalid-payloads-on-the-p2p-network) @@ -85,13 +86,26 @@ The *type* of the payload of this topic changes to the (modified) `SignedBeaconB Specifically, this type changes with the addition of `execution_payload` to the inner `BeaconBlockBody`. See Bellatrix [state transition document](./beacon-chain.md#beaconblockbody) for further details. +Blocks with execution enabled will be permitted to propagate regardless of the +validity of the execution payload. This prevents network segregation between +[optimistic](/sync/optimistic.md) and non-optimistic nodes. + In addition to the gossip validations for this topic from prior specifications, the following validations MUST pass before forwarding the `signed_beacon_block` on the network. Alias `block = signed_beacon_block.message`, `execution_payload = block.body.execution_payload`. - If the execution is enabled for the block -- i.e. `is_execution_enabled(state, block.body)` then validate the following: - - _[REJECT]_ The block's execution payload timestamp is correct with respect to the slot - -- i.e. `execution_payload.timestamp == compute_timestamp_at_slot(state, block.slot)`. + - _[REJECT]_ The block's execution payload timestamp is correct with respect to the slot + -- i.e. `execution_payload.timestamp == compute_timestamp_at_slot(state, block.slot)`. + - If `exection_payload` verification of block's parent by an execution node is *not* complete: + - [REJECT] The block's parent (defined by `block.parent_root`) passes all + validation (excluding execution node verification of the `block.body.execution_payload`). + - otherwise: + - [IGNORE] The block's parent (defined by `block.parent_root`) passes all + validation (including execution node verification of the `block.body.execution_payload`). + +The following gossip validation from prior specifications MUST NOT be applied if the execution is enabled for the block -- i.e. `is_execution_enabled(state, block.body)`: + - [REJECT] The block's parent (defined by `block.parent_root`) passes validation. ### Transitioning the gossip @@ -100,6 +114,14 @@ details on how to handle transitioning gossip topics for Bellatrix. ## The Req/Resp domain +Non-faulty, [optimistic](/sync/optimistic.md) nodes may send blocks which +result in an INVALID response from an execution engine. To prevent network +segregation between optimistic and non-optimistic nodes, transmission of an +INVALID execution payload via the Req/Resp domain SHOULD NOT cause a node to be +down-scored or disconnected. Transmission of a block which is invalid due to +any consensus layer rules (i.e., *not* execution layer rules) MAY result in +down-scoring or disconnection. + ### Messages #### BeaconBlocksByRange v2 @@ -181,3 +203,26 @@ valid block sizes in the range of gas limits expected in the medium term. As with both gossip and req/rsp maximum values, type-specific limits should always by simultaneously respected. + +### Why allow invalid payloads on the P2P network? + +The specification allows blocks with invalid execution payloads to propagate across +gossip and via RPC calls. The reasoning for this is as follows: + +1. Optimistic nodes must listen to block gossip to obtain a view of the head of + the chain. +2. Therefore, optimistic nodes must propagate gossip blocks. Otherwise, they'd + be censoring. +3. If optimistic nodes will propagate blocks via gossip, then they must respond + to requests for the parent via RPC. +4. Therefore, optimistic nodes must send optimistic blocks via RPC. + +So, to prevent network segregation from optimistic nodes inadvertently sending +invalid execution payloads, nodes should never downscore/disconnect nodes due to such invalid +payloads. This does open the network to some DoS attacks from invalid execution +payloads, but the scope of actors is limited to validators who can put those +payloads in valid (and slashable) beacon blocks. Therefore, it is argued that +the DoS risk introduced in tolerable. + +More complicated schemes are possible that could restrict invalid payloads from +RPC. However, it's not clear that complexity is warranted. diff --git a/sync/optimistic.md b/sync/optimistic.md new file mode 100644 index 000000000..80e9de3e9 --- /dev/null +++ b/sync/optimistic.md @@ -0,0 +1,368 @@ +# Optimistic Sync + + + +../specs/bellatrix/fork-choice.md#validate_merge_block +## Introduction + +In order to provide a syncing execution engine with a partial view of the head +of the chain, it may be desirable for a consensus engine to import beacon +blocks without verifying the execution payloads. This partial sync is called an +*optimistic sync*. + +Optimistic sync is designed to be opt-in and backwards compatible (i.e., +non-optimistic nodes can tolerate optimistic nodes on the network and vice +versa). Optimistic sync is not a fundamental requirement for consensus nodes. +Rather, it's a stop-gap measure to allow execution nodes to sync via +established methods until future Ethereum roadmap items are implemented (e.g., +statelessness). + +## Constants + +|Name|Value|Unit +|---|---|---| +|`SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY`| `128` | slots + +*Note: the `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` must be user-configurable. See +[Fork Choice Poisoning](#fork-choice-poisoning).* + +## Helpers + +Let `head: BeaconBlock` be the result of calling of the fork choice +algorithm at the time of block production. Let `head_block_root: Root` be the +root of that block. + +Let `blocks: Dict[Root, BeaconBlock]` and `block_states: Dict[Root, +BeaconState]` be the blocks (and accompanying states) that have been verified +either completely or optimistically. + +Let `optimistic_roots: Set[Root]` be the set of `hash_tree_root(block)` for all +optimistically imported blocks which have only received a `SYNCING` designation +from an execution engine (i.e., they are not known to be `INVALID` or `VALID`). + +Let `current_slot: Slot` be `(time - genesis_time) // SECONDS_PER_SLOT` where +`time` is the UNIX time according to the local system clock. + +```python +@dataclass +class OptimisticStore(object): + optimistic_roots: Set[Root] + head_block_root: Root + blocks: Dict[Root, BeaconBlock] + block_states: Dict[Root, BeaconState] +``` + +```python +def is_optimistic(opt_store: OptimisticStore, block: BeaconBlock) -> bool: + return hash_tree_root(block) in opt_store.optimistic_roots +``` + +```python +def latest_verified_ancestor(opt_store: OptimisticStore, block: BeaconBlock) -> BeaconBlock: + # It is assumed that the `block` parameter is never an INVALID block. + while True: + if not is_optimistic(opt_store, block) or block.parent_root == Root(): + return block + block = opt_store.blocks[block.parent_root] +``` + +```python +def is_execution_block(block: BeaconBlock) -> bool: + return block.body.execution_payload != ExecutionPayload() +``` + +```python +def is_optimistic_candidate_block(opt_store: OptimisticStore, current_slot: Slot, block: BeaconBlock) -> bool: + justified_root = opt_store.block_states[opt_store.head_block_root].current_justified_checkpoint.root + justifed_is_execution_block = is_execution_block(opt_store.blocks[justified_root]) + block_is_deep = block.slot + SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY <= current_slot + return justifed_is_execution_block or block_is_deep +``` + +Let only a node which returns `is_optimistic(opt_store, head) is True` be an *optimistic +node*. Let only a validator on an optimistic node be an *optimistic validator*. + +When this specification only defines behaviour for an optimistic +node/validator, but *not* for the non-optimistic case, assume default +behaviours without regard for optimistic sync. + +## Mechanisms + +### When to optimistically import blocks + +A block MAY be optimistically imported when +`is_optimistic_candidate_block(opt_store, current_slot, block)` returns +`True`. This ensures that blocks are only optimistically imported if either: + +1. The justified checkpoint has execution enabled. +1. The current slot (as per the system clock) is at least + `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` ahead of the slot of the block being + imported. + +*See [Fork Choice Poisoning](#fork-choice-poisoning) for the motivations behind +these conditions.* + +### How to optimistically import blocks + +To optimistically import a block: + +- The [`execute_payload`](../specs/bellatrix/beacon-chain.md#execute_payload) function MUST return `True` if the execution + engine returns `SYNCING` or `VALID`. An `INVALID` response MUST return `False`. +- The [`validate_merge_block`](../specs/bellatrix/fork-choice.md#validate_merge_block) + function MUST NOT raise an assertion if both the +`pow_block` and `pow_parent` are unknown to the execution engine. + - All other assertions in [`validate_merge_block`](../specs/bellatrix/fork-choice.md#validate_merge_block) + (e.g., `TERMINAL_BLOCK_HASH`) MUST prevent an optimistic import. +- The parent of the block MUST NOT have an INVALID execution payload. + +In addition to this change in validation, the consensus engine MUST track which +blocks returned `SYNCING` and which returned `VALID` for subsequent processing. + +Optimistically imported blocks MUST pass all verifications included in +`process_block` (withstanding the modifications to `execute_payload`). + +A consensus engine MUST be able to retrospectively (i.e., after import) modify +the status of `SYNCING` blocks to be either `VALID` or `INVALID` based upon responses +from an execution engine. I.e., perform the following transitions: + +- `SYNCING` -> `VALID` +- `SYNCING` -> `INVALID` + +When a block transitions from `SYNCING` -> `VALID`, all *ancestors* of the +block MUST also transition from `SYNCING` -> `VALID`. Such a block and any previously `SYNCING` ancestors are no longer +considered "optimistically imported". + +When a block transitions from `SYNCING` -> `INVALID`, all *descendants* of the +block MUST also transition from `SYNCING` -> `INVALID`. + +When a block transitions from the `SYNCING` state, it is removed from the set of +`opt_store.optimistic_roots`. + +When a "merge block" (i.e. the first block which enables execution in a chain) is declared to be +`VALID` by an execution engine (either directly or indirectly), the full +[`validate_merge_block`](../specs/bellatrix/fork-choice.md#validate_merge_block) +MUST be run against the merge block. If the block +fails [`validate_merge_block`](../specs/bellatrix/fork-choice.md#validate_merge_block), +the merge block MUST be treated the same as +an `INVALID` block (i.e., it and all its descendants are invalidated and +removed from the block tree). + +### Execution Engine Errors + +When an execution engine returns an error or fails to respond to a payload +validity request for some block, a consensus engine: + +- MUST NOT optimistically import the block. +- MUST NOT apply the block to the fork choice store. +- MAY queue the block for later processing. + +### Assumptions about Execution Engine Behaviour + +This specification assumes execution engines will only return `SYNCING` when +there is insufficient information available to make a `VALID` or `INVALID` +determination on the given `ExecutionPayload` (e.g., the parent payload is +unknown). Specifically, `SYNCING` responses should be fork-specific, in that +the search for a block on one chain MUST NOT trigger a `SYNCING` response for +another chain. + +### Re-Orgs + +The consensus engine MUST support any chain reorganisation which does *not* +affect the justified checkpoint. + +If the justified checkpoint transitions from `SYNCING` -> `INVALID`, a +consensus engine MAY choose to alert the user and force the application to +exit. + +## Fork Choice + +Consensus engines MUST support removing blocks from fork choice that transition +from `SYNCING` to `INVALID`. Specifically, a block deemed `INVALID` at any +point MUST NOT be included in the canonical chain and the weights from those +`INVALID` blocks MUST NOT be applied to any `VALID` or `SYNCING` ancestors. + +### Fork Choice Poisoning + +During the merge transition it is possible for an attacker to craft a +`BeaconBlock` with an execution payload that references an +eternally-unavailable `body.execution_payload.parent_hash` (i.e., the parent +hash is random bytes). In rare circumstances, it is possible that an attacker +can build atop such a block to trigger justification. If an optimistic node +imports this malicious chain, that node will have a "poisoned" fork choice +store, such that the node is unable to produce a block that descends from the +head (due to the invalid chain of payloads) and the node is unable to produce a +block that forks around the head (due to the justification of the malicious +chain). + +If an honest chain exists which justifies a higher epoch than the malicious +chain, that chain will take precedence and revive any poisoned store. Such a +chain, if imported before the malicious chain, will prevent the store from +being poisoned. Therefore, the poisoning attack is temporary if >= 2/3rds of +the network is honest and non-faulty. + +The `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` parameter assumes that the network +will justify a honest chain within some number of slots. With this assumption, +it is acceptable to optimistically import transition blocks during the sync +process. Since there is an assumption that an honest chain with a higher +justified checkpoint exists, any fork choice poisoning will be short-lived and +resolved before that node is required to produce a block. + +However, the assumption that the honest, canonical chain will always justify +within `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` slots is dubious. Therefore, +clients MUST provide the following command line flag to assist with manual +disaster recovery: + +- `--safe-slots-to-import-optimistically`: modifies the + `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY`. + +## Checkpoint Sync (Weak Subjectivity Sync) + +A consensus engine MAY assume that the `ExecutionPayload` of a block used as an +anchor for checkpoint sync is `VALID` without necessarily providing that +payload to an execution engine. + +## Validator assignments + +An optimistic node is *not* a full node. It is unable to produce blocks, since +an execution engine cannot produce a payload upon an unknown parent. It cannot +faithfully attest to the head block of the chain, since it has not fully +verified that block. + +### Block Production + +An optimistic validator MUST NOT produce a block (i.e., sign across the +`DOMAIN_BEACON_PROPOSER` domain). + +### Attesting + +An optimistic validator MUST NOT participate in attestation (i.e., sign across the +`DOMAIN_BEACON_ATTESTER`, `DOMAIN_SELECTION_PROOF` or +`DOMAIN_AGGREGATE_AND_PROOF` domains). + +### Participating in Sync Committees + +An optimistic validator MUST NOT participate in sync committees (i.e., sign across the +`DOMAIN_SYNC_COMMITTEE`, `DOMAIN_SYNC_COMMITTEE_SELECTION_PROOF` or +`DOMAIN_CONTRIBUTION_AND_PROOF` domains). + +## Ethereum Beacon APIs + +Consensus engines which provide an implementation of the [Ethereum Beacon +APIs](https://github.com/ethereum/beacon-APIs) must take care to avoid +presenting optimistic blocks as fully-verified blocks. + +### Helpers + +Let the following response types be defined as any response with the +corresponding HTTP status code: + +- "Success" Response: Status Codes 200-299. +- "Not Found" Response: Status Code 404. +- "Syncing" Response: Status Code 503. + +### Requests for Optimistic Blocks + +When information about an optimistic block is requested, the consensus engine: + +- MUST NOT respond with success. +- MAY respond with not found. +- MAY respond with syncing. + +### Requests for an Optimistic Head + +When `is_optimistic(opt_store, head) is True`, the consensus engine: + +- MUST NOT return an optimistic `head`. +- MAY substitute the head block with `latest_verified_ancestor(block)`. +- MAY return syncing. + +### Requests to Validators Endpoints + +When `is_optimistic(opt_store, head) is True`, the consensus engine MUST return syncing to +all endpoints which match the following pattern: + +- `eth/*/validator/*` + +## Design Decision Rationale + +### Why `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY`? + +Nodes can only import an optimistic block if their justified checkpoint is +verified or the block is older than `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY`. + +These restraints are applied in order to mitigate an attack where a block which +enables execution (a *transition block*) can reference a junk parent hash. This +makes it impossible for honest nodes to build atop that block. If an attacker +exploits a nuance in fork choice `filter_block_tree`, they can, in some rare +cases, produce a junk block that out-competes all locally produced blocks for +the head. This prevents a node from producing a chain of blocks, therefore +breaking liveness. + +Thankfully, if 2/3rds of validators are not poisoned, they can justify an +honest chain which will un-poison all other nodes. + +Notably, this attack only exists for optimistic nodes. Nodes which fully verify +the transition block will reject a block with a junk parent hash. Therefore, +liveness is unaffected if a vast majority of nodes have fully synced execution +and consensus clients before and during the transition. + +Given all of this, we can say two things: + +1. **BNs which are following the head during the transition shouldn't + optimistically import the transition block.** If 1/3rd of validators + optimistically import the poison block, there will be no remaining nodes to + justify an honest chain. +2. **BNs which are syncing can optimistically import transition blocks.** In + this case a justified chain already exists blocks. The poison block would be + quickly reverted and would have no affect on liveness. + +Astute readers will notice that (2) contains a glaring assumption about network +liveness. This is necessary because a node cannot feasibly ascertain that the +transition block is justified without importing that block and risking +poisoning. Therefore, we use `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` to say +something along the lines of: *"if the transition block is sufficiently old +enough, then we can just assume that block is honest or there exists an honest +justified chain to out-compete it."* + +Note the use of "feasibly" in the previous paragraph. One can imagine +mechanisms to check that a block is justified before importing it. For example, +just keep processing blocks without adding them to fork choice. However, there +are still edge-cases here (e.g., when to halt and declare there was no +justification?) and how to mitigate implementation complexity. At this point, +it's important to reflect on the attack and how likely it is to happen. It +requires some rather contrived circumstances and it seems very unlikely to +occur. Therefore, we need to consider if adding complexity to avoid an +unlikely attack increases or decreases our total risk. Presently, it appears +that `SAFE_SLOTS_TO_IMPORT_OPTIMISTICALLY` sits in a sweet spot for this +trade-off. + +### Transitioning from VALID -> INVALID or INVALID -> VALID + +These operations are purposefully omitted. It is outside of the scope of the +specification since it's only possible with a faulty EE. + +Such a scenario requires manual intervention. + +## What about Light Clients? + +An alternative to optimistic sync is to run a light client inside/alongside +beacon nodes that mitigates the need for optimistic sync by providing +tip-of-chain blocks to the execution engine. However, light clients comes with +their own set of complexities. Relying on light clients may also restrict nodes +from syncing from genesis, if they so desire. + +A notable thing about optimistic sync is that it's *optional*. Should an +implementation decide to go the light-client route, then they can just ignore +optimistic sync all together. + +## What if `TERMINAL_BLOCK_HASH` is used? + +If the terminal block hash override is used (i.e., `TERMINAL_BLOCK_HASH != +Hash32()`), the [`validate_merge_block`](../specs/bellatrix/fork-choice.md#validate_merge_block) +function will deterministically +return `True` or `False`. Whilst it's not *technically* required +retrospectively call [`validate_merge_block`](../specs/bellatrix/fork-choice.md#validate_merge_block) +on a transition block that +matches `TERMINAL_BLOCK_HASH` after an optimistic sync, doing so will have no +effect. For simplicity, the optimistic sync specification does not define +edge-case behaviour for when `TERMINAL_BLOCK_HASH` is used.