work in progress DAS network + validator spec

2025-01-22 00:20:28 +00:00 · 2021-01-01 16:51:24 +01:00 · 2021-01-01 16:51:24 +01:00 · 6f0b613f08
commit 6f0b613f08
parent be91e59823
4 changed files with 272 additions and 1 deletions
--- a/specs/phase1/beacon-chain.md
+++ b/specs/phase1/beacon-chain.md
@ -68,6 +68,7 @@ We define the following Python custom types for type hinting and readability:
 | - | - | - |
 | `Shard` | `uint64` | A shard number |
 | `BLSCommitment` | `bytes48` | A G1 curve point |
+| `BLSKateProof` | `bytes48` | A G1 curve point |

 ## Configuration

@ -187,7 +188,7 @@ class ShardHeader(Container):
    # The actual data commitment
    commitment: DataCommitment
    # Proof that the degree < commitment.length
-    degree_proof: BLSCommitment
+    degree_proof: BLSKateProof
 ```

 ### `SignedShardHeader`
--- a/specs/phase1/data-availability-sampling.md
+++ b/specs/phase1/data-availability-sampling.md
@ -0,0 +1,65 @@
+# Ethereum 2.0 Phase 1 -- Data Availability Sampling
+
+**Notice**: This document is a work-in-progress for researchers and implementers.
+
+## Table of contents
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+## Custom types
+
+We define the following Python custom types for type hinting and readability:
+
+| Name | SSZ equivalent | Description |
+| - | - | - |
+| `SampleIndex` | `uint64` | A sample index, corresponding to chunk of extended data |
+| `BLSPoint` | `uint256` | A number `x` in the range `0 <= x < MODULUS` |
+
+
+## New containers
+
+### DASSample
+
+```python
+class DASSample(Container):
+    slot: Slot
+    shard: Shard
+    index: SampleIndex
+    proof: BLSKateProof
+    data: Vector[BLSPoint, POINTS_PER_SAMPLE]
+```
+
+## Helper functions
+
+```python
+def recover_data(data: Sequence[Optional[Point]]) -> Sequence[Point]:
+    ...
+```
+
+## DAS functions
+
+```python
+def extend_data(data: Sequence[Point]) -> Sequence[Point]:
+    ...
+```
+
+```python
+def unextend_data(extended_data: Sequence[Point]) -> Sequence[Point]:
+    ...
+```
+
+```python
+def sample_data(extended_data: Sequence[Point]) -> Sequence[DASSample]:
+    ...
+```
+
+```python
+def verify_sample(sample: DASSample):
+    ...
+```
+
+```python
+def reconstruct_extended_data(samples: Sequence[DASSample]) -> Sequence[Point]:
+    ...
+```
--- a/specs/phase1/p2p-das.md
+++ b/specs/phase1/p2p-das.md
@ -0,0 +1,162 @@
+# Ethereum 2.0 Phase 1 -- Network specification for Data Availability Sampling 
+
+**Notice**: This document is a work-in-progress for researchers and implementers.
+
+## Table of contents
+
+<!-- TOC -->
+
+<!-- /TOC -->
+
+## Introduction
+
+For an introduction about DAS itself, see [the DAS section in the Phase 1 validator spec](./validator.md#data-availability-sampling).
+This is not a pre-requisite for the network layer, but will give you valuable context. 
+
+For sampling, all nodes need to query for `k` random samples each slot.
+
+*__TODO__: describe big picture of sampling workload size*
+
+This is a lot of work, and ideally happens at a low latency.
+
+To achieve quick querying, the query model is changed to *push* the samples to listeners instead, using GossipSub.
+The listeners then randomly rotate their subscriptions to keep queries unpredictable.
+Except for a small subset of subscriptions, which will function as a backbone to keep topics more stable.
+
+Publishing can utilize the fan-out functionality in GossipSub, and is easier to split between nodes:
+nodes on the horizontal networks can help by producing the same samples and fan-out publishing to their own peers.
+
+This push model also helps to obfuscate the original source of a message:
+the listeners will not have to make individual queries to some identified source.
+
+The push model does not aim to serve "historical" queries (anything older than the most recent).
+Historical queries are still required for the unhappy case, where messages are not pushed quick enough,
+and missing samples are not reconstructed by other nodes on the horizontal subnet quick enough.
+
+The main challenge in supporting historical queries is to target the right nodes, 
+without concentrating too many requests on a single node, or breaking the network/consensus identity separation.
+
+## DAS Subnets
+
+On a high level, the push-model roles are divided into:
+- Sources: create blobs of shard block data, and transformed into many tiny samples. 
+- Sinks: continuously look for samples
+
+At full operation, the network has one proposer, per shard, per slot.
+
+In the push-model, there are:
+- *Vertical subnets*: Sinks can subscribe to indices of samples: there is a sample to subnet mapping.
+- *Horizontal subnets*: Sources need to distribute samples to all vertical networks: they participate in a fanout layer.
+
+### Horizontal subnets
+
+The shift of the distribution responsibility to a proposer can only be achieved with amplification:
+a regular proposer cannot reach every vertical subnet.
+
+#### Publishing
+
+To publish their work, proposers already put the shard block as a whole on a shard-block subnet.
+
+The proposer can fan-out their work more aggressively, by using the fan-out functionality of GossipSub:
+it may publish to all its peers on the subnet, instead of just those in its mesh.
+
+#### Horizontal propagation
+
+Peers on the horizontal subnet are expected to at least perform regular propagation of shard blocks, like how do would participate in any other topic.
+
+*Although this may be sufficient for testnets, expect parameter changes in the spec here.*
+
+#### Horizontal to vertical
+
+Nodes on this same subnet can replicate the sampling efficiently (including a proof for each sample),
+and distribute it to any vertical networks that are available to them.
+
+Since the messages are content-addressed (instead of origin-stamped), 
+multiple publishers of the same samples on a vertical subnet do not hurt performance, 
+but actually improve it by shortcutting regular propagation on the vertical subnet, and thus lowering the latency to a sample.
+
+
+### Vertical subnets
+
+Vertical subnets propagate the samples to every peer that is interested.
+These interests are randomly sampled and rotate quickly: although not perfect, 
+sufficient to avoid any significant amount of nodes from being 100% predictable.
+
+As soon as a sample is missing after the expected propagation time window,
+nodes can divert to the pull-model, or ultimately flag it as unavailable data.
+
+#### Slow rotation: Backbone
+
+To allow for subscriptions to rotate quickly and randomly, a backbone is formed to help onboard peers into other topics.
+
+This backbone is based on a pure function of the *node* identity and time:
+- Nodes can be found *without additional discovery overhead*:
+  peers on a vertical topic can be found by searching the local peerstore for identities that hash to the desired topic(s). 
+- Nodes can be held accountable for contributing to the backbone:
+  peers that particpate in DAS but are not active on the appropriate backbone topics can be scored down.
+
+A node should anticipate backbone topics to subscribe to based their own identity.
+These subscriptions rotate slowly, and with different offsets per node identity to avoid sudden network-wide rotations.
+
+```python
+# TODO hash function: (node, time)->subnets
+```
+
+Backbone subscription work is outlined in the [DAS validator spec](./validator.md#data-availability-sampling)
+
+#### Quick Rotation: Sampling
+
+A node MUST maintain `k` random subscriptions to topics, and rotate these according to the [DAS validator spec](./validator.md#data-availability-sampling).
+If the node does not already have connected peers on the topic it needs to sample, it can search its peerstore for peers in the topic backbone.
+
+## DAS in the Gossip domain: Push
+
+### Topics and messages
+
+#### Horizontal subnets
+
+
+
+#### Vertical subnets
+
+
+
+## DAS in the Req-Resp domain: Pull
+
+To pull samples from nodes, in case of network instability when samples are unavailable, a new query method is added to the Req-Resp domain.
+
+This builds on top of the protocol identification and encoding spec which was introduced in [the Phase0 network spec](../phase0/p2p-interface.md). 
+
+Note that the Phase1 DAS networking uses a different protocol prefix: `/eth2/das/req`
+
+The result codes are extended with:
+-  3: **ResourceUnavailable** -- when the request was valid but cannot be served at this point in time.
+
+TODO: unify with phase0? Lighthoue already defined this in their response codes enum.
+
+### Messages
+
+#### DASQuery
+
+**Protocol ID:** `/eth2/das/req/query/1/`
+
+Request Content:
+```
+(
+  sample_index: SampleIndex
+)
+```
+
+Response Content:
+```
+(
+  DASSample
+)
+```
+
+When the sample is:
+- Available: respond with a `Success` result code, and the encoded sample.
+- Expected to be available, but not: respond with a `ResourceUnavailable` result code.
+- Not available, but never of interest to the node: respond with an `InvalidRequest` result code.
+
+When the node is part of the backbone and expected to have the sample, the validity of the quest MUST be recognized with `Success` or `ResourceUnavailable`.
--- a/specs/phase1/validator.md
+++ b/specs/phase1/validator.md
@ -536,6 +536,49 @@ class SignedLightAggregateAndProof(Container):
    signature: BLSSignature
 ```

+## Data Availability Sampling
+
+### Gossip subscriptions to maintain
+
+#### Slow rotation: Backbone
+
+TODO
+
+#### Quick rotation: Sampling
+
+TODO
+
+
+### DAS during network instability
+
+The GossipSub based retrieval of samples may not always work
+
+#### Waiting on missing samples
+
+Wait for the sample to re-broadcast. Someone may be slow with publishing, or someone else is able to do the work.
+
+Any node can do the following work to keep the network healthy:
+- Common: Listen on a horizontal subnet, chunkify the block data in samples, and propagate the samples to vertical subnets.
+- Extreme: Listen on enough vertical subnets, reconstruct the missing samples by recovery, and propagate the recovered samples.
+
+This is not a requirement, but should improve the network stability with little resources, and without any central party.
+
+#### Pulling missing samples
+
+The more realistic option, to execute when a sample is missing, is to query any node that is known to hold it.
+Since *consensus identity is disconnected from network identity*, there is no direct way to contact custody holders
+without explicitly asking for the data.
+
+However, *network identities* are still used to build a backbone for each vertical subnet.
+These nodes should have received the samples, and can serve a buffer of them on demand.
+Although serving these is not directly incentivised, it is little work:
+1. Buffer any message you see on the backbone vertical subnets, for a buffer of up to two weeks.
+2. Serve the samples on request. An individual sample is just expected to be `~ 0.5 KB`, and does not require any pre-processing to serve.
+
+Pulling samples directly from nodes with a custody responsibility, without revealing their identity to the network, is an open problem.
+
+
+
 ## How to avoid slashing

 Proposer and Attester slashings described in Phase 0 remain in place with the