work in progress DAS network + validator spec

This commit is contained in:
protolambda 2021-01-01 16:51:24 +01:00
parent be91e59823
commit 6f0b613f08
No known key found for this signature in database
GPG Key ID: EC89FDBB2B4C7623
4 changed files with 272 additions and 1 deletions

View File

@ -68,6 +68,7 @@ We define the following Python custom types for type hinting and readability:
| - | - | - |
| `Shard` | `uint64` | A shard number |
| `BLSCommitment` | `bytes48` | A G1 curve point |
| `BLSKateProof` | `bytes48` | A G1 curve point |
## Configuration
@ -187,7 +188,7 @@ class ShardHeader(Container):
# The actual data commitment
commitment: DataCommitment
# Proof that the degree < commitment.length
degree_proof: BLSCommitment
degree_proof: BLSKateProof
```
### `SignedShardHeader`

View File

@ -0,0 +1,65 @@
# Ethereum 2.0 Phase 1 -- Data Availability Sampling
**Notice**: This document is a work-in-progress for researchers and implementers.
## Table of contents
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
## Custom types
We define the following Python custom types for type hinting and readability:
| Name | SSZ equivalent | Description |
| - | - | - |
| `SampleIndex` | `uint64` | A sample index, corresponding to chunk of extended data |
| `BLSPoint` | `uint256` | A number `x` in the range `0 <= x < MODULUS` |
## New containers
### DASSample
```python
class DASSample(Container):
slot: Slot
shard: Shard
index: SampleIndex
proof: BLSKateProof
data: Vector[BLSPoint, POINTS_PER_SAMPLE]
```
## Helper functions
```python
def recover_data(data: Sequence[Optional[Point]]) -> Sequence[Point]:
...
```
## DAS functions
```python
def extend_data(data: Sequence[Point]) -> Sequence[Point]:
...
```
```python
def unextend_data(extended_data: Sequence[Point]) -> Sequence[Point]:
...
```
```python
def sample_data(extended_data: Sequence[Point]) -> Sequence[DASSample]:
...
```
```python
def verify_sample(sample: DASSample):
...
```
```python
def reconstruct_extended_data(samples: Sequence[DASSample]) -> Sequence[Point]:
...
```

162
specs/phase1/p2p-das.md Normal file
View File

@ -0,0 +1,162 @@
# Ethereum 2.0 Phase 1 -- Network specification for Data Availability Sampling
**Notice**: This document is a work-in-progress for researchers and implementers.
## Table of contents
<!-- TOC -->
<!-- /TOC -->
## Introduction
For an introduction about DAS itself, see [the DAS section in the Phase 1 validator spec](./validator.md#data-availability-sampling).
This is not a pre-requisite for the network layer, but will give you valuable context.
For sampling, all nodes need to query for `k` random samples each slot.
*__TODO__: describe big picture of sampling workload size*
This is a lot of work, and ideally happens at a low latency.
To achieve quick querying, the query model is changed to *push* the samples to listeners instead, using GossipSub.
The listeners then randomly rotate their subscriptions to keep queries unpredictable.
Except for a small subset of subscriptions, which will function as a backbone to keep topics more stable.
Publishing can utilize the fan-out functionality in GossipSub, and is easier to split between nodes:
nodes on the horizontal networks can help by producing the same samples and fan-out publishing to their own peers.
This push model also helps to obfuscate the original source of a message:
the listeners will not have to make individual queries to some identified source.
The push model does not aim to serve "historical" queries (anything older than the most recent).
Historical queries are still required for the unhappy case, where messages are not pushed quick enough,
and missing samples are not reconstructed by other nodes on the horizontal subnet quick enough.
The main challenge in supporting historical queries is to target the right nodes,
without concentrating too many requests on a single node, or breaking the network/consensus identity separation.
## DAS Subnets
On a high level, the push-model roles are divided into:
- Sources: create blobs of shard block data, and transformed into many tiny samples.
- Sinks: continuously look for samples
At full operation, the network has one proposer, per shard, per slot.
In the push-model, there are:
- *Vertical subnets*: Sinks can subscribe to indices of samples: there is a sample to subnet mapping.
- *Horizontal subnets*: Sources need to distribute samples to all vertical networks: they participate in a fanout layer.
### Horizontal subnets
The shift of the distribution responsibility to a proposer can only be achieved with amplification:
a regular proposer cannot reach every vertical subnet.
#### Publishing
To publish their work, proposers already put the shard block as a whole on a shard-block subnet.
The proposer can fan-out their work more aggressively, by using the fan-out functionality of GossipSub:
it may publish to all its peers on the subnet, instead of just those in its mesh.
#### Horizontal propagation
Peers on the horizontal subnet are expected to at least perform regular propagation of shard blocks, like how do would participate in any other topic.
*Although this may be sufficient for testnets, expect parameter changes in the spec here.*
#### Horizontal to vertical
Nodes on this same subnet can replicate the sampling efficiently (including a proof for each sample),
and distribute it to any vertical networks that are available to them.
Since the messages are content-addressed (instead of origin-stamped),
multiple publishers of the same samples on a vertical subnet do not hurt performance,
but actually improve it by shortcutting regular propagation on the vertical subnet, and thus lowering the latency to a sample.
### Vertical subnets
Vertical subnets propagate the samples to every peer that is interested.
These interests are randomly sampled and rotate quickly: although not perfect,
sufficient to avoid any significant amount of nodes from being 100% predictable.
As soon as a sample is missing after the expected propagation time window,
nodes can divert to the pull-model, or ultimately flag it as unavailable data.
#### Slow rotation: Backbone
To allow for subscriptions to rotate quickly and randomly, a backbone is formed to help onboard peers into other topics.
This backbone is based on a pure function of the *node* identity and time:
- Nodes can be found *without additional discovery overhead*:
peers on a vertical topic can be found by searching the local peerstore for identities that hash to the desired topic(s).
- Nodes can be held accountable for contributing to the backbone:
peers that particpate in DAS but are not active on the appropriate backbone topics can be scored down.
A node should anticipate backbone topics to subscribe to based their own identity.
These subscriptions rotate slowly, and with different offsets per node identity to avoid sudden network-wide rotations.
```python
# TODO hash function: (node, time)->subnets
```
Backbone subscription work is outlined in the [DAS validator spec](./validator.md#data-availability-sampling)
#### Quick Rotation: Sampling
A node MUST maintain `k` random subscriptions to topics, and rotate these according to the [DAS validator spec](./validator.md#data-availability-sampling).
If the node does not already have connected peers on the topic it needs to sample, it can search its peerstore for peers in the topic backbone.
## DAS in the Gossip domain: Push
### Topics and messages
#### Horizontal subnets
#### Vertical subnets
## DAS in the Req-Resp domain: Pull
To pull samples from nodes, in case of network instability when samples are unavailable, a new query method is added to the Req-Resp domain.
This builds on top of the protocol identification and encoding spec which was introduced in [the Phase0 network spec](../phase0/p2p-interface.md).
Note that the Phase1 DAS networking uses a different protocol prefix: `/eth2/das/req`
The result codes are extended with:
- 3: **ResourceUnavailable** -- when the request was valid but cannot be served at this point in time.
TODO: unify with phase0? Lighthoue already defined this in their response codes enum.
### Messages
#### DASQuery
**Protocol ID:** `/eth2/das/req/query/1/`
Request Content:
```
(
sample_index: SampleIndex
)
```
Response Content:
```
(
DASSample
)
```
When the sample is:
- Available: respond with a `Success` result code, and the encoded sample.
- Expected to be available, but not: respond with a `ResourceUnavailable` result code.
- Not available, but never of interest to the node: respond with an `InvalidRequest` result code.
When the node is part of the backbone and expected to have the sample, the validity of the quest MUST be recognized with `Success` or `ResourceUnavailable`.

View File

@ -536,6 +536,49 @@ class SignedLightAggregateAndProof(Container):
signature: BLSSignature
```
## Data Availability Sampling
### Gossip subscriptions to maintain
#### Slow rotation: Backbone
TODO
#### Quick rotation: Sampling
TODO
### DAS during network instability
The GossipSub based retrieval of samples may not always work
#### Waiting on missing samples
Wait for the sample to re-broadcast. Someone may be slow with publishing, or someone else is able to do the work.
Any node can do the following work to keep the network healthy:
- Common: Listen on a horizontal subnet, chunkify the block data in samples, and propagate the samples to vertical subnets.
- Extreme: Listen on enough vertical subnets, reconstruct the missing samples by recovery, and propagate the recovered samples.
This is not a requirement, but should improve the network stability with little resources, and without any central party.
#### Pulling missing samples
The more realistic option, to execute when a sample is missing, is to query any node that is known to hold it.
Since *consensus identity is disconnected from network identity*, there is no direct way to contact custody holders
without explicitly asking for the data.
However, *network identities* are still used to build a backbone for each vertical subnet.
These nodes should have received the samples, and can serve a buffer of them on demand.
Although serving these is not directly incentivised, it is little work:
1. Buffer any message you see on the backbone vertical subnets, for a buffer of up to two weeks.
2. Serve the samples on request. An individual sample is just expected to be `~ 0.5 KB`, and does not require any pre-processing to serve.
Pulling samples directly from nodes with a custody responsibility, without revealing their identity to the network, is an open problem.
## How to avoid slashing
Proposer and Attester slashings described in Phase 0 remain in place with the