eth2.0-specs/specs/das/sampling.md

85 lines
3.4 KiB
Markdown

# Data Availability Sampling -- Sampling
**Notice**: This document is a work-in-progress for researchers and implementers.
## Table of contents
<!-- TOC -->
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
- [Data Availability Sampling](#data-availability-sampling)
- [GossipSub](#gossipsub)
- [Horizontal subnets](#horizontal-subnets)
- [Vertical subnets](#vertical-subnets)
- [Slow rotation: Backbone](#slow-rotation-backbone)
- [Quick rotation: Sampling](#quick-rotation-sampling)
- [DAS during network instability](#das-during-network-instability)
- [Stage 0: Waiting on missing samples](#stage-0-waiting-on-missing-samples)
- [Stage 1: Pulling missing samples from known peers](#stage-1-pulling-missing-samples-from-known-peers)
- [Stage 2: Pulling missing data from validators with custody.](#stage-2-pulling-missing-data-from-validators-with-custody)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
<!-- /TOC -->
## Data Availability Sampling
TODO: Summary of Data Availability problem
TODO: Summary of solution, why 2x extension, and randomized samples
## GossipSub
### Horizontal subnets
TODO
### Vertical subnets
#### Slow rotation: Backbone
TODO
#### Quick rotation: Sampling
TODO
### DAS during network instability
The GossipSub based retrieval of samples may not always work.
In such event, a node can move through below stages until it recovers data availability.
#### Stage 0: Waiting on missing samples
Wait for the sample to re-broadcast. Someone may be slow with publishing, or someone else is able to do the work.
Any node can do the following work to keep the network healthy:
- Common: Listen on a horizontal subnet, chunkify the block data in samples, and propagate the samples to vertical subnets.
- Extreme: Listen on enough vertical subnets, reconstruct the missing samples by recovery, and propagate the recovered samples.
This is not a requirement, but should improve the network stability with little resources, and without any central party.
#### Stage 1: Pulling missing samples from known peers
The more realistic option, to execute when a sample is missing, is to query any node that is known to hold it.
Since *consensus identity is disconnected from network identity*, there is no direct way to contact custody holders
without explicitly asking for the data.
However, *network identities* are still used to build a backbone for each vertical subnet.
These nodes should have received the samples, and can serve a buffer of them on demand.
Although serving these is not directly incentivised, it is little work:
1. Buffer any message you see on the backbone vertical subnets, for a buffer of up to two weeks.
2. Serve the samples on request. An individual sample is just expected to be `~ 0.5 KB`, and does not require any pre-processing to serve.
A validator SHOULD make a `DASQuery` request to random peers, until failing more than the configured failure-rate.
TODO: detailed failure-mode spec. Stop after trying e.g. 3 peers for any sample in a configured time window (after the gossip period).
#### Stage 2: Pulling missing data from validators with custody.
Pulling samples directly from nodes with validators that have a custody responsibility,
without revealing their identity to the network, is an open problem.