Add `recover_matrix` and remove unused `FlatExtendedMatrix` type

This commit is contained in:
Hsiao-Wei Wang 2024-02-02 01:45:02 +08:00
parent 428c166283
commit c47d5f3578
No known key found for this signature in database
GPG Key ID: AE3D6B174F971DE4
7 changed files with 83 additions and 26 deletions

View File

@ -18,7 +18,7 @@
- [Helper functions](#helper-functions) - [Helper functions](#helper-functions)
- [`get_custody_columns`](#get_custody_columns) - [`get_custody_columns`](#get_custody_columns)
- [`compute_extended_data`](#compute_extended_data) - [`compute_extended_data`](#compute_extended_data)
- [`compute_extended_matrix`](#compute_extended_matrix) - [`recover_matrix`](#recover_matrix)
- [`get_data_column_sidecars`](#get_data_column_sidecars) - [`get_data_column_sidecars`](#get_data_column_sidecars)
- [Custody](#custody) - [Custody](#custody)
- [Custody requirement](#custody-requirement) - [Custody requirement](#custody-requirement)
@ -47,7 +47,6 @@ We define the following Python custom types for type hinting and readability:
| - | - | - | | - | - | - |
| `DataColumn` | `List[Cell, MAX_BLOBS_PER_BLOCK]` | The data of each column in EIP-7594 | | `DataColumn` | `List[Cell, MAX_BLOBS_PER_BLOCK]` | The data of each column in EIP-7594 |
| `ExtendedMatrix` | `List[Cell, MAX_BLOBS_PER_BLOCK * NUMBER_OF_COLUMNS]` | The full data of one-dimensional erasure coding extended blobs (in row major format) | | `ExtendedMatrix` | `List[Cell, MAX_BLOBS_PER_BLOCK * NUMBER_OF_COLUMNS]` | The full data of one-dimensional erasure coding extended blobs (in row major format) |
| `FlatExtendedMatrix` | `List[BLSFieldElement, FIELD_ELEMENTS_PER_CELL * MAX_BLOBS_PER_BLOCK * NUMBER_OF_COLUMNS]` | The flattened format of `ExtendedMatrix` |
## Configuration ## Configuration
@ -122,12 +121,29 @@ def compute_extended_data(data: Sequence[BLSFieldElement]) -> Sequence[BLSFieldE
... ...
``` ```
#### `compute_extended_matrix` #### `recover_matrix`
```python ```python
def compute_extended_matrix(blobs: Sequence[Blob]) -> FlatExtendedMatrix: def recover_matrix(cells_dict: Dict[Tuple[BlobIndex, CellID], Cell], blob_count: uint64) -> ExtendedMatrix:
matrix = [compute_extended_data(blob) for blob in blobs] """
return FlatExtendedMatrix(matrix) Return the recovered ``ExtendedMatrix``.
This helper demonstrate how to apply ``recover_polynomial``.
The data structure for storing cells is implementation-dependent.
"""
extended_matrix = []
for blob_index in range(blob_count):
cell_ids = [cell_id for b_index, cell_id in cells_dict.keys() if b_index == blob_index]
cells = [cells_dict[(blob_index, cell_id)] for cell_id in cell_ids]
cells_bytes = [[bls_field_to_bytes(element) for element in cell] for cell in cells]
full_polynomial = recover_polynomial(cell_ids, cells_bytes)
cells_from_full_polynomial = [
full_polynomial[i * FIELD_ELEMENTS_PER_CELL:(i + 1) * FIELD_ELEMENTS_PER_CELL]
for i in range(CELLS_PER_BLOB)
]
extended_matrix.extend(cells_from_full_polynomial)
return ExtendedMatrix(extended_matrix)
``` ```
#### `get_data_column_sidecars` #### `get_data_column_sidecars`
@ -204,7 +220,7 @@ To custody a particular column, a node joins the respective gossip subnet. Verif
### Reconstruction and cross-seeding ### Reconstruction and cross-seeding
If the node obtains 50%+ of all the columns, they can reconstruct the full data matrix via `recover_samples_impl` helper. If the node obtains 50%+ of all the columns, they can reconstruct the full data matrix via `recover_matrix` helper.
If a node fails to sample a peer or fails to get a column on the column subnet, a node can utilize the Req/Resp message to query the missing column from other peers. If a node fails to sample a peer or fails to get a column on the column subnet, a node can utilize the Req/Resp message to query the missing column from other peers.
@ -218,7 +234,7 @@ Once the node obtain the column, the node should send the missing columns to the
## Peer sampling ## Peer sampling
At each slot, a node makes (locally randomly determined) `SAMPLES_PER_SLOT` queries for samples from their peers via `DataColumnSidecarByRoot` request. A node utilizes `get_custody_columns` helper to determine which peer(s) to request from. If a node has enough good/honest peers across all rows and columns, this has a high chance of success. At each slot, a node makes (locally randomly determined) `SAMPLES_PER_SLOT` queries for samples from their peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_columns` helper to determine which peer(s) to request from. If a node has enough good/honest peers across all rows and columns, this has a high chance of success.
## Peer scoring ## Peer scoring
@ -240,7 +256,7 @@ The fork choice rule (essentially a DA filter) is *orthogonal to a given DAS des
In any DAS design, there are probably a few degrees of freedom around timing, acceptability of short-term re-orgs, etc. In any DAS design, there are probably a few degrees of freedom around timing, acceptability of short-term re-orgs, etc.
For example, the fork choice rule might require validators to do successful DAS on slot N to be able to include block of slot `N` in its fork choice. That's the tightest DA filter. But trailing filters are also probably acceptable, knowing that there might be some failures/short re-orgs but that they don't hurt the aggregate security. For example, the rule could be — DAS must be completed for slot N-1 for a child block in N to be included in the fork choice. For example, the fork choice rule might require validators to do successful DAS on slot `N` to be able to include block of slot `N` in its fork choice. That's the tightest DA filter. But trailing filters are also probably acceptable, knowing that there might be some failures/short re-orgs but that they don't hurt the aggregate security. For example, the rule could be — DAS must be completed for slot N-1 for a child block in N to be included in the fork choice.
Such trailing techniques and their analysis will be valuable for any DAS construction. The question is — can you relax how quickly you need to do DA and in the worst case not confirm unavailable data via attestations/finality, and what impact does it have on short-term re-orgs and fast confirmation rules. Such trailing techniques and their analysis will be valuable for any DAS construction. The question is — can you relax how quickly you need to do DA and in the worst case not confirm unavailable data via attestations/finality, and what impact does it have on short-term re-orgs and fast confirmation rules.

View File

@ -20,6 +20,7 @@
- [BLS12-381 helpers](#bls12-381-helpers) - [BLS12-381 helpers](#bls12-381-helpers)
- [`hash_to_bls_field`](#hash_to_bls_field) - [`hash_to_bls_field`](#hash_to_bls_field)
- [`bytes_to_bls_field`](#bytes_to_bls_field) - [`bytes_to_bls_field`](#bytes_to_bls_field)
- [`bls_field_to_bytes`](#bls_field_to_bytes)
- [`validate_kzg_g1`](#validate_kzg_g1) - [`validate_kzg_g1`](#validate_kzg_g1)
- [`bytes_to_kzg_commitment`](#bytes_to_kzg_commitment) - [`bytes_to_kzg_commitment`](#bytes_to_kzg_commitment)
- [`bytes_to_kzg_proof`](#bytes_to_kzg_proof) - [`bytes_to_kzg_proof`](#bytes_to_kzg_proof)
@ -170,6 +171,12 @@ def bytes_to_bls_field(b: Bytes32) -> BLSFieldElement:
return BLSFieldElement(field_element) return BLSFieldElement(field_element)
``` ```
#### `bls_field_to_bytes`
```python
def bls_field_to_bytes(x: BLSFieldElement) -> Bytes32:
return int.to_bytes(x % BLS_MODULUS, 32, KZG_ENDIANNESS)
```
#### `validate_kzg_g1` #### `validate_kzg_g1`

View File

@ -32,10 +32,6 @@ def bls_add_one(x):
) )
def field_element_bytes(x):
return int.to_bytes(x % BLS_MODULUS, 32, "big")
@with_deneb_and_later @with_deneb_and_later
@spec_test @spec_test
@single_phase @single_phase
@ -43,7 +39,7 @@ def test_verify_kzg_proof(spec):
""" """
Test the wrapper functions (taking bytes arguments) for computing and verifying KZG proofs. Test the wrapper functions (taking bytes arguments) for computing and verifying KZG proofs.
""" """
x = field_element_bytes(3) x = spec.bls_field_to_bytes(3)
blob = get_sample_blob(spec) blob = get_sample_blob(spec)
commitment = spec.blob_to_kzg_commitment(blob) commitment = spec.blob_to_kzg_commitment(blob)
proof, y = spec.compute_kzg_proof(blob, x) proof, y = spec.compute_kzg_proof(blob, x)
@ -58,7 +54,7 @@ def test_verify_kzg_proof_incorrect_proof(spec):
""" """
Test the wrapper function `verify_kzg_proof` fails on an incorrect proof. Test the wrapper function `verify_kzg_proof` fails on an incorrect proof.
""" """
x = field_element_bytes(3465) x = spec.bls_field_to_bytes(3465)
blob = get_sample_blob(spec) blob = get_sample_blob(spec)
commitment = spec.blob_to_kzg_commitment(blob) commitment = spec.blob_to_kzg_commitment(blob)
proof, y = spec.compute_kzg_proof(blob, x) proof, y = spec.compute_kzg_proof(blob, x)

View File

@ -0,0 +1,44 @@
import random
from eth2spec.test.context import (
spec_test,
single_phase,
with_eip7594_and_later,
)
from eth2spec.test.helpers.sharding import (
get_sample_blob,
)
@with_eip7594_and_later
@spec_test
@single_phase
def test_recover_matrix(spec):
rng = random.Random(5566)
# Number of samples we will be recovering from
N_SAMPLES = spec.CELLS_PER_BLOB // 2
blob_count = 2
cells_dict = {}
original_cells = []
for blob_index in range(blob_count):
# Get the data we will be working with
blob = get_sample_blob(spec, rng=rng)
# Extend data with Reed-Solomon and split the extended data in cells
cells = spec.compute_cells(blob)
original_cells.append(cells)
cell_ids = []
# First figure out just the indices of the cells
for _ in range(N_SAMPLES):
cell_id = rng.randint(0, spec.CELLS_PER_BLOB - 1)
while cell_id in cell_ids:
cell_id = rng.randint(0, spec.CELLS_PER_BLOB - 1)
cell_ids.append(cell_id)
cell = cells[cell_id]
cells_dict[(blob_index, cell_id)] = cell
assert len(cell_ids) == N_SAMPLES
# Recover the matrix
recovered_matrix = spec.recover_matrix(cells_dict, blob_count)
flatten_original_cells = [cell for cells in original_cells for cell in cells]
assert recovered_matrix == flatten_original_cells

View File

@ -10,10 +10,6 @@ from eth2spec.test.helpers.sharding import (
from eth2spec.utils.bls import BLS_MODULUS from eth2spec.utils.bls import BLS_MODULUS
def field_element_bytes(x):
return int.to_bytes(x % BLS_MODULUS, 32, "big")
@with_eip7594_and_later @with_eip7594_and_later
@spec_test @spec_test
@single_phase @single_phase
@ -39,7 +35,7 @@ def test_verify_cell_proof(spec):
commitment = spec.blob_to_kzg_commitment(blob) commitment = spec.blob_to_kzg_commitment(blob)
cells, proofs = spec.compute_cells_and_proofs(blob) cells, proofs = spec.compute_cells_and_proofs(blob)
cells_bytes = [[field_element_bytes(element) for element in cell] for cell in cells] cells_bytes = [[spec.bls_field_to_bytes(element) for element in cell] for cell in cells]
cell_id = 0 cell_id = 0
assert spec.verify_cell_proof(commitment, cell_id, cells_bytes[cell_id], proofs[cell_id]) assert spec.verify_cell_proof(commitment, cell_id, cells_bytes[cell_id], proofs[cell_id])
@ -54,7 +50,7 @@ def test_verify_cell_proof_batch(spec):
blob = get_sample_blob(spec) blob = get_sample_blob(spec)
commitment = spec.blob_to_kzg_commitment(blob) commitment = spec.blob_to_kzg_commitment(blob)
cells, proofs = spec.compute_cells_and_proofs(blob) cells, proofs = spec.compute_cells_and_proofs(blob)
cells_bytes = [[field_element_bytes(element) for element in cell] for cell in cells] cells_bytes = [[spec.bls_field_to_bytes(element) for element in cell] for cell in cells]
assert len(cells) == len(proofs) assert len(cells) == len(proofs)
@ -83,15 +79,15 @@ def test_recover_polynomial(spec):
# Extend data with Reed-Solomon and split the extended data in cells # Extend data with Reed-Solomon and split the extended data in cells
cells = spec.compute_cells(blob) cells = spec.compute_cells(blob)
cells_bytes = [[field_element_bytes(element) for element in cell] for cell in cells] cells_bytes = [[spec.bls_field_to_bytes(element) for element in cell] for cell in cells]
# Compute the cells we will be recovering from # Compute the cells we will be recovering from
cell_ids = [] cell_ids = []
# First figure out just the indices of the cells # First figure out just the indices of the cells
for i in range(N_SAMPLES): for i in range(N_SAMPLES):
j = rng.randint(0, spec.CELLS_PER_BLOB) j = rng.randint(0, spec.CELLS_PER_BLOB - 1)
while j in cell_ids: while j in cell_ids:
j = rng.randint(0, spec.CELLS_PER_BLOB) j = rng.randint(0, spec.CELLS_PER_BLOB - 1)
cell_ids.append(j) cell_ids.append(j)
# Now the cells themselves # Now the cells themselves
known_cells_bytes = [cells_bytes[cell_id] for cell_id in cell_ids] known_cells_bytes = [cells_bytes[cell_id] for cell_id in cell_ids]

View File

@ -12,8 +12,6 @@ def run_get_custody_columns(spec, peer_count, custody_subnet_count):
columns_per_subnet = spec.NUMBER_OF_COLUMNS // spec.config.DATA_COLUMN_SIDECAR_SUBNET_COUNT columns_per_subnet = spec.NUMBER_OF_COLUMNS // spec.config.DATA_COLUMN_SIDECAR_SUBNET_COUNT
for assignment in assignments: for assignment in assignments:
assert len(assignment) == custody_subnet_count * columns_per_subnet assert len(assignment) == custody_subnet_count * columns_per_subnet
print('assignment', assignment)
print('set(assignment)', set(assignment))
assert len(assignment) == len(set(assignment)) assert len(assignment) == len(set(assignment))