From b848ca6dc7b5dc7d2b83e577ec13d79901067eff Mon Sep 17 00:00:00 2001 From: Csaba Kiraly Date: Tue, 28 May 2024 12:15:36 +0200 Subject: [PATCH 01/15] improved sampling description - describe sample selection - describe sample queries Signed-off-by: Csaba Kiraly --- specs/_features/eip7594/das-core.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index dc50365b1..df5ae81f5 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -239,7 +239,19 @@ To custody a particular column, a node joins the respective gossip subnet. Verif ## Peer sampling -A node SHOULD maintain a diverse set of peers for each column and each slot by verifying responsiveness to sample queries. At each slot, a node makes `SAMPLES_PER_SLOT` queries for samples from their peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_columns` helper to determine which peer(s) to request from. If a node has enough good/honest peers across all rows and columns, this has a high chance of success. +### Sample selection + +At each slot, a node SHOULD select at least `SAMPLES_PER_SLOT` column IDs for sampling. It is recommended to use uniform random selection without replacement based on local randomness. Sampling is considered successful if the node manages to retrieve all selected columns. + +### Sample queries + +A node SHOULD maintain a diverse set of peers for each column and each slot by verifying responsiveness to sample queries. + +A node SHOULD query for samples from their peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_columns` helper to determine which peer(s) it could request from. If more candidate peers are found, a node SHOULD randomize it's peer selection to distribute sample query load in the network. Nodes MAY use peer scoring to tune this selection (for example, by using weighted selection or by using a cut-off threshold). + +If a node already has a column because of custody, it is not required to send out queries for that column. + +If a node has enough good/honest peers across all columns, and the data is being made available, the above procedure has a high chance of success. ## Peer scoring From 8d332788b9aa70b4b307d58504ac02d0ced4862a Mon Sep 17 00:00:00 2001 From: Csaba Kiraly Date: Tue, 28 May 2024 12:24:25 +0200 Subject: [PATCH 02/15] clarify the use of LossyDAS Clarify that what matters is the false positive threshold, allowing different sampling strategies as protocol compliant behavior. Signed-off-by: Csaba Kiraly --- specs/_features/eip7594/das-core.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index df5ae81f5..668b5095c 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -243,6 +243,14 @@ To custody a particular column, a node joins the respective gossip subnet. Verif At each slot, a node SHOULD select at least `SAMPLES_PER_SLOT` column IDs for sampling. It is recommended to use uniform random selection without replacement based on local randomness. Sampling is considered successful if the node manages to retrieve all selected columns. +Alternatively, a node MAY use LossyDAS selecting more than `SAMPLES_PER_SLOT` columns while allowing some missing, respecting the same target false positive threshold (the probability of successful sampling of an unavailable block) as dictated by `SAMPLES_PER_SLOT`. The table below shows the number of samples and the number of allowed missing columns for this threshold. + +| Allowed missing (L) | 0| 1| 2| 3| 4| 5| 6| 7| 8| +|------------------------------- |--|--|--|--|--|--|--|--|--| +| Samples (S) for target threshold 5e-6 |16|20|23|26|29|32|34|37|39| + +Sampling is considered successful if any `S - L` columns are retrieved successfully. + ### Sample queries A node SHOULD maintain a diverse set of peers for each column and each slot by verifying responsiveness to sample queries. From 7b4d23c0ba4cbdc995ad673ead4d32e3eb242dd4 Mon Sep 17 00:00:00 2001 From: Hsiao-Wei Wang Date: Tue, 28 May 2024 22:57:52 +0800 Subject: [PATCH 03/15] fix toc --- specs/_features/eip7594/das-core.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index 668b5095c..a51435898 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -30,6 +30,8 @@ - [Column gossip](#column-gossip) - [Parameters](#parameters) - [Peer sampling](#peer-sampling) + - [Sample selection](#sample-selection) + - [Sample queries](#sample-queries) - [Peer scoring](#peer-scoring) - [Reconstruction and cross-seeding](#reconstruction-and-cross-seeding) - [DAS providers](#das-providers) From 4e1d566c43eb9175205348a58367fdc4de2a07c1 Mon Sep 17 00:00:00 2001 From: Csaba Kiraly Date: Wed, 29 May 2024 11:01:50 +0200 Subject: [PATCH 04/15] improve candidate peer text Signed-off-by: Csaba Kiraly --- specs/_features/eip7594/das-core.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index a51435898..b99f2fc12 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -257,7 +257,9 @@ Sampling is considered successful if any `S - L` columns are retrieved successfu A node SHOULD maintain a diverse set of peers for each column and each slot by verifying responsiveness to sample queries. -A node SHOULD query for samples from their peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_columns` helper to determine which peer(s) it could request from. If more candidate peers are found, a node SHOULD randomize it's peer selection to distribute sample query load in the network. Nodes MAY use peer scoring to tune this selection (for example, by using weighted selection or by using a cut-off threshold). +A node SHOULD query for samples from selected peers via `DataColumnSidecarsByRoot` request. A node utilizes `get_custody_columns` helper to determine which peer(s) it could request from, identifying a list of candidate peers for each selected column. + +If more than one candidate peer is found for a given column, a node SHOULD randomize its peer selection to distribute sample query load in the network. Nodes MAY use peer scoring to tune this selection (for example, by using weighted selection or by using a cut-off threshold). If possible, it is also recommended to avoid requesting many columns from the same peer in order to avoid relying on and exposing the sample selection to a single peer. If a node already has a column because of custody, it is not required to send out queries for that column. From a04cd87c38b82c91d574deddfb41e242e53209d8 Mon Sep 17 00:00:00 2001 From: Csaba Kiraly Date: Wed, 29 May 2024 11:04:10 +0200 Subject: [PATCH 05/15] fix the (source-view) formatting of the table Signed-off-by: Csaba Kiraly --- specs/_features/eip7594/das-core.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index b99f2fc12..e07b1bae3 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -247,8 +247,8 @@ At each slot, a node SHOULD select at least `SAMPLES_PER_SLOT` column IDs for sa Alternatively, a node MAY use LossyDAS selecting more than `SAMPLES_PER_SLOT` columns while allowing some missing, respecting the same target false positive threshold (the probability of successful sampling of an unavailable block) as dictated by `SAMPLES_PER_SLOT`. The table below shows the number of samples and the number of allowed missing columns for this threshold. -| Allowed missing (L) | 0| 1| 2| 3| 4| 5| 6| 7| 8| -|------------------------------- |--|--|--|--|--|--|--|--|--| +| Allowed missing (L) | 0| 1| 2| 3| 4| 5| 6| 7| 8| +|----------------------------------------|--|--|--|--|--|--|--|--|--| | Samples (S) for target threshold 5e-6 |16|20|23|26|29|32|34|37|39| Sampling is considered successful if any `S - L` columns are retrieved successfully. From 5f3beca87121f6f121e41b9a61ea1a971dcb7cd7 Mon Sep 17 00:00:00 2001 From: Csaba Kiraly Date: Wed, 29 May 2024 11:08:52 +0200 Subject: [PATCH 06/15] remove LossyDAS naming from spec While the technique was introduced as LossyDAS, we don't need the name in the specification itself. Signed-off-by: Csaba Kiraly --- specs/_features/eip7594/das-core.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index e07b1bae3..b9674b177 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -245,7 +245,7 @@ To custody a particular column, a node joins the respective gossip subnet. Verif At each slot, a node SHOULD select at least `SAMPLES_PER_SLOT` column IDs for sampling. It is recommended to use uniform random selection without replacement based on local randomness. Sampling is considered successful if the node manages to retrieve all selected columns. -Alternatively, a node MAY use LossyDAS selecting more than `SAMPLES_PER_SLOT` columns while allowing some missing, respecting the same target false positive threshold (the probability of successful sampling of an unavailable block) as dictated by `SAMPLES_PER_SLOT`. The table below shows the number of samples and the number of allowed missing columns for this threshold. +Alternatively, a node MAY use a method that selects more than `SAMPLES_PER_SLOT` columns while allowing some missing, respecting the same target false positive threshold (the probability of successful sampling of an unavailable block) as dictated by `SAMPLES_PER_SLOT`. The table below shows the number of samples and the number of allowed missing columns for this threshold. | Allowed missing (L) | 0| 1| 2| 3| 4| 5| 6| 7| 8| |----------------------------------------|--|--|--|--|--|--|--|--|--| From 436e58e3f8d2e3839e91864bde016f656cff31d1 Mon Sep 17 00:00:00 2001 From: Csaba Kiraly Date: Wed, 29 May 2024 14:25:33 +0200 Subject: [PATCH 07/15] add get_extended_sample_count helper function add LossyDAS sample count generation helper function Signed-off-by: Csaba Kiraly --- specs/_features/eip7594/das-core.md | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index b9674b177..7d5710a2d 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -22,6 +22,7 @@ - [`compute_extended_matrix`](#compute_extended_matrix) - [`recover_matrix`](#recover_matrix) - [`get_data_column_sidecars`](#get_data_column_sidecars) + - [`get_extended_sample_count`](#get_extended_sample_count) - [Custody](#custody) - [Custody requirement](#custody-requirement) - [Public, deterministic selection](#public-deterministic-selection) @@ -199,6 +200,21 @@ def get_data_column_sidecars(signed_block: SignedBeaconBlock, return sidecars ``` +#### `get_extended_sample_count` + +```python +# from scipy.stats import hypergeom +def get_extended_sample_count(samples_per_slot: uint64, allowed_failures: uint64) -> uint64: + assert 0 <= allowed_failures <= NUMBER_OF_COLUMNS // 2 + + worst_case_missing = NUMBER_OF_COLUMNS // 2 + 1 + false_positive_threshold = hypergeom.cdf(0, NUMBER_OF_COLUMNS, worst_case_missing, samples_per_slot) + for sample_count in range(samples_per_slot, NUMBER_OF_COLUMNS + 1): + if hypergeom.cdf(allowed_failures, NUMBER_OF_COLUMNS, worst_case_missing, sample_count) <= false_positive_threshold: + break + return sample_count +``` + ## Custody ### Custody requirement @@ -245,13 +261,13 @@ To custody a particular column, a node joins the respective gossip subnet. Verif At each slot, a node SHOULD select at least `SAMPLES_PER_SLOT` column IDs for sampling. It is recommended to use uniform random selection without replacement based on local randomness. Sampling is considered successful if the node manages to retrieve all selected columns. -Alternatively, a node MAY use a method that selects more than `SAMPLES_PER_SLOT` columns while allowing some missing, respecting the same target false positive threshold (the probability of successful sampling of an unavailable block) as dictated by `SAMPLES_PER_SLOT`. The table below shows the number of samples and the number of allowed missing columns for this threshold. +Alternatively, a node MAY use a method that selects more than `SAMPLES_PER_SLOT` columns while allowing some missing, respecting the same target false positive threshold (the probability of successful sampling of an unavailable block) as dictated by the `SAMPLES_PER_SLOT` parameter. A node can use the `get_extended_sample_count(samples_per_slot, allowed_failures) -> sample_count` helper function to determine the sample count for any selected number of allowed failures. Sampling is then considered successful if any `sample_count - allowed_failures` columns are retrieved successfully. -| Allowed missing (L) | 0| 1| 2| 3| 4| 5| 6| 7| 8| -|----------------------------------------|--|--|--|--|--|--|--|--|--| -| Samples (S) for target threshold 5e-6 |16|20|23|26|29|32|34|37|39| +For reference, the table below shows the number of samples and the number of allowed missing columns assuming `NUMBER_OF_COLUMNS = 128` and `SAMPLES_PER_SLOT = 16`. -Sampling is considered successful if any `S - L` columns are retrieved successfully. +| Allowed missing | 0| 1| 2| 3| 4| 5| 6| 7| 8| +|-----------------|--|--|--|--|--|--|--|--|--| +| Sample count |16|20|24|27|29|32|35|37|40| ### Sample queries From 4c57399887e22aebffd325fe55db19b659263749 Mon Sep 17 00:00:00 2001 From: Csaba Kiraly Date: Tue, 4 Jun 2024 09:38:09 +0200 Subject: [PATCH 08/15] self-contained get_extended_sample_count Importing scipy is not preferred. This is a self-contained version. Eventually an import of math and use of math.comb makes it simpler. Solving other formatting issues as well. Signed-off-by: Csaba Kiraly --- specs/_features/eip7594/das-core.md | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index 7d5710a2d..f0f302a59 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -203,15 +203,28 @@ def get_data_column_sidecars(signed_block: SignedBeaconBlock, #### `get_extended_sample_count` ```python -# from scipy.stats import hypergeom def get_extended_sample_count(samples_per_slot: uint64, allowed_failures: uint64) -> uint64: assert 0 <= allowed_failures <= NUMBER_OF_COLUMNS // 2 + def math_comb(n, k): + if not 0 <= k <= n: + return 0 + r = 1 + for i in range(min(k, n - k)): + r = r * (n - i) // (i + 1) + return r + + def hypergeom_cdf(k, M, n, N): + return sum([math_comb(n, i) * math_comb(M - n, N - i) / math_comb(M, N) + for i in range(k + 1)]) + worst_case_missing = NUMBER_OF_COLUMNS // 2 + 1 - false_positive_threshold = hypergeom.cdf(0, NUMBER_OF_COLUMNS, worst_case_missing, samples_per_slot) + false_positive_threshold = hypergeom_cdf(0, NUMBER_OF_COLUMNS, + worst_case_missing, samples_per_slot) for sample_count in range(samples_per_slot, NUMBER_OF_COLUMNS + 1): - if hypergeom.cdf(allowed_failures, NUMBER_OF_COLUMNS, worst_case_missing, sample_count) <= false_positive_threshold: - break + if hypergeom_cdf(allowed_failures, NUMBER_OF_COLUMNS, + worst_case_missing, sample_count) <= false_positive_threshold: + break return sample_count ``` From 2ab4f1e12e9f59584326a475c3a1da9e98375802 Mon Sep 17 00:00:00 2001 From: Csaba Kiraly Date: Mon, 10 Jun 2024 11:16:06 +0200 Subject: [PATCH 09/15] get_extended_sample_count: use SAMPLES_PER_SLOT constant Signed-off-by: Csaba Kiraly --- specs/_features/eip7594/das-core.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index f0f302a59..442409ce4 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -203,7 +203,7 @@ def get_data_column_sidecars(signed_block: SignedBeaconBlock, #### `get_extended_sample_count` ```python -def get_extended_sample_count(samples_per_slot: uint64, allowed_failures: uint64) -> uint64: +def get_extended_sample_count(allowed_failures: uint64) -> uint64: assert 0 <= allowed_failures <= NUMBER_OF_COLUMNS // 2 def math_comb(n, k): @@ -220,8 +220,8 @@ def get_extended_sample_count(samples_per_slot: uint64, allowed_failures: uint64 worst_case_missing = NUMBER_OF_COLUMNS // 2 + 1 false_positive_threshold = hypergeom_cdf(0, NUMBER_OF_COLUMNS, - worst_case_missing, samples_per_slot) - for sample_count in range(samples_per_slot, NUMBER_OF_COLUMNS + 1): + worst_case_missing, SAMPLES_PER_SLOT) + for sample_count in range(SAMPLES_PER_SLOT, NUMBER_OF_COLUMNS + 1): if hypergeom_cdf(allowed_failures, NUMBER_OF_COLUMNS, worst_case_missing, sample_count) <= false_positive_threshold: break From fb020456cba20ffa76d5c3566c8240b8206a47f5 Mon Sep 17 00:00:00 2001 From: Hsiao-Wei Wang Date: Wed, 19 Jun 2024 02:18:29 +0800 Subject: [PATCH 10/15] Add `get_extended_sample_count` unit tests --- specs/_features/eip7594/das-core.md | 6 ++- .../test/eip7594/unittests/das/test_das.py | 52 +++++++++++++++++++ 2 files changed, 57 insertions(+), 1 deletion(-) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index 442409ce4..aec3a276e 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -215,7 +215,11 @@ def get_extended_sample_count(allowed_failures: uint64) -> uint64: return r def hypergeom_cdf(k, M, n, N): - return sum([math_comb(n, i) * math_comb(M - n, N - i) / math_comb(M, N) + k = int(k) + M = int(M) + n = int(n) + N = int(N) + return sum([math_comb(n, i) * math_comb(M - n, N - i) // math_comb(M, N) for i in range(k + 1)]) worst_case_missing = NUMBER_OF_COLUMNS // 2 + 1 diff --git a/tests/core/pyspec/eth2spec/test/eip7594/unittests/das/test_das.py b/tests/core/pyspec/eth2spec/test/eip7594/unittests/das/test_das.py index cdbfad9ff..d61f2e7cd 100644 --- a/tests/core/pyspec/eth2spec/test/eip7594/unittests/das/test_das.py +++ b/tests/core/pyspec/eth2spec/test/eip7594/unittests/das/test_das.py @@ -1,5 +1,6 @@ import random from eth2spec.test.context import ( + expect_assertion_error, spec_test, single_phase, with_eip7594_and_later, @@ -67,3 +68,54 @@ def test_recover_matrix(spec): recovered_matrix = spec.recover_matrix(cells_dict, blob_count) flatten_original_cells = [cell for cells in original_cells for cell in cells] assert recovered_matrix == flatten_original_cells + + +@with_eip7594_and_later +@spec_test +@single_phase +def test_get_extended_sample_count__1(spec): + rng = random.Random(1111) + allowed_failures = rng.randint(0, spec.config.NUMBER_OF_COLUMNS // 2) + spec.get_extended_sample_count(allowed_failures) + + +@with_eip7594_and_later +@spec_test +@single_phase +def test_get_extended_sample_count__2(spec): + rng = random.Random(2222) + allowed_failures = rng.randint(0, spec.config.NUMBER_OF_COLUMNS // 2) + spec.get_extended_sample_count(allowed_failures) + + +@with_eip7594_and_later +@spec_test +@single_phase +def test_get_extended_sample_count__3(spec): + rng = random.Random(3333) + allowed_failures = rng.randint(0, spec.config.NUMBER_OF_COLUMNS // 2) + spec.get_extended_sample_count(allowed_failures) + + +@with_eip7594_and_later +@spec_test +@single_phase +def test_get_extended_sample_count__lower_bound(spec): + allowed_failures = 0 + spec.get_extended_sample_count(allowed_failures) + + +@with_eip7594_and_later +@spec_test +@single_phase +def test_get_extended_sample_count__upper_bound(spec): + allowed_failures = spec.config.NUMBER_OF_COLUMNS // 2 + spec.get_extended_sample_count(allowed_failures) + + +@with_eip7594_and_later +@spec_test +@single_phase +def test_get_extended_sample_count__upper_bound_exceed(spec): + allowed_failures = spec.config.NUMBER_OF_COLUMNS // 2 + 1 + expect_assertion_error(lambda: spec.get_extended_sample_count(allowed_failures)) From beedf852cb81827772fcff7fc78137601864e204 Mon Sep 17 00:00:00 2001 From: Hsiao-Wei Wang Date: Tue, 25 Jun 2024 16:15:16 +0800 Subject: [PATCH 11/15] Revert division change and add comments --- specs/_features/eip7594/das-core.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index aec3a276e..fe01a5e86 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -206,7 +206,7 @@ def get_data_column_sidecars(signed_block: SignedBeaconBlock, def get_extended_sample_count(allowed_failures: uint64) -> uint64: assert 0 <= allowed_failures <= NUMBER_OF_COLUMNS // 2 - def math_comb(n, k): + def math_comb(n: int, k: int) -> int: if not 0 <= k <= n: return 0 r = 1 @@ -214,12 +214,14 @@ def get_extended_sample_count(allowed_failures: uint64) -> uint64: r = r * (n - i) // (i + 1) return r - def hypergeom_cdf(k, M, n, N): + def hypergeom_cdf(k: uint64, M: uint64, n: uint64, N: uint64) -> float: + # NOTE: It contains float-point computations. + # Convert uint64 to Python integers before computations. k = int(k) M = int(M) n = int(n) N = int(N) - return sum([math_comb(n, i) * math_comb(M - n, N - i) // math_comb(M, N) + return sum([math_comb(n, i) * math_comb(M - n, N - i) / math_comb(M, N) for i in range(k + 1)]) worst_case_missing = NUMBER_OF_COLUMNS // 2 + 1 From ee977381de036cee998a17f0b65e7ebdc8bd0413 Mon Sep 17 00:00:00 2001 From: Hsiao-Wei Wang Date: Tue, 25 Jun 2024 16:48:38 +0800 Subject: [PATCH 12/15] Add `test_get_extended_sample_count__table_in_spec` to verify the table content in the spec --- .../test/eip7594/unittests/das/test_das.py | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/tests/core/pyspec/eth2spec/test/eip7594/unittests/das/test_das.py b/tests/core/pyspec/eth2spec/test/eip7594/unittests/das/test_das.py index 283a1e425..7110f2373 100644 --- a/tests/core/pyspec/eth2spec/test/eip7594/unittests/das/test_das.py +++ b/tests/core/pyspec/eth2spec/test/eip7594/unittests/das/test_das.py @@ -3,6 +3,7 @@ from eth2spec.test.context import ( expect_assertion_error, spec_test, single_phase, + with_config_overrides, with_eip7594_and_later, ) from eth2spec.test.helpers.sharding import ( @@ -116,3 +117,29 @@ def test_get_extended_sample_count__upper_bound(spec): def test_get_extended_sample_count__upper_bound_exceed(spec): allowed_failures = spec.config.NUMBER_OF_COLUMNS // 2 + 1 expect_assertion_error(lambda: spec.get_extended_sample_count(allowed_failures)) + + +@with_eip7594_and_later +@spec_test +@with_config_overrides({ + 'NUMBER_OF_COLUMNS': 128, + 'SAMPLES_PER_SLOT': 16, +}) +@single_phase +def test_get_extended_sample_count__table_in_spec(spec): + table = dict( + # (allowed_failures, expected_extended_sample_count) + { + 0: 16, + 1: 20, + 2: 24, + 3: 27, + 4: 29, + 5: 32, + 6: 35, + 7: 37, + 8: 40, + } + ) + for allowed_failures, expected_extended_sample_count in table.items(): + assert spec.get_extended_sample_count(allowed_failures=allowed_failures) == expected_extended_sample_count From 17dfb9ae5763b53101fc3fc9fba1e26b27de8750 Mon Sep 17 00:00:00 2001 From: Csaba Kiraly Date: Thu, 27 Jun 2024 09:30:39 +0200 Subject: [PATCH 13/15] fix get_extended_sample_count proc signature Co-authored-by: Pop Chunhapanya --- specs/_features/eip7594/das-core.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index bd86b80d8..9310a62ee 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -304,7 +304,7 @@ To custody a particular column, a node joins the respective gossip subnet. Verif At each slot, a node SHOULD select at least `SAMPLES_PER_SLOT` column IDs for sampling. It is recommended to use uniform random selection without replacement based on local randomness. Sampling is considered successful if the node manages to retrieve all selected columns. -Alternatively, a node MAY use a method that selects more than `SAMPLES_PER_SLOT` columns while allowing some missing, respecting the same target false positive threshold (the probability of successful sampling of an unavailable block) as dictated by the `SAMPLES_PER_SLOT` parameter. A node can use the `get_extended_sample_count(samples_per_slot, allowed_failures) -> sample_count` helper function to determine the sample count for any selected number of allowed failures. Sampling is then considered successful if any `sample_count - allowed_failures` columns are retrieved successfully. +Alternatively, a node MAY use a method that selects more than `SAMPLES_PER_SLOT` columns while allowing some missing, respecting the same target false positive threshold (the probability of successful sampling of an unavailable block) as dictated by the `SAMPLES_PER_SLOT` parameter. A node can use the `get_extended_sample_count(allowed_failures) -> sample_count` helper function to determine the sample count for any selected number of allowed failures. Sampling is then considered successful if any `sample_count - allowed_failures` columns are retrieved successfully. For reference, the table below shows the number of samples and the number of allowed missing columns assuming `NUMBER_OF_COLUMNS = 128` and `SAMPLES_PER_SLOT = 16`. From 78b583d8b062aacec77b5e522f9e883e760e756a Mon Sep 17 00:00:00 2001 From: Csaba Kiraly Date: Thu, 27 Jun 2024 09:48:32 +0200 Subject: [PATCH 14/15] clarify use of get_extended_sample_count Here we assume uniform random selection without replacement. If other methods are used, the target false positive threshold is the main rule to follow. Signed-off-by: Csaba Kiraly --- specs/_features/eip7594/das-core.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index 9310a62ee..2e6226646 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -304,7 +304,7 @@ To custody a particular column, a node joins the respective gossip subnet. Verif At each slot, a node SHOULD select at least `SAMPLES_PER_SLOT` column IDs for sampling. It is recommended to use uniform random selection without replacement based on local randomness. Sampling is considered successful if the node manages to retrieve all selected columns. -Alternatively, a node MAY use a method that selects more than `SAMPLES_PER_SLOT` columns while allowing some missing, respecting the same target false positive threshold (the probability of successful sampling of an unavailable block) as dictated by the `SAMPLES_PER_SLOT` parameter. A node can use the `get_extended_sample_count(allowed_failures) -> sample_count` helper function to determine the sample count for any selected number of allowed failures. Sampling is then considered successful if any `sample_count - allowed_failures` columns are retrieved successfully. +Alternatively, a node MAY use a method that selects more than `SAMPLES_PER_SLOT` columns while allowing some missing, respecting the same target false positive threshold (the probability of successful sampling of an unavailable block) as dictated by the `SAMPLES_PER_SLOT` parameter. If using uniform random selection without replacement, a node can use the `get_extended_sample_count(allowed_failures) -> sample_count` helper function to determine the sample count (number of unique column IDs) for any selected number of allowed failures. Sampling is then considered successful if any `sample_count - allowed_failures` columns are retrieved successfully. For reference, the table below shows the number of samples and the number of allowed missing columns assuming `NUMBER_OF_COLUMNS = 128` and `SAMPLES_PER_SLOT = 16`. From 1ad381dccb9c923e81d0c7c409677a9b02eb97d1 Mon Sep 17 00:00:00 2001 From: Csaba Kiraly Date: Thu, 27 Jun 2024 10:28:21 +0200 Subject: [PATCH 15/15] adding get_extended_sample_count docsstring Signed-off-by: Csaba Kiraly --- specs/_features/eip7594/das-core.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/specs/_features/eip7594/das-core.md b/specs/_features/eip7594/das-core.md index 2e6226646..293e85ad7 100644 --- a/specs/_features/eip7594/das-core.md +++ b/specs/_features/eip7594/das-core.md @@ -229,6 +229,14 @@ def get_data_column_sidecars(signed_block: SignedBeaconBlock, ```python def get_extended_sample_count(allowed_failures: uint64) -> uint64: assert 0 <= allowed_failures <= NUMBER_OF_COLUMNS // 2 + """ + Return the sample count if allowing failures. + + This helper demonstrates how to calculate the number of columns to query per slot when + allowing given number of failures, assuming uniform random selection without replacement. + Nested functions are direct replacements of Python library functions math.comb and + scipy.stats.hypergeom.cdf, with the same signatures. + """ def math_comb(n: int, k: int) -> int: if not 0 <= k <= n: