From 07bb16a97d660094df50142c0eaef4f94644415e Mon Sep 17 00:00:00 2001 From: Jimmy Debe <91767824+jimstir@users.noreply.github.com> Date: Thu, 3 Oct 2024 00:13:02 -0400 Subject: [PATCH] Update erasue-coding.md --- specs/erasue-coding.md | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/specs/erasue-coding.md b/specs/erasue-coding.md index 06d4f8d..5a9d2e7 100644 --- a/specs/erasue-coding.md +++ b/specs/erasue-coding.md @@ -29,14 +29,12 @@ The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL N The Codex client performerasure coding locally before provding dataset to the marketplace. -### Flow - Before data is provided to storage providers on the marketplace, clients must do the following: 1. Prepare dataset 2. Encode data with Reed Solomon erasue coding, more explained below -3. Derive an CID from encoded chunks share on the marketplace +3. Derive an CID from encoded chunks, share on the marketplace 4. Error correction by validator nodes once storage contract begins ### Preparing Data @@ -61,7 +59,6 @@ With Reed-Solomon algorithm, extra data chunks need to be created for the datase Parity blocks is added to the chucks of data before encoding. Once data is encoded, it is prepared to be transmitted. - Below is the content of the dag-pb protobuf message ```protobuf @@ -88,28 +85,32 @@ Below is the content of the dag-pb protobuf message } ``` -## Decode Data +### Decode Data +Decoding occurs after a dataset is downloaded by storage providers and +and proofs of storage are required. There are two node roles that will need to decode data. + - Client nodes to read data - Validator nodes to verfiy storage providers are storing data as per the marketplace -To ensure data is being stored by by storage providers, data will need to be decoded when vaildator nodes need download data slots. - +To ensure data is being stored by storage providers, the smart contracts REQUIRES proof of storage to be submitted. +If a window is missed, vaildators can ## Security Considerations -### Encoding Problem +### Adversarial Attack An adversarial storage provider can remove only the first element from more than half of the block, and the slot data can no longer be recovered from the data that the host stores. For example, with 1TB of slot data erasure coded into 256 data and parity shards, an adversary could strategically remove 129 bytes, and the data can no longer be fully recovered with the erasure coded data that is present on the host. -### Recommended Solution - -we should perform our checks on entire shards to protect against adversarial erasure. +The RECOMMENDED solution should perform checks on entire shards to protect against adversarial erasure. In the Merkle storage proofs, we need to hash the entire shard, and then check that hash with a Merkle proof. Effectively the block size for Merkle proofs should equal the shard size of the erasure coding interleaving. Hashing large amounts of data will be expensive to perform in a SNARK, which is used to compress proofs in size in Codex. +### Data Encryption + +If data is not encryted before entering the encoding process, nodes, including storage providers, will be able to access the data. This may lead to privacy concerns and the misuse of data. ## Copyright