logos-storage/logos-storage-docs-obsidian

mirror of https://github.com/logos-storage/logos-storage-docs-obsidian.git synced 2026-01-09 16:53:06 +00:00

Chrysostomos Nanakos 33588d8d49

Erasure coding spec (https://github.com/codex-storage/codex-spec/issues/19)

2025-08-14 16:15:20 +03:00

17 KiB

Raw Blame History

Erasure Coding Module Specification

1. Purpose and Scope

Purpose

The erasure coding module provides data redundancy and recovery capabilities. It encodes original data blocks into erasure-protected datasets using systematic Reed-Solomon erasure coding, enabling data recovery even when some blocks are unavailable or corrupted.

The Codex implementation uses the Leopard library, but alternative Reed-Solomon implementations may be used as well.

Scope

Encoding manifests into erasure-protected manifests using K data blocks + M parity blocks
Decoding erasure-protected manifests back to original data
Repairing incomplete or damaged datasets
Merkle tree generation and verification for data integrity

Boundaries and Limitations

Requires at least K blocks to reconstruct original data
Dataset must be padded to have blocks aligned for interleaving
Maximum recovery capability limited to M missing blocks per column
All blocks must be of uniform size within a manifest

2. Interfaces

Interface	Description	Input	Output
`encode()`	Encodes a manifest into erasure-protected format	`manifest: Manifest` `blocks: Natural` (K) `parity: Natural` (M) `strategy: StrategyType` (default: SteppedStrategy)	`Future[?!Manifest]` - Protected manifest with erasure coding metadata
`decode()`	Decodes protected manifest to original	`encoded: Manifest` (must be protected)	`Future[?!Manifest]` - Original manifest reconstructed from available blocks
`repair()`	Repairs a damaged protected manifest by reconstructing full dataset	`encoded: Manifest` (protected)	`Future[?!void]` - Success/failure status
`new()`	Creates new Erasure instance	`store: BlockStore` `encoderProvider: EncoderProvider` `decoderProvider: DecoderProvider` `taskPool: Taskpool`	`Erasure` instance
`start()`	Starts the erasure service	None	`Future[void]`
`stop()`	Stops the erasure service	None	`Future[void]`

Internal Helper Interfaces

Interface	Description	Input	Output
`asyncEncode()`	Performs async encoding using thread pool	`blockSize: int` `blocksLen: int` `parityLen: int` `blocks: ref seq[seq[byte]]` `parity: ptr UncheckedArray`	`Future[?!void]`
`asyncDecode()`	Performs async decoding using thread pool	`blockSize: int` `blocksLen: int` `parityLen: int` `blocks, parity: ref seq[seq[byte]]` `recovered: ptr UncheckedArray`	`Future[?!void]`
`getPendingBlocks()`	Creates async iterator for block retrieval	`manifest: Manifest` `indices: seq[int]`	`AsyncIter[(?!Block, int)]`

3. Functional Requirements

3.1 Systematic Reed-Solomon Erasure Coding

Generate M parity blocks from K data blocks using Reed-Solomon erasure coding
Use interleaving to encode blocks at the same position across shards
Support recovery with any K blocks from K+M total blocks
Maintain systematic property where original data is readable without decoding
Ensure deterministic encoding

3.2 Block Organization

Erasure coding uses interleaving (SteppedStrategy) to encode data blocks. This works by taking blocks from the same position across multiple data shards and encoding them together:

Interleaving Process
--------------------

Data shards (K=4 in this example):

    -------------    -------------    -------------    -------------
    |x| | | | | |    |x| | | | | |    |x| | | | | |    |x| | | | | |
    -------------    -------------    -------------    -------------
     |                /                /                |
      \___________   |   _____________/                 |
                  \  |  /  ____________________________/
                   | | |  /
                   v v v v
                  
                  ---------         ---------
            data  |x|x|x|x|   -->   |p|p|p|p|  parity
                  ---------         ---------
                  
                                     | | | |
       _____________________________/ /  |  \_________
      /                 _____________/   |             \
     |                 /                /               |
     v                v                v                v
    -------------    -------------    -------------    -------------
    |p| | | | | |    |p| | | | | |    |p| | | | | |    |p| | | | | |
    -------------    -------------    -------------    -------------
    
Parity shards (M parity shards generated)

Key concepts:

K: Number of original data shards
M: Number of parity shards
N: Total shards (K + M)
Steps: Number of encoding iterations, one for each data block position

The dataset is organized as:

Rows: K + M total
Columns: B blocks per row, where B = Steps
Total blocks: (K + M) x B blocks in the encoded dataset

3.3 Data Recovery

Reconstruct missing data blocks from any K available blocks
Support partial column recovery - up to M blocks missing per column
Verify recovered data against original tree root (originalTreeCid)
Complete partial block downloads when recovery succeeds
Store recovered blocks back to BlockStore for future access

3.4 Manifest Management

Transform unprotected manifests to protected manifests with erasure coding metadata
Preserve original metadata (filename, mimetype, dimensions)
Generate and store merkle tree proofs for all blocks
Support verifiable manifests with slot roots for proof generation
Track both original and encoded dataset sizes

4. Non-Functional Requirements

Performance

Latency: Yield control periodically (10ms sleep) to prevent blocking
Scalability: Linear complexity with dataset size O(n)

Reliability

Error Handling:
- Detailed error types (ErasureError, InsufficientBlocksError)
- Proper error propagation through Result types
Data Integrity:
- Merkle tree verification for all operations
- CID-based content addressing
- Tree root comparison for validation

Scalability

Handle datasets of arbitrary size - limited only by storage
Support configurable erasure coding parameters (K, M)
Thread pool size configurable based on system resources
Streaming block retrieval to avoid memory exhaustion

5. Internal Behavior

5.1 Encoding Workflow

flowchart TD
    A[Start Encoding] --> B{Validate Parameters}
    B -->|Invalid| C[Return InsufficientBlocksError]
    B -->|Valid| D[Calculate Dimensions]
    D --> E[Initialize EncodingParams]
    E --> F[Call encodeData]
    F --> G[For each step 0..steps-1]
    G --> H[prepareEncodingData]
    H --> I[Retrieve K blocks via strategy]
    I --> J[Pad with empty blocks if needed]
    J --> K[asyncEncode]
    K --> L[Spawn leopardEncodeTask]
    L --> M[Wait for thread completion]
    M --> N[Store M parity blocks]
    N --> O{More steps?}
    O -->|Yes| G
    O -->|No| P[Build Merkle Tree]
    P --> Q[Store tree proofs]
    Q --> R[Create Protected Manifest]
    R --> S[Return Success]

5.2 Decoding Workflow

flowchart TD
    A[Start Decoding] --> B[Parse Protected Manifest]
    B --> C[Call decodeInternal]
    C --> D[For each step 0..steps-1]
    D --> E[prepareDecodingData]
    E --> F[Retrieve available blocks]
    F --> G{dataPieces >= K?}
    G -->|Yes| H[Skip decoding]
    G -->|No| I[asyncDecode]
    I --> J[Spawn leopardDecodeTask]
    J --> K[Wait for completion]
    K --> L[Store recovered blocks]
    H --> M{More steps?}
    L --> M
    M -->|Yes| D
    M -->|No| N[Build Original Tree]
    N --> O{Verify Tree Root}
    O -->|Match| P[Store proofs]
    O -->|Mismatch| Q[Return Error]
    P --> R[Create Decoded Manifest]
    R --> S[Return Success]

5.3 Repair Workflow

flowchart TD
    A[Start Repair] --> B[Call decodeInternal]
    B --> C[Recover all blocks]
    C --> D[Build Original Tree]
    D --> E{Verify Original Tree Root}
    E -->|Mismatch| F[Return Error]
    E -->|Match| G[Store all proofs]
    G --> H[Create Decoded Manifest]
    H --> I[Call encode with same params]
    I --> J[Re-encode dataset]
    J --> K{Verify Repaired Tree}
    K -->|Mismatch| L[Return Error]
    K -->|Match| M[Return Success]

5.4 Data organization with interleaving

The encoded dataset uses interleaving, where data blocks at the same position acrosss shards are processed together:

Logical Organization with interleaving:

        Position 0   Position 1   ...   Position B-1
        ----------------------------------------
Shard 0  | Block 0  | Block 1   | ... | Block B-1      | Data
Shard 1  | Block B  | Block B+1 | ... | Block 2B-1     | Data
...      | ...      | ...       | ... | ...            | Data
Shard K-1| Block (K-1)×B | ...  | ... | Block K×B-1    | Data
         |----------|-----------|-----|----------------|
Shard K  | Parity 0 | Parity 1  | ... | Parity B-1     | Parity
Shard K+1| Parity B | Parity B+1| ... | Parity 2B-1    | Parity
...      | ...      | ...       | ... | ...            | Parity
Shard K+M-1| Parity (M-1)×B |...| ... | Parity M×B-1   | Parity

where:
- K = number of data shards
- M = number of parity shards
- B = number of positions (steps) per shard
- Each column represents one encoding step
- Elements at the same position form an encoding group

6. Dependencies

6.1 External Libraries

Dependency	Purpose	Version/Module
`pkg/chronos`	Async/await runtime, Future types	Threading, async operations
`pkg/libp2p`	CID, multicodec, multihash operations	Content addressing
`pkg/taskpools`	Thread pool management	Parallel processing
`pkg/stew`	Utility functions (byteutils)	Helper operations
`pkg/leopard`	Reed-Solomon erasure coding	Erasure coding module

6.2 Internal Components

Component	Purpose	Interface
`BlockStore`	Block storage and retrieval	`getBlock(cid/treeCid,index/address)`, `putBlock(blk,ttl)`, `completeBlock(address,blk)`, `putCidAndProof()`, `getCidAndProof()`, `hasBlock()`, `delBlock()`
`Manifest`	Dataset metadata representation	Verifiable/Protected/unprotected manifests with erasure coding metadata
`IndexingStrategy`	Block organization strategies	`getIndices(iteration)`, `init(strategy,firstIndex,lastIndex,iterations)`
`Backend`	Leopard erasure coding	`encode()`, `decode()` interfaces provided via EncoderProvider/DecoderProvider
`CodexTree`	Merkle tree operations	Tree generation, proof creation, root CID calculation
`MerkleTree[H,K]`	Generic merkle tree	`getProof(index)`, `reconstructRoot()`, `verify()` with configurable hash function
`BlockType`	Block data structure	CID-based block representation with data payload

6.3 Helper Functions

Function	Purpose	Input	Output
`putSomeProofs()`	Store merkle proofs for specific indices	`store: BlockStore` `tree: CodexTree` `iter: Iter[int/Natural]`	`Future[?!void]`
`putAllProofs()`	Store all merkle proofs for a tree	`store: BlockStore` `tree: CodexTree`	`Future[?!void]`

7. Data Models

7.1 Core Types

Erasure Object

type Erasure* = ref object
  taskPool: Taskpool
  encoderProvider*: EncoderProvider
  decoderProvider*: DecoderProvider
  store*: BlockStore

EncodingParams

type EncodingParams = object
  ecK: Natural           # Number of data blocks (K)
  ecM: Natural           # Number of parity blocks (M)
  rounded: Natural       # Dataset rounded to multiple of K
  steps: Natural         # Number of encoding iterations (steps)
  blocksCount: Natural   # Total blocks after encoding
  strategy: StrategyType # Indexing strategy used

7.2 Task Types

EncodeTask

type EncodeTask = object
  success: Atomic[bool]                                # Operation success flag
  erasure: ptr Erasure                                 # Erasure instance
  blocks: ptr UncheckedArray[ptr UncheckedArray[byte]] # Input data blocks
  parity: ptr UncheckedArray[ptr UncheckedArray[byte]] # Output parity blocks
  blockSize: int                                       # Size of each block
  blocksLen: int                                       # Number of data blocks (K)
  parityLen: int                                       # Number of parity blocks (M)
  signal: ThreadSignalPtr                              # Completion signal

DecodeTask

type DecodeTask = object
  success: Atomic[bool]                                   # Operation success flag
  erasure: ptr Erasure                                    # Erasure instance
  blocks: ptr UncheckedArray[ptr UncheckedArray[byte]]    # Available data blocks
  parity: ptr UncheckedArray[ptr UncheckedArray[byte]]    # Available parity blocks
  recovered: ptr UncheckedArray[ptr UncheckedArray[byte]] # Recovered blocks output
  blockSize: int                                          # Size of each block
  blocksLen: int                                          # Number of data blocks (K)
  parityLen: int                                          # Number of parity blocks (M)
  recoveredLen: int                                       # Number of recovered blocks
  signal: ThreadSignalPtr                                 # Completion signal

7.3 Error Types

type
  ErasureError* = object of CodexError
    # Base error type for erasure coding operations
  
  InsufficientBlocksError* = object of ErasureError
    # Raised when insufficient blocks for encoding
    minSize*: NBytes  # Minimum dataset size required

7.4 Manifest

Protected and Verifiable Manifest Fields

case protected: bool
of true:
  ecK: int                        # Number of data blocks
  ecM: int                        # Number of parity blocks
  originalTreeCid: Cid            # CID of original dataset tree
  originalDatasetSize: NBytes     # Size before erasure coding
  protectedStrategy: StrategyType # Strategy used for encoding
  
  case verifiable: bool
  of true:
    verifyRoot: Cid                  # Root of verification tree
    slotRoots: seq[Cid]              # Individual slot roots
    cellSize: NBytes                 # Size of verification cells
    verifiableStrategy: StrategyType # Strategy for verification

7.5 Indexing Strategy

type
  StrategyType* = enum
    LinearStrategy    # Sequential block grouping
    SteppedStrategy   # Interleaved block grouping
  
  IndexingStrategy* = object
    strategyType*: StrategyType
    firstIndex*: int      # Start of index range
    lastIndex*: int       # End of index range
    iterations*: int      # Number of iteration steps
    step*: int            # Step size between indices
    groupCount*: int      # Number of groups
    padBlockCount*: int   # Padding blocks per group

7.6 Supporting Types

BlockAddress

type BlockAddress* = object
  case leaf*: bool
  of true:
    treeCid*: Cid    # CID of the merkle tree
    index*: Natural  # Index of block in the tree
  of false:
    cid*: Cid        # Direct CID reference

Empty Block Handling

proc emptyCid*(version: CidVersion, hcodec: MultiCodec, dcodec: MultiCodec): ?!Cid
  # Returns CID representing empty content for padding

Merkle Tree Types

type
  CodexTree* = MerkleTree[Poseidon2Hash, PoseidonKeysEnum]
  CodexProof* = MerkleProof[Poseidon2Hash, PoseidonKeysEnum]
  
  MerkleTree*[H, K] = ref object of RootObj
    layers*: seq[seq[H]]        # Tree layers from leaves to root
    compress*: CompressFn[H, K] # Hash compression function
    zero*: H                    # Zero/empty value
    
  MerkleProof*[H, K] = ref object of RootObj
    index*: int                 # Leaf index, starting from 0
    path*: seq[H]               # Proof path from bottom to top
    nleaves*: int               # Total number of leaves
    compress*: CompressFn[H, K] # Compress function
    zero*: H                    # Zero value

7.7 System Constants

DefaultBlockSize: 65536 bytes
DefaultCellSize: 2048 bytes
DefaultMaxSlotDepth: 32
DefaultMaxDatasetDepth: 8
DefaultBlockDepth: 5
DefaultCellElms: 67
DefaultSamplesNum: 5

7.8 Supported Hash Codecs

Sha256HashCodec: SHA2-256 hash function
Sha512HashCodec: SHA2-512 hash function
Pos2Bn128SpngCodec: Poseidon2 sponge construction
Pos2Bn128MrklCodec: Poseidon2 merkle tree construction

7.9 Codex-Specific Codecs

ManifestCodec: For manifest encoding
DatasetRootCodec: For dataset root CIDs
BlockCodec: For block encoding
SlotRootCodec: For slot root CIDs
SlotProvingRootCodec: For proving root CIDs
CodexSlotCellCodec: For slot cell encoding

17 KiB Raw Blame History Unescape Escape