wip readme

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
2024-01-19 07:53:34 +01:00 · 2024-01-19 07:53:34 +01:00 · 6cd84b180c
parent 41a5fa1348
commit 6cd84b180c
1 changed files with 32 additions and 20 deletions
--- a/codex/erasure/erasure.nim
+++ b/codex/erasure/erasure.nim
@ -38,26 +38,38 @@ logScope:
 type
  ## Encode a manifest into one that is erasure protected.
  ##
-  ## The new manifest has K `blocks` that are encoded into
+  ## A layer of erasure protection is added on top of an 
-  ## additional M `parity` blocks. The resulting dataset
+  ## existing (eventally already protected) manifest, using
-  ## is padded with empty blocks if it doesn't have a square
+  ## a Reed Solomon code. The RS code is applied with parameters
-  ## shape.
+  ## K (original blocks) and M (parity blocks), with a block
-  ##
+  ## level interleaving of I.
-  ## NOTE: The padding blocks could be excluded
+  ##  
-  ## from transmission, but they aren't for now.
+  ## For every i = 0 ..< I and j = 0 ..< K, we apply the erasure
-  ##
+  ## code over the K blocks with indices i + j * I
-  ## The resulting dataset is logically divided into rows
+  ## Resulting M parity blocks will be assigned new indices
-  ## where a row is made up of B blocks. There are then,
+  ## in the resulting manifest:
-  ## K + M = N rows in total, each of length B blocks. Rows
+  ##   newIndex(i,j) = i + j * I, j = K ..< K+M
-  ## are assumed to be of the same number of (B) blocks.
+  ## 
-  ##
+  ## The above procedure encodes exactly I * K blocks into 
-  ## The encoding is systematic and the rows can be
+  ## I * (K+M) blocks. This can also be viewed as the original data
-  ## read sequentially by any node without decoding.
+  ## being in an I * K matrix with I columns and K rows. Each column
-  ##
+  ## is then encoded individually, generating M new rows.
-  ## Decoding is possible with any K rows or partial K
+  ## 
-  ## columns (with up to M blocks missing per column),
+  ## If the original data is of a size different from I * K, it is 
-  ## or any combination there of.
+  ## padded to multiples of I * K, and encoded in multiple steps.
-  ##
+  ## 
  ## With b original blocks, we get
  ##  - steps: b div (I * K) 
  ##  - original block i is encoded in
  ##    - step: s = i div (I * K)
  ##    - column: c = i mod I
  ##    - code position: p = i mod K 
  ##  - original block i is mapped to 
  ##    - newIndex(s, c, v) = s * I * (K+N) + 
  ## 
  ## 
  ## If the original data has more than
  ## K * I blocks, the precedure is reapeated multiple times.
  EncoderProvider* = proc(size, blocks, parity: int): EncoderBackend
    {.raises: [Defect], noSideEffect.}