erasure coding for storage proofs writeup
This commit is contained in:
parent
37953497fc
commit
6271a18ae7
|
@ -0,0 +1,244 @@
|
|||
Storage proofs & erasure coding
|
||||
===============================
|
||||
|
||||
Erasure coding is used for multiple purposes in Codex:
|
||||
|
||||
- To restore data when a host drops from the network; other hosts can restore
|
||||
the data that the missing host was storing.
|
||||
- To speed up downloads
|
||||
- To increase the probability of detecting missing data on a host
|
||||
|
||||
For the first two items we'll use a different erasure coding scheme than we do
|
||||
for the last. In this document we focus on the last item; an erasure coding
|
||||
scheme that makes it easier to detect missing or corrupted data on a host
|
||||
through storage proofs.
|
||||
|
||||
Storage proofs
|
||||
--------------
|
||||
|
||||
Our proofs of storage allow a host to prove that they are still in possession of
|
||||
the data that they promised to hold. A proof is generated by sampling a number
|
||||
of blocks and providing a Merkle proof for those blocks. The Merkle proof is
|
||||
generated inside a SNARK to compress it to a small size to allow for
|
||||
cost-effective verification on a blockchain.
|
||||
|
||||
These storage proofs depend on erasure coding to ensure that a large part of the
|
||||
data needs to be missing before the original dataset can no longer be restored.
|
||||
This makes it easier to detect when a dataset is no longer recoverable.
|
||||
|
||||
Consider this example without erasure coding:
|
||||
|
||||
-------------------------------------
|
||||
|///|///|///|///|///|///|///| |///|
|
||||
-------------------------------------
|
||||
^
|
||||
|
|
||||
missing
|
||||
|
||||
|
||||
When we query a block from this dataset, we have a low chance of detecting the
|
||||
missing block. But the dataset is no longer recoverable, because a single block
|
||||
is missing.
|
||||
|
||||
When we add erasure coding:
|
||||
|
||||
--------------------------------- ---------------------------------
|
||||
| |///| |///| | | |///| |///|///| | |///|///| | |
|
||||
--------------------------------- ---------------------------------
|
||||
original data parity data
|
||||
|
||||
In this example, more than 50% of the erasure coded data needs to be missing
|
||||
before the dataset can no longer be recovered. When we now query a block from
|
||||
this dataset, we have a more than 50% chance of detecting a missing block. And
|
||||
when we query multiple blocks, the odds of detecting a missing block increase
|
||||
dramatically.
|
||||
|
||||
Erasure coding
|
||||
--------------
|
||||
|
||||
Reed-Solomon erasure coding works by representing data as a polynomial, and then
|
||||
sampling parity data from that polynomial.
|
||||
|
||||
__
|
||||
__ / \ __ __
|
||||
/ \ / \ / \ / \
|
||||
/ \ / \__/ \ __ /
|
||||
-- / \__/ \ __/
|
||||
\__/ \ /
|
||||
^ \ / |
|
||||
| --- |
|
||||
^ ^ | ^ | |
|
||||
| | ^ | | ^ | | | | |
|
||||
| | ^ | | | | | | | | | |
|
||||
| | | ^ | | | | | | | | | | | |
|
||||
| | | | | | | | | | | | | | | |
|
||||
| | | | | | | | v v v v v v v v
|
||||
|
||||
------------------------- -------------------------
|
||||
|//|//|//|//|//|//|//|//| |//|//|//|//|//|//|//|//|
|
||||
------------------------- -------------------------
|
||||
|
||||
original data parity
|
||||
|
||||
|
||||
This only works for small amounts of data. When the polynomial is for instance
|
||||
defined over byte sized elements from a Galios field of 2^8, you can only encode
|
||||
2^8 = 256 bytes (data and parity combined).
|
||||
|
||||
Interleaving
|
||||
------------
|
||||
|
||||
To encode larger pieces of data with erasure coding, interleaving is used. This
|
||||
works by taking larger blocks of data, and encoding smaller elements from these
|
||||
blocks.
|
||||
|
||||
data blocks
|
||||
|
||||
------------- ------------- ------------- -------------
|
||||
|x| | | | | | |x| | | | | | |x| | | | | | |x| | | | | |
|
||||
------------- ------------- ------------- -------------
|
||||
| / / |
|
||||
\___________ | _____________/ |
|
||||
\ | / ____________________________/
|
||||
| | | /
|
||||
v v v v
|
||||
|
||||
--------- ---------
|
||||
data |x|x|x|x| --> |p|p|p|p| parity
|
||||
--------- ---------
|
||||
|
||||
| | | |
|
||||
_____________________________/ / | \_________
|
||||
/ _____________/ | \
|
||||
| / / |
|
||||
v v v v
|
||||
------------- ------------- ------------- -------------
|
||||
|p| | | | | | |p| | | | | | |p| | | | | | |p| | | | | |
|
||||
------------- ------------- ------------- -------------
|
||||
|
||||
parity blocks
|
||||
|
||||
This is repeated for each element inside the blocks.
|
||||
|
||||
Adversarial erasure
|
||||
-------------------
|
||||
|
||||
The disadvantage of interleaving is that it weakens the protection against
|
||||
adversarial erasure that Reed-Solomon provides.
|
||||
|
||||
An adversary can now strategically remove only the first element from more than
|
||||
half of the blocks, and the dataset will be damaged beyond repair. For example,
|
||||
with a dataset of 1TB erasure coded into 256 data and parity blocks, an
|
||||
adversary could strategically remove 129 bytes, and the data can no longer be
|
||||
fully recovered.
|
||||
|
||||
Implications for storage proofs
|
||||
-------------------------------
|
||||
|
||||
This means that when we check for missing data, we should perform our checks on
|
||||
entire blocks to protect against adversarial erasure. In the case of our Merkle
|
||||
storage proofs, this means that we need to hash the entire block, and then check
|
||||
that hash with a Merkle proof. This is rather unfortunate, because hashing large
|
||||
amounts of data is rather expensive to perform in a SNARK, which is what we use
|
||||
to compress proofs in size.
|
||||
|
||||
A large amount of input data in a SNARK leads to a larger circuit, and to more
|
||||
iterations of the hashing algorithm, which also leads to a larger circuit. A
|
||||
larger circuit means longer computation and higher memory consumption.
|
||||
|
||||
Ideally, we'd like to have small blocks to keep Merkle proofs inside SNARKs
|
||||
relatively performant, but we are limited by the maximum amount of blocks that a
|
||||
particular Reed-Solomon algorithm supports. For instance, the [leopard][1]
|
||||
library can create at most 65536 blocks, because it uses a Galois field of 2^16.
|
||||
Should we use this to encode a 1TB file, we'd end up with blocks of 16MB, far
|
||||
too large to be practical in a SNARK.
|
||||
|
||||
Design space
|
||||
------------
|
||||
|
||||
This limits the choices that we can make. The limiting factors seem to be:
|
||||
|
||||
- Maximum number of blocks, determined by the field size of the erasure coding
|
||||
algorithm
|
||||
- Number of blocks per proof, which determines how likely we are to detect
|
||||
missing blocks
|
||||
- Capacity of the SNARK algorithm; how many bytes can we hash in a reasonable
|
||||
time inside the SNARK
|
||||
|
||||
From these limiting factors we can derive:
|
||||
|
||||
- Block size
|
||||
- Maximum slot size; the maximum amount of data that can be verified with a
|
||||
proof
|
||||
- Erasure coding memory requirements
|
||||
|
||||
For example, when we use the leopard library, with a Galois field of 2^16, and
|
||||
require 80 blocks to be sampled per proof, and we can implement a SNARK that can
|
||||
hash 80*64K bytes, then we have:
|
||||
|
||||
- Block size: 64KB
|
||||
- Maximum slot size: 4GB (2^16 * 64KB)
|
||||
- Erasure coding memory: > 128KB (2^16 * 16 bits)
|
||||
|
||||
Which has the disadvantage of having a rather low maximum slot size of 4GB. When
|
||||
we want to improve on this to support e.g. 1TB slot sizes, we'll need to either
|
||||
increase the capacity of the SNARK, increase the field size of the erasure
|
||||
coding algorithm, or decrease the durability guarantees.
|
||||
|
||||
> The [accompanying spreadsheet][4] allows you to explore the design space
|
||||
> yourself
|
||||
|
||||
Increasing SNARK capacity
|
||||
-------------------------
|
||||
|
||||
Increasing the computational capacity of SNARKs is an active field of study, but
|
||||
it is unlikely that we'll see an implementation of SNARKS that is 100-1000x
|
||||
faster before we launch Codex. Better hashing algorithms are also being designed
|
||||
for use in SNARKS, but it is equally unlikely that we'll see such a speedup here
|
||||
either.
|
||||
|
||||
Decreasing durability guarantees
|
||||
--------------------------------
|
||||
|
||||
We could reduce the durability guarantees by requiring e.g. 20 instead of 80
|
||||
blocks per proof. This would still give us a probability of detecting missing
|
||||
data of 1 - 0.5^20, which is 0.999999046, or "six nines". Arguably this is still
|
||||
good enough. Choosing 20 blocks per proof allows for slots up to 16GB:
|
||||
|
||||
- Block size: 256KB
|
||||
- Maximum slot size: 16GB (2^16 * 256KB)
|
||||
- Erasure coding memory: > 128KB (2^16 * 16 bits)
|
||||
|
||||
Erasure coding field size
|
||||
-------------------------
|
||||
|
||||
If we could perform erasure coding on a field of around 2^20 to 2^30, then this
|
||||
would allow us to get to larger slots. For instance, with a field of at least
|
||||
size 2^24, we could support slot sizes up to 1TB:
|
||||
|
||||
- Block size: 64KB
|
||||
- Maximum slot size: 1TB (2^24 * 64KB)
|
||||
- Erasure coding memory: > 48MB (2^24 * 24 bits)
|
||||
|
||||
We are however unaware of any implementations of reed solomon that use a field
|
||||
size larger than 2^16 and still be efficient O(N log(N)). [FastECC][2] uses a
|
||||
prime field of 20 bits, but it lacks a decoder and a byte encoding scheme. The
|
||||
paper ["An Efficient (n,k) Information Dispersal Algorithm Based on Fermat
|
||||
Number Transforms"][3] describes a scheme that uses Proth fields of 2^30, but
|
||||
lacks an implementation.
|
||||
|
||||
If we were to adopt an erasure coding scheme with a large field, it is likely
|
||||
that we'll have implement one.
|
||||
|
||||
Conclusion
|
||||
----------
|
||||
|
||||
It is likely that with the current state of the art in SNARK design and erasure
|
||||
coding implementations we can only support slot sizes up to 4GB. The most
|
||||
promising way to increase the supported slot sizes seems to be to implement an
|
||||
erasure coding algorithm using a field size of around 2^24.
|
||||
|
||||
[1]: https://github.com/catid/leopard
|
||||
[2]: https://github.com/Bulat-Ziganshin/FastECC
|
||||
[3]: https://ieeexplore.ieee.org/abstract/document/6545355
|
||||
[4]: ./proof-erasure-coding.ods
|
Loading…
Reference in New Issue