Merkle trees and hashes: add a final proposal, plus other minor improvements
This commit is contained in:
parent
05aa48c95d
commit
55f53e0037
115
design/Merkle.md
115
design/Merkle.md
|
@ -4,6 +4,9 @@ Merkle tree API proposal (WIP draft)
|
|||
|
||||
Let's collect the possible problems and solutions with constructing Merkle trees.
|
||||
|
||||
See [section "Final proposal"](#Final-proposal) at the bottom for the concrete
|
||||
version we decided to implement.
|
||||
|
||||
### Vocabulary
|
||||
|
||||
A Merkle tree, built on a hash function `H`, produces a Merkle root of type `T`.
|
||||
|
@ -73,9 +76,11 @@ Traditional (linear) hash functions usually solve the analogous problems by clev
|
|||
### Domain separation
|
||||
|
||||
It's a good practice in general to ensure that different constructions using the same
|
||||
underlying hash functions will never produce the same output. This is called "domain separation",
|
||||
and it's a bit similar to _multihash_; however instead of adding extra bits of information to a hash
|
||||
(and thus increasing its size), we just compress the extra information into the hash itself.
|
||||
underlying hash function will never produce the same output. This is called "domain separation",
|
||||
and it can very loosely remind one to _multihash_; however instead of adding extra bits of information
|
||||
to a hash (and thus increasing its size), we just compress the extra information into the hash itself.
|
||||
So the information itself is lost, however collisions between different domains are prevented.
|
||||
|
||||
A simple example would be using `H(dom|H(...))` instead of `H(...)`. The below solutions
|
||||
can be interpreted as an application of this idea, where we want to separate the different
|
||||
lengths `n`.
|
||||
|
@ -135,10 +140,11 @@ a Merkle proof, you still need to know whether the element you prove is the last
|
|||
odd element, or not. However instead of submitting the length, you can encode this
|
||||
into a single bit (not sure if that's much better though).
|
||||
|
||||
**Solution 5 (??).** Use a different tree shape, where the left subtree is always a complete
|
||||
**Solution 5.** Use a different tree shape, where the left subtree is always a complete
|
||||
(full) binary tree with `2^floor(log2(n-1))` leaves, and the right subtree is
|
||||
constructed recursively. Then the shape of tree encodes the number of inputs `n`.
|
||||
This however complicates the Merkle proofs (they won't have uniform size).
|
||||
Blake3 hash uses such a strategy internally. This however complicates the Merkle proofs
|
||||
(they won't have uniform size anymore).
|
||||
TODO: think more about this!
|
||||
|
||||
### Keyed compression functions
|
||||
|
@ -146,17 +152,17 @@ TODO: think more about this!
|
|||
How can we have many different compression functions? Consider three case studies:
|
||||
|
||||
**Poseidon.** The Poseidon family of hashes is built on a (fixed) permutation
|
||||
`pi : F^t -> F^t`, where `F` is a (large) finite field. For simplicity consider the case `t=3`.
|
||||
`perm : F^t -> F^t`, where `F` is a (large) finite field. For simplicity consider the case `t=3`.
|
||||
The standard compression function is then defined as:
|
||||
|
||||
compress(x,y) := let (u,_,_) = pi(x,y,0) in u
|
||||
compress(x,y) := let (u,_,_) = perm(x,y,0) in u
|
||||
|
||||
That, we take the triple `(x,y,0)`, apply the permutation to get another triple `(u,v,w)`, and
|
||||
extract the field element `u` (we could use `v` or `w` too, it shouldn't matter).
|
||||
Now we can see that it is in fact very easy to generalize this to a _keyed_ (or _indexed_)
|
||||
compression function:
|
||||
|
||||
compress_k(x,y) := let (u,_,_) = pi(x,y,k) in u
|
||||
compress_k(x,y) := let (u,_,_) = perm(x,y,k) in u
|
||||
|
||||
where `k` is the key. Note that there is no overhead in doing this. And since `F` is pretty
|
||||
big (in our case, about 253 bits), there is plenty of information we can encode in the key `k`.
|
||||
|
@ -186,19 +192,10 @@ works perfectly well with no overhead compared to `SHA256(x|y)`.
|
|||
**MiMC.** MiMC is another arithmetic construction, however in this
|
||||
case the starting point is a _block cipher_, that is, we start with
|
||||
a keyed permutation! Unfortunately MiMC-p/p is a (keyed) permutation
|
||||
of `F`, which is not very useful for usl; however in Feistel mode we
|
||||
of `F`, which is not very useful for us; however in Feistel mode we
|
||||
get a keyed permutation of `F^2`, and we can just take the first
|
||||
component of the output of that as the compressed output.
|
||||
|
||||
### Tree padding proposal
|
||||
|
||||
It seems to me, that whatever way we try to solve problem 2) without pre-hashing, we need
|
||||
to include the length (or at least one bit information about the length) into the Merkle
|
||||
proofs. So maybe we should just live with that.
|
||||
|
||||
Then from the above choices, right now maybe solution **4c**, or some variation of it
|
||||
looks the nicest to me.
|
||||
|
||||
### Making `deserialize` injective
|
||||
|
||||
Consider the following simple algorithm to deserialize a sequence of bytes into chunks of
|
||||
|
@ -211,16 +208,82 @@ Consider the following simple algorithm to deserialize a sequence of bytes into
|
|||
The problem with this, is that for example `0x123456`, `0x12345600` and `0x1234560000`
|
||||
all results in the same output.
|
||||
|
||||
Some possible solutions:
|
||||
#### About padding in general
|
||||
|
||||
Let's take a step back, and meditate a little bit of what's the meaning of padding.
|
||||
|
||||
What is padding? It's a mapping from a set of sequences into a subset. In our case
|
||||
we have an arbitrary sequence of bytes, and we want to map into the subset of sequences
|
||||
whose length is divisible by 31.
|
||||
|
||||
Why do we want padding? Because we want to apply an algorithm (in this case a hash function)
|
||||
to arbitrary sequences, but the algorithm can only handle a subset of all sequences.
|
||||
In our case we first map the arbitrary sequence of bytes into a sequence of bytes
|
||||
whose length is divisible by 31, and then map that into a sequence of finite field
|
||||
elements.
|
||||
|
||||
What properties do we want from padding? Well, that depends on what what properties we
|
||||
want from the resulting algorithm. In this case we do hashing, so we definitely want
|
||||
to avoid collisions. This means that our padding should never map two different input
|
||||
sequences into the same padded sequence (because that would create a trivial collision).
|
||||
In mathematics, we call such functions "injective".
|
||||
|
||||
How do you prove that a function is injective? You provide an inverse function,
|
||||
which takes a padded sequences and outputs the original one.
|
||||
|
||||
In summary we need to come up with an injective padding strategy for arbitrary byte
|
||||
sequences, which always results in a byte sequence whose length is divisible by 31.
|
||||
|
||||
#### Some possible solutions:
|
||||
|
||||
- prepend the length (number of input bytes) to the input, say as a 64-bit little-endian integer (8 bytes),
|
||||
before padding as above
|
||||
- or append the length instead of prepending, then pad
|
||||
- or append the length instead of prepending, then pad (note: appending is streaming-friendly; prepending is not)
|
||||
- or first pad with zero bytes, but leave 8 bytes for the length (so that when we finally append
|
||||
the length, the result will be divisible 31).
|
||||
- use the following padding strategy: _always_ add a single 0x01 byte, then enough 0x00 bytes so that the length
|
||||
is divisible by 31. Why does this work? Well, consider an already padded sequence. Count
|
||||
the number of zero bytes from the end: you get a number `0 <= m < 31`. This number
|
||||
determines the residue class of the original length `n` modulo 31; then this class,
|
||||
together with the padded length fully determines the original length.
|
||||
the length, the result will be divisible 31). This is _almost_ exactly what SHA2 does.
|
||||
- use the following padding strategy: _always_ add a single `0x01` byte, then enough `0x00` bytes so that the length
|
||||
is divisible by 31. This is usually called the `10*` padding strategy, abusing regexp notation.
|
||||
Why does this work? Well, consider an already padded sequence. It's very easy to recover the
|
||||
original byte sequence by 1) first removing all trailing zeros; and 2) after that, remove the single
|
||||
trailing `0x01` byte. This proves that the padding is an injective function.
|
||||
- one can easily come up with many similar padding strategies. For example SHA3/Keccak uses `10*1`
|
||||
(but on bits, not bytes), and SHA2 uses a combination of `10*` and appending the bit length of the
|
||||
original input.
|
||||
|
||||
Remark: Any safe padding strategy will result in at least one extra field element
|
||||
if the input length was already divisible by 31. This is both unavoidable in general,
|
||||
and not an issue in practice (as the size of the input grows, the overhead becomes
|
||||
negligible). The same thing happens when you SHA256 hash an integer multiple of 64 bytes.
|
||||
|
||||
|
||||
### Final proposal
|
||||
|
||||
We decided to implement the following version.
|
||||
|
||||
- pad byte sequences (to have length divisible by 31) with the `10*` padding strategy; that is,
|
||||
always append a single `0x01` byte, and after that add a number of zero bytes (between 0 and 30),
|
||||
so that the resulting sequence have length divisible by 31
|
||||
- when converting an (already padded) byte sequence to a sequence of field elements,
|
||||
split it up into 31 byte chunks, interpret those as little-endian 248-bit unsigned
|
||||
integers, and finally interpret those integers as field elements in the BN254 prime
|
||||
field (using the standard mapping `Z -> Z/p`).
|
||||
- when using the Poseidon2 sponge construction to compute a linear hash out of
|
||||
a sequence of field elements, we use the BN254 field, `t=3` and `(0,0,domsep)`
|
||||
as the initial state, where `domsep := 2^64 + 256*t + rate` is the domain separation
|
||||
IV. Note that because `t=3`, we can only have `rate=1` or `rate=2`. We need
|
||||
a padding strategy here too (since the input length must be divisible by `rate`):
|
||||
we use `10*` again, but here on field elements.
|
||||
Remark: For `rate=1` this makes things always a tiny bit slower, but we plan to use
|
||||
`rate=2` anyway (as it's twice as fast), and it's better not to have exceptional cases.
|
||||
- when using Poseidon2 to build a binary Merkle tree, we use "solution #3" from above.
|
||||
That is, we use a keyed compression function, with the key being one of `{0,1,2,3}`
|
||||
(two bits). The lowest bit is 1 in the bottom-most (that is, the widest) layer,
|
||||
and 0 otherwise; the other bit is 1 if it's both the last element of the layer,
|
||||
_and_ it is an odd layer; 0 otherwise. In odd layers, we also add an extra 0 field
|
||||
element to make it even. This is also valid for the singleton input: in that case
|
||||
it's both odd and the bottommost, so the root of a singleton input `[x]` will
|
||||
be `H_{key=3}(x|0)`
|
||||
- we will use the same strategy when constructing binary Merkle trees with the
|
||||
SHA256 hash; in that case, the compression function will be `SHA256(x|y|key)`.
|
||||
Note: since SHA256 already uses padding internally, adding the key does not
|
||||
result in any overhoad.
|
||||
|
|
Loading…
Reference in New Issue