initial Merkle tree doc draft
This commit is contained in:
parent
f0c0d5bb38
commit
05aa48c95d
|
@ -0,0 +1,226 @@
|
|||
|
||||
Merkle tree API proposal (WIP draft)
|
||||
------------------------------------
|
||||
|
||||
Let's collect the possible problems and solutions with constructing Merkle trees.
|
||||
|
||||
### Vocabulary
|
||||
|
||||
A Merkle tree, built on a hash function `H`, produces a Merkle root of type `T`.
|
||||
This is usually the same type as the output of the hash function. Some examples:
|
||||
|
||||
- SHA1: `T` is 160 bits
|
||||
- SHA256: `T` is 256 bits
|
||||
- Poseidon: `T` is one (or a few) finite field element(s)
|
||||
|
||||
The hash function `H` can also have different types `S` of inputs. For example:
|
||||
|
||||
- SHA1 / SHA256 / SHA3: `S` is an arbitrary sequence of bits
|
||||
- some less-conforming implementation of these could take a sequence of bytes instead
|
||||
- Poseidon: `S` is a sequence of finite field elements
|
||||
- Poseidon compression function: at most `t-1` field elements (in our case `t=3`, so
|
||||
that's two field elements)
|
||||
- A naive Merkle tree implementation could for example accept only a power-of-two
|
||||
sized sequence of `T`
|
||||
|
||||
Notation: Let's denote a sequence of `T` by `[T]`.
|
||||
|
||||
### Merkle tree API
|
||||
|
||||
We usually need at least two types of Merkle tree APIs:
|
||||
|
||||
- one which takes a sequence `S = [T]` of length `n` as input, and produces an
|
||||
output (Merkle root) of type `T`
|
||||
- and one which takes a sequence of bytes (or even bits, but in practice we probably
|
||||
only need bytes): `S = [byte]`
|
||||
|
||||
We can decompose the latter into the composition of a function
|
||||
`deserialize : [byte] -> [T]` and the former.
|
||||
|
||||
### Naive Merkle tree implementation
|
||||
|
||||
A straightforward implementation of a binary Merkle tree `merkleRoot : [T] -> T`
|
||||
could be for example:
|
||||
|
||||
- if the input has length 1, it's the root
|
||||
- if the input has even length `2*k`, group it into pairs, apply a
|
||||
`compress : (T,T) -> T` compression function, producing the next layer of size `k`
|
||||
- if the input has odd length `2*k+1`, pad it with an extra element `dummy` of
|
||||
type `T`, then apply the procedure for even length, producing the next layer of size `k+1`
|
||||
|
||||
The compression function could be implemented in several ways:
|
||||
|
||||
- when `S` and `T` are just sequences of bits or bytes (as in the case of classical hash
|
||||
functions like SHA256), we can just concatenate the two leaves of the node and apply the
|
||||
hash: `compress(x,y) := H(x|y)`
|
||||
- in case of hash functions based on the sponge construction (like Poseidon or Keccak/SHA3),
|
||||
we can just fill the "capacity part" of the state with a constant (say 0), the "absorbing
|
||||
part" of the state with the two inputs, apply the permutation, and extract a single `T`
|
||||
|
||||
### Attacks
|
||||
|
||||
When implemented without enough care (like the above naive algorithm), there are several
|
||||
possible attacks producing hash collisions or second preimages:
|
||||
|
||||
1. The root of particular any layer is the same as the root of the input
|
||||
2. The root of `[x_0,x_1,...,x_(2*k)]` (length is `n=2*k+1` is the same as the root of
|
||||
`[x_0,x_1,...,x_(2*k),dummy]` (length is `n=2*k+2`)
|
||||
3. when using bytes as the input, already `deserialize` can have similar collision attacks
|
||||
4. The root of a singleton sequence is itself
|
||||
|
||||
Traditional (linear) hash functions usually solve the analogous problems by clever padding.
|
||||
|
||||
### Domain separation
|
||||
|
||||
It's a good practice in general to ensure that different constructions using the same
|
||||
underlying hash functions will never produce the same output. This is called "domain separation",
|
||||
and it's a bit similar to _multihash_; however instead of adding extra bits of information to a hash
|
||||
(and thus increasing its size), we just compress the extra information into the hash itself.
|
||||
A simple example would be using `H(dom|H(...))` instead of `H(...)`. The below solutions
|
||||
can be interpreted as an application of this idea, where we want to separate the different
|
||||
lengths `n`.
|
||||
|
||||
### Possible solutions (for the tree attacks)
|
||||
|
||||
While the third problem (`deserialize` may be not injective) is similar to the second problem,
|
||||
let's deal first with the tree problems, and come back to `deserialize` (see below) later.
|
||||
|
||||
**Solution 0b.** Pre-hash each input element. This solves 2) and 4) (if we choose `dummy` to be
|
||||
something we don't expect anybody to find a preimage), but does not solve 1); also it
|
||||
doubles the computation time.
|
||||
|
||||
**Solution 1.** Just prepend the data with the length `n` of the input sequence. Note that any
|
||||
cryptographic hash function needs an output size of at least 160 bits (and usually at least
|
||||
256 bits), so we can always embed the length (surely less than `2^64`) into `T`. This solves
|
||||
both problems 1) and 2) (the height of the tree is a deterministic function of the length),
|
||||
and 4) too.
|
||||
However, a typical application of a Merkle tree is the case where the length of the input
|
||||
`n=2^d` is a power of two; in this case it looks a little bit "inelegant" to increase the size
|
||||
to `n=2^d+1`, though the overhead with above even-odd construction is only `log2(n)`.
|
||||
An advantage is that you can _prove_ the size of the input with a standard Merkle inclusion proof.
|
||||
Alternative version: append instead of prepend; then the indexing of the leaves does not change.
|
||||
|
||||
**Solution 2.** Apply an extra compression step at the very end including the length `n`,
|
||||
calculating `newRoot = compress(n,origRoot)`. This again solves all 3 problems. However, it
|
||||
makes the code a bit less regular; and you have to submit the length as part of Merkle proofs.
|
||||
|
||||
**Solution 3a.** Use two different compression function, one for the bottom layer (by bottom
|
||||
I mean the closest to the input) and another for all the other layers. For example you can
|
||||
use `compress(x,y) := H(isBottomLayer|x|y)`. This solves problem 1).
|
||||
|
||||
**Solution 3b.** Use two different compression function, one for the even nodes, and another
|
||||
for the odd nodes (that is, those with a single children instead of two). Similarly to the
|
||||
previous case, you can use for example `compress(x,y) := H(isOddNode|x|y)` (note that for
|
||||
the odd nodes, we will have `y=dummy`). This solves problem 2). Remark: The extra bits of
|
||||
information (odd/even) added to the last nodes (one in each layer) are exactly the binary
|
||||
expansion of the length `n`. A disadvantage is that for verifying a Merkle proof, we need to
|
||||
know for each node whether it's the last or not, so we need to include the length `n` into
|
||||
any Merkle proof here too.
|
||||
|
||||
**Solution 3.** Combining **3a** and **3b**, we can solve both problems 1) and 2); so here we add
|
||||
two bits of information to each node (that is, we need 4 different compression functions).
|
||||
4) can be always solved by adding a final compression call.
|
||||
|
||||
**Solution 4a.** Replace each input element `x_i` with `compress(i,x_i)`. This solves
|
||||
both problems again (and 4) too), but doubles the amount of computation.
|
||||
|
||||
**Solution 4b.** Only in the bottom layer, use `H(1|isOddNode|i|x_{2i}|x_{2i+1})` for
|
||||
compression (not that for the odd node we have `x_{2i+1}=dummy`). This is similar to
|
||||
the previous solution, but does not increase the amount of computation.
|
||||
|
||||
**Solution 4c.** Only in the bottom layer, use `H(i|j|x_i|x_j)` for even nodes
|
||||
(with `i=2*k` and `j=2*k+1`), and `H(i|0|x_i|0)` for the odd node (or alternatively
|
||||
we could also use `H(i|i|x_i|x_i)` for the odd node). Note: when verifying
|
||||
a Merkle proof, you still need to know whether the element you prove is the last _and_
|
||||
odd element, or not. However instead of submitting the length, you can encode this
|
||||
into a single bit (not sure if that's much better though).
|
||||
|
||||
**Solution 5 (??).** Use a different tree shape, where the left subtree is always a complete
|
||||
(full) binary tree with `2^floor(log2(n-1))` leaves, and the right subtree is
|
||||
constructed recursively. Then the shape of tree encodes the number of inputs `n`.
|
||||
This however complicates the Merkle proofs (they won't have uniform size).
|
||||
TODO: think more about this!
|
||||
|
||||
### Keyed compression functions
|
||||
|
||||
How can we have many different compression functions? Consider three case studies:
|
||||
|
||||
**Poseidon.** The Poseidon family of hashes is built on a (fixed) permutation
|
||||
`pi : F^t -> F^t`, where `F` is a (large) finite field. For simplicity consider the case `t=3`.
|
||||
The standard compression function is then defined as:
|
||||
|
||||
compress(x,y) := let (u,_,_) = pi(x,y,0) in u
|
||||
|
||||
That, we take the triple `(x,y,0)`, apply the permutation to get another triple `(u,v,w)`, and
|
||||
extract the field element `u` (we could use `v` or `w` too, it shouldn't matter).
|
||||
Now we can see that it is in fact very easy to generalize this to a _keyed_ (or _indexed_)
|
||||
compression function:
|
||||
|
||||
compress_k(x,y) := let (u,_,_) = pi(x,y,k) in u
|
||||
|
||||
where `k` is the key. Note that there is no overhead in doing this. And since `F` is pretty
|
||||
big (in our case, about 253 bits), there is plenty of information we can encode in the key `k`.
|
||||
|
||||
Note: We probably lose a few bits of security here, if somebody looks for a preimage among
|
||||
_all_ keys; however in our constructions the keys have a fixed structure, so it's probably
|
||||
not that dangerous. If we want to be extra safe, we could use `t=4` and `pi(x,y,k,0)`
|
||||
instead (but that has some computation overhead).
|
||||
|
||||
**SHA256.** When using SHA256 as our hash function, normally the compression function is
|
||||
defined as `compress(x,y) := SHA256(x|y)`, that is, concatenate the (bitstring representation of the)
|
||||
two elements, and apply SHA256 to the resulting (bit)string. Normally `x` and `y` are both
|
||||
256 bits long, and so is the result. If we look into the details of how SHA256 is specified,
|
||||
this is actually wasteful. That's because while SHA256 processes the input in 512 bit chunks,
|
||||
it also prescribes a mandatory nonempty padding. So when calling SHA256 on an input of size
|
||||
512 bit (64 bytes), it will actually process two chunks, the second chunk consisting purely
|
||||
of padding. When constructing a binary Merkle tree using a compression function like before,
|
||||
the input is always of the same size, so this padding is unnecessary; nevertheless, people
|
||||
usually prefer to follow the standardized SHA256 call. But, if we are processing 1024 bits
|
||||
anyway, we have a lot of free space to include our key `k`! In fact we can add up to
|
||||
`512-64-1=447` bits of additional information; so for example
|
||||
|
||||
compress_k(x,y) := SHA256(k|x|y)
|
||||
|
||||
works perfectly well with no overhead compared to `SHA256(x|y)`.
|
||||
|
||||
**MiMC.** MiMC is another arithmetic construction, however in this
|
||||
case the starting point is a _block cipher_, that is, we start with
|
||||
a keyed permutation! Unfortunately MiMC-p/p is a (keyed) permutation
|
||||
of `F`, which is not very useful for usl; however in Feistel mode we
|
||||
get a keyed permutation of `F^2`, and we can just take the first
|
||||
component of the output of that as the compressed output.
|
||||
|
||||
### Tree padding proposal
|
||||
|
||||
It seems to me, that whatever way we try to solve problem 2) without pre-hashing, we need
|
||||
to include the length (or at least one bit information about the length) into the Merkle
|
||||
proofs. So maybe we should just live with that.
|
||||
|
||||
Then from the above choices, right now maybe solution **4c**, or some variation of it
|
||||
looks the nicest to me.
|
||||
|
||||
### Making `deserialize` injective
|
||||
|
||||
Consider the following simple algorithm to deserialize a sequence of bytes into chunks of
|
||||
31 bytes:
|
||||
|
||||
- pad the input with at most 30 zero bytes such that the padded length becomes divisible
|
||||
with 31
|
||||
- split the padded sequnce into `ceil(n/31)` chunks, each 31 bytes.
|
||||
|
||||
The problem with this, is that for example `0x123456`, `0x12345600` and `0x1234560000`
|
||||
all results in the same output.
|
||||
|
||||
Some possible solutions:
|
||||
|
||||
- prepend the length (number of input bytes) to the input, say as a 64-bit little-endian integer (8 bytes),
|
||||
before padding as above
|
||||
- or append the length instead of prepending, then pad
|
||||
- or first pad with zero bytes, but leave 8 bytes for the length (so that when we finally append
|
||||
the length, the result will be divisible 31).
|
||||
- use the following padding strategy: _always_ add a single 0x01 byte, then enough 0x00 bytes so that the length
|
||||
is divisible by 31. Why does this work? Well, consider an already padded sequence. Count
|
||||
the number of zero bytes from the end: you get a number `0 <= m < 31`. This number
|
||||
determines the residue class of the original length `n` modulo 31; then this class,
|
||||
together with the padded length fully determines the original length.
|
||||
|
Loading…
Reference in New Issue