12 KiB
Merkle tree API proposal (WIP draft)
Let's collect the possible problems and solutions with constructing Merkle trees.
Vocabulary
A Merkle tree, built on a hash function H
, produces a Merkle root of type T
.
This is usually the same type as the output of the hash function. Some examples:
- SHA1:
T
is 160 bits - SHA256:
T
is 256 bits - Poseidon:
T
is one (or a few) finite field element(s)
The hash function H
can also have different types S
of inputs. For example:
- SHA1 / SHA256 / SHA3:
S
is an arbitrary sequence of bits - some less-conforming implementation of these could take a sequence of bytes instead
- Poseidon:
S
is a sequence of finite field elements - Poseidon compression function: at most
t-1
field elements (in our caset=3
, so that's two field elements) - A naive Merkle tree implementation could for example accept only a power-of-two
sized sequence of
T
Notation: Let's denote a sequence of T
by [T]
.
Merkle tree API
We usually need at least two types of Merkle tree APIs:
- one which takes a sequence
S = [T]
of lengthn
as input, and produces an output (Merkle root) of typeT
- and one which takes a sequence of bytes (or even bits, but in practice we probably
only need bytes):
S = [byte]
We can decompose the latter into the composition of a function
deserialize : [byte] -> [T]
and the former.
Naive Merkle tree implementation
A straightforward implementation of a binary Merkle tree merkleRoot : [T] -> T
could be for example:
- if the input has length 1, it's the root
- if the input has even length
2*k
, group it into pairs, apply acompress : (T,T) -> T
compression function, producing the next layer of sizek
- if the input has odd length
2*k+1
, pad it with an extra elementdummy
of typeT
, then apply the procedure for even length, producing the next layer of sizek+1
The compression function could be implemented in several ways:
- when
S
andT
are just sequences of bits or bytes (as in the case of classical hash functions like SHA256), we can just concatenate the two leaves of the node and apply the hash:compress(x,y) := H(x|y)
- in case of hash functions based on the sponge construction (like Poseidon or Keccak/SHA3),
we can just fill the "capacity part" of the state with a constant (say 0), the "absorbing
part" of the state with the two inputs, apply the permutation, and extract a single
T
Attacks
When implemented without enough care (like the above naive algorithm), there are several possible attacks producing hash collisions or second preimages:
- The root of particular any layer is the same as the root of the input
- The root of
[x_0,x_1,...,x_(2*k)]
(length isn=2*k+1
is the same as the root of[x_0,x_1,...,x_(2*k),dummy]
(length isn=2*k+2
) - when using bytes as the input, already
deserialize
can have similar collision attacks - The root of a singleton sequence is itself
Traditional (linear) hash functions usually solve the analogous problems by clever padding.
Domain separation
It's a good practice in general to ensure that different constructions using the same
underlying hash functions will never produce the same output. This is called "domain separation",
and it's a bit similar to multihash; however instead of adding extra bits of information to a hash
(and thus increasing its size), we just compress the extra information into the hash itself.
A simple example would be using H(dom|H(...))
instead of H(...)
. The below solutions
can be interpreted as an application of this idea, where we want to separate the different
lengths n
.
Possible solutions (for the tree attacks)
While the third problem (deserialize
may be not injective) is similar to the second problem,
let's deal first with the tree problems, and come back to deserialize
(see below) later.
Solution 0b. Pre-hash each input element. This solves 2) and 4) (if we choose dummy
to be
something we don't expect anybody to find a preimage), but does not solve 1); also it
doubles the computation time.
Solution 1. Just prepend the data with the length n
of the input sequence. Note that any
cryptographic hash function needs an output size of at least 160 bits (and usually at least
256 bits), so we can always embed the length (surely less than 2^64
) into T
. This solves
both problems 1) and 2) (the height of the tree is a deterministic function of the length),
and 4) too.
However, a typical application of a Merkle tree is the case where the length of the input
n=2^d
is a power of two; in this case it looks a little bit "inelegant" to increase the size
to n=2^d+1
, though the overhead with above even-odd construction is only log2(n)
.
An advantage is that you can prove the size of the input with a standard Merkle inclusion proof.
Alternative version: append instead of prepend; then the indexing of the leaves does not change.
Solution 2. Apply an extra compression step at the very end including the length n
,
calculating newRoot = compress(n,origRoot)
. This again solves all 3 problems. However, it
makes the code a bit less regular; and you have to submit the length as part of Merkle proofs.
Solution 3a. Use two different compression function, one for the bottom layer (by bottom
I mean the closest to the input) and another for all the other layers. For example you can
use compress(x,y) := H(isBottomLayer|x|y)
. This solves problem 1).
Solution 3b. Use two different compression function, one for the even nodes, and another
for the odd nodes (that is, those with a single children instead of two). Similarly to the
previous case, you can use for example compress(x,y) := H(isOddNode|x|y)
(note that for
the odd nodes, we will have y=dummy
). This solves problem 2). Remark: The extra bits of
information (odd/even) added to the last nodes (one in each layer) are exactly the binary
expansion of the length n
. A disadvantage is that for verifying a Merkle proof, we need to
know for each node whether it's the last or not, so we need to include the length n
into
any Merkle proof here too.
Solution 3. Combining 3a and 3b, we can solve both problems 1) and 2); so here we add two bits of information to each node (that is, we need 4 different compression functions). 4) can be always solved by adding a final compression call.
Solution 4a. Replace each input element x_i
with compress(i,x_i)
. This solves
both problems again (and 4) too), but doubles the amount of computation.
Solution 4b. Only in the bottom layer, use H(1|isOddNode|i|x_{2i}|x_{2i+1})
for
compression (not that for the odd node we have x_{2i+1}=dummy
). This is similar to
the previous solution, but does not increase the amount of computation.
Solution 4c. Only in the bottom layer, use H(i|j|x_i|x_j)
for even nodes
(with i=2*k
and j=2*k+1
), and H(i|0|x_i|0)
for the odd node (or alternatively
we could also use H(i|i|x_i|x_i)
for the odd node). Note: when verifying
a Merkle proof, you still need to know whether the element you prove is the last and
odd element, or not. However instead of submitting the length, you can encode this
into a single bit (not sure if that's much better though).
Solution 5 (??). Use a different tree shape, where the left subtree is always a complete
(full) binary tree with 2^floor(log2(n-1))
leaves, and the right subtree is
constructed recursively. Then the shape of tree encodes the number of inputs n
.
This however complicates the Merkle proofs (they won't have uniform size).
TODO: think more about this!
Keyed compression functions
How can we have many different compression functions? Consider three case studies:
Poseidon. The Poseidon family of hashes is built on a (fixed) permutation
pi : F^t -> F^t
, where F
is a (large) finite field. For simplicity consider the case t=3
.
The standard compression function is then defined as:
compress(x,y) := let (u,_,_) = pi(x,y,0) in u
That, we take the triple (x,y,0)
, apply the permutation to get another triple (u,v,w)
, and
extract the field element u
(we could use v
or w
too, it shouldn't matter).
Now we can see that it is in fact very easy to generalize this to a keyed (or indexed)
compression function:
compress_k(x,y) := let (u,_,_) = pi(x,y,k) in u
where k
is the key. Note that there is no overhead in doing this. And since F
is pretty
big (in our case, about 253 bits), there is plenty of information we can encode in the key k
.
Note: We probably lose a few bits of security here, if somebody looks for a preimage among
all keys; however in our constructions the keys have a fixed structure, so it's probably
not that dangerous. If we want to be extra safe, we could use t=4
and pi(x,y,k,0)
instead (but that has some computation overhead).
SHA256. When using SHA256 as our hash function, normally the compression function is
defined as compress(x,y) := SHA256(x|y)
, that is, concatenate the (bitstring representation of the)
two elements, and apply SHA256 to the resulting (bit)string. Normally x
and y
are both
256 bits long, and so is the result. If we look into the details of how SHA256 is specified,
this is actually wasteful. That's because while SHA256 processes the input in 512 bit chunks,
it also prescribes a mandatory nonempty padding. So when calling SHA256 on an input of size
512 bit (64 bytes), it will actually process two chunks, the second chunk consisting purely
of padding. When constructing a binary Merkle tree using a compression function like before,
the input is always of the same size, so this padding is unnecessary; nevertheless, people
usually prefer to follow the standardized SHA256 call. But, if we are processing 1024 bits
anyway, we have a lot of free space to include our key k
! In fact we can add up to
512-64-1=447
bits of additional information; so for example
compress_k(x,y) := SHA256(k|x|y)
works perfectly well with no overhead compared to SHA256(x|y)
.
MiMC. MiMC is another arithmetic construction, however in this
case the starting point is a block cipher, that is, we start with
a keyed permutation! Unfortunately MiMC-p/p is a (keyed) permutation
of F
, which is not very useful for usl; however in Feistel mode we
get a keyed permutation of F^2
, and we can just take the first
component of the output of that as the compressed output.
Tree padding proposal
It seems to me, that whatever way we try to solve problem 2) without pre-hashing, we need to include the length (or at least one bit information about the length) into the Merkle proofs. So maybe we should just live with that.
Then from the above choices, right now maybe solution 4c, or some variation of it looks the nicest to me.
Making deserialize
injective
Consider the following simple algorithm to deserialize a sequence of bytes into chunks of 31 bytes:
- pad the input with at most 30 zero bytes such that the padded length becomes divisible with 31
- split the padded sequnce into
ceil(n/31)
chunks, each 31 bytes.
The problem with this, is that for example 0x123456
, 0x12345600
and 0x1234560000
all results in the same output.
Some possible solutions:
- prepend the length (number of input bytes) to the input, say as a 64-bit little-endian integer (8 bytes), before padding as above
- or append the length instead of prepending, then pad
- or first pad with zero bytes, but leave 8 bytes for the length (so that when we finally append the length, the result will be divisible 31).
- use the following padding strategy: always add a single 0x01 byte, then enough 0x00 bytes so that the length
is divisible by 31. Why does this work? Well, consider an already padded sequence. Count
the number of zero bytes from the end: you get a number
0 <= m < 31
. This number determines the residue class of the original lengthn
modulo 31; then this class, together with the padded length fully determines the original length.