codex-research/design/Merkle.md

12 KiB

Merkle tree API proposal (WIP draft)

Let's collect the possible problems and solutions with constructing Merkle trees.

Vocabulary

A Merkle tree, built on a hash function H, produces a Merkle root of type T. This is usually the same type as the output of the hash function. Some examples:

  • SHA1: T is 160 bits
  • SHA256: T is 256 bits
  • Poseidon: T is one (or a few) finite field element(s)

The hash function H can also have different types S of inputs. For example:

  • SHA1 / SHA256 / SHA3: S is an arbitrary sequence of bits
  • some less-conforming implementation of these could take a sequence of bytes instead
  • Poseidon: S is a sequence of finite field elements
  • Poseidon compression function: at most t-1 field elements (in our case t=3, so that's two field elements)
  • A naive Merkle tree implementation could for example accept only a power-of-two sized sequence of T

Notation: Let's denote a sequence of T by [T].

Merkle tree API

We usually need at least two types of Merkle tree APIs:

  • one which takes a sequence S = [T] of length n as input, and produces an output (Merkle root) of type T
  • and one which takes a sequence of bytes (or even bits, but in practice we probably only need bytes): S = [byte]

We can decompose the latter into the composition of a function deserialize : [byte] -> [T] and the former.

Naive Merkle tree implementation

A straightforward implementation of a binary Merkle tree merkleRoot : [T] -> T could be for example:

  • if the input has length 1, it's the root
  • if the input has even length 2*k, group it into pairs, apply a compress : (T,T) -> T compression function, producing the next layer of size k
  • if the input has odd length 2*k+1, pad it with an extra element dummy of type T, then apply the procedure for even length, producing the next layer of size k+1

The compression function could be implemented in several ways:

  • when S and T are just sequences of bits or bytes (as in the case of classical hash functions like SHA256), we can just concatenate the two leaves of the node and apply the hash: compress(x,y) := H(x|y)
  • in case of hash functions based on the sponge construction (like Poseidon or Keccak/SHA3), we can just fill the "capacity part" of the state with a constant (say 0), the "absorbing part" of the state with the two inputs, apply the permutation, and extract a single T

Attacks

When implemented without enough care (like the above naive algorithm), there are several possible attacks producing hash collisions or second preimages:

  1. The root of particular any layer is the same as the root of the input
  2. The root of [x_0,x_1,...,x_(2*k)] (length is n=2*k+1 is the same as the root of [x_0,x_1,...,x_(2*k),dummy] (length is n=2*k+2)
  3. when using bytes as the input, already deserialize can have similar collision attacks
  4. The root of a singleton sequence is itself

Traditional (linear) hash functions usually solve the analogous problems by clever padding.

Domain separation

It's a good practice in general to ensure that different constructions using the same underlying hash functions will never produce the same output. This is called "domain separation", and it's a bit similar to multihash; however instead of adding extra bits of information to a hash (and thus increasing its size), we just compress the extra information into the hash itself. A simple example would be using H(dom|H(...)) instead of H(...). The below solutions can be interpreted as an application of this idea, where we want to separate the different lengths n.

Possible solutions (for the tree attacks)

While the third problem (deserialize may be not injective) is similar to the second problem, let's deal first with the tree problems, and come back to deserialize (see below) later.

Solution 0b. Pre-hash each input element. This solves 2) and 4) (if we choose dummy to be something we don't expect anybody to find a preimage), but does not solve 1); also it doubles the computation time.

Solution 1. Just prepend the data with the length n of the input sequence. Note that any cryptographic hash function needs an output size of at least 160 bits (and usually at least 256 bits), so we can always embed the length (surely less than 2^64) into T. This solves both problems 1) and 2) (the height of the tree is a deterministic function of the length), and 4) too. However, a typical application of a Merkle tree is the case where the length of the input n=2^d is a power of two; in this case it looks a little bit "inelegant" to increase the size to n=2^d+1, though the overhead with above even-odd construction is only log2(n). An advantage is that you can prove the size of the input with a standard Merkle inclusion proof. Alternative version: append instead of prepend; then the indexing of the leaves does not change.

Solution 2. Apply an extra compression step at the very end including the length n, calculating newRoot = compress(n,origRoot). This again solves all 3 problems. However, it makes the code a bit less regular; and you have to submit the length as part of Merkle proofs.

Solution 3a. Use two different compression function, one for the bottom layer (by bottom I mean the closest to the input) and another for all the other layers. For example you can use compress(x,y) := H(isBottomLayer|x|y). This solves problem 1).

Solution 3b. Use two different compression function, one for the even nodes, and another for the odd nodes (that is, those with a single children instead of two). Similarly to the previous case, you can use for example compress(x,y) := H(isOddNode|x|y) (note that for the odd nodes, we will have y=dummy). This solves problem 2). Remark: The extra bits of information (odd/even) added to the last nodes (one in each layer) are exactly the binary expansion of the length n. A disadvantage is that for verifying a Merkle proof, we need to know for each node whether it's the last or not, so we need to include the length n into any Merkle proof here too.

Solution 3. Combining 3a and 3b, we can solve both problems 1) and 2); so here we add two bits of information to each node (that is, we need 4 different compression functions). 4) can be always solved by adding a final compression call.

Solution 4a. Replace each input element x_i with compress(i,x_i). This solves both problems again (and 4) too), but doubles the amount of computation.

Solution 4b. Only in the bottom layer, use H(1|isOddNode|i|x_{2i}|x_{2i+1}) for compression (not that for the odd node we have x_{2i+1}=dummy). This is similar to the previous solution, but does not increase the amount of computation.

Solution 4c. Only in the bottom layer, use H(i|j|x_i|x_j) for even nodes (with i=2*k and j=2*k+1), and H(i|0|x_i|0) for the odd node (or alternatively we could also use H(i|i|x_i|x_i) for the odd node). Note: when verifying a Merkle proof, you still need to know whether the element you prove is the last and odd element, or not. However instead of submitting the length, you can encode this into a single bit (not sure if that's much better though).

Solution 5 (??). Use a different tree shape, where the left subtree is always a complete (full) binary tree with 2^floor(log2(n-1)) leaves, and the right subtree is constructed recursively. Then the shape of tree encodes the number of inputs n. This however complicates the Merkle proofs (they won't have uniform size). TODO: think more about this!

Keyed compression functions

How can we have many different compression functions? Consider three case studies:

Poseidon. The Poseidon family of hashes is built on a (fixed) permutation pi : F^t -> F^t, where F is a (large) finite field. For simplicity consider the case t=3. The standard compression function is then defined as:

compress(x,y) := let (u,_,_) = pi(x,y,0) in u

That, we take the triple (x,y,0), apply the permutation to get another triple (u,v,w), and extract the field element u (we could use v or w too, it shouldn't matter). Now we can see that it is in fact very easy to generalize this to a keyed (or indexed) compression function:

compress_k(x,y) := let (u,_,_) = pi(x,y,k) in u

where k is the key. Note that there is no overhead in doing this. And since F is pretty big (in our case, about 253 bits), there is plenty of information we can encode in the key k.

Note: We probably lose a few bits of security here, if somebody looks for a preimage among all keys; however in our constructions the keys have a fixed structure, so it's probably not that dangerous. If we want to be extra safe, we could use t=4 and pi(x,y,k,0) instead (but that has some computation overhead).

SHA256. When using SHA256 as our hash function, normally the compression function is defined as compress(x,y) := SHA256(x|y), that is, concatenate the (bitstring representation of the) two elements, and apply SHA256 to the resulting (bit)string. Normally x and y are both 256 bits long, and so is the result. If we look into the details of how SHA256 is specified, this is actually wasteful. That's because while SHA256 processes the input in 512 bit chunks, it also prescribes a mandatory nonempty padding. So when calling SHA256 on an input of size 512 bit (64 bytes), it will actually process two chunks, the second chunk consisting purely of padding. When constructing a binary Merkle tree using a compression function like before, the input is always of the same size, so this padding is unnecessary; nevertheless, people usually prefer to follow the standardized SHA256 call. But, if we are processing 1024 bits anyway, we have a lot of free space to include our key k! In fact we can add up to 512-64-1=447 bits of additional information; so for example

compress_k(x,y) := SHA256(k|x|y)

works perfectly well with no overhead compared to SHA256(x|y).

MiMC. MiMC is another arithmetic construction, however in this case the starting point is a block cipher, that is, we start with a keyed permutation! Unfortunately MiMC-p/p is a (keyed) permutation of F, which is not very useful for usl; however in Feistel mode we get a keyed permutation of F^2, and we can just take the first component of the output of that as the compressed output.

Tree padding proposal

It seems to me, that whatever way we try to solve problem 2) without pre-hashing, we need to include the length (or at least one bit information about the length) into the Merkle proofs. So maybe we should just live with that.

Then from the above choices, right now maybe solution 4c, or some variation of it looks the nicest to me.

Making deserialize injective

Consider the following simple algorithm to deserialize a sequence of bytes into chunks of 31 bytes:

  • pad the input with at most 30 zero bytes such that the padded length becomes divisible with 31
  • split the padded sequnce into ceil(n/31) chunks, each 31 bytes.

The problem with this, is that for example 0x123456, 0x12345600 and 0x1234560000 all results in the same output.

Some possible solutions:

  • prepend the length (number of input bytes) to the input, say as a 64-bit little-endian integer (8 bytes), before padding as above
  • or append the length instead of prepending, then pad
  • or first pad with zero bytes, but leave 8 bytes for the length (so that when we finally append the length, the result will be divisible 31).
  • use the following padding strategy: always add a single 0x01 byte, then enough 0x00 bytes so that the length is divisible by 31. Why does this work? Well, consider an already padded sequence. Count the number of zero bytes from the end: you get a number 0 <= m < 31. This number determines the residue class of the original length n modulo 31; then this class, together with the padded length fully determines the original length.