Merkle tree doc: fix some typos and other mistakes

This commit is contained in:
Balazs Komuves 2023-12-20 23:23:58 +01:00
parent 796b4937c8
commit 570666d110
No known key found for this signature in database
GPG Key ID: F63B7AEF18435562

View File

@ -26,7 +26,7 @@ The hash function `H` can also have different types `S` of inputs. For example:
- A naive Merkle tree implementation could for example accept only a power-of-two - A naive Merkle tree implementation could for example accept only a power-of-two
sized sequence of `T` sized sequence of `T`
Notation: Let's denote a sequence of `T` by `[T]`. Notation: Let's denote a sequence of `T`-s by `[T]`.
### Merkle tree API ### Merkle tree API
@ -65,7 +65,7 @@ The compression function could be implemented in several ways:
When implemented without enough care (like the above naive algorithm), there are several When implemented without enough care (like the above naive algorithm), there are several
possible attacks producing hash collisions or second preimages: possible attacks producing hash collisions or second preimages:
1. The root of particular any layer is the same as the root of the input 1. The root of any particular layer is the same as the root of the input
2. The root of `[x_0,x_1,...,x_(2*k)]` (length is `n=2*k+1` is the same as the root of 2. The root of `[x_0,x_1,...,x_(2*k)]` (length is `n=2*k+1` is the same as the root of
`[x_0,x_1,...,x_(2*k),dummy]` (length is `n=2*k+2`) `[x_0,x_1,...,x_(2*k),dummy]` (length is `n=2*k+2`)
3. when using bytes as the input, already `deserialize` can have similar collision attacks 3. when using bytes as the input, already `deserialize` can have similar collision attacks
@ -76,10 +76,11 @@ Traditional (linear) hash functions usually solve the analogous problems by clev
### Domain separation ### Domain separation
It's a good practice in general to ensure that different constructions using the same It's a good practice in general to ensure that different constructions using the same
underlying hash function will never produce the same output. This is called "domain separation", underlying hash function will never (or at least with a very high probability not) produce the same output.
and it can very loosely remind one to _multihash_; however instead of adding extra bits of information This is called "domain separation", and it can very loosely remind one to _multihash_; however
to a hash (and thus increasing its size), we just compress the extra information into the hash itself. instead of adding extra bits of information to a hash (and thus increasing its size), we just
So the information itself is lost, however collisions between different domains are prevented. compress the extra information into the hash itself. So the information itself is lost,
however collisions between different domains are prevented.
A simple example would be using `H(dom|H(...))` instead of `H(...)`. The below solutions A simple example would be using `H(dom|H(...))` instead of `H(...)`. The below solutions
can be interpreted as an application of this idea, where we want to separate the different can be interpreted as an application of this idea, where we want to separate the different
@ -90,9 +91,9 @@ lengths `n`.
While the third problem (`deserialize` may be not injective) is similar to the second problem, While the third problem (`deserialize` may be not injective) is similar to the second problem,
let's deal first with the tree problems, and come back to `deserialize` (see below) later. let's deal first with the tree problems, and come back to `deserialize` (see below) later.
**Solution 0b.** Pre-hash each input element. This solves 2) and 4) (if we choose `dummy` to be **Solution 0.** Pre-hash each input element. This solves 1), 2) and also 4) (at least
something we don't expect anybody to find a preimage), but does not solve 1); also it if we choose `dummy` to be something we don't expect anybody to find a preimage), but
doubles the computation time. it doubles the computation time.
**Solution 1.** Just prepend the data with the length `n` of the input sequence. Note that any **Solution 1.** Just prepend the data with the length `n` of the input sequence. Note that any
cryptographic hash function needs an output size of at least 160 bits (and usually at least cryptographic hash function needs an output size of at least 160 bits (and usually at least
@ -103,17 +104,20 @@ However, a typical application of a Merkle tree is the case where the length of
`n=2^d` is a power of two; in this case it looks a little bit "inelegant" to increase the size `n=2^d` is a power of two; in this case it looks a little bit "inelegant" to increase the size
to `n=2^d+1`, though the overhead with above even-odd construction is only `log2(n)`. to `n=2^d+1`, though the overhead with above even-odd construction is only `log2(n)`.
An advantage is that you can _prove_ the size of the input with a standard Merkle inclusion proof. An advantage is that you can _prove_ the size of the input with a standard Merkle inclusion proof.
Alternative version: append instead of prepend; then the indexing of the leaves does not change.
Alternative version: append the length, instead of prepending; then the indexing of the leaves does not change.
**Solution 2.** Apply an extra compression step at the very end including the length `n`, **Solution 2.** Apply an extra compression step at the very end including the length `n`,
calculating `newRoot = compress(n,origRoot)`. This again solves all 3 problems. However, it calculating `newRoot = compress(n,origRoot)`. This again solves all 3 problems. However, it
makes the code a bit less regular; and you have to submit the length as part of Merkle proofs. makes the code a bit less regular; and you have to submit the length as part of Merkle proofs
(but it seems hard to avoid that anyway).
**Solution 3a.** Use two different compression function, one for the bottom layer (by bottom **Solution 3a.** Use two different compression functions, one for the bottom layer (by bottom
I mean the closest to the input) and another for all the other layers. For example you can I mean the one next to the input, which is the same as the widest one) and another for all
use `compress(x,y) := H(isBottomLayer|x|y)`. This solves problem 1). the other layers. For example you can use `compress(x,y) := H(isBottomLayer|x|y)`.
This solves problem 1).
**Solution 3b.** Use two different compression function, one for the even nodes, and another **Solution 3b.** Use two different compression functions, one for the even nodes, and another
for the odd nodes (that is, those with a single children instead of two). Similarly to the for the odd nodes (that is, those with a single children instead of two). Similarly to the
previous case, you can use for example `compress(x,y) := H(isOddNode|x|y)` (note that for previous case, you can use for example `compress(x,y) := H(isOddNode|x|y)` (note that for
the odd nodes, we will have `y=dummy`). This solves problem 2). Remark: The extra bits of the odd nodes, we will have `y=dummy`). This solves problem 2). Remark: The extra bits of
@ -130,7 +134,7 @@ two bits of information to each node (that is, we need 4 different compression f
both problems again (and 4) too), but doubles the amount of computation. both problems again (and 4) too), but doubles the amount of computation.
**Solution 4b.** Only in the bottom layer, use `H(1|isOddNode|i|x_{2i}|x_{2i+1})` for **Solution 4b.** Only in the bottom layer, use `H(1|isOddNode|i|x_{2i}|x_{2i+1})` for
compression (not that for the odd node we have `x_{2i+1}=dummy`). This is similar to compression (note that for the odd node we have `x_{2i+1}=dummy`). This is similar to
the previous solution, but does not increase the amount of computation. the previous solution, but does not increase the amount of computation.
**Solution 4c.** Only in the bottom layer, use `H(i|j|x_i|x_j)` for even nodes **Solution 4c.** Only in the bottom layer, use `H(i|j|x_i|x_j)` for even nodes
@ -210,7 +214,7 @@ all results in the same output.
#### About padding in general #### About padding in general
Let's take a step back, and meditate a little bit of what's the meaning of padding. Let's take a step back, and meditate a little bit about the meaning of padding.
What is padding? It's a mapping from a set of sequences into a subset. In our case What is padding? It's a mapping from a set of sequences into a subset. In our case
we have an arbitrary sequence of bytes, and we want to map into the subset of sequences we have an arbitrary sequence of bytes, and we want to map into the subset of sequences
@ -241,8 +245,8 @@ sequences, which always results in a byte sequence whose length is divisible by
- or append the length instead of prepending, then pad (note: appending is streaming-friendly; prepending is not) - or append the length instead of prepending, then pad (note: appending is streaming-friendly; prepending is not)
- or first pad with zero bytes, but leave 8 bytes for the length (so that when we finally append - or first pad with zero bytes, but leave 8 bytes for the length (so that when we finally append
the length, the result will be divisible 31). This is _almost_ exactly what SHA2 does. the length, the result will be divisible 31). This is _almost_ exactly what SHA2 does.
- use the following padding strategy: _always_ add a single `0x01` byte, then enough `0x00` bytes so that the length - use the following padding strategy: _always_ add a single `0x01` byte, then enough `0x00` bytes (possibly none)
is divisible by 31. This is usually called the `10*` padding strategy, abusing regexp notation. so that the length is divisible by 31. This is usually called the `10*` padding strategy, abusing regexp notation.
Why does this work? Well, consider an already padded sequence. It's very easy to recover the Why does this work? Well, consider an already padded sequence. It's very easy to recover the
original byte sequence by 1) first removing all trailing zeros; and 2) after that, remove the single original byte sequence by 1) first removing all trailing zeros; and 2) after that, remove the single
trailing `0x01` byte. This proves that the padding is an injective function. trailing `0x01` byte. This proves that the padding is an injective function.
@ -265,10 +269,10 @@ We decided to implement the following version.
so that the resulting sequence have length divisible by 31 so that the resulting sequence have length divisible by 31
- when converting an (already padded) byte sequence to a sequence of field elements, - when converting an (already padded) byte sequence to a sequence of field elements,
split it up into 31 byte chunks, interpret those as little-endian 248-bit unsigned split it up into 31 byte chunks, interpret those as little-endian 248-bit unsigned
integers, and finally interpret those integers as field elements in the BN254 prime integers, and finally interpret those integers as field elements in the BN254 scalar
field (using the standard mapping `Z -> Z/p`). prime field (using the standard mapping `Z -> Z/r`).
- when using the Poseidon2 sponge construction to compute a linear hash out of - when using the Poseidon2 sponge construction to compute a linear hash out of
a sequence of field elements, we use the BN254 field, `t=3` and `(0,0,domsep)` a sequence of field elements, we use the BN254 scalar field, `t=3` and `(0,0,domsep)`
as the initial state, where `domsep := 2^64 + 256*t + rate` is the domain separation as the initial state, where `domsep := 2^64 + 256*t + rate` is the domain separation
IV. Note that because `t=3`, we can only have `rate=1` or `rate=2`. We need IV. Note that because `t=3`, we can only have `rate=1` or `rate=2`. We need
a padding strategy here too (since the input length must be divisible by `rate`): a padding strategy here too (since the input length must be divisible by `rate`):
@ -286,4 +290,4 @@ We decided to implement the following version.
- we will use the same strategy when constructing binary Merkle trees with the - we will use the same strategy when constructing binary Merkle trees with the
SHA256 hash; in that case, the compression function will be `SHA256(x|y|key)`. SHA256 hash; in that case, the compression function will be `SHA256(x|y|key)`.
Note: since SHA256 already uses padding internally, adding the key does not Note: since SHA256 already uses padding internally, adding the key does not
result in any overhoad. result in any overhead.