diff --git a/design/Merkle.md b/design/Merkle.md
new file mode 100644
index 0000000..89e91f6
--- /dev/null
+++ b/design/Merkle.md
@@ -0,0 +1,226 @@
+
+Merkle tree API proposal (WIP draft)
+------------------------------------
+
+Let's collect the possible problems and solutions with constructing Merkle trees.
+
+### Vocabulary
+
+A Merkle tree, built on a hash function `H`, produces a Merkle root of type `T`. 
+This is usually the same type as the output of the hash function. Some examples:
+
+- SHA1: `T` is 160 bits
+- SHA256: `T` is 256 bits
+- Poseidon: `T` is one (or a few) finite field element(s)
+
+The hash function `H` can also have different types `S` of inputs. For example:
+
+- SHA1 / SHA256 / SHA3: `S` is an arbitrary sequence of bits
+- some less-conforming implementation of these could take a sequence of bytes instead
+- Poseidon: `S` is a sequence of finite field elements
+- Poseidon compression function: at most `t-1` field elements (in our case `t=3`, so 
+  that's two field elements)
+- A naive Merkle tree implementation could for example accept only a power-of-two 
+  sized sequence of `T`
+
+Notation: Let's denote a sequence of `T` by `[T]`.
+
+### Merkle tree API
+
+We usually need at least two types of Merkle tree APIs:
+
+- one which takes a sequence `S = [T]` of length `n` as input, and produces an 
+  output (Merkle root) of type `T`
+- and one which takes a sequence of bytes (or even bits, but in practice we probably 
+  only need bytes): `S = [byte]`
+
+We can decompose the latter into the composition of a function 
+`deserialize : [byte] -> [T]` and the former.
+
+### Naive Merkle tree implementation
+
+A straightforward implementation of a binary Merkle tree `merkleRoot : [T] -> T` 
+could be for example:
+
+- if the input has length 1, it's the root
+- if the input has even length `2*k`, group it into pairs, apply a 
+  `compress : (T,T) -> T` compression function, producing the next layer of size `k`
+- if the input has odd length `2*k+1`, pad it with an extra element `dummy` of 
+  type `T`, then apply the procedure for even length, producing the next layer of size `k+1`
+
+The compression function could be implemented in several ways:
+
+- when `S` and `T` are just sequences of bits or bytes (as in the case of classical hash
+  functions like SHA256), we can just concatenate the two leaves of the node and apply the
+  hash: `compress(x,y) := H(x|y)`
+- in case of hash functions based on the sponge construction (like Poseidon or Keccak/SHA3), 
+  we can just fill the "capacity part" of the state with a constant (say 0), the "absorbing 
+  part" of the state with the two inputs, apply the permutation, and extract a single `T` 
+
+### Attacks
+
+When implemented without enough care (like the above naive algorithm), there are several 
+possible attacks producing hash collisions or second preimages:
+
+1. The root of particular any layer is the same as the root of the input
+2. The root of `[x_0,x_1,...,x_(2*k)]` (length is `n=2*k+1` is the same as the root of 
+   `[x_0,x_1,...,x_(2*k),dummy]` (length is `n=2*k+2`)
+3. when using bytes as the input, already `deserialize` can have similar collision attacks
+4. The root of a singleton sequence is itself
+
+Traditional (linear) hash functions usually solve the analogous problems by clever padding.
+
+### Domain separation
+
+It's a good practice in general to ensure that different constructions using the same 
+underlying hash functions will never produce the same output. This is called "domain separation",
+and it's a bit similar to _multihash_; however instead of adding extra bits of information to a hash
+(and thus increasing its size), we just compress the extra information into the hash itself.
+A simple example would be using `H(dom|H(...))` instead of `H(...)`. The below solutions
+can be interpreted as an application of this idea, where we want to separate the different
+lengths `n`.
+
+### Possible solutions (for the tree attacks)
+
+While the third problem (`deserialize` may be not injective) is similar to the second problem,
+let's deal first with the tree problems, and come back to `deserialize` (see below) later.
+
+**Solution 0b.** Pre-hash each input element. This solves 2) and 4) (if we choose `dummy` to be
+something we don't expect anybody to find a preimage), but does not solve 1); also it
+doubles the computation time.
+
+**Solution 1.** Just prepend the data with the length `n` of the input sequence. Note that any
+cryptographic hash function needs an output size of at least 160 bits (and usually at least 
+256 bits), so we can always embed the length (surely less than `2^64`) into `T`. This solves
+both problems 1) and 2) (the height of the tree is a deterministic function of the length),
+and 4) too.
+However, a typical application of a Merkle tree is the case where the length of the input
+`n=2^d` is a power of two; in this case it looks a little bit "inelegant" to increase the size
+to `n=2^d+1`, though the overhead with above even-odd construction is only `log2(n)`.
+An advantage is that you can _prove_ the size of the input with a standard Merkle inclusion proof.
+Alternative version: append instead of prepend; then the indexing of the leaves does not change.
+
+**Solution 2.** Apply an extra compression step at the very end including the length `n`, 
+calculating `newRoot = compress(n,origRoot)`. This again solves all 3 problems. However, it 
+makes the code a bit less regular; and you have to submit the length as part of Merkle proofs.
+
+**Solution 3a.** Use two different compression function, one for the bottom layer (by bottom
+I mean the closest to the input) and another for all the other layers. For example you can 
+use `compress(x,y) := H(isBottomLayer|x|y)`. This solves problem 1).
+
+**Solution 3b.** Use two different compression function, one for the even nodes, and another
+for the odd nodes (that is, those with a single children instead of two). Similarly to the 
+previous case, you can use for example `compress(x,y) := H(isOddNode|x|y)` (note that for 
+the odd nodes, we will have `y=dummy`). This solves problem 2). Remark: The extra bits of 
+information (odd/even) added to the last nodes (one in each layer) are exactly the binary 
+expansion of the length `n`. A disadvantage is that for verifying a Merkle proof, we need to 
+know for each node whether it's the last or not, so we need to include the length `n` into 
+any Merkle proof here too.
+
+**Solution 3.** Combining **3a** and **3b**, we can solve both problems 1) and 2); so here we add
+two bits of information to each node (that is, we need 4 different compression functions).
+4) can be always solved by adding a final compression call.
+
+**Solution 4a.** Replace each input element `x_i` with `compress(i,x_i)`. This solves
+both problems again (and 4) too), but doubles the amount of computation.
+
+**Solution 4b.** Only in the bottom layer, use `H(1|isOddNode|i|x_{2i}|x_{2i+1})` for 
+compression (not that for the odd node we have `x_{2i+1}=dummy`). This is similar to 
+the previous solution, but does not increase the amount of computation.
+
+**Solution 4c.** Only in the bottom layer, use `H(i|j|x_i|x_j)` for even nodes
+(with `i=2*k` and `j=2*k+1`), and `H(i|0|x_i|0)` for the odd node (or alternatively
+we could also use `H(i|i|x_i|x_i)` for the odd node). Note: when verifying
+a Merkle proof, you still need to know whether the element you prove is the last _and_
+odd element, or not. However instead of submitting the length, you can encode this
+into a single bit (not sure if that's much better though).
+
+**Solution 5 (??).** Use a different tree shape, where the left subtree is always a complete
+(full) binary tree with `2^floor(log2(n-1))` leaves, and the right subtree is
+constructed recursively. Then the shape of tree encodes the number of inputs `n`.
+This however complicates the Merkle proofs (they won't have uniform size).
+TODO: think more about this!
+
+### Keyed compression functions
+
+How can we have many different compression functions? Consider three case studies:
+
+**Poseidon.** The Poseidon family of hashes is built on a (fixed) permutation 
+`pi : F^t -> F^t`, where `F` is a (large) finite field. For simplicity consider the case `t=3`. 
+The standard compression function is then defined as:
+
+    compress(x,y) := let (u,_,_) = pi(x,y,0) in u
+
+That, we take the triple `(x,y,0)`, apply the permutation to get another triple `(u,v,w)`, and
+extract the field element `u` (we could use `v` or `w` too, it shouldn't matter).
+Now we can see that it is in fact very easy to generalize this to a _keyed_ (or _indexed_)
+compression function:
+
+    compress_k(x,y) := let (u,_,_) = pi(x,y,k) in u
+
+where `k` is the key. Note that there is no overhead in doing this. And since `F` is pretty
+big (in our case, about 253 bits), there is plenty of information we can encode in the key `k`.
+
+Note: We probably lose a few bits of security here, if somebody looks for a preimage among
+_all_ keys; however in our constructions the keys have a fixed structure, so it's probably
+not that dangerous. If we want to be extra safe, we could use `t=4` and `pi(x,y,k,0)`
+instead (but that has some computation overhead).
+
+**SHA256.** When using SHA256 as our hash function, normally the compression function is
+defined as `compress(x,y) := SHA256(x|y)`, that is, concatenate the (bitstring representation of the)
+two elements, and apply SHA256 to the resulting (bit)string. Normally `x` and `y` are both
+256 bits long, and so is the result. If we look into the details of how SHA256 is specified,
+this is actually wasteful. That's because while SHA256 processes the input in 512 bit chunks,
+it also prescribes a mandatory nonempty padding. So when calling SHA256 on an input of size 
+512 bit (64 bytes), it will actually process two chunks, the second chunk consisting purely
+of padding. When constructing a binary Merkle tree using a compression function like before,
+the input is always of the same size, so this padding is unnecessary; nevertheless, people 
+usually prefer to follow the standardized SHA256 call. But, if we are processing 1024 bits
+anyway, we have a lot of free space to include our key `k`! In fact we can add  up to 
+`512-64-1=447` bits of additional information; so for example
+
+    compress_k(x,y) := SHA256(k|x|y)
+
+works perfectly well with no overhead compared to `SHA256(x|y)`.
+
+**MiMC.** MiMC is another arithmetic construction, however in this
+case the starting point is a _block cipher_, that is, we start with
+a keyed permutation! Unfortunately MiMC-p/p is a (keyed) permutation 
+of `F`, which is not very useful for usl; however in Feistel mode we
+get a keyed permutation of `F^2`, and we can just take the first
+component of the output of that as the compressed output.
+
+### Tree padding proposal
+
+It seems to me, that whatever way we try to solve problem 2) without pre-hashing, we need
+to include the length (or at least one bit information about the length) into the Merkle
+proofs. So maybe we should just live with that.
+
+Then from the above choices, right now maybe solution **4c**, or some variation of it
+looks the nicest to me.
+
+### Making `deserialize` injective
+
+Consider the following simple algorithm to deserialize a sequence of bytes into chunks of
+31 bytes:
+
+- pad the input with at most 30 zero bytes such that the padded length becomes divisible
+  with 31
+- split the padded sequnce into `ceil(n/31)` chunks, each 31 bytes.
+
+The problem with this, is that for example `0x123456`, `0x12345600` and `0x1234560000` 
+all results in the same output.
+
+Some possible solutions:
+
+- prepend the length (number of input bytes) to the input, say as a 64-bit little-endian integer (8 bytes),
+  before padding as above
+- or append the length instead of prepending, then pad
+- or first pad with zero bytes, but leave 8 bytes for the length (so that when we finally append
+  the length, the result will be divisible 31).
+- use the following padding strategy: _always_ add a single 0x01 byte, then enough 0x00 bytes so that the length
+  is divisible by 31. Why does this work? Well, consider an already padded sequence. Count
+  the number of zero bytes from the end: you get a number `0 <= m < 31`. This number 
+  determines the residue class of the original length `n` modulo 31; then this class,
+  together with the padded length fully determines the original length. 
+