From db6d7888c78caae924eb857035690d82c3c2652e Mon Sep 17 00:00:00 2001 From: Yuriy Glukhov Date: Wed, 6 Feb 2019 10:57:08 +0100 Subject: [PATCH] Added docs --- doc/bloom.md | 32 +++++ doc/keyfile.md | 5 + doc/keys.md | 8 ++ doc/p2p.md | 279 ++++++++++++++++++++++++++++++++++++++++ doc/rlp.md | 142 +++++++++++++++++++++ doc/trie.md | 338 +++++++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 804 insertions(+) create mode 100644 doc/bloom.md create mode 100644 doc/keyfile.md create mode 100644 doc/keys.md create mode 100644 doc/p2p.md create mode 100644 doc/rlp.md create mode 100644 doc/trie.md diff --git a/doc/bloom.md b/doc/bloom.md new file mode 100644 index 0000000..c840254 --- /dev/null +++ b/doc/bloom.md @@ -0,0 +1,32 @@ +# eth_bloom: an Ethereum Bloom Filter + +# Introduction + +A Nim implementation of the bloom filter used by Ethereum. + +# Description + +[Bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) are data structures that use hash functions to test whether an element is a member of a set. They work like other data structures but are probabilistic in nature: that is, they allow false positive matches but not false negative. Bloom filters use less storage space than other data structures. + +Ethereum bloom filters are implemented with the Keccak-256 cryptographic hash function. + +To see the bloom filter used in the context of Ethereum, please refer to the [Ethereum Yellow Paper](https://ethereum.github.io/yellowpaper/paper.pdf). + + +# Installation +``` +$ nimble install eth_bloom +``` + +# Usage +```nim +import eth_bloom, stint +var f: BloomFilter +f.incl("test1") +assert("test1" in f) +assert("test2" notin f) +f.incl("test2") +assert("test2" in f) +assert(f.value.toHex == "80000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000200000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000000000") +``` + diff --git a/doc/keyfile.md b/doc/keyfile.md new file mode 100644 index 0000000..7146e0c --- /dev/null +++ b/doc/keyfile.md @@ -0,0 +1,5 @@ +# eth_keyfile + +## Introduction +This library is a Nim reimplementation of [ethereum/eth-keyfile](https://github.com/ethereum/eth-keyfile), which is used to create and load Ethereum `keyfile` format and the tools for handling the format and for storing private keys. Currently, the library supports only the PBKDF2 method and does not support the Scrypt method. + diff --git a/doc/keys.md b/doc/keys.md new file mode 100644 index 0000000..9c05194 --- /dev/null +++ b/doc/keys.md @@ -0,0 +1,8 @@ +# eth_keys + +This library is a Nim re-implementation of [eth-keys](https://github.com/ethereum/eth-keys): the common API for working with Ethereum's public and private keys, signatures, and addresses. + +By default, Nim eth-keys uses Bitcoin's [libsecp256k1](https://github.com/bitcoin-core/secp256k1) as a backend. Make sure libsecp256k1 is available on your system. + +An experimental pure Nim backend (Warning ⚠: do not use in production) is available with the compilation switch `-d:backend_native` + diff --git a/doc/p2p.md b/doc/p2p.md new file mode 100644 index 0000000..61367c3 --- /dev/null +++ b/doc/p2p.md @@ -0,0 +1,279 @@ +# eth_p2p + +## Introduction + +This library implements the DevP2P family of networking protocols used +in the Ethereum world. + +## Connecting to the Ethereum network + +A connection to the Ethereum network can be created by instantiating +the `EthereumNode` type: + +``` nim +proc newEthereumNode*(keys: KeyPair, + listeningAddress: Address, + networkId: uint, + chain: AbstractChainDB, + clientId = "nim-eth-p2p", + addAllCapabilities = true): EthereumNode = +``` + +#### Parameters: + +`keys`: + A pair of public and private keys used to authenticate the node + on the network and to determine its node ID. + See the [eth_keys](https://github.com/status-im/nim-eth-keys) + library for utilities that will help you generate and manage + such keys. + +`listeningAddress`: + The network interface and port where your client will be + accepting incoming connections. + +`networkId`: + The Ethereum network ID. The client will disconnect immediately + from any peers who don't use the same network. + +`chain`: + An abstract instance of the Ethereum blockchain associated + with the node. This library allows you to plug any instance + conforming to the abstract interface defined in the + [eth_common](https://github.com/status-im/nim-eth-common) + package. + +`clientId`: + A name used to identify the software package connecting + to the network (i.e. similar to the `User-Agent` string + in a browser). + +`addAllCapabilities`: + By default, the node will support all RPLx protocols imported in + your project. You can specify `false` if you prefer to create a + node with a more limited set of protocols. Use one or more calls + to `node.addCapability` to specify the desired set: + + ```nim + node.addCapability(eth) + node.addCapability(ssh) + ``` + + Each supplied protocol identifier is a name of a protocol introduced + by the `p2pProtocol` macro discussed later in this document. + +Instantiating an `EthereumNode` does not immediately connect you to +the network. To start the connection process, call `node.connectToNetwork`: + +``` nim +proc connectToNetwork*(node: var EthereumNode, + bootstrapNodes: openarray[ENode], + startListening = true, + enableDiscovery = true) +``` + +The `EthereumNode` will automatically find and maintan a pool of peers +using the Ethereum node discovery protocol. You can access the pool as +`node.peers`. + +## Communicating with Peers using RLPx + +[RLPx](https://github.com/ethereum/devp2p/blob/master/rlpx.md) is the +high-level protocol for exchanging messages between peers in the Ethereum +network. Most of the client code of this library should not be concerned +with the implementation details of the underlying protocols and should use +the high-level APIs described in this section. + +The RLPx protocols are defined as a collection of strongly-typed messages, +which are grouped into sub-protocols multiplexed over the same TCP connection. + +This library represents each such message as a regular Nim function call +over the `Peer` object. Certain messages act only as notifications, while +others fit the request/response pattern. + +To understand more about how messages are defined and used, let's look at +the definition of a RLPx protocol: + +### RLPx sub-protocols + +The sub-protocols are defined with the `p2pProtocol` macro. It will accept +a 3-letter identifier for the protocol and the current protocol version: + +Here is how the [DevP2P wire protocol](https://github.com/ethereum/wiki/wiki/%C3%90%CE%9EVp2p-Wire-Protocol) might look like: + +``` nim +p2pProtocol p2p(version = 0): + proc hello(peer: Peer, + version: uint, + clientId: string, + capabilities: openarray[Capability], + listenPort: uint, + nodeId: P2PNodeId) = + peer.id = nodeId + + proc disconnect(peer: Peer, reason: DisconnectionReason) + + proc ping(peer: Peer) = + await peer.pong() + + proc pong(peer: Peer) = + echo "received pong from ", peer.id +``` + +As seen in the example above, a protocol definition determines both the +available messages that can be sent to another peer (e.g. as in `peer.pong()`) +and the asynchronous code responsible for handling the incoming messages. + +### Protocol state + +The protocol implementations are expected to maintain a state and to act +like a state machine handling the incoming messages. You are allowed to +define an arbitrary state type that can be specified in the `peerState` +protocol option. Later, instances of the state object can be obtained +though the `state` pseudo-field of the `Peer` object: + +``` nim +type AbcPeerState = object + receivedMsgsCount: int + +p2pProtocol abc(version = 1, + peerState = AbcPeerState): + + proc incomingMessage(p: Peer) = + p.state.receivedMsgsCount += 1 + +``` + +Besides the per-peer state demonstrated above, there is also support +for maintaining a network-wide state. It's enabled by specifying the +`networkState` option of the protocol and the state object can be obtained +through accessor of the same name. + +The state objects are initialized to zero by default, but you can modify +this behaviour by overriding the following procs for your state types: + +```nim +proc initProtocolState*(state: MyPeerState, p: Peer) +proc initProtocolState*(state: MyNetworkState, n: EthereumNode) +``` + +Sometimes, you'll need to access the state of another protocol. +To do this, specify the protocol identifier to the `state` accessors: + +``` nim + echo "ABC protocol messages: ", peer.state(abc).receivedMsgCount +``` + +While the state machine approach may be a particularly robust way of +implementing sub-protocols (it is more amenable to proving the correctness +of the implementation through formal verification methods), sometimes it may +be more convenient to use more imperative style of communication where the +code is able to wait for a particular response after sending a particular +request. The library provides two mechanisms for achieving this: + +### Waiting particular messages with `nextMsg` + +The `nextMsg` helper proc can be used to pause the execution of an async +proc until a particular incoming message from a peer arrives: + +``` nim +proc handshakeExample(peer: Peer) {.async.} = + ... + # send a hello message + peer.hello(...) + + # wait for a matching hello response + let response = await peer.nextMsg(p2p.hello) + echo response.clientId # print the name of the Ethereum client + # used by the other peer (Geth, Parity, Nimbus, etc) +``` + +There are few things to note in the above example: + +1. The `p2pProtocol` definition created a pseudo-variable named after the + protocol holding various properties of the protocol. + +2. Each message defined in the protocol received a corresponding type name, + matching the message name (e.g. `p2p.hello`). This type will have fields + matching the parameter names of the message. If the messages has `openarray` + params, these will be remapped to `seq` types. + +If the designated messages also has an attached handler, the future returned +by `nextMsg` will be resolved only after the handler has been fully executed +(so you can count on any side effects produced by the handler to have taken +place). If there are multiple outstanding calls to `nextMsg`, they will +complete together. Any other messages received in the meantime will still +be dispatched to their respective handlers. + +### `requestResponse` pairs + +``` nim +p2pProtocol les(version = 2): + ... + + requestResponse: + proc getProofs(p: Peer, proofs: openarray[ProofRequest]) + proc proofs(p: Peer, BV: uint, proofs: openarray[Blob]) + + ... +``` + +Two or more messages within the protocol may be grouped into a +`requestResponse` block. The last message in the group is assumed +to be the response while all other messages are considered requests. + +When a request message is sent, the return type will be a `Future` +that will be completed once the response is received. Please note +that there is a mandatory timeout parameter, so the actual return +type is `Future[Option[MessageType]]`. The `timeout` parameter can +be specified for each individual call and the default value can be +overridden on the level of individual message, or the entire protocol: + +``` nim +p2pProtocol abc(version = 1, + useRequestIds = false, + timeout = 5000): # value in milliseconds + requestResponse: + proc myReq(dataId: int, timeout = 3000) + proc myRes(data: string) +``` + +By default, the library will take care of inserting a hidden `reqId` +parameter as used in the [LES protocol](https://github.com/zsfelfoldi/go-ethereum/wiki/Light-Ethereum-Subprotocol-%28LES%29), +but you can disable this behavior by overriding the protocol setting +`useRequestIds`. + +### Implementing handshakes and reacting to other events + +Besides message definitions and implementations, a protocol specification may +also include handlers for certain important events such as newly connected +peers or misbehaving or disconnecting peers: + +``` nim +p2pProtocol les(version = 2): + onPeerConnected do (peer: Peer): + asyncCheck peer.status [ + "networkId": rlp.encode(1), + "keyGenesisHash": rlp.encode(peer.network.chain.genesisHash) + ... + ] + + let otherPeerStatus = await peer.nextMsg(les.status) + ... + + onPeerDisconnected do (peer: Peer, reason: DisconnectionReason): + debug "peer disconnected", peer +``` + +### Checking the other peer's supported sub-protocols + +Upon establishing a connection, RLPx will automatically negotiate the list of +mutually supported protocols by the peers. To check whether a particular peer +supports a particular sub-protocol, use the following code: + +``` nim +if peer.supports(les): # `les` is the identifier of the light clients sub-protocol + peer.getReceipts(nextReqId(), neededReceipts()) + +``` + diff --git a/doc/rlp.md b/doc/rlp.md new file mode 100644 index 0000000..4fa85c7 --- /dev/null +++ b/doc/rlp.md @@ -0,0 +1,142 @@ +# rlp + +## Introduction + +A Nim implementation of the Recursive Length Prefix encoding (RLP) as specified +in the Ethereum's [Yellow Papper](https://ethereum.github.io/yellowpaper/paper.pdf) +and [Wiki](https://github.com/ethereum/wiki/wiki/RLP). + + +## Reading RLP data + +The `Rlp` type provided by this library represents a cursor over a RLP-encoded +byte stream. Before instantiating such a cursor, you must convert your +input data a `BytesRange` value provided by the [nim-ranges][RNG] library, +which represents an immutable and thus cheap-to-copy sub-range view over an +underlying `seq[byte]` instance: + +[RNG]: https://github.com/status-im/nim-ranges + +``` nim +proc rlpFromBytes*(data: BytesRange): Rlp +``` + +### Streaming API + +Once created, the `Rlp` object will offer procs such as `isList`, `isBlob`, +`getType`, `listLen`, `blobLen` to determine the type of the value under +the cursor. The contents of blobs can be extracted with procs such as +`toString`, `toBytes` and `toInt` without advancing the cursor. + +Lists can be traversed with the standard `items` iterator, which will advance +the cursor to each sub-item position and yield the `Rlp` object at that point. +As an alternative, `listElem` can return a new `Rlp` object adjusted to a +particular sub-item position without advancing the original cursor. +Keep in mind that copying `Rlp` objects is cheap and you can create as many +cursors pointing to different positions in the RLP stream as necessary. + +`skipElem` will advance the cursor to the next position in the current list. +`hasData` will indicate that there are no more bytes in the stream that can +be consumed. + +Another way to extract data from the stream is through the universal `read` +proc that accepts a type as a parameter. You can pass any supported type +such as `string`, `int`, `seq[T]`, etc, including composite user-defined +types (see [Object Serialization](#object-serialization)). The cursor +will be advanced just past the end of the consumed object. + +The `toXX` and `read` family of procs may raise a `RlpTypeMismatch` in case +of type mismatch with the stream contents under the cursor. A corrupted +RLP stream or an attemp to read past the stream end will be signaled +with the `MalformedRlpError` exception. If the RLP stream includes data +that cannot be processed on the current platform (e.g. an integer value +that is too large), the library will raise an `UnsupportedRlpError` exception. + +### DOM API + +Calling `Rlp.toNodes` at any position within the stream will return a tree +of `RlpNode` objects representing the collection of values begging at that +position: + +``` nim +type + RlpNodeType* = enum + rlpBlob + rlpList + + RlpNode* = object + case kind*: RlpNodeType + of rlpBlob: + bytes*: BytesRange + of rlpList: + elems*: seq[RlpNode] +``` + +As a short-cut, you can also call `decode` directly on a byte sequence to +avoid creating a `Rlp` object when obtaining the nodes. +For debugging purposes, you can also create a human readable representation +of the Rlp nodes by calling the `inspect` proc: + +``` nim +proc inspect*(self: Rlp, indent = 0): string +``` + +## Creating RLP data + +The `RlpWriter` type can be used to encode RLP data. Instances are created +with the `initRlpWriter` proc. This should be followed by one or more calls +to `append` which is overloaded to accept arbitrary values. Finally, you can +call `finish` to obtain the final `BytesRange`. + +If the end result should by a RLP list of particular length, you can replace +the initial call to `initRlpWriter` with `initRlpList(n)`. Calling `finish` +before writing a sufficient number of elements will then result in a +`PrematureFinalizationError`. + +As an alternative short-cut, you can also call `encode` on an arbitrary value +(including sequences and user-defined types) to execute all of the steps at +once and directly obtain the final RLP bytes. `encodeList(varargs)` is another +short-cut for creating RLP lists. + +## Object serialization + +As previously explained, generic procs such as `read`, `append`, `encode` and +`decode` can be used with arbitrary used-defined object types. By default, the +library will serialize all of the fields of the object using the `fields` +iterator, but you can also include only a subset of the fields or modify the +order of serialization or by employing the `rlpIgnore` pragma or by using the +`rlpFields` macro: + +``` nim +macro rlpFields*(T: typedesc, fields: varargs[untyped]) + +## example usage: + +type + Transaction = object + amount: int + time: DateTime + sender: string + receiver: string + +rlpFields Transaction, + sender, receiver, amount + +... + +var t1 = rlp.read(Transaction) +var bytes = encode(t1) +var t2 = bytes.decode(Transaction) +``` + +By default, sub-fields within objects are wrapped in RLP lists. You can avoid this +behavior by adding the custom pragma `rlpInline` on a particular field. In rare +circumstances, you may need to serialize the same field type differently depending +on the enclosing object type. You can use the `rlpCustomSerialization` pragma to +achieve this. + +## Contributing / Testing + +To test the correctness of any modifications to the library, please execute +`nimble test` at the root of the repo. + diff --git a/doc/trie.md b/doc/trie.md new file mode 100644 index 0000000..686ab12 --- /dev/null +++ b/doc/trie.md @@ -0,0 +1,338 @@ +# nim-trie +Nim Implementation of the Ethereum Trie structure +--- + +## Hexary Trie + +## Binary Trie + +Binary-trie is a dictionary-like data structure to store key-value pair. +Much like it's sibling Hexary-trie, the key-value pair will be stored into key-value flat-db. +The primary difference with Hexary-trie is, each node of Binary-trie only consist of one or two child, +while Hexary-trie node can contains up to 16 or 17 child-nodes. + +Unlike Hexary-trie, Binary-trie store it's data into flat-db without using rlp encoding. +Binary-trie store its value using simple **Node-Types** encoding. +The encoded-node will be hashed by keccak_256 and the hash value will be the key to flat-db. +Each entry in the flat-db will looks like: + +| key | value | +|----------------------|--------------------------------------------| +| 32-bytes-keccak-hash | encoded-node(KV or BRANCH or LEAF encoded) | + +### Node-Types +* KV = [0, encoded-key-path, 32 bytes hash of child] +* BRANCH = [1, 32 bytes hash of left child, 32 bytes hash of right child] +* LEAF = [2, value] + +The KV node can have BRANCH node or LEAF node as it's child, but cannot a KV node. +The internal algorithm will merge a KV(parent)->KV(child) into one KV node. +Every KV node contains encoded keypath to reduce the number of blank nodes. + +The BRANCH node can have KV, BRANCH, or LEAF node as it's children. + +The LEAF node is the terminal node, it contains the value of a key. + +### encoded-key-path + +While Hexary-trie encode the path using Hex-Prefix encoding, Binary-trie +encode the path using binary encoding, the scheme looks like this table below. + +```text + |--------- odd --------| + 00mm yyyy xxxx xxxx xxxx xxxx + |------ even -----| + 1000 00mm yyyy xxxx xxxx xxxx +``` + +| symbol | explanation | +|--------|--------------------------| +| xxxx | nibble of binary keypath in bits, 0 = left, 1 = right| +| yyyy | nibble contains 0-3 bits padding + binary keypath | +| mm | number of binary keypath bits modulo 4 (0-3) | +| 00 | zero zero prefix | +| 1000 | even numbered nibbles prefix | + +if there is no padding, then yyyy bit sequence is absent, mm also zero. +yyyy = mm bits + padding bits must be 4 bits length. + +### The API + +The primary API for Binary-trie is `set` and `get`. +* set(key, value) --- _store a value associated with a key_ +* get(key): value --- _get a value using a key_ + +Both `key` and `value` are of `BytesRange` type. And they cannot have zero length. +You can also use convenience API `get` and `set` which accepts +`Bytes` or `string` (a `string` is conceptually wrong in this context +and may costlier than a `BytesRange`, but it is good for testing purpose). + +Getting a non-existent key will return zero length BytesRange. + +Binary-trie also provide dictionary syntax API for `set` and `get`. +* trie[key] = value -- same as `set` +* value = trie[key] -- same as `get` +* contains(key) a.k.a. `in` operator + +Additional APIs are: + * exists(key) -- returns `bool`, to check key-value existence -- same as contains + * delete(key) -- remove a key-value from the trie + * deleteSubtrie(key) -- remove a key-value from the trie plus all of it's subtrie + that starts with the same key prefix + * rootNode() -- get root node + * rootNode(node) -- replace the root node + * getRootHash(): `KeccakHash` with `BytesRange` type + * getDB(): `DB` -- get flat-db pointer + +Constructor API: + * initBinaryTrie(DB, rootHash[optional]) -- rootHash has `BytesRange` or KeccakHash type + * init(BinaryTrie, DB, rootHash[optional]) + +Normally you would not set the rootHash when constructing an empty Binary-trie. +Setting the rootHash occured in a scenario where you have a populated DB +with existing trie structure and you know the rootHash, +and then you want to continue/resume the trie operations. + +## Examples + +```Nim +import + eth_trie/[db, binary, utils] + +var db = newMemoryDB() +var trie = initBinaryTrie(db) +trie.set("key1", "value1") +trie.set("key2", "value2") +assert trie.get("key1") == "value1".toRange +assert trie.get("key2") == "value2".toRange + +# delete all subtrie with key prefixes "key" +trie.deleteSubtrie("key") +assert trie.get("key1") == zeroBytesRange +assert trie.get("key2") == zeroBytesRange + +trie["moon"] = "sun" +assert "moon" in trie +assert trie["moon"] == "sun".toRange +``` + +Remember, `set` and `get` are trie operations. A single `set` operation may invoke +more than one store/lookup operation into the underlying DB. The same is also happened to `get` operation, +it could do more than one flat-db lookup before it return the requested value. + +## The truth behind a lie + +What kind of lie? actually, `delete` and `deleteSubtrie` doesn't remove the +'deleted' node from the underlying DB. It only make the node inaccessible +from the user of the trie. The same also happened if you update the value of a key, +the old value node is not removed from the underlying DB. +A more subtle lie also happened when you add new entrie into the trie using `set` operation. +The previous hash of affected branch become obsolete and replaced by new hash, +the old hash become inaccessible to the user. +You may think that is a waste of storage space. +Luckily, we also provide some utilities to deal with this situation, the branch utils. + +## The branch utils + +The branch utils consist of these API: + * checkIfBranchExist(DB; rootHash; keyPrefix): bool + * getBranch(DB; rootHash; key): branch + * isValidBranch(branch, rootHash, key, value): bool + * getWitness(DB; nodeHash; key): branch + * getTrieNodes(DB; nodeHash): branch + +`keyPrefix`, `key`, and `value` are bytes container with length greater than zero. +They can be BytesRange, Bytes, and string(again, for convenience and testing purpose). + +`rootHash` and `nodeHash` also bytes container, +but they have constraint: must be 32 bytes in length, and it must be a keccak_256 hash value. + +`branch` is a list of nodes, or in this case a seq[BytesRange]. +A list? yes, the structure is stored along with the encoded node. +Therefore a list is enough to reconstruct the entire trie/branch. + +```Nim +import + eth_trie/[db, binary, utils] + +var db = newMemoryDB() +var trie = initBinaryTrie(db) +trie.set("key1", "value1") +trie.set("key2", "value2") + +assert checkIfBranchExist(db, trie.getRootHash(), "key") == true +assert checkIfBranchExist(db, trie.getRootHash(), "key1") == true +assert checkIfBranchExist(db, trie.getRootHash(), "ken") == false +assert checkIfBranchExist(db, trie.getRootHash(), "key123") == false +``` + +The tree will looks like: +```text + root ---> A(kvnode, *common key prefix*) + | + | + | + B(branchnode) + / \ + / \ + / \ +C1(kvnode, *remain kepath*) C2(kvnode, *remain kepath*) + | | + | | + | | + D1(leafnode, b'value1') D2(leafnode, b'value2') +``` + +```Nim +var branchA = getBranch(db, trie.getRootHash(), "key1") +# ==> [A, B, C1, D1] + +var branchB = getBranch(db, trie.getRootHash(), "key2") +# ==> [A, B, C2, D2] + +assert isValidBranch(branchA, trie.getRootHash(), "key1", "value1") == true +# wrong key, return zero bytes +assert isValidBranch(branchA, trie.getRootHash(), "key5", "") == true + +assert isValidBranch(branchB, trie.getRootHash(), "key1", "value1") # InvalidNode + +var x = getBranch(db, trie.getRootHash(), "key") +# ==> [A] + +x = getBranch(db, trie.getRootHash(), "key123") # InvalidKeyError +x = getBranch(db, trie.getRootHash(), "key5") # there is still branch for non-exist key +# ==> [A] + +var branch = getWitness(db, trie.getRootHash(), "key1") +# equivalent to `getBranch(db, trie.getRootHash(), "key1")` +# ==> [A, B, C1, D1] + +branch = getWitness(db, trie.getRootHash(), "key") +# this will include additional nodes of "key2" +# ==> [A, B, C1, D1, C2, D2] + +var wholeTrie = getWitness(db, trie.getRootHash(), "") +# this will return the whole trie +# ==> [A, B, C1, D1, C2, D2] + +var node = branch[1] # B +let nodeHash = keccak256.digest(node.baseAddr, uint(node.len)) +var nodes = getTrieNodes(db, nodeHash) +assert nodes.len == wholeTrie.len - 1 +# ==> [B, C1, D1, C2, D2] +``` + +## Remember the lie? + +Because trie `delete`, `deleteSubtrie` and `set` operation create inaccessible nodes in the underlying DB, +we need to remove them if necessary. We already see that `wholeTrie = getWitness(db, trie.getRootHash(), "")` +will return the whole trie, a list of accessible nodes. +Then we can write the clean tree into a new DB instance to replace the old one. + + +## Sparse Merkle Trie + +Sparse Merkle Trie(SMT) is a variant of Binary Trie which uses binary encoding to +represent path during trie travelsal. When Binary Trie uses three types of node, +SMT only use one type of node without any additional special encoding to store it's key-path. + +Actually, it doesn't even store it's key-path anywhere like Binary Trie, +the key-path is stored implicitly in the trie structure during key-value insertion. + +Because the key-path is not encoded in any special ways, the bits can be extracted directly from +the key without any conversion. + +However, the key restricted to a fixed length because the algorithm demand a fixed height trie +to works properly. In this case, the trie height is limited to 160 level, +or the key is of fixed length 20 bytes (8 bits x 20 = 160). + +To be able to use variable length key, the algorithm can be adapted slightly using hashed key before +constructing the binary key-path. For example, if using keccak256 as the hashing function, +then the height of the tree will be 256, but the key itself can be any length. + +### The API + +The primary API for Binary-trie is `set` and `get`. +* set(key, value, rootHash[optional]) --- _store a value associated with a key_ +* get(key, rootHash[optional]): value --- _get a value using a key_ + +Both `key` and `value` are of `BytesRange` type. And they cannot have zero length. +You can also use convenience API `get` and `set` which accepts +`Bytes` or `string` (a `string` is conceptually wrong in this context +and may costlier than a `BytesRange`, but it is good for testing purpose). + +rootHash is an optional parameter. When used, `get` will get a key from specific root, +and `set` will also set a key at specific root. + +Getting a non-existent key will return zero length BytesRange or a zeroBytesRange. + +Sparse Merkle Trie also provide dictionary syntax API for `set` and `get`. + * trie[key] = value -- same as `set` + * value = trie[key] -- same as `get` + * contains(key) a.k.a. `in` operator + +Additional APIs are: + * exists(key) -- returns `bool`, to check key-value existence -- same as contains + * delete(key) -- remove a key-value from the trie + * getRootHash(): `KeccakHash` with `BytesRange` type + * getDB(): `DB` -- get flat-db pointer + * prove(key, rootHash[optional]): proof -- useful for merkling + +Constructor API: + * initSparseBinaryTrie(DB, rootHash[optional]) + * init(SparseBinaryTrie, DB, rootHash[optional]) + +Normally you would not set the rootHash when constructing an empty Sparse Merkle Trie. +Setting the rootHash occured in a scenario where you have a populated DB +with existing trie structure and you know the rootHash, +and then you want to continue/resume the trie operations. + +## Examples + +```Nim +import + eth_trie/[db, sparse_binary, utils] + +var + db = newMemoryDB() + trie = initSparseMerkleTrie(db) + +let + key1 = "01234567890123456789" + key2 = "abcdefghijklmnopqrst" + +trie.set(key1, "value1") +trie.set(key2, "value2") +assert trie.get(key1) == "value1".toRange +assert trie.get(key2) == "value2".toRange + +trie.delete(key1) +assert trie.get(key1) == zeroBytesRange + +trie.delete(key2) +assert trie[key2] == zeroBytesRange +``` + +Remember, `set` and `get` are trie operations. A single `set` operation may invoke +more than one store/lookup operation into the underlying DB. The same is also happened to `get` operation, +it could do more than one flat-db lookup before it return the requested value. +While Binary Trie perform a variable numbers of lookup and store operations, Sparse Merkle Trie +will do constant numbers of lookup and store operations each `get` and `set` operation. + +## Merkle Proofing + +Using ``prove`` dan ``verifyProof`` API, we can do some merkling with SMT. + +```Nim + let + value1 = "hello world" + badValue = "bad value" + + trie[key1] = value1 + var proof = trie.prove(key1) + + assert verifyProof(proof, trie.getRootHash(), key1, value1) == true + assert verifyProof(proof, trie.getRootHash(), key1, badValue) == false + assert verifyProof(proof, trie.getRootHash(), key2, value1) == false +``` +