mirror of https://github.com/status-im/nim-eth.git
Added docs
This commit is contained in:
parent
fe9fa7689b
commit
db6d7888c7
|
@ -0,0 +1,32 @@
|
|||
# eth_bloom: an Ethereum Bloom Filter
|
||||
|
||||
# Introduction
|
||||
|
||||
A Nim implementation of the bloom filter used by Ethereum.
|
||||
|
||||
# Description
|
||||
|
||||
[Bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) are data structures that use hash functions to test whether an element is a member of a set. They work like other data structures but are probabilistic in nature: that is, they allow false positive matches but not false negative. Bloom filters use less storage space than other data structures.
|
||||
|
||||
Ethereum bloom filters are implemented with the Keccak-256 cryptographic hash function.
|
||||
|
||||
To see the bloom filter used in the context of Ethereum, please refer to the [Ethereum Yellow Paper](https://ethereum.github.io/yellowpaper/paper.pdf).
|
||||
|
||||
|
||||
# Installation
|
||||
```
|
||||
$ nimble install eth_bloom
|
||||
```
|
||||
|
||||
# Usage
|
||||
```nim
|
||||
import eth_bloom, stint
|
||||
var f: BloomFilter
|
||||
f.incl("test1")
|
||||
assert("test1" in f)
|
||||
assert("test2" notin f)
|
||||
f.incl("test2")
|
||||
assert("test2" in f)
|
||||
assert(f.value.toHex == "80000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000200000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000000000")
|
||||
```
|
||||
|
|
@ -0,0 +1,5 @@
|
|||
# eth_keyfile
|
||||
|
||||
## Introduction
|
||||
This library is a Nim reimplementation of [ethereum/eth-keyfile](https://github.com/ethereum/eth-keyfile), which is used to create and load Ethereum `keyfile` format and the tools for handling the format and for storing private keys. Currently, the library supports only the PBKDF2 method and does not support the Scrypt method.
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
# eth_keys
|
||||
|
||||
This library is a Nim re-implementation of [eth-keys](https://github.com/ethereum/eth-keys): the common API for working with Ethereum's public and private keys, signatures, and addresses.
|
||||
|
||||
By default, Nim eth-keys uses Bitcoin's [libsecp256k1](https://github.com/bitcoin-core/secp256k1) as a backend. Make sure libsecp256k1 is available on your system.
|
||||
|
||||
An experimental pure Nim backend (Warning ⚠: do not use in production) is available with the compilation switch `-d:backend_native`
|
||||
|
|
@ -0,0 +1,279 @@
|
|||
# eth_p2p
|
||||
|
||||
## Introduction
|
||||
|
||||
This library implements the DevP2P family of networking protocols used
|
||||
in the Ethereum world.
|
||||
|
||||
## Connecting to the Ethereum network
|
||||
|
||||
A connection to the Ethereum network can be created by instantiating
|
||||
the `EthereumNode` type:
|
||||
|
||||
``` nim
|
||||
proc newEthereumNode*(keys: KeyPair,
|
||||
listeningAddress: Address,
|
||||
networkId: uint,
|
||||
chain: AbstractChainDB,
|
||||
clientId = "nim-eth-p2p",
|
||||
addAllCapabilities = true): EthereumNode =
|
||||
```
|
||||
|
||||
#### Parameters:
|
||||
|
||||
`keys`:
|
||||
A pair of public and private keys used to authenticate the node
|
||||
on the network and to determine its node ID.
|
||||
See the [eth_keys](https://github.com/status-im/nim-eth-keys)
|
||||
library for utilities that will help you generate and manage
|
||||
such keys.
|
||||
|
||||
`listeningAddress`:
|
||||
The network interface and port where your client will be
|
||||
accepting incoming connections.
|
||||
|
||||
`networkId`:
|
||||
The Ethereum network ID. The client will disconnect immediately
|
||||
from any peers who don't use the same network.
|
||||
|
||||
`chain`:
|
||||
An abstract instance of the Ethereum blockchain associated
|
||||
with the node. This library allows you to plug any instance
|
||||
conforming to the abstract interface defined in the
|
||||
[eth_common](https://github.com/status-im/nim-eth-common)
|
||||
package.
|
||||
|
||||
`clientId`:
|
||||
A name used to identify the software package connecting
|
||||
to the network (i.e. similar to the `User-Agent` string
|
||||
in a browser).
|
||||
|
||||
`addAllCapabilities`:
|
||||
By default, the node will support all RPLx protocols imported in
|
||||
your project. You can specify `false` if you prefer to create a
|
||||
node with a more limited set of protocols. Use one or more calls
|
||||
to `node.addCapability` to specify the desired set:
|
||||
|
||||
```nim
|
||||
node.addCapability(eth)
|
||||
node.addCapability(ssh)
|
||||
```
|
||||
|
||||
Each supplied protocol identifier is a name of a protocol introduced
|
||||
by the `p2pProtocol` macro discussed later in this document.
|
||||
|
||||
Instantiating an `EthereumNode` does not immediately connect you to
|
||||
the network. To start the connection process, call `node.connectToNetwork`:
|
||||
|
||||
``` nim
|
||||
proc connectToNetwork*(node: var EthereumNode,
|
||||
bootstrapNodes: openarray[ENode],
|
||||
startListening = true,
|
||||
enableDiscovery = true)
|
||||
```
|
||||
|
||||
The `EthereumNode` will automatically find and maintan a pool of peers
|
||||
using the Ethereum node discovery protocol. You can access the pool as
|
||||
`node.peers`.
|
||||
|
||||
## Communicating with Peers using RLPx
|
||||
|
||||
[RLPx](https://github.com/ethereum/devp2p/blob/master/rlpx.md) is the
|
||||
high-level protocol for exchanging messages between peers in the Ethereum
|
||||
network. Most of the client code of this library should not be concerned
|
||||
with the implementation details of the underlying protocols and should use
|
||||
the high-level APIs described in this section.
|
||||
|
||||
The RLPx protocols are defined as a collection of strongly-typed messages,
|
||||
which are grouped into sub-protocols multiplexed over the same TCP connection.
|
||||
|
||||
This library represents each such message as a regular Nim function call
|
||||
over the `Peer` object. Certain messages act only as notifications, while
|
||||
others fit the request/response pattern.
|
||||
|
||||
To understand more about how messages are defined and used, let's look at
|
||||
the definition of a RLPx protocol:
|
||||
|
||||
### RLPx sub-protocols
|
||||
|
||||
The sub-protocols are defined with the `p2pProtocol` macro. It will accept
|
||||
a 3-letter identifier for the protocol and the current protocol version:
|
||||
|
||||
Here is how the [DevP2P wire protocol](https://github.com/ethereum/wiki/wiki/%C3%90%CE%9EVp2p-Wire-Protocol) might look like:
|
||||
|
||||
``` nim
|
||||
p2pProtocol p2p(version = 0):
|
||||
proc hello(peer: Peer,
|
||||
version: uint,
|
||||
clientId: string,
|
||||
capabilities: openarray[Capability],
|
||||
listenPort: uint,
|
||||
nodeId: P2PNodeId) =
|
||||
peer.id = nodeId
|
||||
|
||||
proc disconnect(peer: Peer, reason: DisconnectionReason)
|
||||
|
||||
proc ping(peer: Peer) =
|
||||
await peer.pong()
|
||||
|
||||
proc pong(peer: Peer) =
|
||||
echo "received pong from ", peer.id
|
||||
```
|
||||
|
||||
As seen in the example above, a protocol definition determines both the
|
||||
available messages that can be sent to another peer (e.g. as in `peer.pong()`)
|
||||
and the asynchronous code responsible for handling the incoming messages.
|
||||
|
||||
### Protocol state
|
||||
|
||||
The protocol implementations are expected to maintain a state and to act
|
||||
like a state machine handling the incoming messages. You are allowed to
|
||||
define an arbitrary state type that can be specified in the `peerState`
|
||||
protocol option. Later, instances of the state object can be obtained
|
||||
though the `state` pseudo-field of the `Peer` object:
|
||||
|
||||
``` nim
|
||||
type AbcPeerState = object
|
||||
receivedMsgsCount: int
|
||||
|
||||
p2pProtocol abc(version = 1,
|
||||
peerState = AbcPeerState):
|
||||
|
||||
proc incomingMessage(p: Peer) =
|
||||
p.state.receivedMsgsCount += 1
|
||||
|
||||
```
|
||||
|
||||
Besides the per-peer state demonstrated above, there is also support
|
||||
for maintaining a network-wide state. It's enabled by specifying the
|
||||
`networkState` option of the protocol and the state object can be obtained
|
||||
through accessor of the same name.
|
||||
|
||||
The state objects are initialized to zero by default, but you can modify
|
||||
this behaviour by overriding the following procs for your state types:
|
||||
|
||||
```nim
|
||||
proc initProtocolState*(state: MyPeerState, p: Peer)
|
||||
proc initProtocolState*(state: MyNetworkState, n: EthereumNode)
|
||||
```
|
||||
|
||||
Sometimes, you'll need to access the state of another protocol.
|
||||
To do this, specify the protocol identifier to the `state` accessors:
|
||||
|
||||
``` nim
|
||||
echo "ABC protocol messages: ", peer.state(abc).receivedMsgCount
|
||||
```
|
||||
|
||||
While the state machine approach may be a particularly robust way of
|
||||
implementing sub-protocols (it is more amenable to proving the correctness
|
||||
of the implementation through formal verification methods), sometimes it may
|
||||
be more convenient to use more imperative style of communication where the
|
||||
code is able to wait for a particular response after sending a particular
|
||||
request. The library provides two mechanisms for achieving this:
|
||||
|
||||
### Waiting particular messages with `nextMsg`
|
||||
|
||||
The `nextMsg` helper proc can be used to pause the execution of an async
|
||||
proc until a particular incoming message from a peer arrives:
|
||||
|
||||
``` nim
|
||||
proc handshakeExample(peer: Peer) {.async.} =
|
||||
...
|
||||
# send a hello message
|
||||
peer.hello(...)
|
||||
|
||||
# wait for a matching hello response
|
||||
let response = await peer.nextMsg(p2p.hello)
|
||||
echo response.clientId # print the name of the Ethereum client
|
||||
# used by the other peer (Geth, Parity, Nimbus, etc)
|
||||
```
|
||||
|
||||
There are few things to note in the above example:
|
||||
|
||||
1. The `p2pProtocol` definition created a pseudo-variable named after the
|
||||
protocol holding various properties of the protocol.
|
||||
|
||||
2. Each message defined in the protocol received a corresponding type name,
|
||||
matching the message name (e.g. `p2p.hello`). This type will have fields
|
||||
matching the parameter names of the message. If the messages has `openarray`
|
||||
params, these will be remapped to `seq` types.
|
||||
|
||||
If the designated messages also has an attached handler, the future returned
|
||||
by `nextMsg` will be resolved only after the handler has been fully executed
|
||||
(so you can count on any side effects produced by the handler to have taken
|
||||
place). If there are multiple outstanding calls to `nextMsg`, they will
|
||||
complete together. Any other messages received in the meantime will still
|
||||
be dispatched to their respective handlers.
|
||||
|
||||
### `requestResponse` pairs
|
||||
|
||||
``` nim
|
||||
p2pProtocol les(version = 2):
|
||||
...
|
||||
|
||||
requestResponse:
|
||||
proc getProofs(p: Peer, proofs: openarray[ProofRequest])
|
||||
proc proofs(p: Peer, BV: uint, proofs: openarray[Blob])
|
||||
|
||||
...
|
||||
```
|
||||
|
||||
Two or more messages within the protocol may be grouped into a
|
||||
`requestResponse` block. The last message in the group is assumed
|
||||
to be the response while all other messages are considered requests.
|
||||
|
||||
When a request message is sent, the return type will be a `Future`
|
||||
that will be completed once the response is received. Please note
|
||||
that there is a mandatory timeout parameter, so the actual return
|
||||
type is `Future[Option[MessageType]]`. The `timeout` parameter can
|
||||
be specified for each individual call and the default value can be
|
||||
overridden on the level of individual message, or the entire protocol:
|
||||
|
||||
``` nim
|
||||
p2pProtocol abc(version = 1,
|
||||
useRequestIds = false,
|
||||
timeout = 5000): # value in milliseconds
|
||||
requestResponse:
|
||||
proc myReq(dataId: int, timeout = 3000)
|
||||
proc myRes(data: string)
|
||||
```
|
||||
|
||||
By default, the library will take care of inserting a hidden `reqId`
|
||||
parameter as used in the [LES protocol](https://github.com/zsfelfoldi/go-ethereum/wiki/Light-Ethereum-Subprotocol-%28LES%29),
|
||||
but you can disable this behavior by overriding the protocol setting
|
||||
`useRequestIds`.
|
||||
|
||||
### Implementing handshakes and reacting to other events
|
||||
|
||||
Besides message definitions and implementations, a protocol specification may
|
||||
also include handlers for certain important events such as newly connected
|
||||
peers or misbehaving or disconnecting peers:
|
||||
|
||||
``` nim
|
||||
p2pProtocol les(version = 2):
|
||||
onPeerConnected do (peer: Peer):
|
||||
asyncCheck peer.status [
|
||||
"networkId": rlp.encode(1),
|
||||
"keyGenesisHash": rlp.encode(peer.network.chain.genesisHash)
|
||||
...
|
||||
]
|
||||
|
||||
let otherPeerStatus = await peer.nextMsg(les.status)
|
||||
...
|
||||
|
||||
onPeerDisconnected do (peer: Peer, reason: DisconnectionReason):
|
||||
debug "peer disconnected", peer
|
||||
```
|
||||
|
||||
### Checking the other peer's supported sub-protocols
|
||||
|
||||
Upon establishing a connection, RLPx will automatically negotiate the list of
|
||||
mutually supported protocols by the peers. To check whether a particular peer
|
||||
supports a particular sub-protocol, use the following code:
|
||||
|
||||
``` nim
|
||||
if peer.supports(les): # `les` is the identifier of the light clients sub-protocol
|
||||
peer.getReceipts(nextReqId(), neededReceipts())
|
||||
|
||||
```
|
||||
|
|
@ -0,0 +1,142 @@
|
|||
# rlp
|
||||
|
||||
## Introduction
|
||||
|
||||
A Nim implementation of the Recursive Length Prefix encoding (RLP) as specified
|
||||
in the Ethereum's [Yellow Papper](https://ethereum.github.io/yellowpaper/paper.pdf)
|
||||
and [Wiki](https://github.com/ethereum/wiki/wiki/RLP).
|
||||
|
||||
|
||||
## Reading RLP data
|
||||
|
||||
The `Rlp` type provided by this library represents a cursor over a RLP-encoded
|
||||
byte stream. Before instantiating such a cursor, you must convert your
|
||||
input data a `BytesRange` value provided by the [nim-ranges][RNG] library,
|
||||
which represents an immutable and thus cheap-to-copy sub-range view over an
|
||||
underlying `seq[byte]` instance:
|
||||
|
||||
[RNG]: https://github.com/status-im/nim-ranges
|
||||
|
||||
``` nim
|
||||
proc rlpFromBytes*(data: BytesRange): Rlp
|
||||
```
|
||||
|
||||
### Streaming API
|
||||
|
||||
Once created, the `Rlp` object will offer procs such as `isList`, `isBlob`,
|
||||
`getType`, `listLen`, `blobLen` to determine the type of the value under
|
||||
the cursor. The contents of blobs can be extracted with procs such as
|
||||
`toString`, `toBytes` and `toInt` without advancing the cursor.
|
||||
|
||||
Lists can be traversed with the standard `items` iterator, which will advance
|
||||
the cursor to each sub-item position and yield the `Rlp` object at that point.
|
||||
As an alternative, `listElem` can return a new `Rlp` object adjusted to a
|
||||
particular sub-item position without advancing the original cursor.
|
||||
Keep in mind that copying `Rlp` objects is cheap and you can create as many
|
||||
cursors pointing to different positions in the RLP stream as necessary.
|
||||
|
||||
`skipElem` will advance the cursor to the next position in the current list.
|
||||
`hasData` will indicate that there are no more bytes in the stream that can
|
||||
be consumed.
|
||||
|
||||
Another way to extract data from the stream is through the universal `read`
|
||||
proc that accepts a type as a parameter. You can pass any supported type
|
||||
such as `string`, `int`, `seq[T]`, etc, including composite user-defined
|
||||
types (see [Object Serialization](#object-serialization)). The cursor
|
||||
will be advanced just past the end of the consumed object.
|
||||
|
||||
The `toXX` and `read` family of procs may raise a `RlpTypeMismatch` in case
|
||||
of type mismatch with the stream contents under the cursor. A corrupted
|
||||
RLP stream or an attemp to read past the stream end will be signaled
|
||||
with the `MalformedRlpError` exception. If the RLP stream includes data
|
||||
that cannot be processed on the current platform (e.g. an integer value
|
||||
that is too large), the library will raise an `UnsupportedRlpError` exception.
|
||||
|
||||
### DOM API
|
||||
|
||||
Calling `Rlp.toNodes` at any position within the stream will return a tree
|
||||
of `RlpNode` objects representing the collection of values begging at that
|
||||
position:
|
||||
|
||||
``` nim
|
||||
type
|
||||
RlpNodeType* = enum
|
||||
rlpBlob
|
||||
rlpList
|
||||
|
||||
RlpNode* = object
|
||||
case kind*: RlpNodeType
|
||||
of rlpBlob:
|
||||
bytes*: BytesRange
|
||||
of rlpList:
|
||||
elems*: seq[RlpNode]
|
||||
```
|
||||
|
||||
As a short-cut, you can also call `decode` directly on a byte sequence to
|
||||
avoid creating a `Rlp` object when obtaining the nodes.
|
||||
For debugging purposes, you can also create a human readable representation
|
||||
of the Rlp nodes by calling the `inspect` proc:
|
||||
|
||||
``` nim
|
||||
proc inspect*(self: Rlp, indent = 0): string
|
||||
```
|
||||
|
||||
## Creating RLP data
|
||||
|
||||
The `RlpWriter` type can be used to encode RLP data. Instances are created
|
||||
with the `initRlpWriter` proc. This should be followed by one or more calls
|
||||
to `append` which is overloaded to accept arbitrary values. Finally, you can
|
||||
call `finish` to obtain the final `BytesRange`.
|
||||
|
||||
If the end result should by a RLP list of particular length, you can replace
|
||||
the initial call to `initRlpWriter` with `initRlpList(n)`. Calling `finish`
|
||||
before writing a sufficient number of elements will then result in a
|
||||
`PrematureFinalizationError`.
|
||||
|
||||
As an alternative short-cut, you can also call `encode` on an arbitrary value
|
||||
(including sequences and user-defined types) to execute all of the steps at
|
||||
once and directly obtain the final RLP bytes. `encodeList(varargs)` is another
|
||||
short-cut for creating RLP lists.
|
||||
|
||||
## Object serialization
|
||||
|
||||
As previously explained, generic procs such as `read`, `append`, `encode` and
|
||||
`decode` can be used with arbitrary used-defined object types. By default, the
|
||||
library will serialize all of the fields of the object using the `fields`
|
||||
iterator, but you can also include only a subset of the fields or modify the
|
||||
order of serialization or by employing the `rlpIgnore` pragma or by using the
|
||||
`rlpFields` macro:
|
||||
|
||||
``` nim
|
||||
macro rlpFields*(T: typedesc, fields: varargs[untyped])
|
||||
|
||||
## example usage:
|
||||
|
||||
type
|
||||
Transaction = object
|
||||
amount: int
|
||||
time: DateTime
|
||||
sender: string
|
||||
receiver: string
|
||||
|
||||
rlpFields Transaction,
|
||||
sender, receiver, amount
|
||||
|
||||
...
|
||||
|
||||
var t1 = rlp.read(Transaction)
|
||||
var bytes = encode(t1)
|
||||
var t2 = bytes.decode(Transaction)
|
||||
```
|
||||
|
||||
By default, sub-fields within objects are wrapped in RLP lists. You can avoid this
|
||||
behavior by adding the custom pragma `rlpInline` on a particular field. In rare
|
||||
circumstances, you may need to serialize the same field type differently depending
|
||||
on the enclosing object type. You can use the `rlpCustomSerialization` pragma to
|
||||
achieve this.
|
||||
|
||||
## Contributing / Testing
|
||||
|
||||
To test the correctness of any modifications to the library, please execute
|
||||
`nimble test` at the root of the repo.
|
||||
|
|
@ -0,0 +1,338 @@
|
|||
# nim-trie
|
||||
Nim Implementation of the Ethereum Trie structure
|
||||
---
|
||||
|
||||
## Hexary Trie
|
||||
|
||||
## Binary Trie
|
||||
|
||||
Binary-trie is a dictionary-like data structure to store key-value pair.
|
||||
Much like it's sibling Hexary-trie, the key-value pair will be stored into key-value flat-db.
|
||||
The primary difference with Hexary-trie is, each node of Binary-trie only consist of one or two child,
|
||||
while Hexary-trie node can contains up to 16 or 17 child-nodes.
|
||||
|
||||
Unlike Hexary-trie, Binary-trie store it's data into flat-db without using rlp encoding.
|
||||
Binary-trie store its value using simple **Node-Types** encoding.
|
||||
The encoded-node will be hashed by keccak_256 and the hash value will be the key to flat-db.
|
||||
Each entry in the flat-db will looks like:
|
||||
|
||||
| key | value |
|
||||
|----------------------|--------------------------------------------|
|
||||
| 32-bytes-keccak-hash | encoded-node(KV or BRANCH or LEAF encoded) |
|
||||
|
||||
### Node-Types
|
||||
* KV = [0, encoded-key-path, 32 bytes hash of child]
|
||||
* BRANCH = [1, 32 bytes hash of left child, 32 bytes hash of right child]
|
||||
* LEAF = [2, value]
|
||||
|
||||
The KV node can have BRANCH node or LEAF node as it's child, but cannot a KV node.
|
||||
The internal algorithm will merge a KV(parent)->KV(child) into one KV node.
|
||||
Every KV node contains encoded keypath to reduce the number of blank nodes.
|
||||
|
||||
The BRANCH node can have KV, BRANCH, or LEAF node as it's children.
|
||||
|
||||
The LEAF node is the terminal node, it contains the value of a key.
|
||||
|
||||
### encoded-key-path
|
||||
|
||||
While Hexary-trie encode the path using Hex-Prefix encoding, Binary-trie
|
||||
encode the path using binary encoding, the scheme looks like this table below.
|
||||
|
||||
```text
|
||||
|--------- odd --------|
|
||||
00mm yyyy xxxx xxxx xxxx xxxx
|
||||
|------ even -----|
|
||||
1000 00mm yyyy xxxx xxxx xxxx
|
||||
```
|
||||
|
||||
| symbol | explanation |
|
||||
|--------|--------------------------|
|
||||
| xxxx | nibble of binary keypath in bits, 0 = left, 1 = right|
|
||||
| yyyy | nibble contains 0-3 bits padding + binary keypath |
|
||||
| mm | number of binary keypath bits modulo 4 (0-3) |
|
||||
| 00 | zero zero prefix |
|
||||
| 1000 | even numbered nibbles prefix |
|
||||
|
||||
if there is no padding, then yyyy bit sequence is absent, mm also zero.
|
||||
yyyy = mm bits + padding bits must be 4 bits length.
|
||||
|
||||
### The API
|
||||
|
||||
The primary API for Binary-trie is `set` and `get`.
|
||||
* set(key, value) --- _store a value associated with a key_
|
||||
* get(key): value --- _get a value using a key_
|
||||
|
||||
Both `key` and `value` are of `BytesRange` type. And they cannot have zero length.
|
||||
You can also use convenience API `get` and `set` which accepts
|
||||
`Bytes` or `string` (a `string` is conceptually wrong in this context
|
||||
and may costlier than a `BytesRange`, but it is good for testing purpose).
|
||||
|
||||
Getting a non-existent key will return zero length BytesRange.
|
||||
|
||||
Binary-trie also provide dictionary syntax API for `set` and `get`.
|
||||
* trie[key] = value -- same as `set`
|
||||
* value = trie[key] -- same as `get`
|
||||
* contains(key) a.k.a. `in` operator
|
||||
|
||||
Additional APIs are:
|
||||
* exists(key) -- returns `bool`, to check key-value existence -- same as contains
|
||||
* delete(key) -- remove a key-value from the trie
|
||||
* deleteSubtrie(key) -- remove a key-value from the trie plus all of it's subtrie
|
||||
that starts with the same key prefix
|
||||
* rootNode() -- get root node
|
||||
* rootNode(node) -- replace the root node
|
||||
* getRootHash(): `KeccakHash` with `BytesRange` type
|
||||
* getDB(): `DB` -- get flat-db pointer
|
||||
|
||||
Constructor API:
|
||||
* initBinaryTrie(DB, rootHash[optional]) -- rootHash has `BytesRange` or KeccakHash type
|
||||
* init(BinaryTrie, DB, rootHash[optional])
|
||||
|
||||
Normally you would not set the rootHash when constructing an empty Binary-trie.
|
||||
Setting the rootHash occured in a scenario where you have a populated DB
|
||||
with existing trie structure and you know the rootHash,
|
||||
and then you want to continue/resume the trie operations.
|
||||
|
||||
## Examples
|
||||
|
||||
```Nim
|
||||
import
|
||||
eth_trie/[db, binary, utils]
|
||||
|
||||
var db = newMemoryDB()
|
||||
var trie = initBinaryTrie(db)
|
||||
trie.set("key1", "value1")
|
||||
trie.set("key2", "value2")
|
||||
assert trie.get("key1") == "value1".toRange
|
||||
assert trie.get("key2") == "value2".toRange
|
||||
|
||||
# delete all subtrie with key prefixes "key"
|
||||
trie.deleteSubtrie("key")
|
||||
assert trie.get("key1") == zeroBytesRange
|
||||
assert trie.get("key2") == zeroBytesRange
|
||||
|
||||
trie["moon"] = "sun"
|
||||
assert "moon" in trie
|
||||
assert trie["moon"] == "sun".toRange
|
||||
```
|
||||
|
||||
Remember, `set` and `get` are trie operations. A single `set` operation may invoke
|
||||
more than one store/lookup operation into the underlying DB. The same is also happened to `get` operation,
|
||||
it could do more than one flat-db lookup before it return the requested value.
|
||||
|
||||
## The truth behind a lie
|
||||
|
||||
What kind of lie? actually, `delete` and `deleteSubtrie` doesn't remove the
|
||||
'deleted' node from the underlying DB. It only make the node inaccessible
|
||||
from the user of the trie. The same also happened if you update the value of a key,
|
||||
the old value node is not removed from the underlying DB.
|
||||
A more subtle lie also happened when you add new entrie into the trie using `set` operation.
|
||||
The previous hash of affected branch become obsolete and replaced by new hash,
|
||||
the old hash become inaccessible to the user.
|
||||
You may think that is a waste of storage space.
|
||||
Luckily, we also provide some utilities to deal with this situation, the branch utils.
|
||||
|
||||
## The branch utils
|
||||
|
||||
The branch utils consist of these API:
|
||||
* checkIfBranchExist(DB; rootHash; keyPrefix): bool
|
||||
* getBranch(DB; rootHash; key): branch
|
||||
* isValidBranch(branch, rootHash, key, value): bool
|
||||
* getWitness(DB; nodeHash; key): branch
|
||||
* getTrieNodes(DB; nodeHash): branch
|
||||
|
||||
`keyPrefix`, `key`, and `value` are bytes container with length greater than zero.
|
||||
They can be BytesRange, Bytes, and string(again, for convenience and testing purpose).
|
||||
|
||||
`rootHash` and `nodeHash` also bytes container,
|
||||
but they have constraint: must be 32 bytes in length, and it must be a keccak_256 hash value.
|
||||
|
||||
`branch` is a list of nodes, or in this case a seq[BytesRange].
|
||||
A list? yes, the structure is stored along with the encoded node.
|
||||
Therefore a list is enough to reconstruct the entire trie/branch.
|
||||
|
||||
```Nim
|
||||
import
|
||||
eth_trie/[db, binary, utils]
|
||||
|
||||
var db = newMemoryDB()
|
||||
var trie = initBinaryTrie(db)
|
||||
trie.set("key1", "value1")
|
||||
trie.set("key2", "value2")
|
||||
|
||||
assert checkIfBranchExist(db, trie.getRootHash(), "key") == true
|
||||
assert checkIfBranchExist(db, trie.getRootHash(), "key1") == true
|
||||
assert checkIfBranchExist(db, trie.getRootHash(), "ken") == false
|
||||
assert checkIfBranchExist(db, trie.getRootHash(), "key123") == false
|
||||
```
|
||||
|
||||
The tree will looks like:
|
||||
```text
|
||||
root ---> A(kvnode, *common key prefix*)
|
||||
|
|
||||
|
|
||||
|
|
||||
B(branchnode)
|
||||
/ \
|
||||
/ \
|
||||
/ \
|
||||
C1(kvnode, *remain kepath*) C2(kvnode, *remain kepath*)
|
||||
| |
|
||||
| |
|
||||
| |
|
||||
D1(leafnode, b'value1') D2(leafnode, b'value2')
|
||||
```
|
||||
|
||||
```Nim
|
||||
var branchA = getBranch(db, trie.getRootHash(), "key1")
|
||||
# ==> [A, B, C1, D1]
|
||||
|
||||
var branchB = getBranch(db, trie.getRootHash(), "key2")
|
||||
# ==> [A, B, C2, D2]
|
||||
|
||||
assert isValidBranch(branchA, trie.getRootHash(), "key1", "value1") == true
|
||||
# wrong key, return zero bytes
|
||||
assert isValidBranch(branchA, trie.getRootHash(), "key5", "") == true
|
||||
|
||||
assert isValidBranch(branchB, trie.getRootHash(), "key1", "value1") # InvalidNode
|
||||
|
||||
var x = getBranch(db, trie.getRootHash(), "key")
|
||||
# ==> [A]
|
||||
|
||||
x = getBranch(db, trie.getRootHash(), "key123") # InvalidKeyError
|
||||
x = getBranch(db, trie.getRootHash(), "key5") # there is still branch for non-exist key
|
||||
# ==> [A]
|
||||
|
||||
var branch = getWitness(db, trie.getRootHash(), "key1")
|
||||
# equivalent to `getBranch(db, trie.getRootHash(), "key1")`
|
||||
# ==> [A, B, C1, D1]
|
||||
|
||||
branch = getWitness(db, trie.getRootHash(), "key")
|
||||
# this will include additional nodes of "key2"
|
||||
# ==> [A, B, C1, D1, C2, D2]
|
||||
|
||||
var wholeTrie = getWitness(db, trie.getRootHash(), "")
|
||||
# this will return the whole trie
|
||||
# ==> [A, B, C1, D1, C2, D2]
|
||||
|
||||
var node = branch[1] # B
|
||||
let nodeHash = keccak256.digest(node.baseAddr, uint(node.len))
|
||||
var nodes = getTrieNodes(db, nodeHash)
|
||||
assert nodes.len == wholeTrie.len - 1
|
||||
# ==> [B, C1, D1, C2, D2]
|
||||
```
|
||||
|
||||
## Remember the lie?
|
||||
|
||||
Because trie `delete`, `deleteSubtrie` and `set` operation create inaccessible nodes in the underlying DB,
|
||||
we need to remove them if necessary. We already see that `wholeTrie = getWitness(db, trie.getRootHash(), "")`
|
||||
will return the whole trie, a list of accessible nodes.
|
||||
Then we can write the clean tree into a new DB instance to replace the old one.
|
||||
|
||||
|
||||
## Sparse Merkle Trie
|
||||
|
||||
Sparse Merkle Trie(SMT) is a variant of Binary Trie which uses binary encoding to
|
||||
represent path during trie travelsal. When Binary Trie uses three types of node,
|
||||
SMT only use one type of node without any additional special encoding to store it's key-path.
|
||||
|
||||
Actually, it doesn't even store it's key-path anywhere like Binary Trie,
|
||||
the key-path is stored implicitly in the trie structure during key-value insertion.
|
||||
|
||||
Because the key-path is not encoded in any special ways, the bits can be extracted directly from
|
||||
the key without any conversion.
|
||||
|
||||
However, the key restricted to a fixed length because the algorithm demand a fixed height trie
|
||||
to works properly. In this case, the trie height is limited to 160 level,
|
||||
or the key is of fixed length 20 bytes (8 bits x 20 = 160).
|
||||
|
||||
To be able to use variable length key, the algorithm can be adapted slightly using hashed key before
|
||||
constructing the binary key-path. For example, if using keccak256 as the hashing function,
|
||||
then the height of the tree will be 256, but the key itself can be any length.
|
||||
|
||||
### The API
|
||||
|
||||
The primary API for Binary-trie is `set` and `get`.
|
||||
* set(key, value, rootHash[optional]) --- _store a value associated with a key_
|
||||
* get(key, rootHash[optional]): value --- _get a value using a key_
|
||||
|
||||
Both `key` and `value` are of `BytesRange` type. And they cannot have zero length.
|
||||
You can also use convenience API `get` and `set` which accepts
|
||||
`Bytes` or `string` (a `string` is conceptually wrong in this context
|
||||
and may costlier than a `BytesRange`, but it is good for testing purpose).
|
||||
|
||||
rootHash is an optional parameter. When used, `get` will get a key from specific root,
|
||||
and `set` will also set a key at specific root.
|
||||
|
||||
Getting a non-existent key will return zero length BytesRange or a zeroBytesRange.
|
||||
|
||||
Sparse Merkle Trie also provide dictionary syntax API for `set` and `get`.
|
||||
* trie[key] = value -- same as `set`
|
||||
* value = trie[key] -- same as `get`
|
||||
* contains(key) a.k.a. `in` operator
|
||||
|
||||
Additional APIs are:
|
||||
* exists(key) -- returns `bool`, to check key-value existence -- same as contains
|
||||
* delete(key) -- remove a key-value from the trie
|
||||
* getRootHash(): `KeccakHash` with `BytesRange` type
|
||||
* getDB(): `DB` -- get flat-db pointer
|
||||
* prove(key, rootHash[optional]): proof -- useful for merkling
|
||||
|
||||
Constructor API:
|
||||
* initSparseBinaryTrie(DB, rootHash[optional])
|
||||
* init(SparseBinaryTrie, DB, rootHash[optional])
|
||||
|
||||
Normally you would not set the rootHash when constructing an empty Sparse Merkle Trie.
|
||||
Setting the rootHash occured in a scenario where you have a populated DB
|
||||
with existing trie structure and you know the rootHash,
|
||||
and then you want to continue/resume the trie operations.
|
||||
|
||||
## Examples
|
||||
|
||||
```Nim
|
||||
import
|
||||
eth_trie/[db, sparse_binary, utils]
|
||||
|
||||
var
|
||||
db = newMemoryDB()
|
||||
trie = initSparseMerkleTrie(db)
|
||||
|
||||
let
|
||||
key1 = "01234567890123456789"
|
||||
key2 = "abcdefghijklmnopqrst"
|
||||
|
||||
trie.set(key1, "value1")
|
||||
trie.set(key2, "value2")
|
||||
assert trie.get(key1) == "value1".toRange
|
||||
assert trie.get(key2) == "value2".toRange
|
||||
|
||||
trie.delete(key1)
|
||||
assert trie.get(key1) == zeroBytesRange
|
||||
|
||||
trie.delete(key2)
|
||||
assert trie[key2] == zeroBytesRange
|
||||
```
|
||||
|
||||
Remember, `set` and `get` are trie operations. A single `set` operation may invoke
|
||||
more than one store/lookup operation into the underlying DB. The same is also happened to `get` operation,
|
||||
it could do more than one flat-db lookup before it return the requested value.
|
||||
While Binary Trie perform a variable numbers of lookup and store operations, Sparse Merkle Trie
|
||||
will do constant numbers of lookup and store operations each `get` and `set` operation.
|
||||
|
||||
## Merkle Proofing
|
||||
|
||||
Using ``prove`` dan ``verifyProof`` API, we can do some merkling with SMT.
|
||||
|
||||
```Nim
|
||||
let
|
||||
value1 = "hello world"
|
||||
badValue = "bad value"
|
||||
|
||||
trie[key1] = value1
|
||||
var proof = trie.prove(key1)
|
||||
|
||||
assert verifyProof(proof, trie.getRootHash(), key1, value1) == true
|
||||
assert verifyProof(proof, trie.getRootHash(), key1, badValue) == false
|
||||
assert verifyProof(proof, trie.getRootHash(), key2, value1) == false
|
||||
```
|
||||
|
Loading…
Reference in New Issue