mirror of https://github.com/status-im/nim-eth.git
Added docs
This commit is contained in:
parent
fe9fa7689b
commit
db6d7888c7
|
@ -0,0 +1,32 @@
|
||||||
|
# eth_bloom: an Ethereum Bloom Filter
|
||||||
|
|
||||||
|
# Introduction
|
||||||
|
|
||||||
|
A Nim implementation of the bloom filter used by Ethereum.
|
||||||
|
|
||||||
|
# Description
|
||||||
|
|
||||||
|
[Bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) are data structures that use hash functions to test whether an element is a member of a set. They work like other data structures but are probabilistic in nature: that is, they allow false positive matches but not false negative. Bloom filters use less storage space than other data structures.
|
||||||
|
|
||||||
|
Ethereum bloom filters are implemented with the Keccak-256 cryptographic hash function.
|
||||||
|
|
||||||
|
To see the bloom filter used in the context of Ethereum, please refer to the [Ethereum Yellow Paper](https://ethereum.github.io/yellowpaper/paper.pdf).
|
||||||
|
|
||||||
|
|
||||||
|
# Installation
|
||||||
|
```
|
||||||
|
$ nimble install eth_bloom
|
||||||
|
```
|
||||||
|
|
||||||
|
# Usage
|
||||||
|
```nim
|
||||||
|
import eth_bloom, stint
|
||||||
|
var f: BloomFilter
|
||||||
|
f.incl("test1")
|
||||||
|
assert("test1" in f)
|
||||||
|
assert("test2" notin f)
|
||||||
|
f.incl("test2")
|
||||||
|
assert("test2" in f)
|
||||||
|
assert(f.value.toHex == "80000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000200000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000000000")
|
||||||
|
```
|
||||||
|
|
|
@ -0,0 +1,5 @@
|
||||||
|
# eth_keyfile
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
This library is a Nim reimplementation of [ethereum/eth-keyfile](https://github.com/ethereum/eth-keyfile), which is used to create and load Ethereum `keyfile` format and the tools for handling the format and for storing private keys. Currently, the library supports only the PBKDF2 method and does not support the Scrypt method.
|
||||||
|
|
|
@ -0,0 +1,8 @@
|
||||||
|
# eth_keys
|
||||||
|
|
||||||
|
This library is a Nim re-implementation of [eth-keys](https://github.com/ethereum/eth-keys): the common API for working with Ethereum's public and private keys, signatures, and addresses.
|
||||||
|
|
||||||
|
By default, Nim eth-keys uses Bitcoin's [libsecp256k1](https://github.com/bitcoin-core/secp256k1) as a backend. Make sure libsecp256k1 is available on your system.
|
||||||
|
|
||||||
|
An experimental pure Nim backend (Warning ⚠: do not use in production) is available with the compilation switch `-d:backend_native`
|
||||||
|
|
|
@ -0,0 +1,279 @@
|
||||||
|
# eth_p2p
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
This library implements the DevP2P family of networking protocols used
|
||||||
|
in the Ethereum world.
|
||||||
|
|
||||||
|
## Connecting to the Ethereum network
|
||||||
|
|
||||||
|
A connection to the Ethereum network can be created by instantiating
|
||||||
|
the `EthereumNode` type:
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
proc newEthereumNode*(keys: KeyPair,
|
||||||
|
listeningAddress: Address,
|
||||||
|
networkId: uint,
|
||||||
|
chain: AbstractChainDB,
|
||||||
|
clientId = "nim-eth-p2p",
|
||||||
|
addAllCapabilities = true): EthereumNode =
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Parameters:
|
||||||
|
|
||||||
|
`keys`:
|
||||||
|
A pair of public and private keys used to authenticate the node
|
||||||
|
on the network and to determine its node ID.
|
||||||
|
See the [eth_keys](https://github.com/status-im/nim-eth-keys)
|
||||||
|
library for utilities that will help you generate and manage
|
||||||
|
such keys.
|
||||||
|
|
||||||
|
`listeningAddress`:
|
||||||
|
The network interface and port where your client will be
|
||||||
|
accepting incoming connections.
|
||||||
|
|
||||||
|
`networkId`:
|
||||||
|
The Ethereum network ID. The client will disconnect immediately
|
||||||
|
from any peers who don't use the same network.
|
||||||
|
|
||||||
|
`chain`:
|
||||||
|
An abstract instance of the Ethereum blockchain associated
|
||||||
|
with the node. This library allows you to plug any instance
|
||||||
|
conforming to the abstract interface defined in the
|
||||||
|
[eth_common](https://github.com/status-im/nim-eth-common)
|
||||||
|
package.
|
||||||
|
|
||||||
|
`clientId`:
|
||||||
|
A name used to identify the software package connecting
|
||||||
|
to the network (i.e. similar to the `User-Agent` string
|
||||||
|
in a browser).
|
||||||
|
|
||||||
|
`addAllCapabilities`:
|
||||||
|
By default, the node will support all RPLx protocols imported in
|
||||||
|
your project. You can specify `false` if you prefer to create a
|
||||||
|
node with a more limited set of protocols. Use one or more calls
|
||||||
|
to `node.addCapability` to specify the desired set:
|
||||||
|
|
||||||
|
```nim
|
||||||
|
node.addCapability(eth)
|
||||||
|
node.addCapability(ssh)
|
||||||
|
```
|
||||||
|
|
||||||
|
Each supplied protocol identifier is a name of a protocol introduced
|
||||||
|
by the `p2pProtocol` macro discussed later in this document.
|
||||||
|
|
||||||
|
Instantiating an `EthereumNode` does not immediately connect you to
|
||||||
|
the network. To start the connection process, call `node.connectToNetwork`:
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
proc connectToNetwork*(node: var EthereumNode,
|
||||||
|
bootstrapNodes: openarray[ENode],
|
||||||
|
startListening = true,
|
||||||
|
enableDiscovery = true)
|
||||||
|
```
|
||||||
|
|
||||||
|
The `EthereumNode` will automatically find and maintan a pool of peers
|
||||||
|
using the Ethereum node discovery protocol. You can access the pool as
|
||||||
|
`node.peers`.
|
||||||
|
|
||||||
|
## Communicating with Peers using RLPx
|
||||||
|
|
||||||
|
[RLPx](https://github.com/ethereum/devp2p/blob/master/rlpx.md) is the
|
||||||
|
high-level protocol for exchanging messages between peers in the Ethereum
|
||||||
|
network. Most of the client code of this library should not be concerned
|
||||||
|
with the implementation details of the underlying protocols and should use
|
||||||
|
the high-level APIs described in this section.
|
||||||
|
|
||||||
|
The RLPx protocols are defined as a collection of strongly-typed messages,
|
||||||
|
which are grouped into sub-protocols multiplexed over the same TCP connection.
|
||||||
|
|
||||||
|
This library represents each such message as a regular Nim function call
|
||||||
|
over the `Peer` object. Certain messages act only as notifications, while
|
||||||
|
others fit the request/response pattern.
|
||||||
|
|
||||||
|
To understand more about how messages are defined and used, let's look at
|
||||||
|
the definition of a RLPx protocol:
|
||||||
|
|
||||||
|
### RLPx sub-protocols
|
||||||
|
|
||||||
|
The sub-protocols are defined with the `p2pProtocol` macro. It will accept
|
||||||
|
a 3-letter identifier for the protocol and the current protocol version:
|
||||||
|
|
||||||
|
Here is how the [DevP2P wire protocol](https://github.com/ethereum/wiki/wiki/%C3%90%CE%9EVp2p-Wire-Protocol) might look like:
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
p2pProtocol p2p(version = 0):
|
||||||
|
proc hello(peer: Peer,
|
||||||
|
version: uint,
|
||||||
|
clientId: string,
|
||||||
|
capabilities: openarray[Capability],
|
||||||
|
listenPort: uint,
|
||||||
|
nodeId: P2PNodeId) =
|
||||||
|
peer.id = nodeId
|
||||||
|
|
||||||
|
proc disconnect(peer: Peer, reason: DisconnectionReason)
|
||||||
|
|
||||||
|
proc ping(peer: Peer) =
|
||||||
|
await peer.pong()
|
||||||
|
|
||||||
|
proc pong(peer: Peer) =
|
||||||
|
echo "received pong from ", peer.id
|
||||||
|
```
|
||||||
|
|
||||||
|
As seen in the example above, a protocol definition determines both the
|
||||||
|
available messages that can be sent to another peer (e.g. as in `peer.pong()`)
|
||||||
|
and the asynchronous code responsible for handling the incoming messages.
|
||||||
|
|
||||||
|
### Protocol state
|
||||||
|
|
||||||
|
The protocol implementations are expected to maintain a state and to act
|
||||||
|
like a state machine handling the incoming messages. You are allowed to
|
||||||
|
define an arbitrary state type that can be specified in the `peerState`
|
||||||
|
protocol option. Later, instances of the state object can be obtained
|
||||||
|
though the `state` pseudo-field of the `Peer` object:
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
type AbcPeerState = object
|
||||||
|
receivedMsgsCount: int
|
||||||
|
|
||||||
|
p2pProtocol abc(version = 1,
|
||||||
|
peerState = AbcPeerState):
|
||||||
|
|
||||||
|
proc incomingMessage(p: Peer) =
|
||||||
|
p.state.receivedMsgsCount += 1
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Besides the per-peer state demonstrated above, there is also support
|
||||||
|
for maintaining a network-wide state. It's enabled by specifying the
|
||||||
|
`networkState` option of the protocol and the state object can be obtained
|
||||||
|
through accessor of the same name.
|
||||||
|
|
||||||
|
The state objects are initialized to zero by default, but you can modify
|
||||||
|
this behaviour by overriding the following procs for your state types:
|
||||||
|
|
||||||
|
```nim
|
||||||
|
proc initProtocolState*(state: MyPeerState, p: Peer)
|
||||||
|
proc initProtocolState*(state: MyNetworkState, n: EthereumNode)
|
||||||
|
```
|
||||||
|
|
||||||
|
Sometimes, you'll need to access the state of another protocol.
|
||||||
|
To do this, specify the protocol identifier to the `state` accessors:
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
echo "ABC protocol messages: ", peer.state(abc).receivedMsgCount
|
||||||
|
```
|
||||||
|
|
||||||
|
While the state machine approach may be a particularly robust way of
|
||||||
|
implementing sub-protocols (it is more amenable to proving the correctness
|
||||||
|
of the implementation through formal verification methods), sometimes it may
|
||||||
|
be more convenient to use more imperative style of communication where the
|
||||||
|
code is able to wait for a particular response after sending a particular
|
||||||
|
request. The library provides two mechanisms for achieving this:
|
||||||
|
|
||||||
|
### Waiting particular messages with `nextMsg`
|
||||||
|
|
||||||
|
The `nextMsg` helper proc can be used to pause the execution of an async
|
||||||
|
proc until a particular incoming message from a peer arrives:
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
proc handshakeExample(peer: Peer) {.async.} =
|
||||||
|
...
|
||||||
|
# send a hello message
|
||||||
|
peer.hello(...)
|
||||||
|
|
||||||
|
# wait for a matching hello response
|
||||||
|
let response = await peer.nextMsg(p2p.hello)
|
||||||
|
echo response.clientId # print the name of the Ethereum client
|
||||||
|
# used by the other peer (Geth, Parity, Nimbus, etc)
|
||||||
|
```
|
||||||
|
|
||||||
|
There are few things to note in the above example:
|
||||||
|
|
||||||
|
1. The `p2pProtocol` definition created a pseudo-variable named after the
|
||||||
|
protocol holding various properties of the protocol.
|
||||||
|
|
||||||
|
2. Each message defined in the protocol received a corresponding type name,
|
||||||
|
matching the message name (e.g. `p2p.hello`). This type will have fields
|
||||||
|
matching the parameter names of the message. If the messages has `openarray`
|
||||||
|
params, these will be remapped to `seq` types.
|
||||||
|
|
||||||
|
If the designated messages also has an attached handler, the future returned
|
||||||
|
by `nextMsg` will be resolved only after the handler has been fully executed
|
||||||
|
(so you can count on any side effects produced by the handler to have taken
|
||||||
|
place). If there are multiple outstanding calls to `nextMsg`, they will
|
||||||
|
complete together. Any other messages received in the meantime will still
|
||||||
|
be dispatched to their respective handlers.
|
||||||
|
|
||||||
|
### `requestResponse` pairs
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
p2pProtocol les(version = 2):
|
||||||
|
...
|
||||||
|
|
||||||
|
requestResponse:
|
||||||
|
proc getProofs(p: Peer, proofs: openarray[ProofRequest])
|
||||||
|
proc proofs(p: Peer, BV: uint, proofs: openarray[Blob])
|
||||||
|
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Two or more messages within the protocol may be grouped into a
|
||||||
|
`requestResponse` block. The last message in the group is assumed
|
||||||
|
to be the response while all other messages are considered requests.
|
||||||
|
|
||||||
|
When a request message is sent, the return type will be a `Future`
|
||||||
|
that will be completed once the response is received. Please note
|
||||||
|
that there is a mandatory timeout parameter, so the actual return
|
||||||
|
type is `Future[Option[MessageType]]`. The `timeout` parameter can
|
||||||
|
be specified for each individual call and the default value can be
|
||||||
|
overridden on the level of individual message, or the entire protocol:
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
p2pProtocol abc(version = 1,
|
||||||
|
useRequestIds = false,
|
||||||
|
timeout = 5000): # value in milliseconds
|
||||||
|
requestResponse:
|
||||||
|
proc myReq(dataId: int, timeout = 3000)
|
||||||
|
proc myRes(data: string)
|
||||||
|
```
|
||||||
|
|
||||||
|
By default, the library will take care of inserting a hidden `reqId`
|
||||||
|
parameter as used in the [LES protocol](https://github.com/zsfelfoldi/go-ethereum/wiki/Light-Ethereum-Subprotocol-%28LES%29),
|
||||||
|
but you can disable this behavior by overriding the protocol setting
|
||||||
|
`useRequestIds`.
|
||||||
|
|
||||||
|
### Implementing handshakes and reacting to other events
|
||||||
|
|
||||||
|
Besides message definitions and implementations, a protocol specification may
|
||||||
|
also include handlers for certain important events such as newly connected
|
||||||
|
peers or misbehaving or disconnecting peers:
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
p2pProtocol les(version = 2):
|
||||||
|
onPeerConnected do (peer: Peer):
|
||||||
|
asyncCheck peer.status [
|
||||||
|
"networkId": rlp.encode(1),
|
||||||
|
"keyGenesisHash": rlp.encode(peer.network.chain.genesisHash)
|
||||||
|
...
|
||||||
|
]
|
||||||
|
|
||||||
|
let otherPeerStatus = await peer.nextMsg(les.status)
|
||||||
|
...
|
||||||
|
|
||||||
|
onPeerDisconnected do (peer: Peer, reason: DisconnectionReason):
|
||||||
|
debug "peer disconnected", peer
|
||||||
|
```
|
||||||
|
|
||||||
|
### Checking the other peer's supported sub-protocols
|
||||||
|
|
||||||
|
Upon establishing a connection, RLPx will automatically negotiate the list of
|
||||||
|
mutually supported protocols by the peers. To check whether a particular peer
|
||||||
|
supports a particular sub-protocol, use the following code:
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
if peer.supports(les): # `les` is the identifier of the light clients sub-protocol
|
||||||
|
peer.getReceipts(nextReqId(), neededReceipts())
|
||||||
|
|
||||||
|
```
|
||||||
|
|
|
@ -0,0 +1,142 @@
|
||||||
|
# rlp
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
A Nim implementation of the Recursive Length Prefix encoding (RLP) as specified
|
||||||
|
in the Ethereum's [Yellow Papper](https://ethereum.github.io/yellowpaper/paper.pdf)
|
||||||
|
and [Wiki](https://github.com/ethereum/wiki/wiki/RLP).
|
||||||
|
|
||||||
|
|
||||||
|
## Reading RLP data
|
||||||
|
|
||||||
|
The `Rlp` type provided by this library represents a cursor over a RLP-encoded
|
||||||
|
byte stream. Before instantiating such a cursor, you must convert your
|
||||||
|
input data a `BytesRange` value provided by the [nim-ranges][RNG] library,
|
||||||
|
which represents an immutable and thus cheap-to-copy sub-range view over an
|
||||||
|
underlying `seq[byte]` instance:
|
||||||
|
|
||||||
|
[RNG]: https://github.com/status-im/nim-ranges
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
proc rlpFromBytes*(data: BytesRange): Rlp
|
||||||
|
```
|
||||||
|
|
||||||
|
### Streaming API
|
||||||
|
|
||||||
|
Once created, the `Rlp` object will offer procs such as `isList`, `isBlob`,
|
||||||
|
`getType`, `listLen`, `blobLen` to determine the type of the value under
|
||||||
|
the cursor. The contents of blobs can be extracted with procs such as
|
||||||
|
`toString`, `toBytes` and `toInt` without advancing the cursor.
|
||||||
|
|
||||||
|
Lists can be traversed with the standard `items` iterator, which will advance
|
||||||
|
the cursor to each sub-item position and yield the `Rlp` object at that point.
|
||||||
|
As an alternative, `listElem` can return a new `Rlp` object adjusted to a
|
||||||
|
particular sub-item position without advancing the original cursor.
|
||||||
|
Keep in mind that copying `Rlp` objects is cheap and you can create as many
|
||||||
|
cursors pointing to different positions in the RLP stream as necessary.
|
||||||
|
|
||||||
|
`skipElem` will advance the cursor to the next position in the current list.
|
||||||
|
`hasData` will indicate that there are no more bytes in the stream that can
|
||||||
|
be consumed.
|
||||||
|
|
||||||
|
Another way to extract data from the stream is through the universal `read`
|
||||||
|
proc that accepts a type as a parameter. You can pass any supported type
|
||||||
|
such as `string`, `int`, `seq[T]`, etc, including composite user-defined
|
||||||
|
types (see [Object Serialization](#object-serialization)). The cursor
|
||||||
|
will be advanced just past the end of the consumed object.
|
||||||
|
|
||||||
|
The `toXX` and `read` family of procs may raise a `RlpTypeMismatch` in case
|
||||||
|
of type mismatch with the stream contents under the cursor. A corrupted
|
||||||
|
RLP stream or an attemp to read past the stream end will be signaled
|
||||||
|
with the `MalformedRlpError` exception. If the RLP stream includes data
|
||||||
|
that cannot be processed on the current platform (e.g. an integer value
|
||||||
|
that is too large), the library will raise an `UnsupportedRlpError` exception.
|
||||||
|
|
||||||
|
### DOM API
|
||||||
|
|
||||||
|
Calling `Rlp.toNodes` at any position within the stream will return a tree
|
||||||
|
of `RlpNode` objects representing the collection of values begging at that
|
||||||
|
position:
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
type
|
||||||
|
RlpNodeType* = enum
|
||||||
|
rlpBlob
|
||||||
|
rlpList
|
||||||
|
|
||||||
|
RlpNode* = object
|
||||||
|
case kind*: RlpNodeType
|
||||||
|
of rlpBlob:
|
||||||
|
bytes*: BytesRange
|
||||||
|
of rlpList:
|
||||||
|
elems*: seq[RlpNode]
|
||||||
|
```
|
||||||
|
|
||||||
|
As a short-cut, you can also call `decode` directly on a byte sequence to
|
||||||
|
avoid creating a `Rlp` object when obtaining the nodes.
|
||||||
|
For debugging purposes, you can also create a human readable representation
|
||||||
|
of the Rlp nodes by calling the `inspect` proc:
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
proc inspect*(self: Rlp, indent = 0): string
|
||||||
|
```
|
||||||
|
|
||||||
|
## Creating RLP data
|
||||||
|
|
||||||
|
The `RlpWriter` type can be used to encode RLP data. Instances are created
|
||||||
|
with the `initRlpWriter` proc. This should be followed by one or more calls
|
||||||
|
to `append` which is overloaded to accept arbitrary values. Finally, you can
|
||||||
|
call `finish` to obtain the final `BytesRange`.
|
||||||
|
|
||||||
|
If the end result should by a RLP list of particular length, you can replace
|
||||||
|
the initial call to `initRlpWriter` with `initRlpList(n)`. Calling `finish`
|
||||||
|
before writing a sufficient number of elements will then result in a
|
||||||
|
`PrematureFinalizationError`.
|
||||||
|
|
||||||
|
As an alternative short-cut, you can also call `encode` on an arbitrary value
|
||||||
|
(including sequences and user-defined types) to execute all of the steps at
|
||||||
|
once and directly obtain the final RLP bytes. `encodeList(varargs)` is another
|
||||||
|
short-cut for creating RLP lists.
|
||||||
|
|
||||||
|
## Object serialization
|
||||||
|
|
||||||
|
As previously explained, generic procs such as `read`, `append`, `encode` and
|
||||||
|
`decode` can be used with arbitrary used-defined object types. By default, the
|
||||||
|
library will serialize all of the fields of the object using the `fields`
|
||||||
|
iterator, but you can also include only a subset of the fields or modify the
|
||||||
|
order of serialization or by employing the `rlpIgnore` pragma or by using the
|
||||||
|
`rlpFields` macro:
|
||||||
|
|
||||||
|
``` nim
|
||||||
|
macro rlpFields*(T: typedesc, fields: varargs[untyped])
|
||||||
|
|
||||||
|
## example usage:
|
||||||
|
|
||||||
|
type
|
||||||
|
Transaction = object
|
||||||
|
amount: int
|
||||||
|
time: DateTime
|
||||||
|
sender: string
|
||||||
|
receiver: string
|
||||||
|
|
||||||
|
rlpFields Transaction,
|
||||||
|
sender, receiver, amount
|
||||||
|
|
||||||
|
...
|
||||||
|
|
||||||
|
var t1 = rlp.read(Transaction)
|
||||||
|
var bytes = encode(t1)
|
||||||
|
var t2 = bytes.decode(Transaction)
|
||||||
|
```
|
||||||
|
|
||||||
|
By default, sub-fields within objects are wrapped in RLP lists. You can avoid this
|
||||||
|
behavior by adding the custom pragma `rlpInline` on a particular field. In rare
|
||||||
|
circumstances, you may need to serialize the same field type differently depending
|
||||||
|
on the enclosing object type. You can use the `rlpCustomSerialization` pragma to
|
||||||
|
achieve this.
|
||||||
|
|
||||||
|
## Contributing / Testing
|
||||||
|
|
||||||
|
To test the correctness of any modifications to the library, please execute
|
||||||
|
`nimble test` at the root of the repo.
|
||||||
|
|
|
@ -0,0 +1,338 @@
|
||||||
|
# nim-trie
|
||||||
|
Nim Implementation of the Ethereum Trie structure
|
||||||
|
---
|
||||||
|
|
||||||
|
## Hexary Trie
|
||||||
|
|
||||||
|
## Binary Trie
|
||||||
|
|
||||||
|
Binary-trie is a dictionary-like data structure to store key-value pair.
|
||||||
|
Much like it's sibling Hexary-trie, the key-value pair will be stored into key-value flat-db.
|
||||||
|
The primary difference with Hexary-trie is, each node of Binary-trie only consist of one or two child,
|
||||||
|
while Hexary-trie node can contains up to 16 or 17 child-nodes.
|
||||||
|
|
||||||
|
Unlike Hexary-trie, Binary-trie store it's data into flat-db without using rlp encoding.
|
||||||
|
Binary-trie store its value using simple **Node-Types** encoding.
|
||||||
|
The encoded-node will be hashed by keccak_256 and the hash value will be the key to flat-db.
|
||||||
|
Each entry in the flat-db will looks like:
|
||||||
|
|
||||||
|
| key | value |
|
||||||
|
|----------------------|--------------------------------------------|
|
||||||
|
| 32-bytes-keccak-hash | encoded-node(KV or BRANCH or LEAF encoded) |
|
||||||
|
|
||||||
|
### Node-Types
|
||||||
|
* KV = [0, encoded-key-path, 32 bytes hash of child]
|
||||||
|
* BRANCH = [1, 32 bytes hash of left child, 32 bytes hash of right child]
|
||||||
|
* LEAF = [2, value]
|
||||||
|
|
||||||
|
The KV node can have BRANCH node or LEAF node as it's child, but cannot a KV node.
|
||||||
|
The internal algorithm will merge a KV(parent)->KV(child) into one KV node.
|
||||||
|
Every KV node contains encoded keypath to reduce the number of blank nodes.
|
||||||
|
|
||||||
|
The BRANCH node can have KV, BRANCH, or LEAF node as it's children.
|
||||||
|
|
||||||
|
The LEAF node is the terminal node, it contains the value of a key.
|
||||||
|
|
||||||
|
### encoded-key-path
|
||||||
|
|
||||||
|
While Hexary-trie encode the path using Hex-Prefix encoding, Binary-trie
|
||||||
|
encode the path using binary encoding, the scheme looks like this table below.
|
||||||
|
|
||||||
|
```text
|
||||||
|
|--------- odd --------|
|
||||||
|
00mm yyyy xxxx xxxx xxxx xxxx
|
||||||
|
|------ even -----|
|
||||||
|
1000 00mm yyyy xxxx xxxx xxxx
|
||||||
|
```
|
||||||
|
|
||||||
|
| symbol | explanation |
|
||||||
|
|--------|--------------------------|
|
||||||
|
| xxxx | nibble of binary keypath in bits, 0 = left, 1 = right|
|
||||||
|
| yyyy | nibble contains 0-3 bits padding + binary keypath |
|
||||||
|
| mm | number of binary keypath bits modulo 4 (0-3) |
|
||||||
|
| 00 | zero zero prefix |
|
||||||
|
| 1000 | even numbered nibbles prefix |
|
||||||
|
|
||||||
|
if there is no padding, then yyyy bit sequence is absent, mm also zero.
|
||||||
|
yyyy = mm bits + padding bits must be 4 bits length.
|
||||||
|
|
||||||
|
### The API
|
||||||
|
|
||||||
|
The primary API for Binary-trie is `set` and `get`.
|
||||||
|
* set(key, value) --- _store a value associated with a key_
|
||||||
|
* get(key): value --- _get a value using a key_
|
||||||
|
|
||||||
|
Both `key` and `value` are of `BytesRange` type. And they cannot have zero length.
|
||||||
|
You can also use convenience API `get` and `set` which accepts
|
||||||
|
`Bytes` or `string` (a `string` is conceptually wrong in this context
|
||||||
|
and may costlier than a `BytesRange`, but it is good for testing purpose).
|
||||||
|
|
||||||
|
Getting a non-existent key will return zero length BytesRange.
|
||||||
|
|
||||||
|
Binary-trie also provide dictionary syntax API for `set` and `get`.
|
||||||
|
* trie[key] = value -- same as `set`
|
||||||
|
* value = trie[key] -- same as `get`
|
||||||
|
* contains(key) a.k.a. `in` operator
|
||||||
|
|
||||||
|
Additional APIs are:
|
||||||
|
* exists(key) -- returns `bool`, to check key-value existence -- same as contains
|
||||||
|
* delete(key) -- remove a key-value from the trie
|
||||||
|
* deleteSubtrie(key) -- remove a key-value from the trie plus all of it's subtrie
|
||||||
|
that starts with the same key prefix
|
||||||
|
* rootNode() -- get root node
|
||||||
|
* rootNode(node) -- replace the root node
|
||||||
|
* getRootHash(): `KeccakHash` with `BytesRange` type
|
||||||
|
* getDB(): `DB` -- get flat-db pointer
|
||||||
|
|
||||||
|
Constructor API:
|
||||||
|
* initBinaryTrie(DB, rootHash[optional]) -- rootHash has `BytesRange` or KeccakHash type
|
||||||
|
* init(BinaryTrie, DB, rootHash[optional])
|
||||||
|
|
||||||
|
Normally you would not set the rootHash when constructing an empty Binary-trie.
|
||||||
|
Setting the rootHash occured in a scenario where you have a populated DB
|
||||||
|
with existing trie structure and you know the rootHash,
|
||||||
|
and then you want to continue/resume the trie operations.
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
```Nim
|
||||||
|
import
|
||||||
|
eth_trie/[db, binary, utils]
|
||||||
|
|
||||||
|
var db = newMemoryDB()
|
||||||
|
var trie = initBinaryTrie(db)
|
||||||
|
trie.set("key1", "value1")
|
||||||
|
trie.set("key2", "value2")
|
||||||
|
assert trie.get("key1") == "value1".toRange
|
||||||
|
assert trie.get("key2") == "value2".toRange
|
||||||
|
|
||||||
|
# delete all subtrie with key prefixes "key"
|
||||||
|
trie.deleteSubtrie("key")
|
||||||
|
assert trie.get("key1") == zeroBytesRange
|
||||||
|
assert trie.get("key2") == zeroBytesRange
|
||||||
|
|
||||||
|
trie["moon"] = "sun"
|
||||||
|
assert "moon" in trie
|
||||||
|
assert trie["moon"] == "sun".toRange
|
||||||
|
```
|
||||||
|
|
||||||
|
Remember, `set` and `get` are trie operations. A single `set` operation may invoke
|
||||||
|
more than one store/lookup operation into the underlying DB. The same is also happened to `get` operation,
|
||||||
|
it could do more than one flat-db lookup before it return the requested value.
|
||||||
|
|
||||||
|
## The truth behind a lie
|
||||||
|
|
||||||
|
What kind of lie? actually, `delete` and `deleteSubtrie` doesn't remove the
|
||||||
|
'deleted' node from the underlying DB. It only make the node inaccessible
|
||||||
|
from the user of the trie. The same also happened if you update the value of a key,
|
||||||
|
the old value node is not removed from the underlying DB.
|
||||||
|
A more subtle lie also happened when you add new entrie into the trie using `set` operation.
|
||||||
|
The previous hash of affected branch become obsolete and replaced by new hash,
|
||||||
|
the old hash become inaccessible to the user.
|
||||||
|
You may think that is a waste of storage space.
|
||||||
|
Luckily, we also provide some utilities to deal with this situation, the branch utils.
|
||||||
|
|
||||||
|
## The branch utils
|
||||||
|
|
||||||
|
The branch utils consist of these API:
|
||||||
|
* checkIfBranchExist(DB; rootHash; keyPrefix): bool
|
||||||
|
* getBranch(DB; rootHash; key): branch
|
||||||
|
* isValidBranch(branch, rootHash, key, value): bool
|
||||||
|
* getWitness(DB; nodeHash; key): branch
|
||||||
|
* getTrieNodes(DB; nodeHash): branch
|
||||||
|
|
||||||
|
`keyPrefix`, `key`, and `value` are bytes container with length greater than zero.
|
||||||
|
They can be BytesRange, Bytes, and string(again, for convenience and testing purpose).
|
||||||
|
|
||||||
|
`rootHash` and `nodeHash` also bytes container,
|
||||||
|
but they have constraint: must be 32 bytes in length, and it must be a keccak_256 hash value.
|
||||||
|
|
||||||
|
`branch` is a list of nodes, or in this case a seq[BytesRange].
|
||||||
|
A list? yes, the structure is stored along with the encoded node.
|
||||||
|
Therefore a list is enough to reconstruct the entire trie/branch.
|
||||||
|
|
||||||
|
```Nim
|
||||||
|
import
|
||||||
|
eth_trie/[db, binary, utils]
|
||||||
|
|
||||||
|
var db = newMemoryDB()
|
||||||
|
var trie = initBinaryTrie(db)
|
||||||
|
trie.set("key1", "value1")
|
||||||
|
trie.set("key2", "value2")
|
||||||
|
|
||||||
|
assert checkIfBranchExist(db, trie.getRootHash(), "key") == true
|
||||||
|
assert checkIfBranchExist(db, trie.getRootHash(), "key1") == true
|
||||||
|
assert checkIfBranchExist(db, trie.getRootHash(), "ken") == false
|
||||||
|
assert checkIfBranchExist(db, trie.getRootHash(), "key123") == false
|
||||||
|
```
|
||||||
|
|
||||||
|
The tree will looks like:
|
||||||
|
```text
|
||||||
|
root ---> A(kvnode, *common key prefix*)
|
||||||
|
|
|
||||||
|
|
|
||||||
|
|
|
||||||
|
B(branchnode)
|
||||||
|
/ \
|
||||||
|
/ \
|
||||||
|
/ \
|
||||||
|
C1(kvnode, *remain kepath*) C2(kvnode, *remain kepath*)
|
||||||
|
| |
|
||||||
|
| |
|
||||||
|
| |
|
||||||
|
D1(leafnode, b'value1') D2(leafnode, b'value2')
|
||||||
|
```
|
||||||
|
|
||||||
|
```Nim
|
||||||
|
var branchA = getBranch(db, trie.getRootHash(), "key1")
|
||||||
|
# ==> [A, B, C1, D1]
|
||||||
|
|
||||||
|
var branchB = getBranch(db, trie.getRootHash(), "key2")
|
||||||
|
# ==> [A, B, C2, D2]
|
||||||
|
|
||||||
|
assert isValidBranch(branchA, trie.getRootHash(), "key1", "value1") == true
|
||||||
|
# wrong key, return zero bytes
|
||||||
|
assert isValidBranch(branchA, trie.getRootHash(), "key5", "") == true
|
||||||
|
|
||||||
|
assert isValidBranch(branchB, trie.getRootHash(), "key1", "value1") # InvalidNode
|
||||||
|
|
||||||
|
var x = getBranch(db, trie.getRootHash(), "key")
|
||||||
|
# ==> [A]
|
||||||
|
|
||||||
|
x = getBranch(db, trie.getRootHash(), "key123") # InvalidKeyError
|
||||||
|
x = getBranch(db, trie.getRootHash(), "key5") # there is still branch for non-exist key
|
||||||
|
# ==> [A]
|
||||||
|
|
||||||
|
var branch = getWitness(db, trie.getRootHash(), "key1")
|
||||||
|
# equivalent to `getBranch(db, trie.getRootHash(), "key1")`
|
||||||
|
# ==> [A, B, C1, D1]
|
||||||
|
|
||||||
|
branch = getWitness(db, trie.getRootHash(), "key")
|
||||||
|
# this will include additional nodes of "key2"
|
||||||
|
# ==> [A, B, C1, D1, C2, D2]
|
||||||
|
|
||||||
|
var wholeTrie = getWitness(db, trie.getRootHash(), "")
|
||||||
|
# this will return the whole trie
|
||||||
|
# ==> [A, B, C1, D1, C2, D2]
|
||||||
|
|
||||||
|
var node = branch[1] # B
|
||||||
|
let nodeHash = keccak256.digest(node.baseAddr, uint(node.len))
|
||||||
|
var nodes = getTrieNodes(db, nodeHash)
|
||||||
|
assert nodes.len == wholeTrie.len - 1
|
||||||
|
# ==> [B, C1, D1, C2, D2]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Remember the lie?
|
||||||
|
|
||||||
|
Because trie `delete`, `deleteSubtrie` and `set` operation create inaccessible nodes in the underlying DB,
|
||||||
|
we need to remove them if necessary. We already see that `wholeTrie = getWitness(db, trie.getRootHash(), "")`
|
||||||
|
will return the whole trie, a list of accessible nodes.
|
||||||
|
Then we can write the clean tree into a new DB instance to replace the old one.
|
||||||
|
|
||||||
|
|
||||||
|
## Sparse Merkle Trie
|
||||||
|
|
||||||
|
Sparse Merkle Trie(SMT) is a variant of Binary Trie which uses binary encoding to
|
||||||
|
represent path during trie travelsal. When Binary Trie uses three types of node,
|
||||||
|
SMT only use one type of node without any additional special encoding to store it's key-path.
|
||||||
|
|
||||||
|
Actually, it doesn't even store it's key-path anywhere like Binary Trie,
|
||||||
|
the key-path is stored implicitly in the trie structure during key-value insertion.
|
||||||
|
|
||||||
|
Because the key-path is not encoded in any special ways, the bits can be extracted directly from
|
||||||
|
the key without any conversion.
|
||||||
|
|
||||||
|
However, the key restricted to a fixed length because the algorithm demand a fixed height trie
|
||||||
|
to works properly. In this case, the trie height is limited to 160 level,
|
||||||
|
or the key is of fixed length 20 bytes (8 bits x 20 = 160).
|
||||||
|
|
||||||
|
To be able to use variable length key, the algorithm can be adapted slightly using hashed key before
|
||||||
|
constructing the binary key-path. For example, if using keccak256 as the hashing function,
|
||||||
|
then the height of the tree will be 256, but the key itself can be any length.
|
||||||
|
|
||||||
|
### The API
|
||||||
|
|
||||||
|
The primary API for Binary-trie is `set` and `get`.
|
||||||
|
* set(key, value, rootHash[optional]) --- _store a value associated with a key_
|
||||||
|
* get(key, rootHash[optional]): value --- _get a value using a key_
|
||||||
|
|
||||||
|
Both `key` and `value` are of `BytesRange` type. And they cannot have zero length.
|
||||||
|
You can also use convenience API `get` and `set` which accepts
|
||||||
|
`Bytes` or `string` (a `string` is conceptually wrong in this context
|
||||||
|
and may costlier than a `BytesRange`, but it is good for testing purpose).
|
||||||
|
|
||||||
|
rootHash is an optional parameter. When used, `get` will get a key from specific root,
|
||||||
|
and `set` will also set a key at specific root.
|
||||||
|
|
||||||
|
Getting a non-existent key will return zero length BytesRange or a zeroBytesRange.
|
||||||
|
|
||||||
|
Sparse Merkle Trie also provide dictionary syntax API for `set` and `get`.
|
||||||
|
* trie[key] = value -- same as `set`
|
||||||
|
* value = trie[key] -- same as `get`
|
||||||
|
* contains(key) a.k.a. `in` operator
|
||||||
|
|
||||||
|
Additional APIs are:
|
||||||
|
* exists(key) -- returns `bool`, to check key-value existence -- same as contains
|
||||||
|
* delete(key) -- remove a key-value from the trie
|
||||||
|
* getRootHash(): `KeccakHash` with `BytesRange` type
|
||||||
|
* getDB(): `DB` -- get flat-db pointer
|
||||||
|
* prove(key, rootHash[optional]): proof -- useful for merkling
|
||||||
|
|
||||||
|
Constructor API:
|
||||||
|
* initSparseBinaryTrie(DB, rootHash[optional])
|
||||||
|
* init(SparseBinaryTrie, DB, rootHash[optional])
|
||||||
|
|
||||||
|
Normally you would not set the rootHash when constructing an empty Sparse Merkle Trie.
|
||||||
|
Setting the rootHash occured in a scenario where you have a populated DB
|
||||||
|
with existing trie structure and you know the rootHash,
|
||||||
|
and then you want to continue/resume the trie operations.
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
```Nim
|
||||||
|
import
|
||||||
|
eth_trie/[db, sparse_binary, utils]
|
||||||
|
|
||||||
|
var
|
||||||
|
db = newMemoryDB()
|
||||||
|
trie = initSparseMerkleTrie(db)
|
||||||
|
|
||||||
|
let
|
||||||
|
key1 = "01234567890123456789"
|
||||||
|
key2 = "abcdefghijklmnopqrst"
|
||||||
|
|
||||||
|
trie.set(key1, "value1")
|
||||||
|
trie.set(key2, "value2")
|
||||||
|
assert trie.get(key1) == "value1".toRange
|
||||||
|
assert trie.get(key2) == "value2".toRange
|
||||||
|
|
||||||
|
trie.delete(key1)
|
||||||
|
assert trie.get(key1) == zeroBytesRange
|
||||||
|
|
||||||
|
trie.delete(key2)
|
||||||
|
assert trie[key2] == zeroBytesRange
|
||||||
|
```
|
||||||
|
|
||||||
|
Remember, `set` and `get` are trie operations. A single `set` operation may invoke
|
||||||
|
more than one store/lookup operation into the underlying DB. The same is also happened to `get` operation,
|
||||||
|
it could do more than one flat-db lookup before it return the requested value.
|
||||||
|
While Binary Trie perform a variable numbers of lookup and store operations, Sparse Merkle Trie
|
||||||
|
will do constant numbers of lookup and store operations each `get` and `set` operation.
|
||||||
|
|
||||||
|
## Merkle Proofing
|
||||||
|
|
||||||
|
Using ``prove`` dan ``verifyProof`` API, we can do some merkling with SMT.
|
||||||
|
|
||||||
|
```Nim
|
||||||
|
let
|
||||||
|
value1 = "hello world"
|
||||||
|
badValue = "bad value"
|
||||||
|
|
||||||
|
trie[key1] = value1
|
||||||
|
var proof = trie.prove(key1)
|
||||||
|
|
||||||
|
assert verifyProof(proof, trie.getRootHash(), key1, value1) == true
|
||||||
|
assert verifyProof(proof, trie.getRootHash(), key1, badValue) == false
|
||||||
|
assert verifyProof(proof, trie.getRootHash(), key2, value1) == false
|
||||||
|
```
|
||||||
|
|
Loading…
Reference in New Issue