Experimental MP-trie (#1573)

* Experimental MP-trie why: Deleting records is a infeasible with the current structure * Added vertex ID recycling management Todo: Provide some unit tests * DB layout update why: Main news is the separation of `Merkel` hashes into an extra table. details: The code fragments cover conversion between compact MPT records and Aristo DB records as well as some rudimentary cache handling for the `Merkel` hashes (i.e. the extra table entries.) todo: Add some simple unit test for the descriptor record (currently used for vertex ID management, only.) * Updated vertex ID recycling management details: added simple unit tests (mainly testing ABI) * docu update
2023-05-11 15:25:29 +01:00 · 2023-05-11 15:25:29 +01:00 · 605739ef4c
parent 2871dbfddf
commit 605739ef4c
11 changed files with 1772 additions and 0 deletions
--- a/nimbus/db/aristo/.gitignore
+++ b/nimbus/db/aristo/.gitignore
@ -0,0 +1 @@
 README.html
--- a/nimbus/db/aristo/README.md
+++ b/nimbus/db/aristo/README.md
@ -0,0 +1,290 @@
 Aristo Trie -- a Patricia Trie with Merkle hash labeled edges
 =============================================================
 These data structures allows to overlay the *Patricia Trie* with *Merkel
 Trie* hashes. With a particular layout, the structure is called
 and *Aristo Trie* (Patricia = Roman Aristocrat, Patrician.)
 This description does assume familiarity with the abstract notion of a hexary
 *Merkle Patricia [Trie](https://en.wikipedia.org/wiki/Trie)*. Suffice it to
 say the state of a valid *Merkle Patricia Tree* is uniquely verified by its
 top level vertex.
 1. Deleting entries in a compact *Merkle Patricia Tree*
 -------------------------------------------------------
 The main feature of the *Aristo Trie* representation is that there are no
 double used nodes any sub-trie as it happens with the representation as a
 [compact Merkle Patricia Tree](http://archive.is/TinyK). For example,
 consider the following state data for the latter.
      leaf = (0xf,0x12345678)                                            (1)
      branch = (a,a,a,,, ..) with a = hash(leaf)
      root = hash(branch)
 These two nodes, called *leaf* and *branch*, and the *root* hash are a state
 (aka key-value pairs) representation as a *compact Merkle Patricia Tree*. The
 actual state is
      0x0f ==> 0x12345678
      0x1f ==> 0x12345678
      0x2f ==> 0x12345678
 The elements from *(1)* can be organised in a key-value table with the *Merkle*
 hashes as lookup keys
      a    -> leaf
      root -> branch
 This is a space efficient way of keeping data as there is no duplication of
 the sub-trees made up by the *Leaf* node with the same payload *0x12345678*
 and path snippet *0xf*. One can imagine how this property applies to more
 general sub-trees in a similar fashion.
 Now delete some key-value pair of the state, e.g. for the key *0x0f*. This
 amounts to removing the first of the three *a* hashes from the *branch*
 record. The new state of the *Merkle Patricia Tree* will look like
      leaf = (0xf,0x12345678)                                            (2)
      branch1 = (,a,a,,, ..)
      root1 = hash(branch1)
      a     -> leaf
      root1 -> branch1
 A problem arises when all keys are deleted and there is no reference to the
 *leaf* data record, anymore. One should find out in general when it can be
 deleted, too. It might be unknown whether the previous states leading to here
 had only a single *Branch* record referencing to this *leaf* data record.
 Finding a stale data record can be achieved by a *mark and sweep* algorithm,
 but it becomes too clumsy to be useful on a large state (i.e. database).
 Reference counts come to mind but maintaining these is generally error prone
 when actors concurrently manipulate the state (i.e. database).
 2. *Patricia Trie* example with *Merkle hash* labelled edges
 ------------------------------------------------------------
 Continuing with the example from chapter 1, the *branch* node is extended by
 an additional set of structural identifiers *x, w, z*. It allows to handle
 the deletion of entries in a more benign way while keeping the *Merkle hashes*
 for validating sub-trees.
 A solution for the deletion problem is to represent the situation *(1)* as
      leaf-a = (0xf,0x12345678) copy of leaf from (1)                    (3)
      leaf-b = (0xf,0x12345678) copy of leaf from (1)
      leaf-c = (0xf,0x12345678) copy of leaf from (1)
      branch2 = ((x,y,z,,, ..)(a,b,c,,, ..))
      root2 = (w,root) with root from (1)
 where
      a = hash(leaf-a) same as a from (1)
      b = hash(leaf-b) same as a from (1)
      c = hash(leaf-c) same as a from (1)
      w,x,y,z numbers, mutually different
 The records above are stored in a key-value database as
      w -> branch2
      x -> leaf-a
      y -> leaf-b
      z -> leaf-c
 Then this structure encodes the key-value pairs as before
      0x0f ==> 0x12345678
      0x1f ==> 0x12345678
      0x2f ==> 0x12345678
 Deleting the data for key *0x0f* now results in the new state
      leaf-b = (0xf,0x12345678)                                          (4)
      leaf-c = (0xf,0x12345678)
      branch3 = ((,y,z,,, ..)(,b,c,,, ..))
      w -> branch3
      y -> leaf-b
      z -> leaf-c
 Due to duplication of the *leaf* node in *(3)*, no reference count is needed
 in order to detect stale records cleanly when deleting key *0x0f*. Removing
 this key allows to remove hash *a* from *branch2* as well as also structural
 key *x* which will consequently be deleted from the lookup table.
 A minor observation is that manipulating a state entry, e.g. changing the
 payload associated with key *0x0f* to
      0x0f ==> 0x987654321
 the structural layout of the above trie will not change, that is the indexes
 *w, x, y, z* of the table that holds the data records as values. All that
 changes are values.
      leaf-d = (0xf,0x987654321)                                         (5)
      leaf-b = (0xf,0x12345678)
      leaf-c = (0xf,0x12345678)
      branch3 = ((x,y,z,,, ..)(d,b,c,,, ..))
      root3 = (w,hash(d,b,c,,, ..))
 3. Discussion of the examples *(1)* and *(3)*
 ---------------------------------------------
 Examples *(1)* and *(3)* differ in that the structural *Patricia Trie*
 information from *(1)* has been removed from the *Merkle hash* instances and
 implemented as separate table lookup IDs (called *vertexID*s later on.) The
 values of these lookup IDs are arbitrary as long as they are all different.
 In fact, the [Erigon](http://archive.is/6MJV7) project discusses a similar
 situation in **Separation of keys and the structure**, albeit aiming for a
 another scenario with the goal of using mostly flat data lookup structures.
 A graph for the example *(1)* would look like
                |
               root
                |
         +-------------+
         |   branch    |
         +-------------+
              | | |
              a a a
              | | |
              leaf
 while example *(2)* has
              (root)                                                     (6)
                |
                w
                |
         +-------------+
         |   branch2   |
         | (a) (b) (c) |
         +-------------+
            /   |   \
           x    y    z
          /     |     \
       leaf-a leaf-b leaf-c
 The labels on the edges indicate the downward target of an edge while the
 round brackets enclose separated *Merkle hash* information.
 This last example (6) can be completely split into structural tree and Merkel
 hash mapping.
         structural trie              hash map                           (7)
         ---------------              --------
                |                  (root) -> w
                w                     (a) -> x
                |                     (b) -> y
         +-------------+              (c) -> z
         |   branch2   |
         +-------------+
            /   |   \
           x    y    z
          /     |     \
       leaf-a leaf-b leaf-c
 4. *Patricia Trie* node serialisation with *Merkle hash* labelled edges
 -----------------------------------------------------------------------
 The data structure for the *Aristo Trie* forllows example *(7)* by keeping
 structural information separate from the Merkle hash labels. As for teminology,
 * an *Aristo Trie* is a pair *(structural trie, hash map)* where
 * the *structural trie* realises a haxary *Patricia Trie* containing the payload
  values in the leaf records
 * the *hash map* contains the hash information so that this trie operates as a
  *Merkle Patricia Tree*.
 In order to accommodate for the additional structural elements, a non RLP-based
 data layout is used for the *Branch*, *Extension*, and *Leaf* containers used
 in the key-value table that implements the *Patricia Trie*. It is now called
 *Aristo Trie* for this particular data layout.
 The structural keys *w, x, y, z* from the example *(3)* are called *vertexID*
 and implemented as 64 bit values, stored *Big Endian* in the serialisation.
 ### Branch record serialisation
        0 +--+--+--+--+--+--+--+--+--+
          |                          |       -- first vertexID
        8 +--+--+--+--+--+--+--+--+--+
          ...                                -- more vertexIDs
 		  +--+--+
 		  |     |                            -- access(16) bitmap
          +--+--+
          || |                               -- marker(2) + unused(6)
          +--+
 	    where
 		  marker(2) is the double bit array 00
 For a given index *n* between *0..15*, if the bit at position *n* of the it
 vector *access(16)* is reset to zero, then there is no *n*-th structural
 *vertexID*. Otherwise one calculates
        the n-th vertexID is at position Vn * 8
        for Vn the number of non-zero bits in the range 0..(n-1) of access(16)
 Note that data are stored *Big Endian*, so the bits *0..7* of *access* are
 stored in the right byte of the serialised bitmap.
 ### Extension record serialisation
        0 +--+--+--+--+--+--+--+--+--+
          |                          |       -- vertexID
        8 +--+--+--+--+--+--+--+--+--+
          |  | ...                           -- path segment
          +--+
          || |                               -- marker(2) + pathSegmentLen(6)
          +--+
 	    where
 		  marker(2) is the double bit array 10
 The path segment of the *Extension* record is compact encoded. So it has at
 least one byte. The first byte *P0* has bit 5 reset, i.e. *P0 and 0x20* is
 zero (bit 4 is set if the right nibble is the first part of the path.)
 Note that the *pathSegmentLen(6)* is redunant as it is determined by the length
 of the extension record (as *recordLen - 9*.)
 ### Leaf record serialisation
        0 +-- ..
          ...                                -- payload (may be empty)
          +--+
          |  | ...                           -- path segment
          +--+
          || |                               -- marker(2) + pathSegmentLen(6)
          +--+
 	    where
 		  marker(2) is the double bit array 11
 A *Leaf* record path segment is compact encoded. So it has at least one byte.
 The first byte *P0* has bit 5 set, i.e. *P0 and 0x20* is non-zero (bit 4 is
 also set if the right nibble is the first part of the path.)
 ### Descriptor record serialisation
        0 +-- ..
          ...                                -- recycled vertexIDs
          +--+--+--+--+--+--+--+--+--+
          |                          |       -- bottom of unused vertexIDs
          +--+--+--+--+--+--+--+--+--+
          || |                               -- marker(2) + unused(6)
          +--+
 	    where
 		  marker(2) is the double bit array 01
 Currently, the descriptor record only contains data for producing unique
 vectorID values that can be used as structural keys. If this descriptor is
 missing, the value `(0x40000000,0x01)` is assumed. The last vertexID in the
 descriptor list has the property that that all values greater or equal than
 this value can be used as vertexID.
 The vertexIDs in the descriptor record must all be non-zero and record itself
 should be allocated in the structural table associated with the zero key.
--- a/nimbus/db/aristo/aristo_cache.nim
+++ b/nimbus/db/aristo/aristo_cache.nim
@ -0,0 +1,176 @@
 # nimbus-eth1
 # Copyright (c) 2021 Status Research & Development GmbH
 # Licensed under either of
 #  * Apache License, version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
 #    http://www.apache.org/licenses/LICENSE-2.0)
 #  * MIT license ([LICENSE-MIT](LICENSE-MIT) or
 #    http://opensource.org/licenses/MIT)
 # at your option. This file may not be copied, modified, or distributed
 # except according to those terms.
 {.push raises: [].}
 import
  std/tables,
  eth/common,
  stew/results,
  ../../sync/snap/range_desc,
  "."/[aristo_desc, aristo_error, aristo_transcode]
 # ------------------------------------------------------------------------------
 # Private helpers
 # ------------------------------------------------------------------------------
 proc convertPartially(
    db: AristoDbRef;
    vtx: VertexRef;
    nd: var NodeRef;
      ): seq[VertexID] =
  ## Returns true if completely converted by looking up the cached hashes.
  ## This function does not recurse. It will return the vertex IDs that are
  ## are missing in order to convert in a single step.
  case vtx.vType:
  of Leaf:
    nd = NodeRef(
      vType: Leaf,
      lPfx:  vtx.lPfx,
      lData: vtx.lData)
  of Extension:
    nd = NodeRef(
      vType: Extension,
      ePfx:  vtx.ePfx,
      eVtx:  vtx.eVtx)
    db.kMap.withValue(vtx.eVtx, keyPtr):
      nd.key[0] = keyPtr[]
      return
    result.add vtx.eVtx
  of Branch:
    nd = NodeRef(
      vType: Branch,
      bVtx:  vtx.bVtx)
    for n in 0..15:
      if vtx.bVtx[n].isZero:
        continue
      db.kMap.withValue(vtx.bVtx[n], kPtr):
        nd.key[n] = kPtr[]
        continue
      result.add vtx.bVtx[n]
 proc convertPartiallyOk(
    db: AristoDbRef;
    vtx: VertexRef;
    nd: var NodeRef;
      ): bool =
  ## Variant of `convertPartially()`, shortcut for `convertPartially().le==0`.
  case vtx.vType:
  of Leaf:
    nd = NodeRef(
      vType: Leaf,
      lPfx:  vtx.lPfx,
      lData: vtx.lData)
    result = true
  of Extension:
    nd = NodeRef(
      vType: Extension,
      ePfx:  vtx.ePfx,
      eVtx:  vtx.eVtx)
    db.kMap.withValue(vtx.eVtx, keyPtr):
      nd.key[0] = keyPtr[]
      result = true
  of Branch:
    nd = NodeRef(
      vType: Branch,
      bVtx:  vtx.bVtx)
    result = true
    for n in 0..15:
      if not vtx.bVtx[n].isZero:
        db.kMap.withValue(vtx.bVtx[n], kPtr):
          nd.key[n] = kPtr[]
          continue
        return false
 proc cachedVID(db: AristoDbRef; nodeKey: NodeKey): VertexID =
  ## Get vertex ID from reverse cache
  db.pAmk.withValue(nodeKey, vidPtr):
    return vidPtr[]
  result = VertexID.new(db)
  db.pAmk[nodeKey] = result
  db.kMap[result] = nodeKey
 # ------------------------------------------------------------------------------
 # Public functions for `VertexID` => `NodeKey` mapping
 # ------------------------------------------------------------------------------
 proc pal*(db: AristoDbRef; vid: VertexID): NodeKey =
  ## Retrieve the cached `Merkel` hash (aka `NodeKey` object) associated with
  ## the argument `VertexID` type argument `vid`. Return a zero `NodeKey` if
  ## there is none.
  ##
  ## If the vertex ID `vid` is not found in the cache, then the structural
  ## table is checked whether the cache can be updated.
  if not db.isNil:
    db.kMap.withValue(vid, keyPtr):
      return keyPtr[]
    db.sTab.withValue(vid, vtxPtr):
      var node: NodeRef
      if db.convertPartiallyOk(vtxPtr[],node):
        var w = initRlpWriter()
        w.append node
        result = w.finish.keccakHash.data.NodeKey
        db.kMap[vid] = result
 # ------------------------------------------------------------------------------
 # Public funcions extending/completing vertex records
 # ------------------------------------------------------------------------------
 proc updated*(nd: NodeRef; db: AristoDbRef): NodeRef =
  ## Return a copy of the argument node `nd` with updated missing vertex IDs.
  ##
  ## For a `Leaf` node, the payload data `PayloadRef` type reference is *not*
  ## duplicated and returned as-is.
  ##
  ## This function will not complain if all `Merkel` hashes (aka `NodeKey`
  ## objects) are zero for either `Extension` or `Leaf` nodes.
  if not nd.isNil:
    case nd.vType:
    of Leaf:
      result = NodeRef(
        vType: Leaf,
        lPfx:  nd.lPfx,
        lData: nd.lData)
    of Extension:
      result = NodeRef(
        vType:  Extension,
        ePfx:   nd.ePfx)
      if not nd.key[0].isZero:
        result.eVtx = db.cachedVID nd.key[0]
        result.key[0] = nd.key[0]
    of Branch:
      result = NodeRef(
        vType: Branch,
        key:   nd.key)
      for n in 0..15:
        if not nd.key[n].isZero:
          result.bVtx[n] = db.cachedVID nd.key[n]
 proc asNode*(vtx: VertexRef; db: AristoDbRef): NodeRef =
  ## Return a `NodeRef` object by augmenting missing `Merkel` hashes (aka
  ## `NodeKey` objects) from the cache or from calculated cached vertex
  ## entries, if available.
  ##
  ## If not all `Merkel` hashes are available in a single lookup, then the
  ## result object is a wrapper around an error code.
  if not db.convertPartiallyOk(vtx, result):
    return NodeRef(error: CacheMissingNodekeys)
 proc asNode*(rc: Result[VertexRef,AristoError]; db: AristoDbRef): NodeRef =
  ## Variant of `asNode()`.
  if rc.isErr:
    return NodeRef(error: rc.error)
  rc.value.asNode(db)
 # ------------------------------------------------------------------------------
 # End
 # ------------------------------------------------------------------------------
--- a/nimbus/db/aristo/aristo_debug.nim
+++ b/nimbus/db/aristo/aristo_debug.nim
@ -0,0 +1,175 @@
 # nimbus-eth1
 # Copyright (c) 2021 Status Research & Development GmbH
 # Licensed under either of
 #  * Apache License, version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
 #    http://www.apache.org/licenses/LICENSE-2.0)
 #  * MIT license ([LICENSE-MIT](LICENSE-MIT) or
 #    http://opensource.org/licenses/MIT)
 # at your option. This file may not be copied, modified, or distributed
 # except according to those terms.
 {.push raises: [].}
 import
  std/[sequtils, strutils],
  eth/[common, trie/nibbles],
  stew/byteutils,
  ../../sync/snap/range_desc,
  "."/aristo_desc
 const
  EMPTY_ROOT_KEY = EMPTY_ROOT_HASH.to(NodeKey)
  EMPTY_CODE_KEY = EMPTY_CODE_HASH.to(NodeKey)
 # ------------------------------------------------------------------------------
 # Ptivate functions
 # ------------------------------------------------------------------------------
 proc keyVidUpdate(db: AristoDbRef, key: NodeKey, vid: VertexID): string =
  if not key.isZero and
     not vid.isZero and
     not db.isNil:
    db.pAmk.withValue(key, vidRef):
      if vidRef[] != vid:
        result = "(!)"
      return
    db.xMap.withValue(key, vidRef):
      if vidRef[] == vid:
        result = "(!)"
      return
    db.xMap[key] = vid
 proc squeeze(s: string; hex = false; ignLen = false): string =
  ## For long strings print `begin..end` only
  if hex:
    let n = (s.len + 1) div 2
    result = if s.len < 20: s else: s[0 .. 5] & ".." & s[s.len-8 .. s.len-1]
    if not ignLen:
      result &= "[" & (if 0 < n: "#" & $n else: "") & "]"
  elif s.len <= 30:
    result = s
  else:
    result = if (s.len and 1) == 0: s[0 ..< 8] else: "0" & s[0 ..< 7]
    if not ignLen:
      result &= "..(" & $s.len & ")"
    result &= ".." & s[s.len-16 ..< s.len]
 proc stripZeros(a: string): string =
  for n in 0 ..< a.len:
    if a[n] != '0':
      return a[n .. ^1]
  return a
 proc ppVid(vid: VertexID): string =
  if vid.isZero: "ø" else: "$" & vid.uint64.toHex.stripZeros
 proc ppKey(key: NodeKey, db = AristoDbRef(nil)): string =
  if key.isZero:
    return "ø"
  if key == EMPTY_ROOT_KEY:
    return "£r"
  if key == EMPTY_CODE_KEY:
    return "£c"
  if not db.isNil:
    db.pAmk.withValue(key, pRef):
      return "£" & $pRef[]
    db.xMap.withValue(key, xRef):
      return "£" & $xRef[]
  "%" & ($key).squeeze(hex=true,ignLen=true)
 proc ppRootKey(a: NodeKey, db = AristoDbRef(nil)): string =
  if a != EMPTY_ROOT_KEY:
    return a.ppKey(db)
 proc ppCodeKey(a: NodeKey, db = AristoDbRef(nil)): string =
  if a != EMPTY_CODE_KEY:
    return a.ppKey(db)
 # ------------------------------------------------------------------------------
 # Public functions
 # ------------------------------------------------------------------------------
 proc keyToVtxID*(db: AristoDbRef, key: NodeKey): VertexID =
  ## Associate a vertex ID with the argument `key` for pretty printing.
  if not key.isZero and
     key != EMPTY_ROOT_KEY and
     key != EMPTY_CODE_KEY and
     not db.isNil:
    db.xMap.withValue(key, vidPtr):
      return vidPtr[]
    result = VertexID.new db
    db.xMap[key] = result
 proc pp*(vid: openArray[VertexID]): string =
  "[" & vid.mapIt(it.ppVid).join(",") & "]"
 proc pp*(p: PayloadRef, db = AristoDbRef(nil)): string =
  if p.isNil:
    result = "n/a"
  else:
    case p.pType:
    of BlobData:
      result &= p.blob.toHex.squeeze(hex=true)
    of AccountData:
      result = "("
      result &= $p.account.nonce & ","
      result &= $p.account.balance & ","
      result &= p.account.storageRoot.to(NodeKey).ppRootKey(db) & ","
      result &= p.account.codeHash.to(NodeKey).ppCodeKey(db) & ")"
 proc pp*(nd: VertexRef, db = AristoDbRef(nil)): string =
  if nd.isNil:
    result = "n/a"
  else:
    result = ["l(", "x(", "b("][nd.vType.ord]
    case nd.vType:
    of Leaf:
      result &= $nd.lPfx & "," & nd.lData.pp(db)
    of Extension:
      result &= $nd.ePfx & "," & nd.eVtx.ppVid
    of Branch:
      result &= "["
      for n in 0..15:
        if not nd.bVtx[n].isZero:
          result &= nd.bVtx[n].ppVid
        result &= ","
      result[^1] = ']'
    result &= ")"
 proc pp*(nd: NodeRef, db = AristoDbRef(nil)): string =
  if nd.isNil:
    result = "n/a"
  elif nd.isError:
    result = "(!" & $nd.error
  else:
    result = ["L(", "X(", "B("][nd.vType.ord]
    case nd.vType:
    of Leaf:
      result &= $nd.lPfx & "," & nd.lData.pp(db)
    of Extension:
      result &= $nd.ePfx & "," & nd.eVtx.ppVid & "," & nd.key[0].ppKey
    of Branch:
      result &= "["
      for n in 0..15:
        if not nd.bVtx[n].isZero or not nd.key[n].isZero:
          result &= nd.bVtx[n].ppVid
        result &= db.keyVidUpdate(nd.key[n], nd.bVtx[n]) & ","
      result[^1] = ']'
      result &= ",["
      for n in 0..15:
        if not nd.bVtx[n].isZero or not nd.key[n].isZero:
          result &= nd.key[n].ppKey(db)
        result &= ","
      result[^1] = ']'
  result &= ")"
 # ------------------------------------------------------------------------------
 # End
 # ------------------------------------------------------------------------------
--- a/nimbus/db/aristo/aristo_desc.nim
+++ b/nimbus/db/aristo/aristo_desc.nim
@ -0,0 +1,231 @@
 # nimbus-eth1
 # Copyright (c) 2021 Status Research & Development GmbH
 # Licensed under either of
 #  * Apache License, version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
 #    http://www.apache.org/licenses/LICENSE-2.0)
 #  * MIT license ([LICENSE-MIT](LICENSE-MIT) or
 #    http://opensource.org/licenses/MIT)
 # at your option. This file may not be copied, modified, or distributed
 # except according to those terms.
 ## Aristo DB -- a Patricia Trie with labeled edges
 ## ===============================================
 ##
 ## These data structures allows to overlay the *Patricia Trie* with *Merkel
 ## Trie* hashes. See the `README.md` in the `aristo` folder for documentation.
 {.push raises: [].}
 import
  std/tables,
  eth/[common, trie/nibbles],
  stew/results,
  ../../sync/snap/range_desc,
  ./aristo_error
 type
  VertexID* = distinct uint64      ## Tip of edge towards child, also table key
  VertexType* = enum               ## Type of Patricia Trie node
    Leaf
    Extension
    Branch
  PayloadType* = enum              ## Type of leaf data (to be extended)
    BlobData
    AccountData
  PayloadRef* = ref object
    case pType*: PayloadType
    of BlobData:
      blob*: Blob                  ## Opaque data value reference
    of AccountData:
      account*: Account            ## Expanded accounting data
  VertexRef* = ref object of RootRef
    ## Vertex for building a hexary Patricia or Merkle Patricia Trie
    case vType*: VertexType
    of Leaf:
      lPfx*: NibblesSeq            ## Portion of path segment
      lData*: PayloadRef           ## Reference to data payload
    of Extension:
      ePfx*: NibblesSeq            ## Portion of path segment
      eVtx*: VertexID              ## Edge to vertex with ID `eVtx`
    of Branch:
      bVtx*: array[16,VertexID]    ## Edge list with vertex IDs
  NodeRef* = ref object of VertexRef
    ## Combined record for a *traditional* ``Merkle Patricia Tree` node merged
    ## with a structural `VertexRef` type object.
    error*: AristoError            ## Can be used for error signalling
    key*: array[16,NodeKey]        ## Merkle hash(es) for Branch & Extension vtx
  PathStep* = object
    ## For constructing a tree traversal path
    # key*: NodeKey                ## Node label ??
    node*: VertexRef               ## Referes to data record
    nibble*: int8                  ## Branch node selector (if any)
    depth*: int                    ## May indicate path length (typically 64)
  Path* = object
    root*: VertexID                ## Root node needed when `path.len == 0`
    path*: seq[PathStep]           ## Chain of nodes
    tail*: NibblesSeq              ## Stands for non completed leaf path
  LeafSpecs* = object
    ## Temporarily stashed leaf data (as for an account.) Proper records
    ## have non-empty payload. Records with empty payload are administrative
    ## items, e.g. lower boundary records.
    pathTag*: NodeTag              ## `Patricia Trie` key path
    nodeVtx*: VertexID             ## Table lookup vertex ID (if any)
    payload*: PayloadRef           ## Reference to data payload
  GetFn* = proc(key: openArray[byte]): Blob
    {.gcsafe, raises: [CatchableError].}
      ## Persistent database `get()` function. For read-only cases, this
      ## function can be seen as the persistent alternative to ``tab[]` on
      ## a `HexaryTreeDbRef` descriptor.
  AristoDbRef* = ref object of RootObj
    ## Hexary trie plus helper structures
    sTab*: Table[VertexID,NodeRef] ## Structural vertex table making up a trie
    kMap*: Table[VertexID,NodeKey] ## Merkle hash key mapping
    pAmk*: Table[NodeKey,VertexID] ## Reverse mapper for data import
    vidGen*: seq[VertexID]         ## Unique vertex ID generator
    # Debugging data below, might go away in future
    xMap*: Table[NodeKey,VertexID] ## Mapper for pretty printing, extends `pAmk`
 static:
  # Not that there is no doubt about this ...
  doAssert NodeKey.default.ByteArray32.initNibbleRange.len == 64
 # ------------------------------------------------------------------------------
 # Public helpers: `VertexID` scalar data model
 # ------------------------------------------------------------------------------
 proc `<`*(a, b: VertexID): bool {.borrow.}
 proc `==`*(a, b: VertexID): bool {.borrow.}
 proc cmp*(a, b: VertexID): int {.borrow.}
 proc `$`*(a: VertexID): string = $a.uint64
 # ------------------------------------------------------------------------------
 # Public functions for `VertexID` management
 # ------------------------------------------------------------------------------
 proc new*(T: type VertexID; db: AristoDbRef): T =
  ## Create a new `VertexID`. Reusable *ID*s are kept in a list where the top
  ## entry *ID0* has the property that any other *ID* larger *ID0* is also not
  ## not used on the database.
  case db.vidGen.len:
  of 0:
    db.vidGen = @[2.VertexID]
    result = 1.VertexID
  of 1:
    result = db.vidGen[^1]
    db.vidGen = @[(result.uint64 + 1).VertexID]
  else:
    result = db.vidGen[^2]
    db.vidGen[^2] = db.vidGen[^1]
    db.vidGen.setLen(db.vidGen.len-1)
 proc peek*(T: type VertexID; db: AristoDbRef): T =
  ## Like `new()` without consuming this *ID*. It will return the *ID* that
  ## would be returned by the `new()` function.
  if db.vidGen.len == 0: 1u64 else: db.vidGen[^1]
 proc dispose*(db: AristoDbRef; vtxID: VertexID) =
  ## Recycle the argument `vtxID` which is useful after deleting entries from
  ## the vertex table to prevent the `VertexID` type key values small.
  if db.vidGen.len == 0:
    db.vidGen = @[vtxID]
  else:
    let topID = db.vidGen[^1]
    # No need to store smaller numbers: all numberts larger than `topID`
    # are free numbers
    if vtxID < topID:
      db.vidGen[^1] = vtxID
      db.vidGen.add topID
 # ------------------------------------------------------------------------------
 # Public helpers: `NodeRef` and `PayloadRef`
 # ------------------------------------------------------------------------------
 proc `==`*(a, b: PayloadRef): bool =
  ## Beware, potential deep comparison
  if a.isNil:
    return b.isNil
  if b.isNil:
    return false
  if unsafeAddr(a) != unsafeAddr(b):
    if a.pType != b.pType:
      return false
    case a.pType:
    of BlobData:
      if a.blob != b.blob:
        return false
    of AccountData:
      if a.account != b.account:
        return false
  true
 proc `==`*(a, b: VertexRef): bool =
  ## Beware, potential deep comparison
  if a.isNil:
    return b.isNil
  if b.isNil:
    return false
  if unsafeAddr(a[]) != unsafeAddr(b[]):
    if a.vType != b.vType:
      return false
    case a.vType:
    of Leaf:
      if a.lPfx != b.lPfx or a.lData != b.lData:
        return false
    of Extension:
      if a.ePfx != b.ePfx or a.eVtx != b.eVtx:
        return false
    of Branch:
      for n in 0..15:
        if a.bVtx[n] != b.bVtx[n]:
          return false
  true
 proc `==`*(a, b: NodeRef): bool =
  ## Beware, potential deep comparison
  if a.VertexRef != b.VertexRef:
    return false
  case a.vType:
  of Extension:
    if a.key[0] != b.key[0]:
      return false
  of Branch:
    for n in 0..15:
      if a.bVtx[n] != 0.VertexID and a.key[n] != b.key[n]:
        return false
  else:
    discard
  true
 # ------------------------------------------------------------------------------
 # Public helpers, miscellaneous functions
 # ------------------------------------------------------------------------------
 proc isZero*[T: NodeKey|VertexID](a: T): bool =
  a == typeof(a).default
 proc isError*(a: NodeRef): bool =
  a.error != AristoError(0)
 proc convertTo*(payload: PayloadRef; T: type Blob): T =
  ## Probably lossy conversion as the storage type `kind` gets missing
  case payload.pType:
  of BlobData:
    result = payload.blob
  of AccountData:
    result = rlp.encode payload.account
 # ------------------------------------------------------------------------------
 # End
 # ------------------------------------------------------------------------------
--- a/nimbus/db/aristo/aristo_error.nim
+++ b/nimbus/db/aristo/aristo_error.nim
@ -0,0 +1,50 @@
 # nimbus-eth1
 # Copyright (c) 2021 Status Research & Development GmbH
 # Licensed under either of
 #  * Apache License, version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
 #    http://www.apache.org/licenses/LICENSE-2.0)
 #  * MIT license ([LICENSE-MIT](LICENSE-MIT) or
 #    http://opensource.org/licenses/MIT)
 # at your option. This file may not be copied, modified, or distributed
 # except according to those terms.
 type
  AristoError* = enum
    NothingSerious = 0
    # Rlp decoder, `fromRlpRecord()`
    Rlp2Or17ListEntries
    RlpBlobExpected
    RlpBranchLinkExpected
    RlpExtPathEncoding
    RlpNonEmptyBlobExpected
    RlpEmptyBlobExpected
    RlpRlpException
    RlpOtherException
    # Db record decoder, `fromDbRecord()`
    DbrNilArgument
    DbrUnknown
    DbrTooShort
    DbrBranchTooShort
    DbrBranchSizeGarbled
    DbrBranchInxOutOfRange
    DbrExtTooShort
    DbrExtSizeGarbled
    DbrExtGotLeafPrefix
    DbrLeafSizeGarbled
    DbrLeafGotExtPrefix
    # Db admin data decoder, `fromAristoDb()`
    ADbGarbledSize
    ADbWrongType
    # Db record encoder, `toDbRecord()`
    VtxExPathOverflow
    VtxLeafPathOverflow
    # Converter `asNode()`
    CacheMissingNodekeys
 # End
--- a/nimbus/db/aristo/aristo_transcode.nim
+++ b/nimbus/db/aristo/aristo_transcode.nim
@ -0,0 +1,322 @@
 # nimbus-eth1
 # Copyright (c) 2021 Status Research & Development GmbH
 # Licensed under either of
 #  * Apache License, version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
 #    http://www.apache.org/licenses/LICENSE-2.0)
 #  * MIT license ([LICENSE-MIT](LICENSE-MIT) or
 #    http://opensource.org/licenses/MIT)
 # at your option. This file may not be copied, modified, or distributed
 # except according to those terms.
 {.push raises: [].}
 import
  std/[bitops, sequtils],
  eth/[common, trie/nibbles],
  stew/results,
  ../../sync/snap/range_desc,
  "."/[aristo_desc, aristo_error]
 const
  EmptyBlob = seq[byte].default
    ## Useful shortcut (borrowed from `sync/snap/constants.nim`)
 # ------------------------------------------------------------------------------
 # Private functions
 # ------------------------------------------------------------------------------
 proc aristoError(error: AristoError): NodeRef =
  ## Allows returning de
  NodeRef(vType: Leaf, error: error)
 # ------------------------------------------------------------------------------
 # Public RLP transcoder mixins
 # ------------------------------------------------------------------------------
 proc read*(
    rlp: var Rlp;
    T: type NodeRef;
      ): T {.gcsafe, raises: [RlpError]} =
  ## Mixin for RLP writer, see `fromRlpRecord()` for an encoder with detailed
  ## error return code (if needed.) This reader is a jazzed up version which
  ## reports some particular errors in the `Dummy` type node.
  if not rlp.isList:
    # Otherwise `rlp.items` would raise a `Defect`
    return aristoError(Rlp2Or17ListEntries)
  var
    blobs = newSeq[Blob](2)         # temporary, cache
    links: array[16,NodeKey]        # reconstruct branch node
    top = 0                         # count entries and positions
  # Collect lists of either 2 or 17 blob entries.
  for w in rlp.items:
    case top
    of 0, 1:
      if not w.isBlob:
        return aristoError(RlpBlobExpected)
      blobs[top] = rlp.read(Blob)
    of 2 .. 15:
      if not links[top].init(rlp.read(Blob)):
        return aristoError(RlpBranchLinkExpected)
    of 16:
      if not w.isBlob:
        return aristoError(RlpBlobExpected)
      if 0 < rlp.read(Blob).len:
        return aristoError(RlpEmptyBlobExpected)
    else:
      return aristoError(Rlp2Or17ListEntries)
    top.inc
  # Verify extension data
  case top
  of 2:
    if blobs[0].len == 0:
      return aristoError(RlpNonEmptyBlobExpected)
    let (isLeaf, pathSegment) = hexPrefixDecode blobs[0]
    if isLeaf:
      return NodeRef(
        vType:   Leaf,
        lPfx:    pathSegment,
        lData:   PayloadRef(
          pType: BlobData,
          blob:  blobs[1]))
    else:
      var node = NodeRef(
        vType: Extension,
        ePfx:  pathSegment)
      if not node.key[0].init(blobs[1]):
        return aristoError(RlpExtPathEncoding)
      return node
  of 17:
    for n in [0,1]:
      if not links[n].init(blobs[n]):
        return aristoError(RlpBranchLinkExpected)
    return NodeRef(
      vType: Branch,
      key:   links)
  else:
    discard
  aristoError(Rlp2Or17ListEntries)
 proc append*(writer: var RlpWriter; node: NodeRef) =
  ## Mixin for RLP writer. Note that a `Dummy` node is encoded as an empty
  ## list.
  proc addNodeKey(writer: var RlpWriter; key: NodeKey) =
    if key.isZero:
      writer.append EmptyBlob
    else:
      writer.append key.to(Hash256)
  if node.isError:
    writer.startList(0)
  else:
    case node.vType:
    of Branch:
      writer.startList(17)
      for n in 0..15:
        writer.addNodeKey node.key[n]
      writer.append EmptyBlob
    of Extension:
      writer.startList(2)
      writer.append node.ePfx.hexPrefixEncode(isleaf = false)
      writer.addNodeKey node.key[0]
    of Leaf:
      writer.startList(2)
      writer.append node.lPfx.hexPrefixEncode(isleaf = true)
      writer.append node.lData.convertTo(Blob)
 # ------------------------------------------------------------------------------
 # Public db record transcoders
 # ------------------------------------------------------------------------------
 proc blobify*(node: VertexRef; data: var Blob): AristoError =
  ## This function serialises the node argument to a database record. Contrary
  ## to RLP based serialisation, these records aim to align on fixed byte
  ## boundaries.
  ## ::
  ##   Branch:
  ##     uint64, ...    -- list of up to 16 child nodes lookup keys
  ##     uint16         -- index bitmap
  ##     0x00           -- marker(2) + unused(2)
  ##
  ##   Extension:
  ##     uint64         -- child node lookup key
  ##     Blob           -- hex encoded partial path (at least one byte)
  ##     0x80           -- marker(2) + unused(2)
  ##
  ##   Leaf:
  ##     Blob           -- opaque leaf data payload (might be zero length)
  ##     Blob           -- hex encoded partial path (at least one byte)
  ##     0xc0           -- marker(2) + partialPathLen(6)
  ##
  ## For a branch record, the bytes of the `access` array indicate the position
  ## of the Patricia Trie node reference. So the `vertexID` with index `n` has
  ## ::
  ##   8 * n * ((access shr (n * 4)) and 15)
  ##
  case node.vType:
  of Branch:
    var
      top = 0u64
      access = 0u16
      refs: Blob
      keys: Blob
    for n in 0..15:
      if not node.bVtx[n].isZero:
        access = access or (1u16 shl n)
        refs &= node.bVtx[n].uint64.toBytesBE.toSeq
    data = refs & access.toBytesBE.toSeq & @[0u8]
  of Extension:
    let
      pSegm = node.ePfx.hexPrefixEncode(isleaf = false)
      psLen = pSegm.len.byte
    if psLen == 0 or 33 < pslen:
      return VtxExPathOverflow
    data = node.eVtx.uint64.toBytesBE.toSeq & pSegm & @[0x80u8 or psLen]
  of Leaf:
    let
      pSegm = node.lPfx.hexPrefixEncode(isleaf = true)
      psLen = pSegm.len.byte
    if psLen == 0 or 33 < psLen:
      return VtxLeafPathOverflow
    data = node.lData.convertTo(Blob) & pSegm & @[0xC0u8 or psLen]
 proc blobify*(node: VertexRef): Result[Blob, AristoError] =
  ## Variant of `blobify()`
  var
    data: Blob
    info = node.blobify data
  if info != AristoError(0):
    return err(info)
  ok(data)
 proc blobify*(db: AristoDbRef; data: var Blob) =
  ## This function serialises some maintenance data for the `AristoDb`
  ## descriptor. At the moment, this contains the recycliing table for the
  ## `VertexID` values, only.
  ##
  ## This data recoed is supposed to be stored as the table value with the
  ## zero key for persistent tables.
  ## ::
  ##   Admin:
  ##     uint64, ...    -- list of IDs
  ##     0x40
  ##
  data.setLen(0)
  for w in db.vidGen:
    data &= w.uint64.toBytesBE.toSeq
  data.add 0x40u8
 proc blobify*(db: AristoDbRef): Blob =
  ## Variant of `toDescRecord()`
  db.blobify result
 proc deblobify*(record: Blob; vtx: var VertexRef): AristoError =
  ## De-serialise a data record encoded with `blobify()`. The second
  ## argument `vtx` can be `nil`.
  if record.len < 3:                                  # minimum `Leaf` record
    return DbrTooShort
  case record[^1] shr 6:
  of 0: # `Branch` node
    if record.len < 19:                               # at least two edges
      return DbrBranchTooShort
    if (record.len mod 8) != 3:
      return DbrBranchSizeGarbled
    let
      maxOffset = record.len - 11
      aInx = record.len - 3
      aIny = record.len - 2
    var
      offs = 0
      access = uint16.fromBytesBE record[aInx..aIny]  # bitmap
      vtxList: array[16,VertexID]
    while access != 0:
      if maxOffset < offs:
        return DbrBranchInxOutOfRange
      let n = access.firstSetBit - 1
      access.clearBit n
      vtxList[n] = (uint64.fromBytesBE record[offs ..< offs+8]).VertexID
      offs += 8
      # End `while`
    vtx = VertexRef(
      vType: Branch,
      bVtx:  vtxList)
  of 2: # `Extension` node
    let
      sLen = record[^1].int and 0x3f                  # length of path segment
      rlen = record.len - 1                           # `vertexID` + path segm
    if record.len < 10:
      return DbrExtTooShort
    if 8 + sLen != rlen:                              # => slen is at least 1
      return DbrExtSizeGarbled
    let (isLeaf, pathSegment) = hexPrefixDecode record[8 ..< rLen]
    if isLeaf:
      return DbrExtGotLeafPrefix
    vtx = VertexRef(
      vType: Extension,
      eVtx:  (uint64.fromBytesBE record[0 ..< 8]).VertexID,
      ePfx:  pathSegment)
  of 3: # `Leaf` node
    let
      sLen = record[^1].int and 0x3f                  # length of path segment
      rlen = record.len - 1                           # payload + path segment
      pLen = rLen - sLen                              # payload length
    if rlen < sLen:
      return DbrLeafSizeGarbled
    let (isLeaf, pathSegment) = hexPrefixDecode record[pLen ..< rLen]
    if not isLeaf:
      return DbrLeafGotExtPrefix
    vtx = VertexRef(
      vType:   Leaf,
      lPfx:    pathSegment,
      lData:   PayloadRef(
        pType: BlobData,
        blob:  record[0 ..< plen]))
  else:
    return DbrUnknown
 proc deblobify*(data: Blob; db: var AristoDbRef): AristoError =
  ## De-serialise the data record encoded with `blobify()`. The second
  ## argument `db` can be `nil` in which case a new `AristoDbRef` type
  ## descriptor will be created.
  if db.isNil:
    db = AristoDbRef()
  if data.len == 0:
    db.vidGen = @[1.VertexID]
  else:
    if (data.len mod 8) != 1:
      return ADbGarbledSize
    if data[^1] shr 6 != 1:
      return ADbWrongType
    for n in 0 ..< (data.len div 8):
      let w = n * 8
      db.vidGen.add (uint64.fromBytesBE data[w ..< w + 8]).VertexID
 proc deblobify*[W: VertexRef|AristoDbRef](
    record: Blob;
    T: type W;
      ): Result[T,AristoError] =
  ## Variant of `deblobify()` for either `VertexRef` or `AristoDbRef`
  var obj: T # isNil, will be auto-initialised
  let info = record.deblobify obj
  if info != AristoError(0):
    return err(info)
  ok(obj)
 proc deblobify*(record: Blob): Result[VertexRef,AristoError] =
  ## Default variant of `deblobify()` for `VertexRef`.
  record.deblobify VertexRef
 # ------------------------------------------------------------------------------
 # End
 # ------------------------------------------------------------------------------
--- a/tests/all_tests.nim
+++ b/tests/all_tests.nim
@ -12,6 +12,7 @@ import ../test_macro
 cliBuilder:
  import  ./test_code_stream,
          ./test_accounts_cache,
          ./test_aristo,
          ./test_custom_network,
          ./test_sync_snap,
          ./test_rocksdb_timing,
--- a/tests/test_aristo.nim
+++ b/tests/test_aristo.nim
@ -0,0 +1,221 @@
 # Nimbus - Types, data structures and shared utilities used in network sync
 #
 # Copyright (c) 2018-2021 Status Research & Development GmbH
 # Licensed under either of
 #  * Apache License, version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
 #    http://www.apache.org/licenses/LICENSE-2.0)
 #  * MIT license ([LICENSE-MIT](LICENSE-MIT) or
 #    http://opensource.org/licenses/MIT)
 # at your option. This file may not be copied, modified, or
 # distributed except according to those terms.
 ## Re-invented implementation for Merkle Patricia Tree named as Aristo Trie
 import
  std/[os, strformat, strutils],
  chronicles,
  eth/[common, p2p],
  rocksdb,
  unittest2,
  ../nimbus/db/select_backend,
  ../nimbus/db/aristo/[aristo_desc],
  ../nimbus/core/chain,
  ../nimbus/sync/snap/worker/db/[
    hexary_desc, rocky_bulk_load, snapdb_accounts, snapdb_desc],
  ./replay/[pp, undump_accounts],
  ./test_sync_snap/[snap_test_xx, test_accounts, test_types],
  ./test_aristo/[test_transcode]
 const
  baseDir = [".", "..", ".."/"..", $DirSep]
  repoDir = [".", "tests", "nimbus-eth1-blobs"]
  subDir = ["replay", "test_sync_snap", "replay"/"snap"]
  # Reference file for finding the database directory
  sampleDirRefFile = "sample0.txt.gz"
  # Standard test samples
  accSample = snapTest0
  # Number of database slots available
  nTestDbInstances = 9
  # Dormant (may be set if persistent database causes problems)
  disablePersistentDB = false
 type
  TestDbs = object
    ## Provide enough spare empty databases
    persistent: bool
    dbDir: string
    baseDir: string # for cleanup
    subDir: string  # for cleanup
    cdb: array[nTestDbInstances,ChainDb]
 # ------------------------------------------------------------------------------
 # Helpers
 # ------------------------------------------------------------------------------
 proc findFilePath(
     file: string;
     baseDir: openArray[string] = baseDir;
     repoDir: openArray[string] = repoDir;
     subDir: openArray[string] = subDir;
       ): Result[string,void] =
  for dir in baseDir:
    if dir.dirExists:
      for repo in repoDir:
        if (dir / repo).dirExists:
          for sub in subDir:
            if (dir / repo / sub).dirExists:
              let path = dir / repo / sub / file
              if path.fileExists:
                return ok(path)
  echo "*** File not found \"", file, "\"."
  err()
 proc getTmpDir(sampleDir = sampleDirRefFile): string =
  sampleDir.findFilePath.value.splitFile.dir
 proc setTraceLevel {.used.} =
  discard
  when defined(chronicles_runtime_filtering) and loggingEnabled:
    setLogLevel(LogLevel.TRACE)
 proc setErrorLevel {.used.} =
  discard
  when defined(chronicles_runtime_filtering) and loggingEnabled:
    setLogLevel(LogLevel.ERROR)
 # ------------------------------------------------------------------------------
 # Private functions
 # ------------------------------------------------------------------------------
 proc to(sample: AccountsSample; T: type seq[UndumpAccounts]): T =
  ## Convert test data into usable in-memory format
  let file = sample.file.findFilePath.value
  var root: Hash256
  for w in file.undumpNextAccount:
    let n = w.seenAccounts - 1
    if n < sample.firstItem:
      continue
    if sample.lastItem < n:
      break
    if sample.firstItem == n:
      root = w.root
    elif w.root != root:
      break
    result.add w
 proc flushDbDir(s: string; subDir = "") =
  if s != "":
    let baseDir = s / "tmp"
    for n in 0 ..< nTestDbInstances:
      let instDir = if subDir == "": baseDir / $n else: baseDir / subDir / $n
      if (instDir / "nimbus" / "data").dirExists:
        # Typically under Windows: there might be stale file locks.
        try: instDir.removeDir except CatchableError: discard
    try: (baseDir / subDir).removeDir except CatchableError: discard
    block dontClearUnlessEmpty:
      for w in baseDir.walkDir:
        break dontClearUnlessEmpty
      try: baseDir.removeDir except CatchableError: discard
 proc flushDbs(db: TestDbs) =
  if db.persistent:
    for n in 0 ..< nTestDbInstances:
      if db.cdb[n].rocksStoreRef.isNil:
        break
      db.cdb[n].rocksStoreRef.store.db.rocksdb_close
    db.baseDir.flushDbDir(db.subDir)
 proc testDbs(
    workDir: string;
    subDir: string;
    instances: int;
    persistent: bool;
      ): TestDbs =
  if disablePersistentDB or workDir == "" or not persistent:
    result.persistent = false
    result.dbDir = "*notused*"
  else:
    result.persistent = true
    result.baseDir = workDir
    result.subDir = subDir
    if subDir != "":
      result.dbDir = workDir / "tmp" / subDir
    else:
      result.dbDir = workDir / "tmp"
  if result.persistent:
    workDir.flushDbDir(subDir)
    for n in 0 ..< min(result.cdb.len, instances):
      result.cdb[n] = (result.dbDir / $n).newChainDB
 proc snapDbRef(cdb: ChainDb; pers: bool): SnapDbRef =
  if pers: SnapDbRef.init(cdb) else: SnapDbRef.init(newMemoryDB())
 proc snapDbAccountsRef(cdb:ChainDb; root:Hash256; pers:bool):SnapDbAccountsRef =
  SnapDbAccountsRef.init(cdb.snapDbRef(pers), root, Peer())
 # ------------------------------------------------------------------------------
 # Test Runners: accounts and accounts storages
 # ------------------------------------------------------------------------------
 proc trancodeRunner(noisy  = true; sample = accSample; stopAfter = high(int)) =
  let
    accLst = sample.to(seq[UndumpAccounts])
    root = accLst[0].root
    tmpDir = getTmpDir()
    db = tmpDir.testDbs(sample.name & "-accounts", instances=2, persistent=true)
    info = if db.persistent: &"persistent db on \"{db.baseDir}\""
           else: "in-memory db"
    fileInfo = sample.file.splitPath.tail.replace(".txt.gz","")
  defer:
    db.flushDbs
  suite &"Aristo: transcoding {fileInfo} accounts and proofs for {info}":
    test &"Trancoding VertexID recyling lists (seed={accLst.len})":
      noisy.test_transcodeVidRecycleLists(accLst.len)
    # New common descriptor for this sub-group of tests
    let
      desc = db.cdb[0].snapDbAccountsRef(root, db.persistent)
      hexaDb = desc.hexaDb
      getFn = desc.getAccountFn
      dbg = if noisy: hexaDb else: nil
    # Borrowed from `test_sync_snap/test_accounts.nim`
    test &"Importing {accLst.len} list items to persistent database":
      if db.persistent:
        accLst.test_accountsImport(desc, true)
      else:
        skip()
    test "Trancoding database records: RLP, NodeRef, Blob, VertexRef":
      noisy.showElapsed("test_transcoder()"):
        noisy.test_transcodeAccounts(db.cdb[0].rocksStoreRef, stopAfter)
 # ------------------------------------------------------------------------------
 # Main function(s)
 # ------------------------------------------------------------------------------
 proc aristoMain*(noisy = defined(debug)) =
  noisy.trancodeRunner()
 when isMainModule:
  const
    noisy = defined(debug) or true
  # Borrowed from `test_sync_snap.nim`
  when true: # and false:
    for n,sam in snapTestList:
      noisy.trancodeRunner(sam)
    for n,sam in snapTestStorageList:
      noisy.trancodeRunner(sam)
 # ------------------------------------------------------------------------------
 # End
 # ------------------------------------------------------------------------------
--- a/tests/test_aristo/test_helpers.nim
+++ b/tests/test_aristo/test_helpers.nim
@ -0,0 +1,73 @@
 # Nimbus - Types, data structures and shared utilities used in network sync
 #
 # Copyright (c) 2018-2021 Status Research & Development GmbH
 # Licensed under either of
 #  * Apache License, version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
 #    http://www.apache.org/licenses/LICENSE-2.0)
 #  * MIT license ([LICENSE-MIT](LICENSE-MIT) or
 #    http://opensource.org/licenses/MIT)
 # at your option. This file may not be copied, modified, or
 # distributed except according to those terms.
 import
  std/sequtils,
  eth/common,
  rocksdb,
  ../../nimbus/db/kvstore_rocksdb,
  ../../nimbus/sync/snap/constants,
  ../replay/pp
 # ------------------------------------------------------------------------------
 # Public helpers
 # ------------------------------------------------------------------------------
 proc say*(noisy = false; pfx = "***"; args: varargs[string, `$`]) =
  if noisy:
    if args.len == 0:
      echo "*** ", pfx
    elif 0 < pfx.len and pfx[^1] != ' ':
      echo pfx, " ", args.toSeq.join
    else:
      echo pfx, args.toSeq.join
 # ------------------------------------------------------------------------------
 # Public iterators
 # ------------------------------------------------------------------------------
 iterator walkAllDb*(rocky: RocksStoreRef): (int,Blob,Blob) =
  ## Walk over all key-value pairs of the database (`RocksDB` only.)
  let
    rop = rocky.store.readOptions
    rit = rocky.store.db.rocksdb_create_iterator(rop)
  defer:
    rit.rocksdb_iter_destroy()
  rit.rocksdb_iter_seek_to_first()
  var count = -1
  while rit.rocksdb_iter_valid() != 0:
    count .inc
    # Read key-value pair
    var
      kLen, vLen: csize_t
    let
      kData = rit.rocksdb_iter_key(addr kLen)
      vData = rit.rocksdb_iter_value(addr vLen)
    # Fetch data
    let
      key = if kData.isNil: EmptyBlob
            else: kData.toOpenArrayByte(0,int(kLen)-1).toSeq
      value = if vData.isNil: EmptyBlob
              else: vData.toOpenArrayByte(0,int(vLen)-1).toSeq
    yield (count, key, value)
    # Update Iterator (might overwrite kData/vdata)
    rit.rocksdb_iter_next()
    # End while
 # ------------------------------------------------------------------------------
 # End
 # ------------------------------------------------------------------------------
--- a/tests/test_aristo/test_transcode.nim
+++ b/tests/test_aristo/test_transcode.nim
@ -0,0 +1,232 @@
 # Nimbus - Types, data structures and shared utilities used in network sync
 #
 # Copyright (c) 2018-2021 Status Research & Development GmbH
 # Licensed under either of
 #  * Apache License, version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
 #    http://www.apache.org/licenses/LICENSE-2.0)
 #  * MIT license ([LICENSE-MIT](LICENSE-MIT) or
 #    http://opensource.org/licenses/MIT)
 # at your option. This file may not be copied, modified, or
 # distributed except according to those terms.
 ## Aristo (aka Patricia) DB trancoder test
 import
  eth/common,
  stew/byteutils,
  unittest2,
  ../../nimbus/db/kvstore_rocksdb,
  ../../nimbus/db/aristo/[
    aristo_desc, aristo_cache, aristo_debug, aristo_error, aristo_transcode],
  ../../nimbus/sync/snap/range_desc,
  ./test_helpers
 type
  TesterDesc = object
    prng: uint32                       ## random state
 # ------------------------------------------------------------------------------
 # Private helpers
 # ------------------------------------------------------------------------------
 proc posixPrngRand(state: var uint32): byte =
  ## POSIX.1-2001 example of a rand() implementation, see manual page rand(3).
  state = state * 1103515245 + 12345;
  let val = (state shr 16) and 32767    # mod 2^31
  (val shr 8).byte                      # Extract second byte
 proc rand[W: SomeInteger|VertexID](ap: var TesterDesc; T: type W): T =
  var a: array[sizeof T,byte]
  for n in 0 ..< sizeof T:
    a[n] = ap.prng.posixPrngRand().byte
  when sizeof(T) == 1:
    let w = uint8.fromBytesBE(a).T
  when sizeof(T) == 2:
    let w = uint16.fromBytesBE(a).T
  when sizeof(T) == 4:
    let w = uint32.fromBytesBE(a).T
  else:
    let w = uint64.fromBytesBE(a).T
  when T is SomeUnsignedInt:
    # That way, `fromBytesBE()` can be applied to `uint`
    result = w
  else:
    # That way the result is independent of endianness
    (addr result).copyMem(unsafeAddr w, sizeof w)
 proc vidRand(td: var TesterDesc; bits = 19): VertexID =
  if bits < 64:
    let
      mask = (1u64 shl max(1,bits)) - 1
      rval = td.rand uint64
    (rval and mask).VertexID
  else:
    td.rand VertexID
 proc init(T: type TesterDesc; seed: int): TesterDesc =
  result.prng = (seed and 0x7fffffff).uint32
 # -----
 proc getOrEmpty(rc: Result[Blob,AristoError]; noisy = true): Blob =
  if rc.isOk:
    return rc.value
  noisy.say "***", "error=", rc.error
 proc `+`(a: VertexID, b: int): VertexID =
  (a.uint64 + b.uint64).VertexID
 # ------------------------------------------------------------------------------
 # Public test function
 # ------------------------------------------------------------------------------
 proc test_transcodeAccounts*(
    noisy = true;
    rocky: RocksStoreRef;
    stopAfter = high(int);
      ) =
  ## Transcoder tests on accounts database
  var
    adb = AristoDbRef()
    count = -1
  for (n, key,value) in rocky.walkAllDb():
    if stopAfter < n:
      break
    count = n
    # RLP <-> NIM object mapping
    let node0 = value.decode(NodeRef)
    block:
      let blob0 = rlp.encode node0
      if value != blob0:
        check value.len == blob0.len
        check value == blob0
        noisy.say "***", "count=", count, " value=", value.rlpFromBytes.inspect
        noisy.say "***", "count=", count, " blob0=", blob0.rlpFromBytes.inspect
    # Provide DbRecord with dummy links and expanded payload. Registering the
    # node as vertex and re-converting it does the job
    var node = node0.updated(adb)
    if node.isError:
      check node.error == AristoError(0)
    else:
      case node.vType:
      of aristo_desc.Leaf:
        let account = node.lData.blob.decode(Account)
        node.lData = PayloadRef(pType: AccountData, account: account)
        discard adb.keyToVtxID node.lData.account.storageRoot.to(NodeKey)
        discard adb.keyToVtxID node.lData.account.codeHash.to(NodeKey)
      of aristo_desc.Extension:
        # key <-> vtx correspondence
        check node.key[0] == node0.key[0]
        check not node.eVtx.isZero
      of aristo_desc.Branch:
        for n in 0..15:
          # key[n] <-> vtx[n] correspondence
          check node.key[n] == node0.key[n]
          check node.key[n].isZero == node.bVtx[n].isZero
    # This NIM object must match to the same RLP encoded byte stream
    block:
      var blob1 = rlp.encode node
      if value != blob1:
        check value.len == blob1.len
        check value == blob1
        noisy.say "***", "count=", count, " value=", value.rlpFromBytes.inspect
        noisy.say "***", "count=", count, " blob1=", blob1.rlpFromBytes.inspect
    # NIM object <-> DbRecord mapping
    let dbr = node.blobify.getOrEmpty(noisy)
    var node1 = dbr.deblobify.asNode(adb)
    if node1.isError:
      check node1.error == AristoError(0)
    block:
      # `deblobify()` will always decode to `BlobData` type payload
      if node1.vType == aristo_desc.Leaf:
        let account = node1.lData.blob.decode(Account)
        node1.lData = PayloadRef(pType: AccountData, account: account)
      if node != node1:
        check node == node1
        noisy.say "***", "count=", count, " node=", node.pp(adb)
        noisy.say "***", "count=", count, " node1=", node1.pp(adb)
    # Serialise back with expanded `AccountData` type payload (if any)
    let dbr1 = node1.blobify.getOrEmpty(noisy)
    block:
      if dbr != dbr1:
        check dbr == dbr1
        noisy.say "***", "count=", count, " dbr=", dbr.toHex
        noisy.say "***", "count=", count, " dbr1=", dbr1.toHex
    # Serialise back as is
    let dbr2 = dbr.deblobify.asNode(adb).blobify.getOrEmpty(noisy)
    block:
      if dbr != dbr2:
        check dbr == dbr2
        noisy.say "***", "count=", count, " dbr=", dbr.toHex
        noisy.say "***", "count=", count, " dbr2=", dbr2.toHex
  noisy.say "***", "records visited: ", count + 1
 proc test_transcodeVidRecycleLists*(noisy = true; seed = 42) =
  ## Transcode VID lists held in `AristoDb` descriptor
  var td = TesterDesc.init seed
  let db = AristoDbRef()
  # Add some randum numbers
  block:
    let first = td.vidRand()
    db.dispose first
    var
      expectedVids = 1
      count = 1
    # Feed some numbers used and some discaded
    while expectedVids < 5 or count < 5 + expectedVids:
      count.inc
      let vid = td.vidRand()
      expectedVids += (vid < first).ord
      db.dispose vid
    check db.vidGen.len == expectedVids
    noisy.say "***", "vids=", db.vidGen.len, " discarded=", count-expectedVids
  # Serialise/deserialise
  block:
    let dbBlob = db.blobify
    # Deserialise
    let db1 = block:
      let rc = dbBlob.deblobify AristoDbRef
      if rc.isErr:
        check rc.isOk
      rc.get(otherwise = AristoDbRef())
    check db.vidGen == db1.vidGen
  # Make sure that recycled numbers are fetched first
  let topVid = db.vidGen[^1]
  while 1 < db.vidGen.len:
    let w = VertexID.new(db)
    check w < topVid
  check db.vidGen.len == 1 and db.vidGen[0] == topVid
  # Get some consecutive vertex IDs
  for n in 0 .. 5:
    let w = VertexID.new(db)
    check w == topVid + n
    check db.vidGen.len == 1
  # Repeat last test after clearing the cache
  db.vidGen.setLen(0)
  for n in 0 .. 5:
    let w = VertexID.new(db)
    check w == 1.VertexID + n
    check db.vidGen.len == 1
 # ------------------------------------------------------------------------------
 # End
 # ------------------------------------------------------------------------------