Commit Graph

27 Commits

Author SHA1 Message Date
Jordan Hrycaj ffa8ad2246
Core db use differential tx layers for aristo and kvt (#1949)
* Fix kvt headers

* Provide differential layers for KVT transaction stack

why:
  Significant performance improvement

* Provide abstraction layer for database top cache layer

why:
  This will eventually implemented as a differential database layers
  or transaction layers. The latter is needed to improve performance.

behavioural changes:
  Zero vertex and keys (i.e. delete requests) are not optimised out
  until the last layer is written to the database.

* Provide differential layers for Aristo transaction stack

why:
  Significant performance improvement
2023-12-19 12:39:23 +00:00
Jordan Hrycaj 657379f484
Aristo db update merkle hasher (#1925)
* Register paths for added leafs because of trie re-balancing

why:
  While the payload would not change, the prefix in the leaf vertex
  would. So it needs to be flagged for hash recompilation for the
  `hashify()` module.

also:
  Make sure that `Hike` paths which might have vertex links into the
  backend filter are replaced by vertex copies before manipulating.
  Otherwise the vertices on the immutable filter might be involuntarily
  changed.

* Also check for paths where the leaf vertex is on the backend, already

why:
  A a path can have dome vertices on the top layer cache with the
  `Leaf` vertex on  the backend.

* Re-define a void `HashLabel` type.

why:
  A `HashLabel` type is a pair `(root-vertex-ID, Keccak-hash)`. Previously,
  a valid `HashLabel` consisted of a non-empty hash and a non-zero vertex
  ID. This definition leads to a non-unique representation of a void
  `HashLabel` with either root-ID or has void. This has been changed to
  the unique void `HashLabel` exactly if the hash entry is void.

* Update consistency checkers

* Re-org `hashify()` procedure

why:
  Syncing against block chain showed serious deficiencies which produced
  wrong hashes or simply bailed out with error.

  So all fringe cases (mainly due to deleted entries) could be integrated
  into the labelling schedule rather than handling separate fringe cases.
2023-12-04 20:39:26 +00:00
Jordan Hrycaj c47f021596
Core db and aristo updates for destructor and tx logic (#1894)
* Disable `TransactionID` related functions from `state_db.nim`

why:
  Functions `getCommittedStorage()` and `updateOriginalRoot()` from
  the `state_db` module are nowhere used. The emulation of a legacy
  `TransactionID` type functionality is administratively expensive to
  provide by `Aristo` (the legacy DB version is only partially
  implemented, anyway).

  As there is no other place where `TransactionID`s are used, they will
  not be provided by the `Aristo` variant of the `CoreDb`. For the
  legacy DB API, nothing will change.

* Fix copyright headers in source code

* Get rid of compiler warning

* Update Aristo code, remove unused `merge()` variant, export `hashify()`

why:
  Adapt to upcoming `CoreDb` wrapper

* Remove synced tx feature from `Aristo`

why:
+ This feature allowed to synchronise transaction methods like begin,
  commit, and rollback for a group of descriptors.
+ The feature is over engineered and not needed for `CoreDb`, neither
  is it complete (some convergence features missing.)

* Add debugging helpers to `Kvt`

also:
  Update database iterator, add count variable yield argument similar
  to `Aristo`.

* Provide optional destructors for `CoreDb` API

why;
  For the upcoming Aristo wrapper, this allows to control when certain
  smart destruction and update can take place. The auto destructor works
  fine in general when the storage/cache strategy is known and acceptable
  when creating descriptors.

* Add update option for `CoreDb` API function `hash()`

why;
  The hash function is typically used to get the state root of the MPT.
  Due to lazy hashing, this might be not available on the `Aristo` DB.
  So the `update` function asks for re-hashing the gurrent state changes
  if needed.

* Update API tracking log mode: `info` => `debug

* Use shared `Kvt` descriptor in new Ledger API

why:
  No need to create a new descriptor all the time
2023-11-16 19:35:03 +00:00
Jordan Hrycaj 6e0397e276
Aristo and ledger small updates (#1888)
* Fix debug noise in `hashify()` for perfectly normal situation

why:
  Was previously considered a fixable error

* Fix test sample file names

why:
  The larger test file `goerli68161.txt.gz` is already in the local
  archive. So there is no need to use the smaller one from the external
  repo.

* Activate `accounts_cache` module from `db/ledger`

why:
  A copy of the original `accounts_cache.nim` source to be integrated
  into the `Ledger` module wrapper which allows to switch between
  different `accounts_cache` implementations unser tha same API.

details:
  At a later state, the `db/accounts_cache.nim` wrapper will be
  removed so that there is only one access to that module via
  `db/ledger/accounts_cache.nim`.

* Fix copyright headers in source code
2023-11-08 16:52:25 +00:00
Jordan Hrycaj 4feaa2cfab
Aristo db update for short nodes key edge cases (#1887)
* Aristo: Provide key-value list signature calculator

detail:
  Simple wrappers around `Aristo` core functionality

* Update new API for `CoreDb`

details:
+ Renamed new API functions `contains()` => `hasKey()` or `hasPath()`
  which disables the `in` operator on non-boolean 	`contains()` functions
+ The functions `get()` and `fetch()` always return a not-found error if
  there is no item, available. The new functions `getOrEmpty()` and
  `mergeOrEmpty()` return an an empty `Blob` if there is no such key
  found.

* Rewrite `core_apps.nim` using new API from `CoreDb`

* Use `Aristo` functionality for calculating Merkle signatures

details:
  For debugging, the `VerifyAristoForMerkleRootCalc` can be set so
  that `Aristo` results will be verified against the legacy versions.

* Provide general interface for Merkle signing key-value tables

details:
  Export `Aristo` wrappers

* Activate `CoreDb` tests

why:
  Now, API seems to be stable enough for general tests.

* Update `toHex()` usage

why:
  Byteutils' `toHex()` is superior to `toSeq.mapIt(it.toHex(2)).join`

* Split `aristo_transcode` => `aristo_serialise` + `aristo_blobify`

why:
+ Different modules for different purposes
+ `aristo_serialise`: RLP encoding/decoding
+ `aristo_blobify`: Aristo database encoding/decoding

* Compacted representation of small nodes' links instead of Keccak hashes

why:
  Ethereum MPTs use Keccak hashes as node links if the size of an RLP
  encoded node is at least 32 bytes. Otherwise, the RLP encoded node
  value is used as a pseudo node link (rather than a hash.) Such a node
  is nor stored on key-value database. Rather the RLP encoded node value
  is stored instead of a lode link in a parent node instead. Only for
  the root hash, the top level node is always referred to by the hash.

  This feature needed an abstraction of the `HashKey` object which is now
  either a hash or a blob of length at most 31 bytes. This leaves two
  ways of representing an empty/void `HashKey` type, either as an empty
  blob of zero length, or the hash of an empty blob.

* Update `CoreDb` interface (mainly reducing logger noise)

* Fix copyright years (to make `Lint` happy)
2023-11-08 12:18:32 +00:00
Jordan Hrycaj 6d132811ba
Core db update providing additional results code interface (#1776)
* Split `core_db/base.nim` into several sources

* Rename `core_db/legacy.nim` => `core_db/legacy_db.nim`

* Update `CoreDb` API, dual methods returning `Result[]` or plain value

detail:
  Plain value methods implemet the legacy API, they defect on error results

* Redesign `CoreDB` direct backend access

why:
  Made the `backend` directive integral part of the API

* Discontinue providing unused or otherwise available functions

details:
+ setTransactionID() removed, not used and not easily replicable in Aristo
+ maybeGet() removed, available via direct backend access
+ newPhk() removed, never used & was experimental anyway

* Update/reorg backend API

why:
+ Added error print function `$$()`
+ General descriptor completion (and optional validation) via `bless()`

* Update `Aristo`/`Kvt` exception handling

why:
  Avoid `CatchableError` exceptions, rather pass them as error code where
  appropriate.

* More `CoreDB` compliant `Aristo` and `Kvt` methods

details:
+ Providing functions like `contains()`, `getVtxRc()` (returns `Result[]`).
+ Additional error code: `NotImplemented`

* Rewrite/reorg of Aristo DB constructor

why:
  Previously used global object `DefaultQidLayoutRef` as default
  initialiser. This object was created at compile time which lead to
  non-gc safe functions.

* Update nimbus/db/core_db/legacy_db.nim

Co-authored-by: Kim De Mey <kim.demey@gmail.com>

* Update nimbus/db/aristo/aristo_transcode.nim

Co-authored-by: Kim De Mey <kim.demey@gmail.com>

* Update nimbus/db/core_db/legacy_db.nim

Co-authored-by: Kim De Mey <kim.demey@gmail.com>

---------

Co-authored-by: Kim De Mey <kim.demey@gmail.com>
2023-09-26 10:21:13 +01:00
Jordan Hrycaj cd1d370543
Aristo db api extensions for use as core db backend (#1754)
* Update docu

* Update Aristo/Kvt constructor prototype

why:
  Previous version used an `enum` value to indicate what backend is to
  be used. This was replaced by using the backend object type.

* Rewrite `hikeUp()` return code into `Result[Hike,(Hike,AristoError)]`

why:
  Better code maintenance. Previously, the `Hike` object was returned. It
  had an internal error field so partial success was also available on
  a failure. This error field has been removed.

* Use `openArray[byte]` rather than `Blob` in functions prototypes

* Provide synchronised multi instance transactions

why:
  The `CoreDB` object was geared towards the legacy DB which used a single
  transaction for the key-value backend DB. Different state roots are
  provided by the backend database, so all instances work directly on the
  same backend.

  Aristo db instances have different in-memory mappings (aka different
  state roots) and the transactions are on top of there mappings. So each
  instance might run different transactions.

  Multi instance transactions are a compromise to converge towards the
  legacy behaviour. The synchronised transactions span over all instances
  available at the time when base transaction was opened. Instances
  created later are unaffected.

* Provide key-value pair database iterator

why:
  Needed in `CoreDB` for `replicate()` emulation

also:
  Some update of internal code

* Extend API (i.e. prototype variants)

why:
  Needed for `CoreDB` geared towards the legacy backend which has a more
  basic API than Aristo.
2023-09-15 16:23:53 +01:00
Jordan Hrycaj 8e46953390
Aristo db state root repos and reorg (#1744)
* Reorg of distributed backend access

details:
  Now handled via API provided in `aristo_desc`.

* Rename `checkCache()` => `checkTop()`

why:
  Better naming for top layer cache checker

also:
  Provide cascaded fifos checker

* Provide `eq` directive for finding filter by exact filter ID (think block number)

* Some code beautification (for better code reading)

* State root reposition and reorg

details:
  Repositioning is supported by forking a new descriptor. Reorg is then
  accomplished by writing this forked state on the backend database.
2023-09-11 21:38:49 +01:00
Jordan Hrycaj f177f5bf11
Aristo db extend filter storage scheduler api (#1725)
* Add backwards index `[]` operator into fifo

also:
  Need another maintenance instruction: The last overflow queue must
  irrevocably delete some item in order to make space for a new one.

* Add re-org scheduler

details:
  Generates instructions how to extract and merge some leading entries

* Add filter ID selector

details:
  This allows to find the next filter now newer that a given filter ID

* Message update
2023-08-30 18:08:39 +01:00
Jordan Hrycaj 465d694834
Aristo db implement filter storage scheduler (#1713)
* Rename FilterID => QueueID

why:
  The current usage does not identify a particular filter but uses it as
  storage tag to manage it on the database (to be organised in a set of
  FIFOs or queues.)

* Split `aristo_filter` source into sub-files

why:
  Make space for filter management API

* Store filter queue IDs in pairs on the backend

why:
  Any pair will will describe a FIFO accessed by bottom/top IDs

* Reorg some source file names

why:
  The "aristo_" prefix for make local/private files is tedious to
  use, so removed.

* Implement filter slot scheduler

details:
  Filters will be stored on the database on cascaded FIFOs. When a FIFO
  queue is full, some filter items are bundled together and stored on the
  next FIFO.
2023-08-25 23:53:59 +01:00
Jordan Hrycaj b9a4fd3137
Aristo db update serialisation (#1700)
* Remove unused unit test sources

* Redefine and document serialised data records for Aristo backend

why:
  Unique record types determined by marker byte, i.e. the last byte of a
  serialisation record. This just needed some tweaking after adding new
  record types.
2023-08-21 19:18:06 +01:00
Jordan Hrycaj 4c9141ffac
Aristo db implement filter serialisation for storage (#1695)
* Remove concept of empty/blind filters

why:
  Not needed. A non-existent filter is is coded as a nil reference.

* Slightly generalised backend iterators

why:
 * VertexID as key for the ID generator state makes no sense
 * there will be more tables addressed by non-VertexID keys

* Store serialised/blobified vertices on memory backend

why:
  This is more in line with the RocksDB backend so more appropriate
  for testing when comparing behaviour. For a speedy memory database,
  a backend-less variant should be used.

* Drop the `Aristo` prefix from names `AristoLayerRef`, etc.

* Suppress compiler warning

why:
  duplicate imports

* Add filter serialisation transcoder

why:
  Will be used as storage format
2023-08-18 20:46:55 +01:00
Jordan Hrycaj 3078c207ca
Aristo db implement distributed backend access (#1688)
* Fix hashing algorithm

why:
  Particular case where a sub-tree is on the backend, linked by an
  Extension vertex to the top level.

* Update backend verification to report `dirty` top layer

* Implement distributed merge of backend filters

* Implement distributed backend access management

details:
  Implemented and tested as described in chapter 5 of the `README.md`
  file.
2023-08-17 14:42:01 +01:00
Jordan Hrycaj 01fe172738
Aristo db integrate hashify into tx (#1679)
* Renamed type `NoneBackendRef` => `VoidBackendRef`

* Clarify names: `BE=filter+backend` and `UBE=backend (unfiltered)`

why:
  Most functions used full names as `getVtxUnfilteredBackend()` or
  `getKeyBackend()`. After defining abbreviations (and its meaning) it
   seems easier to use `getVtxUBE()` and `getKeyBE()`.

* Integrate `hashify()` process into transaction logic

why:
  Is now transparent unless explicitly controlled.

details:
  Cache changes imply setting a `dirty` flag which in turn triggers
  `hashify()` processing in transaction and `pack()` directives.

* Removed `aristo_tx.exec()` directive

why:
  Inconsistent implementation, functionality will be provided with a
  different paradigm.
2023-08-11 18:23:57 +01:00
Jordan Hrycaj 09fabd04eb
Aristo db use filter betw backend and tx cache (#1678)
* Provide deep copy for each transaction layer

why:
  Localising changes. Selective deep copy was just overlooked.

* Generalise vertex ID generator state reorg function `vidReorg()`

why:
  makes it somewhat easier to handle when saving layers.

* Provide dummy back end descriptor `NoneBackendRef`

* Optional read-only filter between backend and transaction cache

why:
  Some staging area for accumulating changes to the backend DB. This
  will eventually be an access layer for emulating a backend with
  multiple/historic state roots.

* Re-factor `persistent()` with filter between backend/tx-cache => `stow()`

why:
  The filter provides an abstraction from the physically stored data on
  disk. So, there can be several MPT instances using the same disk data
  with different state roots. Of course, all the MPT instances should
  not differ too much for practical reasons :).

TODO:
  Filter administration tools need to be provided.
2023-08-10 21:01:28 +01:00
Jordan Hrycaj 71c91e2280
Aristo db refactor tx paradim (#1674)
* Better error handling

why:
  Bail out on some error as early as possible before any changes.

* Implement `fetch()` as opposite of `merge()`

rationale:
  In the `Aristo` realm, the action named `fetch()` and `merge()` indicate
  leaf value related actions on the MPT, while actions `get()` and `put()`
   handle vertex or hash key related operations that constitute the MPT.

* Re-factor `merge()` prototypes

why:
  The most used variant of `merge()` should have the simplest prototype.

* Persistent DB constructor needs to import `aristo/aristo_init/persistent`

why:
  Most applications use memory DB anyway. This avoids linking `-lrocksdb`
  or any other back end libraries by default.

* Re-factor transaction module

why:
  Got the paradigm wrong. The transaction descriptor did replace the
  database one but should be handled separately.
2023-08-07 18:45:23 +01:00
Jordan Hrycaj ccf639fc3c
Aristo db transaction based interface (#1628)
* Provide transaction based interface for standard operations

* Provide unit tests for new Aristo interface using transactions

details:
  These new tests combine and replace several single-purpose tests.
  The now unused test sources will be kept for a while to be eventually
  removed.
2023-07-05 14:50:11 +01:00
Jordan Hrycaj ff6673beac
Aristo db tidy up a bit (#1625)
* Slightly tighten some self-check conditions

* Redefined the database descriptor object as reference (to the object)

why:
  The upcoming transaction wrapper will work with a database reference
  rather than the object itself

* Append state before `save()` to the Aristo descriptor

why:
  This stae was previously returned by the function. Appending it to
  a field of the Aristo descriptor seems easier to handle.
2023-07-04 19:24:03 +01:00
Jordan Hrycaj 83dbe87159
Aristo db update foreground caching (#1605)
* Fix vertex ID generator state handling for rocksdb backend

why:
 * Key error in walk iterator
 * Needs to be loaded when opening the database

* Use non-zero sub-table prefixes for rocksdb

why:
  Handy for debugging

* Fix error code for missing key on rocksdb backend

why:
  Previously returned `VOID_HASH_KEY` rather than `GetKeyNotFound`

* Explicitly copy vertex data between internal table and function/result argument

why:
  Function argument or return reference may still refer to the same data
  object.

* Updated error symbols

why:
  Error symbol names for the hike module now start with the prefix `Hike`.

* Write back modified branch node into local top layer cache

why:
  With the backend available, the source of the branch node references
  might not be the top layer cache. So any change must be explicitely
  recorded.
2023-06-22 12:13:24 +01:00
Jordan Hrycaj 4b66f93274
Aristo db with storage backends (#1603)
* Generalised Aristo DB constructor for any type of backend

details:
  * Records to be deleted are represented as key-void (rather than
    key-value) pairs by the put-function arguments
  * Allow direct driver access, iterators as example implementation and
    for testing.

* Provide backend storage interface

details:
  Stores the top layer onto backend tables

* Implemented Rocks DB backend

details:
  Transaction based `put()` functionality
  Iterators (based on direct RocksDB access)
2023-06-20 14:26:25 +01:00
Jordan Hrycaj d7f40516a7
Detach from snap/sync declarations & definitions (#1601)
why:
  Tests and some basic components were originally borrowed from the
  snap/sync implementation. These have fully been re-implemented.
2023-06-12 19:16:03 +01:00
Jordan Hrycaj 0308dfac4f
Aristo db address sup trie items properly (#1600)
* Fix include

why:
  Eth67 not default yet so that got missed

* Rename `LeafKey` => `LeafTie`

why:
  Name is a pen picture of what this object is for. Also, it avoids the
  ubiquitous term `key`.

* Provided `getOrVoid()` wrapper for `getOrDefault()`

also:
  Provide `isValid()` syntactic sugar for `.isNil.not`, `!= 0` etc.
  Reorg descriptor source, split into sub-sources

* Bundled `NodeKey` objects with root ID and called it `HashLabel`

why:
  `NodeKey` (aka repurposed Hash265) objects are unique only within a
  particular sub-trie (e.g. storage slots) which are kept separated
  (i.e non-interleaved) by design. This is not applied to the backend
  as the map VertexID->NodeKey labelling the nodes needs not be injective.

  For the in-memory database (transaction) layers, the injective map
  VertexID->(VertexID,NodeKey) is used where the first field of the image
  tuple is the root ID of the sub-trie the `NodeKey` object is valid. So
  identical storage tries for different accounts can be represented.
2023-06-12 14:48:47 +01:00
Jordan Hrycaj 932a2140f2
Aristo db supporting forest and layered tx architecture (#1598)
* Exclude some storage tests

why:
  These test running on external dumps slipped through. The particular
  dumps were reported earlier as somehow dodgy.

  This was changed in `#1457` but having a second look, the change on
  hexary_interpolate.nim(350) might be incorrect.

* Redesign `Aristo DB` descriptor for transaction based layers

why:
  Previous descriptor layout made it cumbersome to push/pop
  database delta layers.

  The new architecture keeps each layer with the full delta set
  relative to the database backend.

* Keep root ID as part of the `Patricia Trie` leaf path

why;
  That way, forests are supported
2023-06-09 12:17:37 +01:00
Jordan Hrycaj 2fc349feb9
Aristo db merkle hashify functionality added (#1593)
* Keep vertex ID generator state with each db-layer

why:
  The vertex ID generator state is part of the difference to the below
  layer

* Move otherwise unused source to test directory

* Add Merkle hash generator

also:
  * Verification facility for debugging
  * Empty Merkle key hashes encoded as `EMPTY_ROOT_HASH`
2023-05-30 22:21:15 +01:00
Jordan Hrycaj cd78458123
Add items to `Aristo Trie` database (#1586)
details:
1. Merging a leaf vertex merges a `Patricia Trie` path (while
   adding/modiying vertices) and adds a leaf node with payload
2. Merging a Merkel node merges a single vertex to the `Patricia Trie`
   and registers merkel hashes
3. Action 2 can be used before action 1 in order to construct a
   Merkel proof as required for handling `snap/1` data.
4. Unit tests show that action 3 is benign for now :)
2023-05-30 12:47:47 +01:00
Jordan Hrycaj ff0fc98fdf
Multi layer architecture 4 aristo db (#1581)
* Cosmetics, renamed fields (eVtx, bVtx) -> (eVid, bVid)

* Multilayered delta architecture for Aristo DB

details:
  Any VertexID or data retrieval needs to go down the rabbit hole and
  fetch/get/manipulate the bottom layer -- even without explicit
  backend.

* Direct reference to backend from top-level layer

why:
  Some services as the vid management needs to be synchronised among all
  layers. So access is optimised.
2023-05-14 18:43:01 +01:00
Jordan Hrycaj 605739ef4c
Experimental MP-trie (#1573)
* Experimental MP-trie

why:
  Deleting records is a infeasible with the current structure

* Added vertex ID recycling management

Todo:
  Provide some unit tests

* DB layout update

why:
  Main news is the separation of `Merkel` hashes into an extra table.

details:
  The code fragments cover conversion between compact MPT records and
  Aristo DB records as well as some rudimentary cache handling for
  the `Merkel` hashes (i.e. the extra table entries.)

todo:
  Add some simple unit test for the descriptor record (currently used
  for vertex ID management, only.)

* Updated vertex ID recycling management

details:
  added simple unit tests (mainly testing ABI)

* docu update
2023-05-11 15:25:29 +01:00