Commit Graph

52 Commits

Author SHA1 Message Date
Jacek Sieka 2fe8cc4551
leaf cache fixes (#2637)
* Add missing leaf cache update when a leaf turns to a branch with two
leaves (on merge) and vice versa (on delete) - this could lead to stale
leaves being returned from the cache causing validation failures - it
didn't happen because the leaf caches were not being used efficiently :)
* Replace `seq` with `ArrayBuf` in `Hike` allowing it to become
allocation-free - this PR also works around an inefficiency in nim in
returning large types via a `var` parameter
* Use the leaf cache instead of `getVtxRc` to fetch recent leaves - this
makes the vertex cache more efficient at caching branches because fewer
leaf requests pass through it.
2024-09-19 10:39:06 +02:00
Jacek Sieka 5c1e2e7d3b
Migrate `keyed_queue` to `minilru` (#2608)
Compared to `keyed_queue`, `minilru` uses significantly less memory, in
particular for the 32-byte hash keys where `kq` stores several copies of
the key redundantly.
2024-09-13 15:47:50 +02:00
Jordan Hrycaj ce713d95fc
Aristo lazily delete larger subtrees (#2560)
* Extract sub-tree deletion functions into separate sub-modules

* Move/rename `aristo_desc.accLruSize` => `aristo_constants.ACC_LRU_SIZE`

* Lazily delete sub-trees

why:
  This gives some control of the memory used to keep the deleted vertices
  in the cached layers. For larger sub-trees, keys and vertices might be
  on the persistent backend to a large extend. This would pull an amount
  of extra information from the backend into the cached layer.

  For lazy deleting it is enough to remember sub-trees by a small set of
  (at most 16) sub-roots to be processed when storing persistent data.
  Marking the tree root deleted immediately allows to let most of the code
  base work as before.

* Comments and cosmetics

* No need to import all for `Aristo` here

* Kludge to make `chronicle` usage in sub-modules work with `fluffy`

why:
  That `fluffy` would not run with any logging in `core_deb` is a problem
  I have known for a while. Up to now, logging was only used for debugging.

  With the current `Aristo` PR, there are cases where logging might be
  wanted but this works only if `chronicles` runs without the
  `json[dynamic]` sinks.

  So this should be re-visited.

* More of a kludge
2024-08-14 08:54:44 +00:00
Jordan Hrycaj 72c3ab8ced
Provide partial tree support for preloading tests (#2536)
* Implement partial trees

why:
  This is currently needed for unit tests to pre-load the database
  with test data similar to `proof` node pre-load.

  The basic features for `snap-sync` boundary proofs are available
  as well for future use. What is missing is the final proof verification
  and a complete storage data load/merge function (stub is available.)

* Cosmetics, clean up
2024-07-29 20:15:17 +00:00
Jordan Hrycaj 5ac362fe6f
Aristo and kvt balancer management update (#2504)
* Aristo: Merge `delta_siblings` module into `deltaPersistent()`

* Aristo: Add `isEmpty()` for canonical checking whether a layer is empty

* Aristo: Merge `LayerDeltaRef` into `LayerObj`

why:
  No need to maintain nested object refs anymore. Previously the
 `LayerDeltaRef` object had a companion `LayerFinalRef` which held
  non-delta layer information.

* Kvt: Merge `LayerDeltaRef` into `LayerRef`

why:
  No need to maintain nested object refs (as with `Aristo`)

* Kvt: Re-write balancer logic similar to `Aristo`

why:
  Although `Kvt` was a cheap copy of `Aristo` it sort of got out of
  sync and the balancer code was wrong.

* Update iterator over forked peers

why:
  Yield additional field `isLast` indicating that the last iteration
  cycle was approached.

* Optimise balancer calculation.

why:
  One can often avoid providing a new object containing the merge of two
  layers for the balancer. This avoids copying tables. In some cases this
  is replaced by `hasKey()` look ups though. One uses one of the two
  to combine and merges the other into the first.

  Of course, this needs some checks for making sure that none of the
  components to merge is eventually shared with something else.

* Fix copyright year
2024-07-18 21:32:32 +00:00
Jacek Sieka df4a21c910
Store cached hash at the layer corresponding to the source data (#2492)
When lazily verifying state roots, we may end up with an entire state
without roots that gets computed for the whole database - in the current
design, that would result in hashes for the entire trie being held in
memory.

Since the hash depends only on the data in the vertex, we can store it
directly at the top-most level derived from the verticies it depends on
- be that memory or database - this makes the memory usage broadly
linear with respect to the already-existing in-memory change set stored
in the layers.

It also ensures that if we have multiple forks in memory, hashes get
cached in the correct layer maximising reuse between forks.

The same layer numbering scheme as elsewhere is reused, where -2 is the
backend, -1 is the balancer, then 0+ is the top of the stack and stack.

A downside of this approach is that we create many small batches - a
future improvement could be to collect all such writes in a single
batch, though the memory profile of this approach should be examined
first (where is the batch kept, exactly?).
2024-07-18 09:13:56 +02:00
Jordan Hrycaj a84a2131cd
No ext update (#2494)
* Imported/rebase from `no-ext`, PR #2485

  Store extension nodes together with the branch

  Extension nodes must be followed by a branch - as such, it makes sense
  to store the two together both in the database and in memory:

  * fewer reads, writes and updates to traverse the tree
  * simpler logic for maintaining the node structure
  * less space used, both memory and storage, because there are fewer
    nodes overall

  There is also a downside: hashes can no longer be cached for an
  extension - instead, only the extension+branch hash can be cached - this
  seems like a fine tradeoff since computing it should be fast.

  TODO: fix commented code

* Fix merge functions and `toNode()`

* Update `merkleSignCommit()` prototype

why:
  Result is always a 32bit hash

* Update short Merkle hash key generation

details:
  Ethereum reference MPTs use Keccak hashes as node links if the size of
  an RLP encoded node is at least 32 bytes. Otherwise, the RLP encoded
  node value is used as a pseudo node link (rather than a hash.) This is
  specified in the yellow paper, appendix D.

  Different to the `Aristo` implementation, the reference MPT would not
  store such a node on the key-value database. Rather the RLP encoded node value is stored instead of a node link in a parent node
  is stored as a node link on the parent database.

  Only for the root hash, the top level node is always referred to by the
  hash.

* Fix/update `Extension` sections

why:
  Were commented out after removal of a dedicated `Extension` type which
  left the system disfunctional.

* Clean up unused error codes

* Update unit tests

* Update docu

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2024-07-16 19:47:59 +00:00
Jacek Sieka 9d91191154
storage hike cache (#2484)
This PR adds a storage hike cache similar to the account hike cache
already present - this cache is less efficient because account storage
is already partically cached in the account ledger but nonetheless helps
keep hiking down.

Notably, there's an opportunity to optimise this cache and the others so
that they cooperate better insteado of overlapping, which is left for a
future PR.

This PR also fixes an O(N) memory usage for storage slots where the
delete would keep the full storage in a work list which on mainnet can
grow very large - the work list is replaced with a more conventional
recursive `O(log N)` approach.
2024-07-14 19:12:10 +02:00
Jacek Sieka f3a56002ca
Turn payload into value type (#2483)
The Vertex type unifies branches, extensions and leaves into a single
memory area where the larges member is the branch (128 bytes + overhead) -
the payloads we have are all smaller than 128 thus wrapping them in an
extra layer of `ref` is wasteful from a memory usage perspective.

Further, the ref:s must be visited during the M&S phase of garbage
collection - since we keep millions of these, many of them
short-lived, this takes up significant CPU time.

```
Function	CPU Time: Total	CPU Time: Self	Module	Function (Full)	Source File	Start Address
system::markStackAndRegisters	10.0%	4.922s	nimbus	system::markStackAndRegisters(var<system::GcHeap>).constprop.0	gc.nim	0x701230`
```
2024-07-14 12:02:05 +02:00
Jacek Sieka 01ab209497
cache account payload (#2478)
Instead of caching just the storage id, we can cache the full payload
which further reduces expensive hikes
2024-07-12 15:08:26 +02:00
Jacek Sieka a6764670f0
merge: avoid hike allocations (#2472)
hike allocations (and the garbage collection maintenance that follows)
are responsible for some 10% of cpu time (not wall time!) at this point
- this PR avoids them by stepping through the layers one step at a time,
simplifying the code at the same time.
2024-07-11 13:26:46 +02:00
Jacek Sieka 81e75622cf
storage: store root id together with vid, for better locality of refe… (#2449)
The state and account MPT:s currenty share key space in the database
based on that vertex id:s are assigned essentially randomly, which means
that when two adjacent slot values from the same contract are accessed,
they might reside at large distance from each other.

Here, we prefix each vertex id by its root causing them to be sorted
together thus bringing all data belonging to a particular contract
closer together - the same effect also happens for the main state MPT
whose nodes now end up clustered together more tightly.

In the future, the prefix given to the storage keys can also be used to
perform range operations such as reading all the storage at once and/or
deleting an account with a batch operation.

Notably, parts of the API already supported this rooting concept while
parts didn't - this PR makes the API consistent by always working with a
root+vid.
2024-07-04 15:46:52 +02:00
Jacek Sieka 443c6d1f8e
Cache account path storage id (#2443)
The storage id is frequently accessed when executing contract code and
finding the path via the database requires several hops making the
process slow - here, we add a cache to keep the most recently used
account storage id:s in memory.

A possible future improvement would be to cache all account accesses so
that for example updating the balance doesn't cause several hikes.
2024-07-03 17:58:25 +02:00
Jordan Hrycaj e7be0d185c
Aristo uses pre classified tree types cont2 (#2397)
* Provide dedicated functions for fetching accounts and storage trees

why:
  Different prototypes for each class `account`, `generic` and
  `storage`.

* Remove `fetchPayload()` and other cruft from API, `aristo_fetch`, etc.

* Fix typos, debugging left overs, comments
2024-06-19 12:40:00 +00:00
Jordan Hrycaj 5a5cc6295e
Triggered write event for kvt (#2351)
* bump rockdb

* Rename `KVT` objects related to filters according to `Aristo` naming

details:
  filter* => delta*
  roFilter => balancer

* Compulsory error handling if `persistent()` fails

* Add return code to `reCentre()`

why:
  Might eventually fail if re-centring is blocked. Some logic will be
  added in subsequent patch sets.

* Add column families from earlier session to rocksdb in opening procedure

why:
  All previously used CFs must be declared when re-opening an existing
  database.

* Update `init()` and add rocksdb `reinit()` methods for changing parameters

why:
  Opening a set column families (with different open options) must span
  at least the ones that are already on disk.

* Provide write-trigger-event interface into `Aristo` backend

why:
  This allows to save data from a guest application (think `KVT`) to
  get synced with the write cycle so the guest and `Aristo` save all
  atomically.

* Use `KVT` with new column family interface from `Aristo`

* Remove obsolete guest interface

* Implement `KVT` piggyback on `Aristo` backend

* CoreDb: Add separate `KVT`/`Aristo` backend mode for debugging

* Remove `rocks_db` import from `persist()` function

why:
  Some systems (i.p `fluffy` and friends) use the `Aristo` memory
  backend emulation and do not link against rocksdb when building the
  application. So this should fix that problem.
2024-06-13 18:15:11 +00:00
Jordan Hrycaj 69a158864c
Remove vid recycling feature (#2294) 2024-06-04 15:05:13 +00:00
Jordan Hrycaj f926222fec
Aristo cull journal related stuff (#2288)
* Remove all journal related stuff

* Refactor function names journal*() => delta*(), filter*() => delta*()

* remove `trg` fileld from `FilterRef`

why:
  Same as `kMap[$1]`

* Re-type FilterRef.src as `HashKey`

why:
  So it is directly comparable to `kMap[$1]`

* Moved `vGen[]` field from `LayerFinalRef` to `LayerDeltaRef`

why:
  Then a separate `FilterRef` type is not needed, anymore

* Rename `roFilter` field in `AristoDbRef` => `balancer`

why:
  New name more appropriate.

* Replace `FilterRef` by `LayerDeltaRef` type

why:
  This allows to avoid copying into the `balancer` (see next patch set)
  most of the time. Typically, only one instance is running on the backend
  and the `balancer` is only used as a stage before saving data.

* Refactor way how to store data persistently

why:
  Avoid useless copy when staging `top` layer for persistently saving to
  backend.

* Fix copyright header?
2024-06-03 20:10:35 +00:00
Jacek Sieka f38c5e631e
trivial memory-based speedups (#2205)
* trivial memory-based speedups

* HashKey becomes non-ref
* use openArray instead of seq in lots of places
* avoid sequtils.reversed when unnecessary
* add basic perf stats to test_coredb

* copyright
2024-05-23 17:37:51 +02:00
Jordan Hrycaj 961f63358e
Core db+aristo update recovery journal management (#2156)
* Aristo: Allow to define/set `FilterID` for journal filter records

why:
  After some changes, the `FilterID` is isomorphic to the `BlockNumber`
  scalar (well, the first 2^64 entries of a `BlockNumber`.)

  The needed change for `FilterID` is that the `FilterID(0)` value is
  valid part of the `FilterID` scalar. A non-valid `FilterID` entry is
  represented by `none(FilterID)`.

* Aristo: Split off function `persist()` as persistent version of `stow()`

why:
  In production, `stow(persistent=false,..)` is currently unused. So,
  using `persist()` rather than `stow(persistent=true,..)` improves
  readability and is better to maintain.

* CoreDb+Aristo: Store block numbers in journal records

why:
  This makes journal records searchable by block numbers

* Aristo: Rename some journal related functions

why:
  The name *journal* is more appropriate to api functions than something
   with *fifo* or *filter*.

* CoreDb+Aristo: Update last/oldest journal state retrieval

* CoreDb+Aristo: Register block number with state root in journal

why:
  No need anymore for extra lookup table `stRootToBlockNum` which maps
  a storage root -> block number.

* Aristo: Remove unused function `getFilUbe()` from api

* CoreDb: Remove now unused virtual table `stRootToBlockNum`

why:
  Was used to map a state root to a block number. This functionality
  is now embedded into the recovery journal backend.

* Turn of API tracking (will fail on `fluffy`)
2024-04-29 20:17:17 +00:00
Jordan Hrycaj 99238ce0e4
Core db maintenance update (#2087)
* CoreDb+Aristo: Fix handler code

* Aristo+Kvt: Remove cruft

* Aristo+Kvt: The function `forkTop()` always provides a single transaction

why:
  Previously it provided a single squashed tx only if there were any. Now
  it will provide a blind one if there were none.

* Fix Copyright header
2024-03-20 15:15:56 +00:00
Jordan Hrycaj 0d73637f14
Core db simplify new api storage modes (#2075)
* Aristo+Kvt: Fix backend `dup()` function in api setup

why:
  Backend object is subject to an inheritance cascade which was not
  taken care of, before. Only the base object was duplicated.

* Kvt: Simplify DB clone/peers management

* Aristo: Simplify DB clone/peers management

* Aristo: Adjust unit test for working with memory DB only

why:
  This currently causes some memory corruption persumably in the
  `libc` background layer.

* CoredDb+Kvt: Simplify API for KVT

why:
  Simplified storage models (was over engineered) for better performance
  and code maintenance.

* CoredDb+Aristo: Simplify API for `Aristo`

why:
  Only single database state needed here. Accessing a similar state will
  be implemented from outside this module using a context layer. This
  gives better performance and improves code maintenance.

* Fix Copyright headers

* CoreDb: Turn off API tracking

why:
  CI would ot go through. Was accidentally turned on.
2024-03-14 22:17:43 +00:00
Jordan Hrycaj 8e18e85288
Aristodb remove obsolete and time consuming admin features (#2048)
* Aristo: Reorg `hashify()` using different schedule algorithm

why:
  Directly calculating the search tree top down from the roots turns
  out to be faster than using the cached structures left over by `merge()`
  and `delete()`.
  Time gains is short of 20%

* Aristo: Remove `lTab[]` leaf entry object type

why:
  Not used anymore. It was previously needed to build the schedule for
  `hashify()`.

* Aristo: Avoid unnecessary re-org of the vertex ID recycling list

why:
  This list can become quite large so a heuristic is employed whether
  it makes sense to re-org.

  Also, re-org check is only done by `delete()` functions.

* Aristo: Remove key/reverse lookup table from tx layers

why:
  It is ignored except for handling proof nodes and costs unnecessary
  run time resources.

  This feature was originally needed to accommodate the mental transition
  from the legacy MPT to the `Aristo` trie :).

* Fix copyright year
2024-02-22 08:24:58 +00:00
Jordan Hrycaj 1b4a43c140
Aristo db remove over engineered object type (#2027)
* CoreDb: update test suite

* Aristo: Simplify reverse key map

why:
  The reverse key map `pAmk: (root,key) -> {vid,..}` as been simplified to
  `pAmk: key -> {vid,..}` as the state `root` domain argument is not used,
  anymore

* Aristo: Remove `HashLabel` object type and replace it by `HashKey`

why:
  The `HashLabel` object attaches a root hash to a hash key. This is
  nowhere used, anymore.

* Fix copyright
2024-02-14 19:11:59 +00:00
Jordan Hrycaj 3b306a9689
Aristo: Update unit test suite (#2002)
* Aristo: Update unit test suite

* Aristo/Kvt: Fix iterators

why:
  Generic iterators were not properly updated after backend change

* Aristo: Add sub-trie deletion functionality

why:
  For storage tries linked to an account payload vertex ID, a the
  whole storage trie needs to be deleted with the account.

* Aristo: Reserve vertex ID numbers for static custom state roots

why:
  Static custom state roots may be controlled by an application,
  e.g. for a receipt or a transaction root. The `Aristo` functions
  are agnostic of what the static state roots are when different
  from the internal tree vertex ID 1.

details;
  The `merge()` function applied to a non-static state root (assumed
  to be a storage root) will check the payload of an accounts leaf
  and mark its Merkle keys to be re-checked.

* Aristo: Correct error code symbol

* Aristo: Update error code symbols

* Aristo: Code cosmetics/comments

* Aristo: Fix hashify schedule calculator

why:
  Had a tendency to stop early leaving an incomplete job
2024-02-01 21:27:48 +00:00
Jordan Hrycaj a1161b537b
Core db update storage root management for sub tries (#1964)
* Aristo: Re-phrase `LayerDelta` and `LayerFinal` as object references

why:
  Avoids copying in some cases

* Fix copyright header

* Aristo: Verify `leafTie.root` function argument for `merge()` proc

why:
  Zero root will lead to inconsistent DB entry

* Aristo: Update failure condition for hash labels compiler `hashify()`

why:
  Node need not be rejected as long as links are on the schedule. In
  that case, `redo[]` is to become `wff.base[]` at a later stage.

  This amends an earlier fix, part of #1952 by also testing against
  the target nodes of the `wff.base[]` sets.

* Aristo: Add storage root glue record to `hashify()` schedule

why:
  An account leaf node might refer to a non-resolvable storage root ID.
  Storage root node chains will end up at the storage root. So the link
  `storage-root->account-leaf` needs an extra item in the schedule.

* Aristo: fix error code returned by `fetchPayload()`

details:
  Final error code is implied by the error code form the `hikeUp()`
  function.

* CoreDb: Discard `createOk` argument in API `getRoot()` function

why:
  Not needed for the legacy DB. For the `Arsto` DB, a lazy approach is
  implemented where a stprage root node is created on-the-fly.

* CoreDb: Prevent `$$` logging in some cases

why:
  Logging the function `$$` is not useful when it is used for internal
  use, i.e. retrieving an an error text for logging.

* CoreDb: Add `tryHashFn()` to API for pretty printing

why:
  Pretty printing must not change the hashification status for the
  `Aristo` DB. So there is an independent API wrapper for getting the
  node hash which never updated the hashes.

* CoreDb: Discard `update` argument in API `hash()` function

why:
  When calling the API function `hash()`, the latest state is always
  wanted. For a version that uses the current state as-is without checking,
  the function `tryHash()` was added to the backend.

* CoreDb: Update opaque vertex ID objects for the `Aristo` backend

why:
  For `Aristo`, vID objects encapsulate a numeric `VertexID`
  referencing a vertex (rather than a node hash as used on the
  legacy backend.) For storage sub-tries, there might be no initial
  vertex known when the descriptor is created. So opaque vertex ID
  objects are supported without a valid `VertexID` which will be
  initalised on-the-fly when the first item is merged.

* CoreDb: Add pretty printer for opaque vertex ID objects

* Cosmetics, printing profiling data

* CoreDb: Fix segfault in `Aristo` backend when creating MPT descriptor

why:
  Missing initialisation  error

* CoreDb: Allow MPT to inherit shared context on `Aristo` backend

why:
  Creates descriptors with different storage roots for the same
  shared `Aristo` DB descriptor.

* Cosmetics, update diagnostic message items for `Aristo` backend

* Fix Copyright year
2024-01-11 19:11:38 +00:00
Jordan Hrycaj ffa8ad2246
Core db use differential tx layers for aristo and kvt (#1949)
* Fix kvt headers

* Provide differential layers for KVT transaction stack

why:
  Significant performance improvement

* Provide abstraction layer for database top cache layer

why:
  This will eventually implemented as a differential database layers
  or transaction layers. The latter is needed to improve performance.

behavioural changes:
  Zero vertex and keys (i.e. delete requests) are not optimised out
  until the last layer is written to the database.

* Provide differential layers for Aristo transaction stack

why:
  Significant performance improvement
2023-12-19 12:39:23 +00:00
Jordan Hrycaj 657379f484
Aristo db update merkle hasher (#1925)
* Register paths for added leafs because of trie re-balancing

why:
  While the payload would not change, the prefix in the leaf vertex
  would. So it needs to be flagged for hash recompilation for the
  `hashify()` module.

also:
  Make sure that `Hike` paths which might have vertex links into the
  backend filter are replaced by vertex copies before manipulating.
  Otherwise the vertices on the immutable filter might be involuntarily
  changed.

* Also check for paths where the leaf vertex is on the backend, already

why:
  A a path can have dome vertices on the top layer cache with the
  `Leaf` vertex on  the backend.

* Re-define a void `HashLabel` type.

why:
  A `HashLabel` type is a pair `(root-vertex-ID, Keccak-hash)`. Previously,
  a valid `HashLabel` consisted of a non-empty hash and a non-zero vertex
  ID. This definition leads to a non-unique representation of a void
  `HashLabel` with either root-ID or has void. This has been changed to
  the unique void `HashLabel` exactly if the hash entry is void.

* Update consistency checkers

* Re-org `hashify()` procedure

why:
  Syncing against block chain showed serious deficiencies which produced
  wrong hashes or simply bailed out with error.

  So all fringe cases (mainly due to deleted entries) could be integrated
  into the labelling schedule rather than handling separate fringe cases.
2023-12-04 20:39:26 +00:00
Jordan Hrycaj c47f021596
Core db and aristo updates for destructor and tx logic (#1894)
* Disable `TransactionID` related functions from `state_db.nim`

why:
  Functions `getCommittedStorage()` and `updateOriginalRoot()` from
  the `state_db` module are nowhere used. The emulation of a legacy
  `TransactionID` type functionality is administratively expensive to
  provide by `Aristo` (the legacy DB version is only partially
  implemented, anyway).

  As there is no other place where `TransactionID`s are used, they will
  not be provided by the `Aristo` variant of the `CoreDb`. For the
  legacy DB API, nothing will change.

* Fix copyright headers in source code

* Get rid of compiler warning

* Update Aristo code, remove unused `merge()` variant, export `hashify()`

why:
  Adapt to upcoming `CoreDb` wrapper

* Remove synced tx feature from `Aristo`

why:
+ This feature allowed to synchronise transaction methods like begin,
  commit, and rollback for a group of descriptors.
+ The feature is over engineered and not needed for `CoreDb`, neither
  is it complete (some convergence features missing.)

* Add debugging helpers to `Kvt`

also:
  Update database iterator, add count variable yield argument similar
  to `Aristo`.

* Provide optional destructors for `CoreDb` API

why;
  For the upcoming Aristo wrapper, this allows to control when certain
  smart destruction and update can take place. The auto destructor works
  fine in general when the storage/cache strategy is known and acceptable
  when creating descriptors.

* Add update option for `CoreDb` API function `hash()`

why;
  The hash function is typically used to get the state root of the MPT.
  Due to lazy hashing, this might be not available on the `Aristo` DB.
  So the `update` function asks for re-hashing the gurrent state changes
  if needed.

* Update API tracking log mode: `info` => `debug

* Use shared `Kvt` descriptor in new Ledger API

why:
  No need to create a new descriptor all the time
2023-11-16 19:35:03 +00:00
Jordan Hrycaj 6e0397e276
Aristo and ledger small updates (#1888)
* Fix debug noise in `hashify()` for perfectly normal situation

why:
  Was previously considered a fixable error

* Fix test sample file names

why:
  The larger test file `goerli68161.txt.gz` is already in the local
  archive. So there is no need to use the smaller one from the external
  repo.

* Activate `accounts_cache` module from `db/ledger`

why:
  A copy of the original `accounts_cache.nim` source to be integrated
  into the `Ledger` module wrapper which allows to switch between
  different `accounts_cache` implementations unser tha same API.

details:
  At a later state, the `db/accounts_cache.nim` wrapper will be
  removed so that there is only one access to that module via
  `db/ledger/accounts_cache.nim`.

* Fix copyright headers in source code
2023-11-08 16:52:25 +00:00
Jordan Hrycaj 4feaa2cfab
Aristo db update for short nodes key edge cases (#1887)
* Aristo: Provide key-value list signature calculator

detail:
  Simple wrappers around `Aristo` core functionality

* Update new API for `CoreDb`

details:
+ Renamed new API functions `contains()` => `hasKey()` or `hasPath()`
  which disables the `in` operator on non-boolean 	`contains()` functions
+ The functions `get()` and `fetch()` always return a not-found error if
  there is no item, available. The new functions `getOrEmpty()` and
  `mergeOrEmpty()` return an an empty `Blob` if there is no such key
  found.

* Rewrite `core_apps.nim` using new API from `CoreDb`

* Use `Aristo` functionality for calculating Merkle signatures

details:
  For debugging, the `VerifyAristoForMerkleRootCalc` can be set so
  that `Aristo` results will be verified against the legacy versions.

* Provide general interface for Merkle signing key-value tables

details:
  Export `Aristo` wrappers

* Activate `CoreDb` tests

why:
  Now, API seems to be stable enough for general tests.

* Update `toHex()` usage

why:
  Byteutils' `toHex()` is superior to `toSeq.mapIt(it.toHex(2)).join`

* Split `aristo_transcode` => `aristo_serialise` + `aristo_blobify`

why:
+ Different modules for different purposes
+ `aristo_serialise`: RLP encoding/decoding
+ `aristo_blobify`: Aristo database encoding/decoding

* Compacted representation of small nodes' links instead of Keccak hashes

why:
  Ethereum MPTs use Keccak hashes as node links if the size of an RLP
  encoded node is at least 32 bytes. Otherwise, the RLP encoded node
  value is used as a pseudo node link (rather than a hash.) Such a node
  is nor stored on key-value database. Rather the RLP encoded node value
  is stored instead of a lode link in a parent node instead. Only for
  the root hash, the top level node is always referred to by the hash.

  This feature needed an abstraction of the `HashKey` object which is now
  either a hash or a blob of length at most 31 bytes. This leaves two
  ways of representing an empty/void `HashKey` type, either as an empty
  blob of zero length, or the hash of an empty blob.

* Update `CoreDb` interface (mainly reducing logger noise)

* Fix copyright years (to make `Lint` happy)
2023-11-08 12:18:32 +00:00
Jordan Hrycaj 6d132811ba
Core db update providing additional results code interface (#1776)
* Split `core_db/base.nim` into several sources

* Rename `core_db/legacy.nim` => `core_db/legacy_db.nim`

* Update `CoreDb` API, dual methods returning `Result[]` or plain value

detail:
  Plain value methods implemet the legacy API, they defect on error results

* Redesign `CoreDB` direct backend access

why:
  Made the `backend` directive integral part of the API

* Discontinue providing unused or otherwise available functions

details:
+ setTransactionID() removed, not used and not easily replicable in Aristo
+ maybeGet() removed, available via direct backend access
+ newPhk() removed, never used & was experimental anyway

* Update/reorg backend API

why:
+ Added error print function `$$()`
+ General descriptor completion (and optional validation) via `bless()`

* Update `Aristo`/`Kvt` exception handling

why:
  Avoid `CatchableError` exceptions, rather pass them as error code where
  appropriate.

* More `CoreDB` compliant `Aristo` and `Kvt` methods

details:
+ Providing functions like `contains()`, `getVtxRc()` (returns `Result[]`).
+ Additional error code: `NotImplemented`

* Rewrite/reorg of Aristo DB constructor

why:
  Previously used global object `DefaultQidLayoutRef` as default
  initialiser. This object was created at compile time which lead to
  non-gc safe functions.

* Update nimbus/db/core_db/legacy_db.nim

Co-authored-by: Kim De Mey <kim.demey@gmail.com>

* Update nimbus/db/aristo/aristo_transcode.nim

Co-authored-by: Kim De Mey <kim.demey@gmail.com>

* Update nimbus/db/core_db/legacy_db.nim

Co-authored-by: Kim De Mey <kim.demey@gmail.com>

---------

Co-authored-by: Kim De Mey <kim.demey@gmail.com>
2023-09-26 10:21:13 +01:00
Jordan Hrycaj cd1d370543
Aristo db api extensions for use as core db backend (#1754)
* Update docu

* Update Aristo/Kvt constructor prototype

why:
  Previous version used an `enum` value to indicate what backend is to
  be used. This was replaced by using the backend object type.

* Rewrite `hikeUp()` return code into `Result[Hike,(Hike,AristoError)]`

why:
  Better code maintenance. Previously, the `Hike` object was returned. It
  had an internal error field so partial success was also available on
  a failure. This error field has been removed.

* Use `openArray[byte]` rather than `Blob` in functions prototypes

* Provide synchronised multi instance transactions

why:
  The `CoreDB` object was geared towards the legacy DB which used a single
  transaction for the key-value backend DB. Different state roots are
  provided by the backend database, so all instances work directly on the
  same backend.

  Aristo db instances have different in-memory mappings (aka different
  state roots) and the transactions are on top of there mappings. So each
  instance might run different transactions.

  Multi instance transactions are a compromise to converge towards the
  legacy behaviour. The synchronised transactions span over all instances
  available at the time when base transaction was opened. Instances
  created later are unaffected.

* Provide key-value pair database iterator

why:
  Needed in `CoreDB` for `replicate()` emulation

also:
  Some update of internal code

* Extend API (i.e. prototype variants)

why:
  Needed for `CoreDB` geared towards the legacy backend which has a more
  basic API than Aristo.
2023-09-15 16:23:53 +01:00
Jordan Hrycaj 8e46953390
Aristo db state root repos and reorg (#1744)
* Reorg of distributed backend access

details:
  Now handled via API provided in `aristo_desc`.

* Rename `checkCache()` => `checkTop()`

why:
  Better naming for top layer cache checker

also:
  Provide cascaded fifos checker

* Provide `eq` directive for finding filter by exact filter ID (think block number)

* Some code beautification (for better code reading)

* State root reposition and reorg

details:
  Repositioning is supported by forking a new descriptor. Reorg is then
  accomplished by writing this forked state on the backend database.
2023-09-11 21:38:49 +01:00
Jordan Hrycaj f177f5bf11
Aristo db extend filter storage scheduler api (#1725)
* Add backwards index `[]` operator into fifo

also:
  Need another maintenance instruction: The last overflow queue must
  irrevocably delete some item in order to make space for a new one.

* Add re-org scheduler

details:
  Generates instructions how to extract and merge some leading entries

* Add filter ID selector

details:
  This allows to find the next filter now newer that a given filter ID

* Message update
2023-08-30 18:08:39 +01:00
Jordan Hrycaj 465d694834
Aristo db implement filter storage scheduler (#1713)
* Rename FilterID => QueueID

why:
  The current usage does not identify a particular filter but uses it as
  storage tag to manage it on the database (to be organised in a set of
  FIFOs or queues.)

* Split `aristo_filter` source into sub-files

why:
  Make space for filter management API

* Store filter queue IDs in pairs on the backend

why:
  Any pair will will describe a FIFO accessed by bottom/top IDs

* Reorg some source file names

why:
  The "aristo_" prefix for make local/private files is tedious to
  use, so removed.

* Implement filter slot scheduler

details:
  Filters will be stored on the database on cascaded FIFOs. When a FIFO
  queue is full, some filter items are bundled together and stored on the
  next FIFO.
2023-08-25 23:53:59 +01:00
Jordan Hrycaj b9a4fd3137
Aristo db update serialisation (#1700)
* Remove unused unit test sources

* Redefine and document serialised data records for Aristo backend

why:
  Unique record types determined by marker byte, i.e. the last byte of a
  serialisation record. This just needed some tweaking after adding new
  record types.
2023-08-21 19:18:06 +01:00
Jordan Hrycaj 4c9141ffac
Aristo db implement filter serialisation for storage (#1695)
* Remove concept of empty/blind filters

why:
  Not needed. A non-existent filter is is coded as a nil reference.

* Slightly generalised backend iterators

why:
 * VertexID as key for the ID generator state makes no sense
 * there will be more tables addressed by non-VertexID keys

* Store serialised/blobified vertices on memory backend

why:
  This is more in line with the RocksDB backend so more appropriate
  for testing when comparing behaviour. For a speedy memory database,
  a backend-less variant should be used.

* Drop the `Aristo` prefix from names `AristoLayerRef`, etc.

* Suppress compiler warning

why:
  duplicate imports

* Add filter serialisation transcoder

why:
  Will be used as storage format
2023-08-18 20:46:55 +01:00
Jordan Hrycaj 3078c207ca
Aristo db implement distributed backend access (#1688)
* Fix hashing algorithm

why:
  Particular case where a sub-tree is on the backend, linked by an
  Extension vertex to the top level.

* Update backend verification to report `dirty` top layer

* Implement distributed merge of backend filters

* Implement distributed backend access management

details:
  Implemented and tested as described in chapter 5 of the `README.md`
  file.
2023-08-17 14:42:01 +01:00
Jordan Hrycaj 01fe172738
Aristo db integrate hashify into tx (#1679)
* Renamed type `NoneBackendRef` => `VoidBackendRef`

* Clarify names: `BE=filter+backend` and `UBE=backend (unfiltered)`

why:
  Most functions used full names as `getVtxUnfilteredBackend()` or
  `getKeyBackend()`. After defining abbreviations (and its meaning) it
   seems easier to use `getVtxUBE()` and `getKeyBE()`.

* Integrate `hashify()` process into transaction logic

why:
  Is now transparent unless explicitly controlled.

details:
  Cache changes imply setting a `dirty` flag which in turn triggers
  `hashify()` processing in transaction and `pack()` directives.

* Removed `aristo_tx.exec()` directive

why:
  Inconsistent implementation, functionality will be provided with a
  different paradigm.
2023-08-11 18:23:57 +01:00
Jordan Hrycaj 09fabd04eb
Aristo db use filter betw backend and tx cache (#1678)
* Provide deep copy for each transaction layer

why:
  Localising changes. Selective deep copy was just overlooked.

* Generalise vertex ID generator state reorg function `vidReorg()`

why:
  makes it somewhat easier to handle when saving layers.

* Provide dummy back end descriptor `NoneBackendRef`

* Optional read-only filter between backend and transaction cache

why:
  Some staging area for accumulating changes to the backend DB. This
  will eventually be an access layer for emulating a backend with
  multiple/historic state roots.

* Re-factor `persistent()` with filter between backend/tx-cache => `stow()`

why:
  The filter provides an abstraction from the physically stored data on
  disk. So, there can be several MPT instances using the same disk data
  with different state roots. Of course, all the MPT instances should
  not differ too much for practical reasons :).

TODO:
  Filter administration tools need to be provided.
2023-08-10 21:01:28 +01:00
Jordan Hrycaj 71c91e2280
Aristo db refactor tx paradim (#1674)
* Better error handling

why:
  Bail out on some error as early as possible before any changes.

* Implement `fetch()` as opposite of `merge()`

rationale:
  In the `Aristo` realm, the action named `fetch()` and `merge()` indicate
  leaf value related actions on the MPT, while actions `get()` and `put()`
   handle vertex or hash key related operations that constitute the MPT.

* Re-factor `merge()` prototypes

why:
  The most used variant of `merge()` should have the simplest prototype.

* Persistent DB constructor needs to import `aristo/aristo_init/persistent`

why:
  Most applications use memory DB anyway. This avoids linking `-lrocksdb`
  or any other back end libraries by default.

* Re-factor transaction module

why:
  Got the paradigm wrong. The transaction descriptor did replace the
  database one but should be handled separately.
2023-08-07 18:45:23 +01:00
Jordan Hrycaj ccf639fc3c
Aristo db transaction based interface (#1628)
* Provide transaction based interface for standard operations

* Provide unit tests for new Aristo interface using transactions

details:
  These new tests combine and replace several single-purpose tests.
  The now unused test sources will be kept for a while to be eventually
  removed.
2023-07-05 14:50:11 +01:00
Jordan Hrycaj ff6673beac
Aristo db tidy up a bit (#1625)
* Slightly tighten some self-check conditions

* Redefined the database descriptor object as reference (to the object)

why:
  The upcoming transaction wrapper will work with a database reference
  rather than the object itself

* Append state before `save()` to the Aristo descriptor

why:
  This stae was previously returned by the function. Appending it to
  a field of the Aristo descriptor seems easier to handle.
2023-07-04 19:24:03 +01:00
Jordan Hrycaj 83dbe87159
Aristo db update foreground caching (#1605)
* Fix vertex ID generator state handling for rocksdb backend

why:
 * Key error in walk iterator
 * Needs to be loaded when opening the database

* Use non-zero sub-table prefixes for rocksdb

why:
  Handy for debugging

* Fix error code for missing key on rocksdb backend

why:
  Previously returned `VOID_HASH_KEY` rather than `GetKeyNotFound`

* Explicitly copy vertex data between internal table and function/result argument

why:
  Function argument or return reference may still refer to the same data
  object.

* Updated error symbols

why:
  Error symbol names for the hike module now start with the prefix `Hike`.

* Write back modified branch node into local top layer cache

why:
  With the backend available, the source of the branch node references
  might not be the top layer cache. So any change must be explicitely
  recorded.
2023-06-22 12:13:24 +01:00
Jordan Hrycaj 4b66f93274
Aristo db with storage backends (#1603)
* Generalised Aristo DB constructor for any type of backend

details:
  * Records to be deleted are represented as key-void (rather than
    key-value) pairs by the put-function arguments
  * Allow direct driver access, iterators as example implementation and
    for testing.

* Provide backend storage interface

details:
  Stores the top layer onto backend tables

* Implemented Rocks DB backend

details:
  Transaction based `put()` functionality
  Iterators (based on direct RocksDB access)
2023-06-20 14:26:25 +01:00
Jordan Hrycaj d7f40516a7
Detach from snap/sync declarations & definitions (#1601)
why:
  Tests and some basic components were originally borrowed from the
  snap/sync implementation. These have fully been re-implemented.
2023-06-12 19:16:03 +01:00
Jordan Hrycaj 0308dfac4f
Aristo db address sup trie items properly (#1600)
* Fix include

why:
  Eth67 not default yet so that got missed

* Rename `LeafKey` => `LeafTie`

why:
  Name is a pen picture of what this object is for. Also, it avoids the
  ubiquitous term `key`.

* Provided `getOrVoid()` wrapper for `getOrDefault()`

also:
  Provide `isValid()` syntactic sugar for `.isNil.not`, `!= 0` etc.
  Reorg descriptor source, split into sub-sources

* Bundled `NodeKey` objects with root ID and called it `HashLabel`

why:
  `NodeKey` (aka repurposed Hash265) objects are unique only within a
  particular sub-trie (e.g. storage slots) which are kept separated
  (i.e non-interleaved) by design. This is not applied to the backend
  as the map VertexID->NodeKey labelling the nodes needs not be injective.

  For the in-memory database (transaction) layers, the injective map
  VertexID->(VertexID,NodeKey) is used where the first field of the image
  tuple is the root ID of the sub-trie the `NodeKey` object is valid. So
  identical storage tries for different accounts can be represented.
2023-06-12 14:48:47 +01:00
Jordan Hrycaj 932a2140f2
Aristo db supporting forest and layered tx architecture (#1598)
* Exclude some storage tests

why:
  These test running on external dumps slipped through. The particular
  dumps were reported earlier as somehow dodgy.

  This was changed in `#1457` but having a second look, the change on
  hexary_interpolate.nim(350) might be incorrect.

* Redesign `Aristo DB` descriptor for transaction based layers

why:
  Previous descriptor layout made it cumbersome to push/pop
  database delta layers.

  The new architecture keeps each layer with the full delta set
  relative to the database backend.

* Keep root ID as part of the `Patricia Trie` leaf path

why;
  That way, forests are supported
2023-06-09 12:17:37 +01:00
Jordan Hrycaj 2fc349feb9
Aristo db merkle hashify functionality added (#1593)
* Keep vertex ID generator state with each db-layer

why:
  The vertex ID generator state is part of the difference to the below
  layer

* Move otherwise unused source to test directory

* Add Merkle hash generator

also:
  * Verification facility for debugging
  * Empty Merkle key hashes encoded as `EMPTY_ROOT_HASH`
2023-05-30 22:21:15 +01:00
Jordan Hrycaj cd78458123
Add items to `Aristo Trie` database (#1586)
details:
1. Merging a leaf vertex merges a `Patricia Trie` path (while
   adding/modiying vertices) and adds a leaf node with payload
2. Merging a Merkel node merges a single vertex to the `Patricia Trie`
   and registers merkel hashes
3. Action 2 can be used before action 1 in order to construct a
   Merkel proof as required for handling `snap/1` data.
4. Unit tests show that action 3 is benign for now :)
2023-05-30 12:47:47 +01:00