Commit Graph

24 Commits

Author SHA1 Message Date
Jacek Sieka 5c1e2e7d3b
Migrate `keyed_queue` to `minilru` (#2608)
Compared to `keyed_queue`, `minilru` uses significantly less memory, in
particular for the 32-byte hash keys where `kq` stores several copies of
the key redundantly.
2024-09-13 15:47:50 +02:00
Jordan Hrycaj cbe5131927
Simplify aristo tree deletion functionality (#2563)
* Cleaning up, removing cruft and debugging statements

* Make `aristo_delta` fluffy compatible

why:
  A sub-module that uses `chronicles` must import all possible
  modules used by a parent module that imports the sub-module.

* update TODO
2024-08-14 12:09:30 +00:00
Jordan Hrycaj ce713d95fc
Aristo lazily delete larger subtrees (#2560)
* Extract sub-tree deletion functions into separate sub-modules

* Move/rename `aristo_desc.accLruSize` => `aristo_constants.ACC_LRU_SIZE`

* Lazily delete sub-trees

why:
  This gives some control of the memory used to keep the deleted vertices
  in the cached layers. For larger sub-trees, keys and vertices might be
  on the persistent backend to a large extend. This would pull an amount
  of extra information from the backend into the cached layer.

  For lazy deleting it is enough to remember sub-trees by a small set of
  (at most 16) sub-roots to be processed when storing persistent data.
  Marking the tree root deleted immediately allows to let most of the code
  base work as before.

* Comments and cosmetics

* No need to import all for `Aristo` here

* Kludge to make `chronicle` usage in sub-modules work with `fluffy`

why:
  That `fluffy` would not run with any logging in `core_deb` is a problem
  I have known for a while. Up to now, logging was only used for debugging.

  With the current `Aristo` PR, there are cases where logging might be
  wanted but this works only if `chronicles` runs without the
  `json[dynamic]` sinks.

  So this should be re-visited.

* More of a kludge
2024-08-14 08:54:44 +00:00
Jacek Sieka c364426422
Smaller in-database representations (#2436)
These representations use ~15-20% less data compared to the status quo,
mainly by removing redundant zeroes in the integer encodings - a
significant effect of this change is that the various rocksdb caches see
better efficiency since more items fit in the same amount of space.

* use RLP encoding for `VertexID` and `UInt256` wherever it appears
* pack `VertexRef`/`PayloadRef` more tightly
2024-07-02 20:25:06 +02:00
Jordan Hrycaj 2c87fd1636
Aristo code cosmetics and tests update (#2434)
* Update some docu

* Resolve obsolete compile time option

why:
  Not optional anymore

* Update checks

why:
  The notion of what constitutes a valid `Aristo` db has changed due to
  (even more) lazy calculating Merkle hash keys.

* Disable redundant unit test for production
2024-07-01 10:59:18 +00:00
Jacek Sieka 6b68ff92d3
Allocation-free nibbles buffer (#2406)
This buffer eleminates a large part of allocations during MPT traversal,
reducing overall memory usage and GC pressure.

Ideally, we would use it throughout in the API instead of
`openArray[byte]` since the built-in length limit appropriately exposes
the natural 64-nibble depth constraint that `openArray` fails to
capture.
2024-06-22 22:33:37 +02:00
Jordan Hrycaj 392088e5e9
Coredb fix storage tree issues (#2317)
* Code cosmetics

* Re-org `aristo_merge`, internally split into sub-modules

why:
  Became a burden for maintenance because it hosts two different
  functionalities under the same merge paradigm: account/data merge
  and snap proof merge where the latter produces a partial trie.

* Fix CoreDb tracer

* Ledger: fix potential account vs. storage tree sync problems

* Remove bound on the size of removable whole storage trees

* Activate `test_tracer_json`
2024-06-07 10:56:31 +00:00
Jordan Hrycaj f926222fec
Aristo cull journal related stuff (#2288)
* Remove all journal related stuff

* Refactor function names journal*() => delta*(), filter*() => delta*()

* remove `trg` fileld from `FilterRef`

why:
  Same as `kMap[$1]`

* Re-type FilterRef.src as `HashKey`

why:
  So it is directly comparable to `kMap[$1]`

* Moved `vGen[]` field from `LayerFinalRef` to `LayerDeltaRef`

why:
  Then a separate `FilterRef` type is not needed, anymore

* Rename `roFilter` field in `AristoDbRef` => `balancer`

why:
  New name more appropriate.

* Replace `FilterRef` by `LayerDeltaRef` type

why:
  This allows to avoid copying into the `balancer` (see next patch set)
  most of the time. Typically, only one instance is running on the backend
  and the `balancer` is only used as a stage before saving data.

* Refactor way how to store data persistently

why:
  Avoid useless copy when staging `top` layer for persistently saving to
  backend.

* Fix copyright header?
2024-06-03 20:10:35 +00:00
Jordan Hrycaj 0d4ef023ed
Update aristo journal functionality (#2155)
* Aristo: Code cosmetics, e.g. update some CamelCase names

* CoreDb+Aristo: Provide oldest known state root implied

details:
  The Aristo journal allows to recover earlier but not all state roots.

* Aristo: Fix journal backward index operator, e.g. `[^1]`

* Aristo: Fix journal updater

why:
  The `fifosStore()` store function slightly misinterpreted the update
  instructions when translation is to database `put()` functions. The
  effect was that the journal was ever growing due to stale entries which
  were never deleted.

* CoreDb+Aristo: Provide utils for purging stale data from the KVT

details:
  See earlier patch, not all state roots are available. This patch
  provides a mapping from some state root to a block number and allows to
  remove all KVT data related to a particular block number

* Aristo+Kvt: Implement a clean up schedule for expired data in KVT

why:
  For a single state ledger like `Aristo`, there is only a limited
  backlog of states. So KVT data (i.e. headers etc.) are cleaned up
  regularly

* Fix copyright year
2024-04-26 13:43:52 +00:00
Jordan Hrycaj 8e18e85288
Aristodb remove obsolete and time consuming admin features (#2048)
* Aristo: Reorg `hashify()` using different schedule algorithm

why:
  Directly calculating the search tree top down from the roots turns
  out to be faster than using the cached structures left over by `merge()`
  and `delete()`.
  Time gains is short of 20%

* Aristo: Remove `lTab[]` leaf entry object type

why:
  Not used anymore. It was previously needed to build the schedule for
  `hashify()`.

* Aristo: Avoid unnecessary re-org of the vertex ID recycling list

why:
  This list can become quite large so a heuristic is employed whether
  it makes sense to re-org.

  Also, re-org check is only done by `delete()` functions.

* Aristo: Remove key/reverse lookup table from tx layers

why:
  It is ignored except for handling proof nodes and costs unnecessary
  run time resources.

  This feature was originally needed to accommodate the mental transition
  from the legacy MPT to the `Aristo` trie :).

* Fix copyright year
2024-02-22 08:24:58 +00:00
Jordan Hrycaj 1b4a43c140
Aristo db remove over engineered object type (#2027)
* CoreDb: update test suite

* Aristo: Simplify reverse key map

why:
  The reverse key map `pAmk: (root,key) -> {vid,..}` as been simplified to
  `pAmk: key -> {vid,..}` as the state `root` domain argument is not used,
  anymore

* Aristo: Remove `HashLabel` object type and replace it by `HashKey`

why:
  The `HashLabel` object attaches a root hash to a hash key. This is
  nowhere used, anymore.

* Fix copyright
2024-02-14 19:11:59 +00:00
Jordan Hrycaj 9e50af839f
Core db+aristo update storage trie handling (#2023)
* CoreDb: Test module with additional sample selector cmd line options

* Aristo: Do not automatically remove a storage trie with the account

why:
  This is an unnecessary side effect. Rather than using an automatism, a
  a storage root must be deleted manually.

* Aristo: Can handle stale storage root vertex IDs as empty IDs.

why:
  This is currently needed for the ledger API supporting both, a legacy
  and the `Aristo` database backend.

  This feature can be disabled at compile time by re-setting the
  `LOOSE_STORAGE_TRIE_COUPLING` flag in the `aristo_constants` module.

* CoreDb+Aristo: Flush/delete storage trie when deleting account

why:
  On either backend, a deleted account leave a dangling storage trie on
  the database.

  For consistency nn the legacy backend, storage tries must not be
  deleted as they might be shared by several accounts whereas on `Aristo`
  they are always unique.
2024-02-12 19:37:00 +00:00
Jordan Hrycaj 3b306a9689
Aristo: Update unit test suite (#2002)
* Aristo: Update unit test suite

* Aristo/Kvt: Fix iterators

why:
  Generic iterators were not properly updated after backend change

* Aristo: Add sub-trie deletion functionality

why:
  For storage tries linked to an account payload vertex ID, a the
  whole storage trie needs to be deleted with the account.

* Aristo: Reserve vertex ID numbers for static custom state roots

why:
  Static custom state roots may be controlled by an application,
  e.g. for a receipt or a transaction root. The `Aristo` functions
  are agnostic of what the static state roots are when different
  from the internal tree vertex ID 1.

details;
  The `merge()` function applied to a non-static state root (assumed
  to be a storage root) will check the payload of an accounts leaf
  and mark its Merkle keys to be re-checked.

* Aristo: Correct error code symbol

* Aristo: Update error code symbols

* Aristo: Code cosmetics/comments

* Aristo: Fix hashify schedule calculator

why:
  Had a tendency to stop early leaving an incomplete job
2024-02-01 21:27:48 +00:00
Jordan Hrycaj 657379f484
Aristo db update merkle hasher (#1925)
* Register paths for added leafs because of trie re-balancing

why:
  While the payload would not change, the prefix in the leaf vertex
  would. So it needs to be flagged for hash recompilation for the
  `hashify()` module.

also:
  Make sure that `Hike` paths which might have vertex links into the
  backend filter are replaced by vertex copies before manipulating.
  Otherwise the vertices on the immutable filter might be involuntarily
  changed.

* Also check for paths where the leaf vertex is on the backend, already

why:
  A a path can have dome vertices on the top layer cache with the
  `Leaf` vertex on  the backend.

* Re-define a void `HashLabel` type.

why:
  A `HashLabel` type is a pair `(root-vertex-ID, Keccak-hash)`. Previously,
  a valid `HashLabel` consisted of a non-empty hash and a non-zero vertex
  ID. This definition leads to a non-unique representation of a void
  `HashLabel` with either root-ID or has void. This has been changed to
  the unique void `HashLabel` exactly if the hash entry is void.

* Update consistency checkers

* Re-org `hashify()` procedure

why:
  Syncing against block chain showed serious deficiencies which produced
  wrong hashes or simply bailed out with error.

  So all fringe cases (mainly due to deleted entries) could be integrated
  into the labelling schedule rather than handling separate fringe cases.
2023-12-04 20:39:26 +00:00
Jordan Hrycaj 6e0397e276
Aristo and ledger small updates (#1888)
* Fix debug noise in `hashify()` for perfectly normal situation

why:
  Was previously considered a fixable error

* Fix test sample file names

why:
  The larger test file `goerli68161.txt.gz` is already in the local
  archive. So there is no need to use the smaller one from the external
  repo.

* Activate `accounts_cache` module from `db/ledger`

why:
  A copy of the original `accounts_cache.nim` source to be integrated
  into the `Ledger` module wrapper which allows to switch between
  different `accounts_cache` implementations unser tha same API.

details:
  At a later state, the `db/accounts_cache.nim` wrapper will be
  removed so that there is only one access to that module via
  `db/ledger/accounts_cache.nim`.

* Fix copyright headers in source code
2023-11-08 16:52:25 +00:00
Jordan Hrycaj 4feaa2cfab
Aristo db update for short nodes key edge cases (#1887)
* Aristo: Provide key-value list signature calculator

detail:
  Simple wrappers around `Aristo` core functionality

* Update new API for `CoreDb`

details:
+ Renamed new API functions `contains()` => `hasKey()` or `hasPath()`
  which disables the `in` operator on non-boolean 	`contains()` functions
+ The functions `get()` and `fetch()` always return a not-found error if
  there is no item, available. The new functions `getOrEmpty()` and
  `mergeOrEmpty()` return an an empty `Blob` if there is no such key
  found.

* Rewrite `core_apps.nim` using new API from `CoreDb`

* Use `Aristo` functionality for calculating Merkle signatures

details:
  For debugging, the `VerifyAristoForMerkleRootCalc` can be set so
  that `Aristo` results will be verified against the legacy versions.

* Provide general interface for Merkle signing key-value tables

details:
  Export `Aristo` wrappers

* Activate `CoreDb` tests

why:
  Now, API seems to be stable enough for general tests.

* Update `toHex()` usage

why:
  Byteutils' `toHex()` is superior to `toSeq.mapIt(it.toHex(2)).join`

* Split `aristo_transcode` => `aristo_serialise` + `aristo_blobify`

why:
+ Different modules for different purposes
+ `aristo_serialise`: RLP encoding/decoding
+ `aristo_blobify`: Aristo database encoding/decoding

* Compacted representation of small nodes' links instead of Keccak hashes

why:
  Ethereum MPTs use Keccak hashes as node links if the size of an RLP
  encoded node is at least 32 bytes. Otherwise, the RLP encoded node
  value is used as a pseudo node link (rather than a hash.) Such a node
  is nor stored on key-value database. Rather the RLP encoded node value
  is stored instead of a lode link in a parent node instead. Only for
  the root hash, the top level node is always referred to by the hash.

  This feature needed an abstraction of the `HashKey` object which is now
  either a hash or a blob of length at most 31 bytes. This leaves two
  ways of representing an empty/void `HashKey` type, either as an empty
  blob of zero length, or the hash of an empty blob.

* Update `CoreDb` interface (mainly reducing logger noise)

* Fix copyright years (to make `Lint` happy)
2023-11-08 12:18:32 +00:00
Jordan Hrycaj 465d694834
Aristo db implement filter storage scheduler (#1713)
* Rename FilterID => QueueID

why:
  The current usage does not identify a particular filter but uses it as
  storage tag to manage it on the database (to be organised in a set of
  FIFOs or queues.)

* Split `aristo_filter` source into sub-files

why:
  Make space for filter management API

* Store filter queue IDs in pairs on the backend

why:
  Any pair will will describe a FIFO accessed by bottom/top IDs

* Reorg some source file names

why:
  The "aristo_" prefix for make local/private files is tedious to
  use, so removed.

* Implement filter slot scheduler

details:
  Filters will be stored on the database on cascaded FIFOs. When a FIFO
  queue is full, some filter items are bundled together and stored on the
  next FIFO.
2023-08-25 23:53:59 +01:00
Jordan Hrycaj 124ac064c6
Aristo db store filters on backend (#1703)
* Simplify RocksDB sub-tables iterator

* Implement `filter` storage on backend db

details:
  Unit tests working
2023-08-22 19:44:54 +01:00
Jordan Hrycaj 56d5c382d7
Aristo db traversal helpers (#1638)
* Misc fixes

detail:
* Fix de-serialisation for account leafs
* Update node recovery from unit tests

* Remove `LegacyAccount` from `PayloadRef` object

why:
  Legacy accounts use a hash key as storage root which is detrimental
  to the working of the Aristo database which uses a vertex ID.

* Dissolve `hashify_helper` into `aristo_utils` and `aristo_transcode`

why:
  Functions are of general interest so they should live in first level
  code files.

* Added left/right iterators over leaf nodes

* Some helper/wrapper functions that might be useful
2023-07-13 00:03:14 +01:00
Jordan Hrycaj 93a72025a1
Extended data Payload specs for the backend. (#1630)
why:
  For the main tree with root vertex ID 1, the leaf nodes hold the
  account data. These accounts may link to sub trees the storage root
  node ID of which must be registered here. There is no reverse key
  lookup on the backend.

note:
  These definitions are experimental. Also, there are some tests missing
  for validating Payload data conversions.
2023-07-05 21:27:48 +01:00
Jordan Hrycaj 83dbe87159
Aristo db update foreground caching (#1605)
* Fix vertex ID generator state handling for rocksdb backend

why:
 * Key error in walk iterator
 * Needs to be loaded when opening the database

* Use non-zero sub-table prefixes for rocksdb

why:
  Handy for debugging

* Fix error code for missing key on rocksdb backend

why:
  Previously returned `VOID_HASH_KEY` rather than `GetKeyNotFound`

* Explicitly copy vertex data between internal table and function/result argument

why:
  Function argument or return reference may still refer to the same data
  object.

* Updated error symbols

why:
  Error symbol names for the hike module now start with the prefix `Hike`.

* Write back modified branch node into local top layer cache

why:
  With the backend available, the source of the branch node references
  might not be the top layer cache. So any change must be explicitely
  recorded.
2023-06-22 12:13:24 +01:00
Jordan Hrycaj d7f40516a7
Detach from snap/sync declarations & definitions (#1601)
why:
  Tests and some basic components were originally borrowed from the
  snap/sync implementation. These have fully been re-implemented.
2023-06-12 19:16:03 +01:00
Jordan Hrycaj 0308dfac4f
Aristo db address sup trie items properly (#1600)
* Fix include

why:
  Eth67 not default yet so that got missed

* Rename `LeafKey` => `LeafTie`

why:
  Name is a pen picture of what this object is for. Also, it avoids the
  ubiquitous term `key`.

* Provided `getOrVoid()` wrapper for `getOrDefault()`

also:
  Provide `isValid()` syntactic sugar for `.isNil.not`, `!= 0` etc.
  Reorg descriptor source, split into sub-sources

* Bundled `NodeKey` objects with root ID and called it `HashLabel`

why:
  `NodeKey` (aka repurposed Hash265) objects are unique only within a
  particular sub-trie (e.g. storage slots) which are kept separated
  (i.e non-interleaved) by design. This is not applied to the backend
  as the map VertexID->NodeKey labelling the nodes needs not be injective.

  For the in-memory database (transaction) layers, the injective map
  VertexID->(VertexID,NodeKey) is used where the first field of the image
  tuple is the root ID of the sub-trie the `NodeKey` object is valid. So
  identical storage tries for different accounts can be represented.
2023-06-12 14:48:47 +01:00
Jordan Hrycaj ff0fc98fdf
Multi layer architecture 4 aristo db (#1581)
* Cosmetics, renamed fields (eVtx, bVtx) -> (eVid, bVid)

* Multilayered delta architecture for Aristo DB

details:
  Any VertexID or data retrieval needs to go down the rabbit hole and
  fetch/get/manipulate the bottom layer -- even without explicit
  backend.

* Direct reference to backend from top-level layer

why:
  Some services as the vid management needs to be synchronised among all
  layers. So access is optimised.
2023-05-14 18:43:01 +01:00