Commit Graph

25 Commits

Author SHA1 Message Date
Jacek Sieka 81e75622cf
storage: store root id together with vid, for better locality of refe… (#2449)
The state and account MPT:s currenty share key space in the database
based on that vertex id:s are assigned essentially randomly, which means
that when two adjacent slot values from the same contract are accessed,
they might reside at large distance from each other.

Here, we prefix each vertex id by its root causing them to be sorted
together thus bringing all data belonging to a particular contract
closer together - the same effect also happens for the main state MPT
whose nodes now end up clustered together more tightly.

In the future, the prefix given to the storage keys can also be used to
perform range operations such as reading all the storage at once and/or
deleting an account with a batch operation.

Notably, parts of the API already supported this rooting concept while
parts didn't - this PR makes the API consistent by always working with a
root+vid.
2024-07-04 15:46:52 +02:00
Jordan Hrycaj ea7c756a9d
Core db reorg (#2444)
* CoreDb: Merged all sub-descriptors into `base_desc` module

* Dissolve `aristo_db/common_desc.nim`

* No need to export `Aristo` methods in `CoreDb`

* Resolve/tighten methods in `aristo_db` sub-moduled

why:
  So they can be straihgt implemented into the `base` module

* Moved/re-implemented `KVT` methods into `base` module

* Moved/re-implemented `MPT` methods into `base` module

* Moved/re-implemented account methods into `base` module

* Moved/re-implemented `CTX` methods into `base` module

* Moved/re-implemented `handler_{aristo,kvt}` into `aristo_db` module

* Moved/re-implemented `TX` methods into `base` module

* Moved/re-implemented base methods into `base` module

* Replaced `toAristoSavedStateBlockNumber()` by proper base method

why:
  Was the last for keeping reason for keeping low level backend access
  methods

* Remove dedicated low level access to `Aristo` backend

why:
  Not needed anymore, for debugging the descriptors can be accessed
  directly

also:
  some clean up stuff

* Re-factor `CoreDb` descriptor layout and adjust base methods

* Moved/re-implemented iterators into `base_iterator*` modules

* Update docu
2024-07-03 15:50:27 +00:00
Jacek Sieka 3d3831dde8
Small cleanups (#2435)
* avoid costly hike memory allocations for operations that don't need to
re-traverse it
* avoid unnecessary state checks (which might trigger unwanted state
root computations)
* disable optimize-for-hits due to the MPT no longer being complete at
all times
2024-07-01 14:07:39 +02:00
Jordan Hrycaj 2c87fd1636
Aristo code cosmetics and tests update (#2434)
* Update some docu

* Resolve obsolete compile time option

why:
  Not optional anymore

* Update checks

why:
  The notion of what constitutes a valid `Aristo` db has changed due to
  (even more) lazy calculating Merkle hash keys.

* Disable redundant unit test for production
2024-07-01 10:59:18 +00:00
Jordan Hrycaj 8dd038144b
Some cleanups (#2428)
* Remove `dirty` set from structural objects

why:
  Not used anymore, the tree is dirty by default.

* Rename `aristo_hashify` -> `aristo_compute`

* Remove cruft, update comments, cosmetics, etc.

* Simplify `SavedState` object

why:
  The key chaining have become obsolete after extra lazy hashing. There
  is some available space for a state hash to be maintained in future.

details:
  Accept the legacy `SavedState` object serialisation format for a
  while (which will be overwritten by new format.)
2024-06-28 18:43:04 +00:00
Jordan Hrycaj 14c3772545
On demand mpt revisited (#2426)
* rebased from `github/on-demand-mpt`

ackn:
  wip: on-demand mpt construction

  Given that actual data is stored in the `Vertex` structure, it's useful
  to think of the MPT as a cache for computing roots rather than being a
  functional requirement on its own.

  This PR engenders this line of thinking by incrementally computing the
  MPT only when it's needed, ie when a state (or similar) root is needed.

  This has the effect of siginficantly reducing memory usage as well as
  improving performance:

  * no need for dirty-mpt-node book-keeping
  * no need to build complex forest of upcoming hashing work
  * only hashes that are functionally needed are ever computed -
  intermediate nodes whose MTP root is not observed are never computed /
  processed

* Unit test hot fixes

* Unit test hot fixes cont.

(somehow lost that part)

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2024-06-28 15:03:12 +00:00
Jordan Hrycaj 61bbf40014
Update storage tree admin (#2419)
* Tighten `CoreDb` API for accounts

why:
  Apart from cruft, the way to fetch the accounts state root via a
  `CoreDbColRef` record was unnecessarily complicated.

* Extend `CoreDb` API for accounts to cover storage tries

why:
  In future, this will make the notion of column objects obsolete. Storage
  trees will then be indexed by the account address rather than the vertex
  ID equivalent like a `CoreDbColRef`.

* Apply new/extended accounts API to ledger and tests

details:
  This makes the `distinct_ledger` module obsolete

* Remove column object constructors

why:
  They were needed as an abstraction of MPT sub-trees including storage
  trees. Now, storage trees are handled by the account (e.g. via address)
  they belong to and all other trees can be identified by a constant well
  known vertex ID. So there is no need for column objects anymore.

  Still there are some left-over column object methods wnich will be
  removed next.

* Remove `serialise()` and `PayloadRef` from default Aristo API

why:
  Not needed. `PayloadRef` was used for unstructured/unknown payload
  formats (account or blob) and `serialise()` was used for decodng
  `PayloadRef`. Now it is known in advance what the payload looks
  like.

* Added query function `hasStorageData()` whether a storage area exists

why:
  Useful for supporting `slotStateEmpty()` of the `CoreDb` API

* In the `Ledger` replace `storage.stateEmpty()` by 	`slotStateEmpty()`

* On Aristo, hide the storage root/vertex ID in the `PayloadRef`

why:
  The storage vertex ID is fully controlled by Aristo while the
  `AristoAccount` object is controlled by the application. With the
  storage root part of the `AristoAccount` object, there was a useless
  administrative burden to keep that storage root field up to date.

* Remove cruft, update comments etc.

* Update changed MPT access paradigms

why:
  Fixes verified proxy tests

* Fluffy cosmetics
2024-06-27 09:01:26 +00:00
Jacek Sieka c8cdffa775
Small cleanups (#2414)
* remove unnecessary / expensive error checking
* avoid some trivial memory allocs
* work around table move bug
2024-06-26 09:25:09 +02:00
Jacek Sieka 6b68ff92d3
Allocation-free nibbles buffer (#2406)
This buffer eleminates a large part of allocations during MPT traversal,
reducing overall memory usage and GC pressure.

Ideally, we would use it throughout in the API instead of
`openArray[byte]` since the built-in length limit appropriately exposes
the natural 64-nibble depth constraint that `openArray` fails to
capture.
2024-06-22 22:33:37 +02:00
Jordan Hrycaj e7be0d185c
Aristo uses pre classified tree types cont2 (#2397)
* Provide dedicated functions for fetching accounts and storage trees

why:
  Different prototypes for each class `account`, `generic` and
  `storage`.

* Remove `fetchPayload()` and other cruft from API, `aristo_fetch`, etc.

* Fix typos, debugging left overs, comments
2024-06-19 12:40:00 +00:00
Jordan Hrycaj 8727307ef4
Aristo uses pre classified tree types cont1 (#2389)
* Provide dedicated functions for deleteing accounts and storage trees

why:
  Storage trees are always linked to an account, so there is no need
  for an application to fiddle about (e.g. re-cycling, unlinking)
  storage tree vertex IDs.

* Remove `delete()` and other cruft from API, `aristo_delete`, etc.

* clean up delete functions

details:
  The delete implementations `deleteImpl()` and `delTreeImpl()` do not
  need to be super generic anymore as all the edge cases are covered by
  the specialised `deleteAccountPayload()`, `deleteGenericData()`, etc.

* Avoid unnecessary re-calculations of account keys

why:
  The function `registerAccountForUpdate()` did extract the storage ID
  (if any) and automatically marked the Merkle keys along the account
  path for re-hashing.

  This would also apply if there was later detected that the account
  or the storage tree did not need to be updated.

  So the `registerAccountForUpdate()` function was split into a part
  which retrieved the storage ID, and another one which marked the
  Merkle keys for re-calculation to be applied only when needed.
2024-06-18 19:30:01 +00:00
Jordan Hrycaj 51f02090b8
Aristo uses pre classified tree types (#2385)
* Remove unused `merge*()` functions (for production)

details:
  Some functionality moved to test suite

* Make sure that only `AccountData` leaf type is exactly used on VertexID(1)

* clean up payload type

* Provide dedicated functions for merging accounts and storage trees

why:
  Storage trees are always linked to an account, so there is no need
  for an application to fiddle about (e.e. creating, re-cycling) with
  storage tree vertex IDs.

* CoreDb: Disable tracer functionality

why:
  Must be updated to accommodate new/changed `Aristo` functions.

* CoreDb: Use new `mergeXXX()` functions

why:
  Makes explicit vertex ID management obsolete for creating new
  storage trees.

* Remove `mergePayload()` and other cruft from API, `aristo_merge`, etc.

* clean up merge functions

details:
  The merge implementation `mergePayloadImpl()` does not need to be super
  generic anymore as all the edge cases are covered by the specialised
  functions `mergeAccountPayload()`, `mergeGenericData()`, and
  `mergeStorageData()`.

* No tracer available at the moment, so disable offending tests
2024-06-18 11:14:02 +00:00
Jordan Hrycaj 0f430c70fd
Aristo avoid storage trie update race conditions (#2251)
* Update TDD suite logger output format choices

why:
  New format is not practical for TDD as it just dumps data across a wide
  range (considerably larder than 80 columns.)

  So the new format can be turned on by function argument.

* Update unit tests samples configuration

why:
  Slightly changed the way to find the `era1` directory

* Remove compiler warnings (fix deprecated expressions and phrases)

* Update `Aristo` debugging tools

* Always update the `storageID` field of account leaf vertices

why:
  Storage tries are weekly linked to an account leaf object in that
  the `storageID` field is updated by the application.

  Previously, `Aristo` verified that leaf objects make sense when passed
  to the database. As a consequence
  * the database was inconsistent for a short while
  * the burden for correctness was all on the application which led
    to delayed error handling which is hard to debug.

  So `Aristo` will internally update the account leaf objects so that
  there are no race conditions due to the storage trie handling

* Aristo: Let `stow()`/`persist()` bail out unless there is a `VertexID(1)`

why:
  The journal and filter logic depends on the hash of the `VertexID(1)`
  which is commonly known as the state root. This implies that all
  changes to the database are somehow related to that.

* Make sure that a `Ledger` account does not overwrite the storage trie reference

why:
  Due to the abstraction of a sub-trie (now referred to as column with a
  hash describing its state) there was a weakness in the `Aristo` handler
  where an account leaf could be overwritten though changing the validity
  of the database. This has been changed and the database will now reject
  such changes.

  This patch fixes the behaviour on the application layer. In particular,
  the column handle returned by the `CoreDb` needs to be updated by
  the `Aristo` database state. This mitigates the problem that a storage
  trie might have vanished or re-apperaed with a different vertex ID.

* Fix sub-trie deletion test

why:
  Was originally hinged on `VertexID(1)` which cannot be wholesale
  deleted anymore after the last Aristo update. Also, running with
  `VertexID(2)` needs an artificial `VertexID(1)` for making `stow()`
  or `persist()` work.

* Cosmetics

* Activate `test_generalstate_json`

* Temporarily `deactivate test_tracer_json`

* Fix copyright header

---------

Co-authored-by: jordan <jordan@dry.pudding>
Co-authored-by: Jacek Sieka <jacek@status.im>
2024-05-30 17:48:38 +00:00
Jordan Hrycaj 8e18e85288
Aristodb remove obsolete and time consuming admin features (#2048)
* Aristo: Reorg `hashify()` using different schedule algorithm

why:
  Directly calculating the search tree top down from the roots turns
  out to be faster than using the cached structures left over by `merge()`
  and `delete()`.
  Time gains is short of 20%

* Aristo: Remove `lTab[]` leaf entry object type

why:
  Not used anymore. It was previously needed to build the schedule for
  `hashify()`.

* Aristo: Avoid unnecessary re-org of the vertex ID recycling list

why:
  This list can become quite large so a heuristic is employed whether
  it makes sense to re-org.

  Also, re-org check is only done by `delete()` functions.

* Aristo: Remove key/reverse lookup table from tx layers

why:
  It is ignored except for handling proof nodes and costs unnecessary
  run time resources.

  This feature was originally needed to accommodate the mental transition
  from the legacy MPT to the `Aristo` trie :).

* Fix copyright year
2024-02-22 08:24:58 +00:00
andri lim 7c1af9a78f
Add style check to config.nims and fix styles in source code (#2038)
* Add style check to config.nims and fix styles in source code

* Fix copyright year
2024-02-20 10:07:38 +07:00
Jordan Hrycaj 1b4a43c140
Aristo db remove over engineered object type (#2027)
* CoreDb: update test suite

* Aristo: Simplify reverse key map

why:
  The reverse key map `pAmk: (root,key) -> {vid,..}` as been simplified to
  `pAmk: key -> {vid,..}` as the state `root` domain argument is not used,
  anymore

* Aristo: Remove `HashLabel` object type and replace it by `HashKey`

why:
  The `HashLabel` object attaches a root hash to a hash key. This is
  nowhere used, anymore.

* Fix copyright
2024-02-14 19:11:59 +00:00
Jordan Hrycaj 9e50af839f
Core db+aristo update storage trie handling (#2023)
* CoreDb: Test module with additional sample selector cmd line options

* Aristo: Do not automatically remove a storage trie with the account

why:
  This is an unnecessary side effect. Rather than using an automatism, a
  a storage root must be deleted manually.

* Aristo: Can handle stale storage root vertex IDs as empty IDs.

why:
  This is currently needed for the ledger API supporting both, a legacy
  and the `Aristo` database backend.

  This feature can be disabled at compile time by re-setting the
  `LOOSE_STORAGE_TRIE_COUPLING` flag in the `aristo_constants` module.

* CoreDb+Aristo: Flush/delete storage trie when deleting account

why:
  On either backend, a deleted account leave a dangling storage trie on
  the database.

  For consistency nn the legacy backend, storage tries must not be
  deleted as they might be shared by several accounts whereas on `Aristo`
  they are always unique.
2024-02-12 19:37:00 +00:00
Jordan Hrycaj 2c35390bdf
Core db and aristo maintenance update (#2014)
* Aristo: Update error return code

why:
  Failing of `Aristo` function `delete()` might fail because there is
  no such data item on the db. This must return a single error code
  as is done with `fetch()`.

* Ledger: Better error handling

why:
  The `expect()` clauses have been replaced by raising asserts indicating
  the error from the database backend.

   Also, `delete()` failures are legitimate if the item to delete does not
   exist.

* Aristo: Delete function must always leave a label on DB for `hashify()`

why:
  The `hashify()` uses the labels left bu `merge()` and `delete()` to
  compile (and optimise) a scheduler for subsequent hashing.

  Originally, the labels were not used for deleted entries and `delete()`
  still had some edge case where the deletion label was not properly
  handled.

* Aristo: Update `hashify()` scheduler, remove buggy optimisation

why:
  Was left over from version without virtual state roots which did not
  know about account payload leaf vertices referring to storage roots.

* Aristo: Label storage trie account in `delete()` similar to `merge()`

details;
  The `delete()` function applied to a non-static state root (assumed
  to be a storage root) will check the payload of an accounts leaf
  and mark its Merkle keys to be re-checked when runninh `hashify()`

* Aristo: Clean up and re-org recycled vertex IDs in `hashify()`

why:
  Re-organising the recycled vertex IDs list intends to reduce the size of the
  list.

  This list is organised as a LIFO (or stack.) By reorganising it in a way
  so that the least vertex ID numbers are on top, the list will be kept
  smaller as observed on some examples (less than 30%.)

* CoreDb: Accept storage trie deletion requests in non-initialised state

why:
  Due to lazy initialisation, the root vertex ID might not yet exist. So
  the `Aristo` database handlers would reject this call with an error and
  this condition needs to be handled by the API (which realises the lazy
  feature.)

* Cosmetics & code massage, prettify logging

* fix missing import
2024-02-08 16:32:16 +00:00
Jordan Hrycaj 3b306a9689
Aristo: Update unit test suite (#2002)
* Aristo: Update unit test suite

* Aristo/Kvt: Fix iterators

why:
  Generic iterators were not properly updated after backend change

* Aristo: Add sub-trie deletion functionality

why:
  For storage tries linked to an account payload vertex ID, a the
  whole storage trie needs to be deleted with the account.

* Aristo: Reserve vertex ID numbers for static custom state roots

why:
  Static custom state roots may be controlled by an application,
  e.g. for a receipt or a transaction root. The `Aristo` functions
  are agnostic of what the static state roots are when different
  from the internal tree vertex ID 1.

details;
  The `merge()` function applied to a non-static state root (assumed
  to be a storage root) will check the payload of an accounts leaf
  and mark its Merkle keys to be re-checked.

* Aristo: Correct error code symbol

* Aristo: Update error code symbols

* Aristo: Code cosmetics/comments

* Aristo: Fix hashify schedule calculator

why:
  Had a tendency to stop early leaving an incomplete job
2024-02-01 21:27:48 +00:00
Jordan Hrycaj ffa8ad2246
Core db use differential tx layers for aristo and kvt (#1949)
* Fix kvt headers

* Provide differential layers for KVT transaction stack

why:
  Significant performance improvement

* Provide abstraction layer for database top cache layer

why:
  This will eventually implemented as a differential database layers
  or transaction layers. The latter is needed to improve performance.

behavioural changes:
  Zero vertex and keys (i.e. delete requests) are not optimised out
  until the last layer is written to the database.

* Provide differential layers for Aristo transaction stack

why:
  Significant performance improvement
2023-12-19 12:39:23 +00:00
Jordan Hrycaj 657379f484
Aristo db update merkle hasher (#1925)
* Register paths for added leafs because of trie re-balancing

why:
  While the payload would not change, the prefix in the leaf vertex
  would. So it needs to be flagged for hash recompilation for the
  `hashify()` module.

also:
  Make sure that `Hike` paths which might have vertex links into the
  backend filter are replaced by vertex copies before manipulating.
  Otherwise the vertices on the immutable filter might be involuntarily
  changed.

* Also check for paths where the leaf vertex is on the backend, already

why:
  A a path can have dome vertices on the top layer cache with the
  `Leaf` vertex on  the backend.

* Re-define a void `HashLabel` type.

why:
  A `HashLabel` type is a pair `(root-vertex-ID, Keccak-hash)`. Previously,
  a valid `HashLabel` consisted of a non-empty hash and a non-zero vertex
  ID. This definition leads to a non-unique representation of a void
  `HashLabel` with either root-ID or has void. This has been changed to
  the unique void `HashLabel` exactly if the hash entry is void.

* Update consistency checkers

* Re-org `hashify()` procedure

why:
  Syncing against block chain showed serious deficiencies which produced
  wrong hashes or simply bailed out with error.

  So all fringe cases (mainly due to deleted entries) could be integrated
  into the labelling schedule rather than handling separate fringe cases.
2023-12-04 20:39:26 +00:00
Jordan Hrycaj 6e0397e276
Aristo and ledger small updates (#1888)
* Fix debug noise in `hashify()` for perfectly normal situation

why:
  Was previously considered a fixable error

* Fix test sample file names

why:
  The larger test file `goerli68161.txt.gz` is already in the local
  archive. So there is no need to use the smaller one from the external
  repo.

* Activate `accounts_cache` module from `db/ledger`

why:
  A copy of the original `accounts_cache.nim` source to be integrated
  into the `Ledger` module wrapper which allows to switch between
  different `accounts_cache` implementations unser tha same API.

details:
  At a later state, the `db/accounts_cache.nim` wrapper will be
  removed so that there is only one access to that module via
  `db/ledger/accounts_cache.nim`.

* Fix copyright headers in source code
2023-11-08 16:52:25 +00:00
Jordan Hrycaj 4feaa2cfab
Aristo db update for short nodes key edge cases (#1887)
* Aristo: Provide key-value list signature calculator

detail:
  Simple wrappers around `Aristo` core functionality

* Update new API for `CoreDb`

details:
+ Renamed new API functions `contains()` => `hasKey()` or `hasPath()`
  which disables the `in` operator on non-boolean 	`contains()` functions
+ The functions `get()` and `fetch()` always return a not-found error if
  there is no item, available. The new functions `getOrEmpty()` and
  `mergeOrEmpty()` return an an empty `Blob` if there is no such key
  found.

* Rewrite `core_apps.nim` using new API from `CoreDb`

* Use `Aristo` functionality for calculating Merkle signatures

details:
  For debugging, the `VerifyAristoForMerkleRootCalc` can be set so
  that `Aristo` results will be verified against the legacy versions.

* Provide general interface for Merkle signing key-value tables

details:
  Export `Aristo` wrappers

* Activate `CoreDb` tests

why:
  Now, API seems to be stable enough for general tests.

* Update `toHex()` usage

why:
  Byteutils' `toHex()` is superior to `toSeq.mapIt(it.toHex(2)).join`

* Split `aristo_transcode` => `aristo_serialise` + `aristo_blobify`

why:
+ Different modules for different purposes
+ `aristo_serialise`: RLP encoding/decoding
+ `aristo_blobify`: Aristo database encoding/decoding

* Compacted representation of small nodes' links instead of Keccak hashes

why:
  Ethereum MPTs use Keccak hashes as node links if the size of an RLP
  encoded node is at least 32 bytes. Otherwise, the RLP encoded node
  value is used as a pseudo node link (rather than a hash.) Such a node
  is nor stored on key-value database. Rather the RLP encoded node value
  is stored instead of a lode link in a parent node instead. Only for
  the root hash, the top level node is always referred to by the hash.

  This feature needed an abstraction of the `HashKey` object which is now
  either a hash or a blob of length at most 31 bytes. This leaves two
  ways of representing an empty/void `HashKey` type, either as an empty
  blob of zero length, or the hash of an empty blob.

* Update `CoreDb` interface (mainly reducing logger noise)

* Fix copyright years (to make `Lint` happy)
2023-11-08 12:18:32 +00:00
Jordan Hrycaj 395580ff9d
Aristo and core db updates (#1800)
* Aristo: remove obsolete functions

* Aristo: Fix error code for non-available hash keys

why:
  Must not return `not-found` when the key is not available (i.e. the
  current changes were not hashified, yet.)

* CoreDB: Provide TDD and test framework
2023-10-03 12:56:13 +01:00
Jordan Hrycaj 56d5c382d7
Aristo db traversal helpers (#1638)
* Misc fixes

detail:
* Fix de-serialisation for account leafs
* Update node recovery from unit tests

* Remove `LegacyAccount` from `PayloadRef` object

why:
  Legacy accounts use a hash key as storage root which is detrimental
  to the working of the Aristo database which uses a vertex ID.

* Dissolve `hashify_helper` into `aristo_utils` and `aristo_transcode`

why:
  Functions are of general interest so they should live in first level
  code files.

* Added left/right iterators over leaf nodes

* Some helper/wrapper functions that might be useful
2023-07-13 00:03:14 +01:00