Commit Graph

26 Commits

Author SHA1 Message Date
Jacek Sieka df4a21c910
Store cached hash at the layer corresponding to the source data (#2492)
When lazily verifying state roots, we may end up with an entire state
without roots that gets computed for the whole database - in the current
design, that would result in hashes for the entire trie being held in
memory.

Since the hash depends only on the data in the vertex, we can store it
directly at the top-most level derived from the verticies it depends on
- be that memory or database - this makes the memory usage broadly
linear with respect to the already-existing in-memory change set stored
in the layers.

It also ensures that if we have multiple forks in memory, hashes get
cached in the correct layer maximising reuse between forks.

The same layer numbering scheme as elsewhere is reused, where -2 is the
backend, -1 is the balancer, then 0+ is the top of the stack and stack.

A downside of this approach is that we create many small batches - a
future improvement could be to collect all such writes in a single
batch, though the memory profile of this approach should be examined
first (where is the batch kept, exactly?).
2024-07-18 09:13:56 +02:00
Jordan Hrycaj a84a2131cd
No ext update (#2494)
* Imported/rebase from `no-ext`, PR #2485

  Store extension nodes together with the branch

  Extension nodes must be followed by a branch - as such, it makes sense
  to store the two together both in the database and in memory:

  * fewer reads, writes and updates to traverse the tree
  * simpler logic for maintaining the node structure
  * less space used, both memory and storage, because there are fewer
    nodes overall

  There is also a downside: hashes can no longer be cached for an
  extension - instead, only the extension+branch hash can be cached - this
  seems like a fine tradeoff since computing it should be fast.

  TODO: fix commented code

* Fix merge functions and `toNode()`

* Update `merkleSignCommit()` prototype

why:
  Result is always a 32bit hash

* Update short Merkle hash key generation

details:
  Ethereum reference MPTs use Keccak hashes as node links if the size of
  an RLP encoded node is at least 32 bytes. Otherwise, the RLP encoded
  node value is used as a pseudo node link (rather than a hash.) This is
  specified in the yellow paper, appendix D.

  Different to the `Aristo` implementation, the reference MPT would not
  store such a node on the key-value database. Rather the RLP encoded node value is stored instead of a node link in a parent node
  is stored as a node link on the parent database.

  Only for the root hash, the top level node is always referred to by the
  hash.

* Fix/update `Extension` sections

why:
  Were commented out after removal of a dedicated `Extension` type which
  left the system disfunctional.

* Clean up unused error codes

* Update unit tests

* Update docu

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2024-07-16 19:47:59 +00:00
Jacek Sieka 81e75622cf
storage: store root id together with vid, for better locality of refe… (#2449)
The state and account MPT:s currenty share key space in the database
based on that vertex id:s are assigned essentially randomly, which means
that when two adjacent slot values from the same contract are accessed,
they might reside at large distance from each other.

Here, we prefix each vertex id by its root causing them to be sorted
together thus bringing all data belonging to a particular contract
closer together - the same effect also happens for the main state MPT
whose nodes now end up clustered together more tightly.

In the future, the prefix given to the storage keys can also be used to
perform range operations such as reading all the storage at once and/or
deleting an account with a batch operation.

Notably, parts of the API already supported this rooting concept while
parts didn't - this PR makes the API consistent by always working with a
root+vid.
2024-07-04 15:46:52 +02:00
Jacek Sieka 1f60e8e453
Use `Hash256` directly for account path (#2439)
Account paths are always a hash - passing it around as such helps avoid
confusion as to how long it is
2024-07-03 10:14:26 +02:00
Jacek Sieka 3d3831dde8
Small cleanups (#2435)
* avoid costly hike memory allocations for operations that don't need to
re-traverse it
* avoid unnecessary state checks (which might trigger unwanted state
root computations)
* disable optimize-for-hits due to the MPT no longer being complete at
all times
2024-07-01 14:07:39 +02:00
Jordan Hrycaj 6dc2773957
Only use pre hashed addresses as account keys (#2424)
* Normalised storage tree addressing in function prototypes

detail:
  Argument list is always `<db> <account-path> <slot-path> ..` with
  both path arguments as `openArray[]`

* Remove cruft

* CoreDb internally Use full account paths rather than addresses

* Update API logging

* Use hashed account address only in prototypes

why:
  This avoids unnecessary repeated hashing of the same account address.
  The burden of doing that is upon the application. In the case here,
  the ledger caches all kinds of stuff anyway so it is common sense to
  exploit that for account address hashes.

caveat:
  Using `openArray[byte]` argument types for hashed accounts is inherently
  fragile. In non-release mode, a length verification `doAssert` is
  enabled by default.

* No accPath in data record (use `AristoAccount` as `CoreDbAccount`)

* Remove now unused `eAddr` field from ledger `AccountRef` type

why:
  Is duplicate of lookup key

* Avoid merging the account record/statement in the ledger twice.
2024-06-27 19:21:01 +00:00
Jacek Sieka 6b68ff92d3
Allocation-free nibbles buffer (#2406)
This buffer eleminates a large part of allocations during MPT traversal,
reducing overall memory usage and GC pressure.

Ideally, we would use it throughout in the API instead of
`openArray[byte]` since the built-in length limit appropriately exposes
the natural 64-nibble depth constraint that `openArray` fails to
capture.
2024-06-22 22:33:37 +02:00
Jordan Hrycaj 143f2e99f5
Core db+aristo fixes and tx handling updates (#2164)
* Aristo: Rename journal related sources and functions

why:
  Previously, the naming was hinged on the phrases `fifo`, `filter` etc.
  which reflect the inner workings of cascaded filters. This was
  unfortunate for reading/understanding the source code for actions where
  the focus is the journal as a whole.

* Aristo: Fix buffer overflow (path length truncating error)

* Aristo: Tighten `hikeUp()` stop check, update error code

why:
  Detect dangling vertex links. These are legit with `snap` sync
  processing but not with regular processing.

* Aristo: Raise assert in regular mode `merge()` at a dangling link/edge

why:
  With `snap` sync processing, partial trees are ok and can be amended.
  Not so in regular mode.

  Previously there was only a debug message when a non-legit dangling edge
  was encountered.

* Aristo: Make sure that vertices are copied before modification

why:
  Otherwise vertices from lower layers might also be modified

* Aristo: Fix relaxed mode for validity checker `check()`

* Remove cruft

* Aristo: Update API for transaction handling

details:
+ Split `aristo_tx.nim` into sub-modules
+ Split `forkWith()` into `findTx()` + `forkTx()`
+ Removed `forkTop()`, `forkBase()` (now superseded by new `forkTx()`)

* CoreDb+Aristo: Fix initialiser (missing methods)
2024-05-03 17:38:17 +00:00
Jordan Hrycaj 2c35390bdf
Core db and aristo maintenance update (#2014)
* Aristo: Update error return code

why:
  Failing of `Aristo` function `delete()` might fail because there is
  no such data item on the db. This must return a single error code
  as is done with `fetch()`.

* Ledger: Better error handling

why:
  The `expect()` clauses have been replaced by raising asserts indicating
  the error from the database backend.

   Also, `delete()` failures are legitimate if the item to delete does not
   exist.

* Aristo: Delete function must always leave a label on DB for `hashify()`

why:
  The `hashify()` uses the labels left bu `merge()` and `delete()` to
  compile (and optimise) a scheduler for subsequent hashing.

  Originally, the labels were not used for deleted entries and `delete()`
  still had some edge case where the deletion label was not properly
  handled.

* Aristo: Update `hashify()` scheduler, remove buggy optimisation

why:
  Was left over from version without virtual state roots which did not
  know about account payload leaf vertices referring to storage roots.

* Aristo: Label storage trie account in `delete()` similar to `merge()`

details;
  The `delete()` function applied to a non-static state root (assumed
  to be a storage root) will check the payload of an accounts leaf
  and mark its Merkle keys to be re-checked when runninh `hashify()`

* Aristo: Clean up and re-org recycled vertex IDs in `hashify()`

why:
  Re-organising the recycled vertex IDs list intends to reduce the size of the
  list.

  This list is organised as a LIFO (or stack.) By reorganising it in a way
  so that the least vertex ID numbers are on top, the list will be kept
  smaller as observed on some examples (less than 30%.)

* CoreDb: Accept storage trie deletion requests in non-initialised state

why:
  Due to lazy initialisation, the root vertex ID might not yet exist. So
  the `Aristo` database handlers would reject this call with an error and
  this condition needs to be handled by the API (which realises the lazy
  feature.)

* Cosmetics & code massage, prettify logging

* fix missing import
2024-02-08 16:32:16 +00:00
Jordan Hrycaj 3b306a9689
Aristo: Update unit test suite (#2002)
* Aristo: Update unit test suite

* Aristo/Kvt: Fix iterators

why:
  Generic iterators were not properly updated after backend change

* Aristo: Add sub-trie deletion functionality

why:
  For storage tries linked to an account payload vertex ID, a the
  whole storage trie needs to be deleted with the account.

* Aristo: Reserve vertex ID numbers for static custom state roots

why:
  Static custom state roots may be controlled by an application,
  e.g. for a receipt or a transaction root. The `Aristo` functions
  are agnostic of what the static state roots are when different
  from the internal tree vertex ID 1.

details;
  The `merge()` function applied to a non-static state root (assumed
  to be a storage root) will check the payload of an accounts leaf
  and mark its Merkle keys to be re-checked.

* Aristo: Correct error code symbol

* Aristo: Update error code symbols

* Aristo: Code cosmetics/comments

* Aristo: Fix hashify schedule calculator

why:
  Had a tendency to stop early leaving an incomplete job
2024-02-01 21:27:48 +00:00
Jordan Hrycaj a1161b537b
Core db update storage root management for sub tries (#1964)
* Aristo: Re-phrase `LayerDelta` and `LayerFinal` as object references

why:
  Avoids copying in some cases

* Fix copyright header

* Aristo: Verify `leafTie.root` function argument for `merge()` proc

why:
  Zero root will lead to inconsistent DB entry

* Aristo: Update failure condition for hash labels compiler `hashify()`

why:
  Node need not be rejected as long as links are on the schedule. In
  that case, `redo[]` is to become `wff.base[]` at a later stage.

  This amends an earlier fix, part of #1952 by also testing against
  the target nodes of the `wff.base[]` sets.

* Aristo: Add storage root glue record to `hashify()` schedule

why:
  An account leaf node might refer to a non-resolvable storage root ID.
  Storage root node chains will end up at the storage root. So the link
  `storage-root->account-leaf` needs an extra item in the schedule.

* Aristo: fix error code returned by `fetchPayload()`

details:
  Final error code is implied by the error code form the `hikeUp()`
  function.

* CoreDb: Discard `createOk` argument in API `getRoot()` function

why:
  Not needed for the legacy DB. For the `Arsto` DB, a lazy approach is
  implemented where a stprage root node is created on-the-fly.

* CoreDb: Prevent `$$` logging in some cases

why:
  Logging the function `$$` is not useful when it is used for internal
  use, i.e. retrieving an an error text for logging.

* CoreDb: Add `tryHashFn()` to API for pretty printing

why:
  Pretty printing must not change the hashification status for the
  `Aristo` DB. So there is an independent API wrapper for getting the
  node hash which never updated the hashes.

* CoreDb: Discard `update` argument in API `hash()` function

why:
  When calling the API function `hash()`, the latest state is always
  wanted. For a version that uses the current state as-is without checking,
  the function `tryHash()` was added to the backend.

* CoreDb: Update opaque vertex ID objects for the `Aristo` backend

why:
  For `Aristo`, vID objects encapsulate a numeric `VertexID`
  referencing a vertex (rather than a node hash as used on the
  legacy backend.) For storage sub-tries, there might be no initial
  vertex known when the descriptor is created. So opaque vertex ID
  objects are supported without a valid `VertexID` which will be
  initalised on-the-fly when the first item is merged.

* CoreDb: Add pretty printer for opaque vertex ID objects

* Cosmetics, printing profiling data

* CoreDb: Fix segfault in `Aristo` backend when creating MPT descriptor

why:
  Missing initialisation  error

* CoreDb: Allow MPT to inherit shared context on `Aristo` backend

why:
  Creates descriptors with different storage roots for the same
  shared `Aristo` DB descriptor.

* Cosmetics, update diagnostic message items for `Aristo` backend

* Fix Copyright year
2024-01-11 19:11:38 +00:00
Jordan Hrycaj ffa8ad2246
Core db use differential tx layers for aristo and kvt (#1949)
* Fix kvt headers

* Provide differential layers for KVT transaction stack

why:
  Significant performance improvement

* Provide abstraction layer for database top cache layer

why:
  This will eventually implemented as a differential database layers
  or transaction layers. The latter is needed to improve performance.

behavioural changes:
  Zero vertex and keys (i.e. delete requests) are not optimised out
  until the last layer is written to the database.

* Provide differential layers for Aristo transaction stack

why:
  Significant performance improvement
2023-12-19 12:39:23 +00:00
Jordan Hrycaj 657379f484
Aristo db update merkle hasher (#1925)
* Register paths for added leafs because of trie re-balancing

why:
  While the payload would not change, the prefix in the leaf vertex
  would. So it needs to be flagged for hash recompilation for the
  `hashify()` module.

also:
  Make sure that `Hike` paths which might have vertex links into the
  backend filter are replaced by vertex copies before manipulating.
  Otherwise the vertices on the immutable filter might be involuntarily
  changed.

* Also check for paths where the leaf vertex is on the backend, already

why:
  A a path can have dome vertices on the top layer cache with the
  `Leaf` vertex on  the backend.

* Re-define a void `HashLabel` type.

why:
  A `HashLabel` type is a pair `(root-vertex-ID, Keccak-hash)`. Previously,
  a valid `HashLabel` consisted of a non-empty hash and a non-zero vertex
  ID. This definition leads to a non-unique representation of a void
  `HashLabel` with either root-ID or has void. This has been changed to
  the unique void `HashLabel` exactly if the hash entry is void.

* Update consistency checkers

* Re-org `hashify()` procedure

why:
  Syncing against block chain showed serious deficiencies which produced
  wrong hashes or simply bailed out with error.

  So all fringe cases (mainly due to deleted entries) could be integrated
  into the labelling schedule rather than handling separate fringe cases.
2023-12-04 20:39:26 +00:00
Jordan Hrycaj 6e0397e276
Aristo and ledger small updates (#1888)
* Fix debug noise in `hashify()` for perfectly normal situation

why:
  Was previously considered a fixable error

* Fix test sample file names

why:
  The larger test file `goerli68161.txt.gz` is already in the local
  archive. So there is no need to use the smaller one from the external
  repo.

* Activate `accounts_cache` module from `db/ledger`

why:
  A copy of the original `accounts_cache.nim` source to be integrated
  into the `Ledger` module wrapper which allows to switch between
  different `accounts_cache` implementations unser tha same API.

details:
  At a later state, the `db/accounts_cache.nim` wrapper will be
  removed so that there is only one access to that module via
  `db/ledger/accounts_cache.nim`.

* Fix copyright headers in source code
2023-11-08 16:52:25 +00:00
Jordan Hrycaj 3fe0a49a5e
Aristo db allow shorter than 64 nibbles path keys (#1864)
* Aristo: Single `FetchPathNotFound` error in `fetchXxx()` and `hasPath()`

why:
  Missing path hike returns too many detailed reasons why it failed
  which becomes cumbersome to handle.

also:
  Renamed `contains()` => `hasPath()` which disables the `in` operator on
  non-boolean 	`contains()` functions

* Kvt: Renamed `contains()` => `hasKey()`

why:
  which disables the `in` operator on non-boolean 	`contains()` functions

* Aristo: Generalising `HashID` by variable length `PathID`

why:
  There are cases when the `Aristo` database is to be used with
  shorter than 64 nibbles keys when handling transactions indexes
  with sequence IDs.

caveat:
  This patch only works reliable for full length `PathID` values. Tests
  for shorter `PathID` values are currently missing.
2023-10-27 22:36:51 +01:00
Jordan Hrycaj cd1d370543
Aristo db api extensions for use as core db backend (#1754)
* Update docu

* Update Aristo/Kvt constructor prototype

why:
  Previous version used an `enum` value to indicate what backend is to
  be used. This was replaced by using the backend object type.

* Rewrite `hikeUp()` return code into `Result[Hike,(Hike,AristoError)]`

why:
  Better code maintenance. Previously, the `Hike` object was returned. It
  had an internal error field so partial success was also available on
  a failure. This error field has been removed.

* Use `openArray[byte]` rather than `Blob` in functions prototypes

* Provide synchronised multi instance transactions

why:
  The `CoreDB` object was geared towards the legacy DB which used a single
  transaction for the key-value backend DB. Different state roots are
  provided by the backend database, so all instances work directly on the
  same backend.

  Aristo db instances have different in-memory mappings (aka different
  state roots) and the transactions are on top of there mappings. So each
  instance might run different transactions.

  Multi instance transactions are a compromise to converge towards the
  legacy behaviour. The synchronised transactions span over all instances
  available at the time when base transaction was opened. Instances
  created later are unaffected.

* Provide key-value pair database iterator

why:
  Needed in `CoreDB` for `replicate()` emulation

also:
  Some update of internal code

* Extend API (i.e. prototype variants)

why:
  Needed for `CoreDB` geared towards the legacy backend which has a more
  basic API than Aristo.
2023-09-15 16:23:53 +01:00
Jordan Hrycaj 8e00143313
Aristo db code massage n cosmetics (#1745)
* Rewrite remaining `AristoError` return code into `Result[void,AristoError]`

why:
  Better code maintenance

* Update import sections

* Update Aristo DB paths

why:
 More systematic so directory can be shared with other DB types

* More cosmetcs

* Update unit tests runners

why:
  Proper handling of persistent and mem-only DB. The latter can be
  consistently triggered by an empty DB path.
2023-09-12 19:45:12 +01:00
Jordan Hrycaj 01fe172738
Aristo db integrate hashify into tx (#1679)
* Renamed type `NoneBackendRef` => `VoidBackendRef`

* Clarify names: `BE=filter+backend` and `UBE=backend (unfiltered)`

why:
  Most functions used full names as `getVtxUnfilteredBackend()` or
  `getKeyBackend()`. After defining abbreviations (and its meaning) it
   seems easier to use `getVtxUBE()` and `getKeyBE()`.

* Integrate `hashify()` process into transaction logic

why:
  Is now transparent unless explicitly controlled.

details:
  Cache changes imply setting a `dirty` flag which in turn triggers
  `hashify()` processing in transaction and `pack()` directives.

* Removed `aristo_tx.exec()` directive

why:
  Inconsistent implementation, functionality will be provided with a
  different paradigm.
2023-08-11 18:23:57 +01:00
Jordan Hrycaj ff6673beac
Aristo db tidy up a bit (#1625)
* Slightly tighten some self-check conditions

* Redefined the database descriptor object as reference (to the object)

why:
  The upcoming transaction wrapper will work with a database reference
  rather than the object itself

* Append state before `save()` to the Aristo descriptor

why:
  This stae was previously returned by the function. Appending it to
  a field of the Aristo descriptor seems easier to handle.
2023-07-04 19:24:03 +01:00
Jordan Hrycaj 83dbe87159
Aristo db update foreground caching (#1605)
* Fix vertex ID generator state handling for rocksdb backend

why:
 * Key error in walk iterator
 * Needs to be loaded when opening the database

* Use non-zero sub-table prefixes for rocksdb

why:
  Handy for debugging

* Fix error code for missing key on rocksdb backend

why:
  Previously returned `VOID_HASH_KEY` rather than `GetKeyNotFound`

* Explicitly copy vertex data between internal table and function/result argument

why:
  Function argument or return reference may still refer to the same data
  object.

* Updated error symbols

why:
  Error symbol names for the hike module now start with the prefix `Hike`.

* Write back modified branch node into local top layer cache

why:
  With the backend available, the source of the branch node references
  might not be the top layer cache. So any change must be explicitely
  recorded.
2023-06-22 12:13:24 +01:00
Jordan Hrycaj 4b66f93274
Aristo db with storage backends (#1603)
* Generalised Aristo DB constructor for any type of backend

details:
  * Records to be deleted are represented as key-void (rather than
    key-value) pairs by the put-function arguments
  * Allow direct driver access, iterators as example implementation and
    for testing.

* Provide backend storage interface

details:
  Stores the top layer onto backend tables

* Implemented Rocks DB backend

details:
  Transaction based `put()` functionality
  Iterators (based on direct RocksDB access)
2023-06-20 14:26:25 +01:00
Jordan Hrycaj d7f40516a7
Detach from snap/sync declarations & definitions (#1601)
why:
  Tests and some basic components were originally borrowed from the
  snap/sync implementation. These have fully been re-implemented.
2023-06-12 19:16:03 +01:00
Jordan Hrycaj 0308dfac4f
Aristo db address sup trie items properly (#1600)
* Fix include

why:
  Eth67 not default yet so that got missed

* Rename `LeafKey` => `LeafTie`

why:
  Name is a pen picture of what this object is for. Also, it avoids the
  ubiquitous term `key`.

* Provided `getOrVoid()` wrapper for `getOrDefault()`

also:
  Provide `isValid()` syntactic sugar for `.isNil.not`, `!= 0` etc.
  Reorg descriptor source, split into sub-sources

* Bundled `NodeKey` objects with root ID and called it `HashLabel`

why:
  `NodeKey` (aka repurposed Hash265) objects are unique only within a
  particular sub-trie (e.g. storage slots) which are kept separated
  (i.e non-interleaved) by design. This is not applied to the backend
  as the map VertexID->NodeKey labelling the nodes needs not be injective.

  For the in-memory database (transaction) layers, the injective map
  VertexID->(VertexID,NodeKey) is used where the first field of the image
  tuple is the root ID of the sub-trie the `NodeKey` object is valid. So
  identical storage tries for different accounts can be represented.
2023-06-12 14:48:47 +01:00
Jordan Hrycaj 932a2140f2
Aristo db supporting forest and layered tx architecture (#1598)
* Exclude some storage tests

why:
  These test running on external dumps slipped through. The particular
  dumps were reported earlier as somehow dodgy.

  This was changed in `#1457` but having a second look, the change on
  hexary_interpolate.nim(350) might be incorrect.

* Redesign `Aristo DB` descriptor for transaction based layers

why:
  Previous descriptor layout made it cumbersome to push/pop
  database delta layers.

  The new architecture keeps each layer with the full delta set
  relative to the database backend.

* Keep root ID as part of the `Patricia Trie` leaf path

why;
  That way, forests are supported
2023-06-09 12:17:37 +01:00
Jordan Hrycaj 2fc349feb9
Aristo db merkle hashify functionality added (#1593)
* Keep vertex ID generator state with each db-layer

why:
  The vertex ID generator state is part of the difference to the below
  layer

* Move otherwise unused source to test directory

* Add Merkle hash generator

also:
  * Verification facility for debugging
  * Empty Merkle key hashes encoded as `EMPTY_ROOT_HASH`
2023-05-30 22:21:15 +01:00
Jordan Hrycaj cd78458123
Add items to `Aristo Trie` database (#1586)
details:
1. Merging a leaf vertex merges a `Patricia Trie` path (while
   adding/modiying vertices) and adds a leaf node with payload
2. Merging a Merkel node merges a single vertex to the `Patricia Trie`
   and registers merkel hashes
3. Action 2 can be used before action 1 in order to construct a
   Merkel proof as required for handling `snap/1` data.
4. Unit tests show that action 3 is benign for now :)
2023-05-30 12:47:47 +01:00