nimbus-eth1

Commit Graph

Author	SHA1	Message	Date
Jacek Sieka	df4a21c910	Store cached hash at the layer corresponding to the source data (#2492 ) When lazily verifying state roots, we may end up with an entire state without roots that gets computed for the whole database - in the current design, that would result in hashes for the entire trie being held in memory. Since the hash depends only on the data in the vertex, we can store it directly at the top-most level derived from the verticies it depends on - be that memory or database - this makes the memory usage broadly linear with respect to the already-existing in-memory change set stored in the layers. It also ensures that if we have multiple forks in memory, hashes get cached in the correct layer maximising reuse between forks. The same layer numbering scheme as elsewhere is reused, where -2 is the backend, -1 is the balancer, then 0+ is the top of the stack and stack. A downside of this approach is that we create many small batches - a future improvement could be to collect all such writes in a single batch, though the memory profile of this approach should be examined first (where is the batch kept, exactly?).	2024-07-18 09:13:56 +02:00
Jacek Sieka	9d91191154	storage hike cache (#2484 ) This PR adds a storage hike cache similar to the account hike cache already present - this cache is less efficient because account storage is already partically cached in the account ledger but nonetheless helps keep hiking down. Notably, there's an opportunity to optimise this cache and the others so that they cooperate better insteado of overlapping, which is left for a future PR. This PR also fixes an O(N) memory usage for storage slots where the delete would keep the full storage in a work list which on mainnet can grow very large - the work list is replaced with a more conventional recursive `O(log N)` approach.	2024-07-14 19:12:10 +02:00
Jacek Sieka	f3a56002ca	Turn payload into value type (#2483 ) The Vertex type unifies branches, extensions and leaves into a single memory area where the larges member is the branch (128 bytes + overhead) - the payloads we have are all smaller than 128 thus wrapping them in an extra layer of `ref` is wasteful from a memory usage perspective. Further, the ref:s must be visited during the M&S phase of garbage collection - since we keep millions of these, many of them short-lived, this takes up significant CPU time. ``` Function CPU Time: Total CPU Time: Self Module Function (Full) Source File Start Address system::markStackAndRegisters 10.0% 4.922s nimbus system::markStackAndRegisters(var<system::GcHeap>).constprop.0 gc.nim 0x701230` ```	2024-07-14 12:02:05 +02:00
Jacek Sieka	01ab209497	cache account payload (#2478 ) Instead of caching just the storage id, we can cache the full payload which further reduces expensive hikes	2024-07-12 15:08:26 +02:00
Jacek Sieka	81e75622cf	storage: store root id together with vid, for better locality of refe… (#2449 ) The state and account MPT:s currenty share key space in the database based on that vertex id:s are assigned essentially randomly, which means that when two adjacent slot values from the same contract are accessed, they might reside at large distance from each other. Here, we prefix each vertex id by its root causing them to be sorted together thus bringing all data belonging to a particular contract closer together - the same effect also happens for the main state MPT whose nodes now end up clustered together more tightly. In the future, the prefix given to the storage keys can also be used to perform range operations such as reading all the storage at once and/or deleting an account with a batch operation. Notably, parts of the API already supported this rooting concept while parts didn't - this PR makes the API consistent by always working with a root+vid.	2024-07-04 15:46:52 +02:00
Jacek Sieka	b23795ab39	remove pPrf, fRpp (#2445 ) No longer used now that hashify is gone	2024-07-03 22:21:57 +02:00
Jacek Sieka	443c6d1f8e	Cache account path storage id (#2443 ) The storage id is frequently accessed when executing contract code and finding the path via the database requires several hops making the process slow - here, we add a cache to keep the most recently used account storage id:s in memory. A possible future improvement would be to cache all account accesses so that for example updating the balance doesn't cause several hikes.	2024-07-03 17:58:25 +02:00
Jordan Hrycaj	8dd038144b	Some cleanups (#2428 ) * Remove `dirty` set from structural objects why: Not used anymore, the tree is dirty by default. * Rename `aristo_hashify` -> `aristo_compute` * Remove cruft, update comments, cosmetics, etc. * Simplify `SavedState` object why: The key chaining have become obsolete after extra lazy hashing. There is some available space for a state hash to be maintained in future. details: Accept the legacy `SavedState` object serialisation format for a while (which will be overwritten by new format.)	2024-06-28 18:43:04 +00:00
Jordan Hrycaj	14c3772545	On demand mpt revisited (#2426 ) * rebased from `github/on-demand-mpt` ackn: wip: on-demand mpt construction Given that actual data is stored in the `Vertex` structure, it's useful to think of the MPT as a cache for computing roots rather than being a functional requirement on its own. This PR engenders this line of thinking by incrementally computing the MPT only when it's needed, ie when a state (or similar) root is needed. This has the effect of siginficantly reducing memory usage as well as improving performance: * no need for dirty-mpt-node book-keeping * no need to build complex forest of upcoming hashing work * only hashes that are functionally needed are ever computed - intermediate nodes whose MTP root is not observed are never computed / processed * Unit test hot fixes * Unit test hot fixes cont. (somehow lost that part) --------- Co-authored-by: Jacek Sieka <jacek@status.im>	2024-06-28 15:03:12 +00:00
Jordan Hrycaj	e7be0d185c	Aristo uses pre classified tree types cont2 (#2397 ) * Provide dedicated functions for fetching accounts and storage trees why: Different prototypes for each class `account`, `generic` and `storage`. * Remove `fetchPayload()` and other cruft from API, `aristo_fetch`, etc. * Fix typos, debugging left overs, comments	2024-06-19 12:40:00 +00:00
Jacek Sieka	eb041abba7	avoid unnecessary memory allocations and lookups (#2334 ) * use `withValue` instead of `hasKey` + `[]` * avoid `@` et al * parse database data inside `onData` instead of making seq then parsing	2024-06-11 11:38:58 +02:00
Jordan Hrycaj	69a158864c	Remove vid recycling feature (#2294 )	2024-06-04 15:05:13 +00:00
Jordan Hrycaj	f926222fec	Aristo cull journal related stuff (#2288 ) * Remove all journal related stuff * Refactor function names journal() => delta(), filter() => delta() * remove `trg` fileld from `FilterRef` why: Same as `kMap[$1]` * Re-type FilterRef.src as `HashKey` why: So it is directly comparable to `kMap[$1]` * Moved `vGen[]` field from `LayerFinalRef` to `LayerDeltaRef` why: Then a separate `FilterRef` type is not needed, anymore * Rename `roFilter` field in `AristoDbRef` => `balancer` why: New name more appropriate. * Replace `FilterRef` by `LayerDeltaRef` type why: This allows to avoid copying into the `balancer` (see next patch set) most of the time. Typically, only one instance is running on the backend and the `balancer` is only used as a stage before saving data. * Refactor way how to store data persistently why: Avoid useless copy when staging `top` layer for persistently saving to backend. * Fix copyright header?	2024-06-03 20:10:35 +00:00
tersec	34ac68990f	fix warnings around unused imports of std/algorithm; proc -> func (#2220 )	2024-05-25 21:01:28 +02:00
Jacek Sieka	0a49833d69	avoid a few more copies (#2215 )	2024-05-24 11:27:17 +02:00
Jacek Sieka	f38c5e631e	trivial memory-based speedups (#2205 ) * trivial memory-based speedups * HashKey becomes non-ref * use openArray instead of seq in lots of places * avoid sequtils.reversed when unnecessary * add basic perf stats to test_coredb * copyright	2024-05-23 17:37:51 +02:00
Jordan Hrycaj	8e18e85288	Aristodb remove obsolete and time consuming admin features (#2048 ) * Aristo: Reorg `hashify()` using different schedule algorithm why: Directly calculating the search tree top down from the roots turns out to be faster than using the cached structures left over by `merge()` and `delete()`. Time gains is short of 20% * Aristo: Remove `lTab[]` leaf entry object type why: Not used anymore. It was previously needed to build the schedule for `hashify()`. * Aristo: Avoid unnecessary re-org of the vertex ID recycling list why: This list can become quite large so a heuristic is employed whether it makes sense to re-org. Also, re-org check is only done by `delete()` functions. * Aristo: Remove key/reverse lookup table from tx layers why: It is ignored except for handling proof nodes and costs unnecessary run time resources. This feature was originally needed to accommodate the mental transition from the legacy MPT to the `Aristo` trie :). * Fix copyright year	2024-02-22 08:24:58 +00:00
Jordan Hrycaj	1b4a43c140	Aristo db remove over engineered object type (#2027 ) * CoreDb: update test suite * Aristo: Simplify reverse key map why: The reverse key map `pAmk: (root,key) -> {vid,..}` as been simplified to `pAmk: key -> {vid,..}` as the state `root` domain argument is not used, anymore * Aristo: Remove `HashLabel` object type and replace it by `HashKey` why: The `HashLabel` object attaches a root hash to a hash key. This is nowhere used, anymore. * Fix copyright	2024-02-14 19:11:59 +00:00
Jordan Hrycaj	3b306a9689	Aristo: Update unit test suite (#2002 ) * Aristo: Update unit test suite * Aristo/Kvt: Fix iterators why: Generic iterators were not properly updated after backend change * Aristo: Add sub-trie deletion functionality why: For storage tries linked to an account payload vertex ID, a the whole storage trie needs to be deleted with the account. * Aristo: Reserve vertex ID numbers for static custom state roots why: Static custom state roots may be controlled by an application, e.g. for a receipt or a transaction root. The `Aristo` functions are agnostic of what the static state roots are when different from the internal tree vertex ID 1. details; The `merge()` function applied to a non-static state root (assumed to be a storage root) will check the payload of an accounts leaf and mark its Merkle keys to be re-checked. * Aristo: Correct error code symbol * Aristo: Update error code symbols * Aristo: Code cosmetics/comments * Aristo: Fix hashify schedule calculator why: Had a tendency to stop early leaving an incomplete job	2024-02-01 21:27:48 +00:00
Jordan Hrycaj	a1161b537b	Core db update storage root management for sub tries (#1964 ) * Aristo: Re-phrase `LayerDelta` and `LayerFinal` as object references why: Avoids copying in some cases * Fix copyright header * Aristo: Verify `leafTie.root` function argument for `merge()` proc why: Zero root will lead to inconsistent DB entry * Aristo: Update failure condition for hash labels compiler `hashify()` why: Node need not be rejected as long as links are on the schedule. In that case, `redo[]` is to become `wff.base[]` at a later stage. This amends an earlier fix, part of #1952 by also testing against the target nodes of the `wff.base[]` sets. * Aristo: Add storage root glue record to `hashify()` schedule why: An account leaf node might refer to a non-resolvable storage root ID. Storage root node chains will end up at the storage root. So the link `storage-root->account-leaf` needs an extra item in the schedule. * Aristo: fix error code returned by `fetchPayload()` details: Final error code is implied by the error code form the `hikeUp()` function. * CoreDb: Discard `createOk` argument in API `getRoot()` function why: Not needed for the legacy DB. For the `Arsto` DB, a lazy approach is implemented where a stprage root node is created on-the-fly. * CoreDb: Prevent `$$` logging in some cases why: Logging the function `$$` is not useful when it is used for internal use, i.e. retrieving an an error text for logging. * CoreDb: Add `tryHashFn()` to API for pretty printing why: Pretty printing must not change the hashification status for the `Aristo` DB. So there is an independent API wrapper for getting the node hash which never updated the hashes. * CoreDb: Discard `update` argument in API `hash()` function why: When calling the API function `hash()`, the latest state is always wanted. For a version that uses the current state as-is without checking, the function `tryHash()` was added to the backend. * CoreDb: Update opaque vertex ID objects for the `Aristo` backend why: For `Aristo`, vID objects encapsulate a numeric `VertexID` referencing a vertex (rather than a node hash as used on the legacy backend.) For storage sub-tries, there might be no initial vertex known when the descriptor is created. So opaque vertex ID objects are supported without a valid `VertexID` which will be initalised on-the-fly when the first item is merged. * CoreDb: Add pretty printer for opaque vertex ID objects * Cosmetics, printing profiling data * CoreDb: Fix segfault in `Aristo` backend when creating MPT descriptor why: Missing initialisation error * CoreDb: Allow MPT to inherit shared context on `Aristo` backend why: Creates descriptors with different storage roots for the same shared `Aristo` DB descriptor. * Cosmetics, update diagnostic message items for `Aristo` backend * Fix Copyright year	2024-01-11 19:11:38 +00:00
Jordan Hrycaj	43e5f428af	Aristo db kvt maintenance update (#1952 ) * Update KVT layers abstraction details: modelled after Aristo layers * Simplified KVT database iterators (removed item counters) why: Not needed for production functions * Simplify KVT merge function `layersCc()` * Simplified Aristo database iterators (removed item counters) why: Not needed for production functions * Update failure condition for hash labels compiler `hashify()` why: Node need not be rejected as long as links are on the schedule. In that case, `redo[]` is to become `wff.base[]` at a later stage. * Update merging layers and label update functions why: + Merging a stack of layers with `layersCc()` could be simplified + Merging layers will optimise the reverse `kMap[]` table maps `pAmk: label->{vid, ..}` by deleting empty mappings `label->{}` where they are redundant. + Updated `layersPutLabel()` for optimising `pAmk[]` tables	2023-12-20 16:19:00 +00:00
Jordan Hrycaj	ffa8ad2246	Core db use differential tx layers for aristo and kvt (#1949 ) * Fix kvt headers * Provide differential layers for KVT transaction stack why: Significant performance improvement * Provide abstraction layer for database top cache layer why: This will eventually implemented as a differential database layers or transaction layers. The latter is needed to improve performance. behavioural changes: Zero vertex and keys (i.e. delete requests) are not optimised out until the last layer is written to the database. * Provide differential layers for Aristo transaction stack why: Significant performance improvement	2023-12-19 12:39:23 +00:00

22 Commits