nimbus-eth1

Commit Graph

Author	SHA1	Message	Date
Jacek Sieka	66ad5497d9	Unroll nibble ops (#2894 ) A bit unexpectedly, nibble handling shows up in the profiler mainly because the current impl is tuned towards slicing while the most common operation is prefix comparison - since the code is simple, might has well get rid of some of the excess fat by always aliging the nibbles to the byte buffer.	2024-12-09 08:15:04 +01:00
Jacek Sieka	f034af422a	Pre-allocate vids for branches (#2882 ) Each branch node may have up to 16 sub-items - currently, these are given VertexID based when they are first needed leading to a mostly-random order of vertexid for each subitem. Here, we pre-allocate all 16 vertex ids such that when a branch subitem is filled, it already has a vertexid waiting for it. This brings several important benefits: * subitems are sorted and "close" in their id sequencing - this means that when rocksdb stores them, they are likely to end up in the same data block thus improving read efficiency * because the ids are consequtive, we can store just the starting id and a bitmap representing which subitems are in use - this reduces disk space usage for branches allowing more of them fit into a single disk read, further improving disk read and caching performance - disk usage at block 18M is down from 84 to 78gb! * the in-memory footprint of VertexRef reduced allowing more instances to fit into caches and less memory to be used overall. Because of the increased locality of reference, it turns out that we no longer need to iterate over the entire database to efficiently generate the hash key database because the normal computation is now faster - this significantly benefits "live" chain processing as well where each dirtied key must be accompanied by a read of all branch subitems next to it - most of the performance benefit in this branch comes from this locality-of-reference improvement. On a sample resync, there's already ~20% improvement with later blocks seeing increasing benefit (because the trie is deeper in later blocks leading to more benefit from branch read perf improvements) ``` blocks: 18729664, baseline: 190h43m49s, contender: 153h59m0s Time (total): -36h44m48s, -19.27% ``` Note: clients need to be resynced as the PR changes the on-disk format R.I.P. little bloom filter - your life in the repo was short but valuable	2024-12-04 11:42:04 +01:00
Jacek Sieka	01ca415721	Store keys together with node data (#2849 ) Currently, computed hash keys are stored in a separate column family with respect to the MPT data they're generated from - this has several disadvantages: * A lot of space is wasted because the lookup key (`RootedVertexID`) is repeated in both tables - this is 30% of the `AriKey` content! * rocksdb must maintain in-memory bloom filters and LRU caches for said keys, doubling its "minimal efficient cache size" * An extra disk traversal must be made to check for existence of cached hash key * Doubles the amount of files on disk due to each column family being its own set of files Here, the two CFs are joined such that both key and data is stored in `AriVtx`. This means: * we save ~30% disk space on repeated lookup keys * we save ~2gb of memory overhead that can be used to cache data instead of indices * we can skip storing hash keys for MPT leaf nodes - these are trivial to compute and waste a lot of space - previously they had to present in the `AriKey` CF to avoid having to look in two tables on the happy path. * There is a small increase in write amplification because when a hash value is updated for a branch node, we must write both key and branch data - previously we would write only the key * There's a small shift in CPU usage - instead of performing lookups in the database, hashes for leaf nodes are (re)-computed on the fly * We can return to slightly smaller on-disk SST files since there's fewer of them, which should reduce disk traffic a bit Internally, there are also other advantages: * when clearing keys, we no longer have to store a zero hash in memory - instead, we deduce staleness of the cached key from the presence of an updated VertexRef - this saves ~1gb of mem overhead during import * hash key cache becomes dedicated to branch keys since leaf keys are no longer stored in memory, reducing churn * key computation is a lot faster thanks to the skipped second disk traversal - a key computation for mainnet can be completed in 11 hours instead of ~2 days (!) thanks to better cache usage and less read amplification - with additional improvements to the on-disk format, we can probably get rid of the initial full traversal method of seeding the key cache on first start after import All in all, this PR reduces the size of a mainnet database from 160gb to 110gb and the peak memory footprint during import by ~1-2gb.	2024-11-20 09:56:27 +01:00
Jacek Sieka	58cde36656	Remove `RawData` from possible leaf payload types (#2794 ) This kind of data is not used except in tests where it is used only to create databases that don't match actual usage of aristo. Removing simplifies future optimizations that can focus on processing specific leaf types more efficiently. A casualty of this removal is some test code as well as some proof generation code that is unused - on the surface, it looks like it should be possible to port both of these to the more specific data types - doing so would ensure that a database written by one part of the codebase can interact with the other - as it stands, there is confusion on this point since using the proof generation code will result in a database of a shape that is incompatible with the rest of eth1.	2024-11-02 10:29:16 +01:00
Jordan Hrycaj	5b6ccddaa0	Db folder sources and related remove compiler warnings (#2673 ) * Aristo: Rename `Hash256` -> `Hash32` * CoreDb: Rename `Hash256` -> `Hash32` * Ledger: Rename `Hash256` -> `Hash32` * StorageTypes: Rename `Hash256` -> `Hash32` * Aristo: Rename `Blob` -> `seq[byte]`, `keccakHash` -> `keccak256` * Kvt: Rename `Blob` -> `seq[byte]` * CoreDb: Rename `Blob` -> `seq[byte]`, `keccakHash` -> `keccak256` * Ledger: Rename `Blob` -> `seq[byte]`, `keccakHash` -> `keccak256` * CoreDb: Rename `BlockHeader` -> `Header`, `BlockNonce` -> `Bytes8` * Misc: Rename `StorageKey` -> `Bytes32` * Tracer: `Hash256` -> `Hash32`, `BlockHeader` -> `Header`, etc. * Fix copyright header	2024-10-01 21:03:10 +00:00
Jacek Sieka	c210885b73	eth: bump to new types (#2660 ) This is a minimal set of changes to make things work with the new types in nim-eth - this is the minimal PR that merely resolves incompatibilities while the full change set would include more cleanup and migration.	2024-09-29 14:37:09 +02:00
Jacek Sieka	2fe8cc4551	leaf cache fixes (#2637 ) * Add missing leaf cache update when a leaf turns to a branch with two leaves (on merge) and vice versa (on delete) - this could lead to stale leaves being returned from the cache causing validation failures - it didn't happen because the leaf caches were not being used efficiently :) * Replace `seq` with `ArrayBuf` in `Hike` allowing it to become allocation-free - this PR also works around an inefficiency in nim in returning large types via a `var` parameter * Use the leaf cache instead of `getVtxRc` to fetch recent leaves - this makes the vertex cache more efficient at caching branches because fewer leaf requests pass through it.	2024-09-19 10:39:06 +02:00
Jacek Sieka	5c1e2e7d3b	Migrate `keyed_queue` to `minilru` (#2608 ) Compared to `keyed_queue`, `minilru` uses significantly less memory, in particular for the 32-byte hash keys where `kq` stores several copies of the key redundantly.	2024-09-13 15:47:50 +02:00
Jordan Hrycaj	ce713d95fc	Aristo lazily delete larger subtrees (#2560 ) * Extract sub-tree deletion functions into separate sub-modules * Move/rename `aristo_desc.accLruSize` => `aristo_constants.ACC_LRU_SIZE` * Lazily delete sub-trees why: This gives some control of the memory used to keep the deleted vertices in the cached layers. For larger sub-trees, keys and vertices might be on the persistent backend to a large extend. This would pull an amount of extra information from the backend into the cached layer. For lazy deleting it is enough to remember sub-trees by a small set of (at most 16) sub-roots to be processed when storing persistent data. Marking the tree root deleted immediately allows to let most of the code base work as before. * Comments and cosmetics * No need to import all for `Aristo` here * Kludge to make `chronicle` usage in sub-modules work with `fluffy` why: That `fluffy` would not run with any logging in `core_deb` is a problem I have known for a while. Up to now, logging was only used for debugging. With the current `Aristo` PR, there are cases where logging might be wanted but this works only if `chronicles` runs without the `json[dynamic]` sinks. So this should be re-visited. * More of a kludge	2024-08-14 08:54:44 +00:00
Jordan Hrycaj	38572bd8ea	Cache a storage root ID forever in the leaf payload of an account (#2551 ) details: Stale root IDs are marked disabled while the ID is kept in the leaf payload. why: This might lead to further caching advantages.	2024-08-07 13:28:01 +00:00
Jacek Sieka	9d91191154	storage hike cache (#2484 ) This PR adds a storage hike cache similar to the account hike cache already present - this cache is less efficient because account storage is already partically cached in the account ledger but nonetheless helps keep hiking down. Notably, there's an opportunity to optimise this cache and the others so that they cooperate better insteado of overlapping, which is left for a future PR. This PR also fixes an O(N) memory usage for storage slots where the delete would keep the full storage in a work list which on mainnet can grow very large - the work list is replaced with a more conventional recursive `O(log N)` approach.	2024-07-14 19:12:10 +02:00
Jacek Sieka	f3a56002ca	Turn payload into value type (#2483 ) The Vertex type unifies branches, extensions and leaves into a single memory area where the larges member is the branch (128 bytes + overhead) - the payloads we have are all smaller than 128 thus wrapping them in an extra layer of `ref` is wasteful from a memory usage perspective. Further, the ref:s must be visited during the M&S phase of garbage collection - since we keep millions of these, many of them short-lived, this takes up significant CPU time. ``` Function CPU Time: Total CPU Time: Self Module Function (Full) Source File Start Address system::markStackAndRegisters 10.0% 4.922s nimbus system::markStackAndRegisters(var<system::GcHeap>).constprop.0 gc.nim 0x701230` ```	2024-07-14 12:02:05 +02:00
Jordan Hrycaj	b924fdcaa7	Separate config for core db and ledger (#2479 ) * Updates and corrections * Extract `CoreDb` configuration from `base.nim` into separate module why: This makes it easier to avoid circular imports, in particular when the capture journal (aka tracer) is revived. * Extract `Ledger` configuration from `base.nim` into separate module why: This makes it easier to avoid circular imports (if any.) also: Move `accounts_ledger.nim` file to sub-folder `backend`. That way the layout resembles that of the `core_db`.	2024-07-12 13:12:25 +00:00
Jacek Sieka	01ab209497	cache account payload (#2478 ) Instead of caching just the storage id, we can cache the full payload which further reduces expensive hikes	2024-07-12 15:08:26 +02:00
Jacek Sieka	a6764670f0	merge: avoid hike allocations (#2472 ) hike allocations (and the garbage collection maintenance that follows) are responsible for some 10% of cpu time (not wall time!) at this point - this PR avoids them by stepping through the layers one step at a time, simplifying the code at the same time.	2024-07-11 13:26:46 +02:00
Jacek Sieka	7d78fd97d5	avoid allocations for slot storage (#2455 ) Introduce a new `StoData` payload type similar to `AccountData` * slightly more efficient storage format * typed api * fewer seqs * fix encoding docs - it wasn't rlp after all :)	2024-07-04 23:48:45 +00:00
Jacek Sieka	81e75622cf	storage: store root id together with vid, for better locality of refe… (#2449 ) The state and account MPT:s currenty share key space in the database based on that vertex id:s are assigned essentially randomly, which means that when two adjacent slot values from the same contract are accessed, they might reside at large distance from each other. Here, we prefix each vertex id by its root causing them to be sorted together thus bringing all data belonging to a particular contract closer together - the same effect also happens for the main state MPT whose nodes now end up clustered together more tightly. In the future, the prefix given to the storage keys can also be used to perform range operations such as reading all the storage at once and/or deleting an account with a batch operation. Notably, parts of the API already supported this rooting concept while parts didn't - this PR makes the API consistent by always working with a root+vid.	2024-07-04 15:46:52 +02:00
Jacek Sieka	b23795ab39	remove pPrf, fRpp (#2445 ) No longer used now that hashify is gone	2024-07-03 22:21:57 +02:00
Jacek Sieka	443c6d1f8e	Cache account path storage id (#2443 ) The storage id is frequently accessed when executing contract code and finding the path via the database requires several hops making the process slow - here, we add a cache to keep the most recently used account storage id:s in memory. A possible future improvement would be to cache all account accesses so that for example updating the balance doesn't cause several hikes.	2024-07-03 17:58:25 +02:00
Jacek Sieka	1f60e8e453	Use `Hash256` directly for account path (#2439 ) Account paths are always a hash - passing it around as such helps avoid confusion as to how long it is	2024-07-03 10:14:26 +02:00
Jordan Hrycaj	8dd038144b	Some cleanups (#2428 ) * Remove `dirty` set from structural objects why: Not used anymore, the tree is dirty by default. * Rename `aristo_hashify` -> `aristo_compute` * Remove cruft, update comments, cosmetics, etc. * Simplify `SavedState` object why: The key chaining have become obsolete after extra lazy hashing. There is some available space for a state hash to be maintained in future. details: Accept the legacy `SavedState` object serialisation format for a while (which will be overwritten by new format.)	2024-06-28 18:43:04 +00:00
Jordan Hrycaj	6dc2773957	Only use pre hashed addresses as account keys (#2424 ) * Normalised storage tree addressing in function prototypes detail: Argument list is always `<db> <account-path> <slot-path> ..` with both path arguments as `openArray[]` * Remove cruft * CoreDb internally Use full account paths rather than addresses * Update API logging * Use hashed account address only in prototypes why: This avoids unnecessary repeated hashing of the same account address. The burden of doing that is upon the application. In the case here, the ledger caches all kinds of stuff anyway so it is common sense to exploit that for account address hashes. caveat: Using `openArray[byte]` argument types for hashed accounts is inherently fragile. In non-release mode, a length verification `doAssert` is enabled by default. * No accPath in data record (use `AristoAccount` as `CoreDbAccount`) * Remove now unused `eAddr` field from ledger `AccountRef` type why: Is duplicate of lookup key * Avoid merging the account record/statement in the ledger twice.	2024-06-27 19:21:01 +00:00
Jordan Hrycaj	61bbf40014	Update storage tree admin (#2419 ) * Tighten `CoreDb` API for accounts why: Apart from cruft, the way to fetch the accounts state root via a `CoreDbColRef` record was unnecessarily complicated. * Extend `CoreDb` API for accounts to cover storage tries why: In future, this will make the notion of column objects obsolete. Storage trees will then be indexed by the account address rather than the vertex ID equivalent like a `CoreDbColRef`. * Apply new/extended accounts API to ledger and tests details: This makes the `distinct_ledger` module obsolete * Remove column object constructors why: They were needed as an abstraction of MPT sub-trees including storage trees. Now, storage trees are handled by the account (e.g. via address) they belong to and all other trees can be identified by a constant well known vertex ID. So there is no need for column objects anymore. Still there are some left-over column object methods wnich will be removed next. * Remove `serialise()` and `PayloadRef` from default Aristo API why: Not needed. `PayloadRef` was used for unstructured/unknown payload formats (account or blob) and `serialise()` was used for decodng `PayloadRef`. Now it is known in advance what the payload looks like. * Added query function `hasStorageData()` whether a storage area exists why: Useful for supporting `slotStateEmpty()` of the `CoreDb` API * In the `Ledger` replace `storage.stateEmpty()` by `slotStateEmpty()` * On Aristo, hide the storage root/vertex ID in the `PayloadRef` why: The storage vertex ID is fully controlled by Aristo while the `AristoAccount` object is controlled by the application. With the storage root part of the `AristoAccount` object, there was a useless administrative burden to keep that storage root field up to date. * Remove cruft, update comments etc. * Update changed MPT access paradigms why: Fixes verified proxy tests * Fluffy cosmetics	2024-06-27 09:01:26 +00:00
Jordan Hrycaj	e7be0d185c	Aristo uses pre classified tree types cont2 (#2397 ) * Provide dedicated functions for fetching accounts and storage trees why: Different prototypes for each class `account`, `generic` and `storage`. * Remove `fetchPayload()` and other cruft from API, `aristo_fetch`, etc. * Fix typos, debugging left overs, comments	2024-06-19 12:40:00 +00:00
Jordan Hrycaj	8727307ef4	Aristo uses pre classified tree types cont1 (#2389 ) * Provide dedicated functions for deleteing accounts and storage trees why: Storage trees are always linked to an account, so there is no need for an application to fiddle about (e.g. re-cycling, unlinking) storage tree vertex IDs. * Remove `delete()` and other cruft from API, `aristo_delete`, etc. * clean up delete functions details: The delete implementations `deleteImpl()` and `delTreeImpl()` do not need to be super generic anymore as all the edge cases are covered by the specialised `deleteAccountPayload()`, `deleteGenericData()`, etc. * Avoid unnecessary re-calculations of account keys why: The function `registerAccountForUpdate()` did extract the storage ID (if any) and automatically marked the Merkle keys along the account path for re-hashing. This would also apply if there was later detected that the account or the storage tree did not need to be updated. So the `registerAccountForUpdate()` function was split into a part which retrieved the storage ID, and another one which marked the Merkle keys for re-calculation to be applied only when needed.	2024-06-18 19:30:01 +00:00
Jordan Hrycaj	51f02090b8	Aristo uses pre classified tree types (#2385 ) * Remove unused `merge()` functions (for production) details: Some functionality moved to test suite Make sure that only `AccountData` leaf type is exactly used on VertexID(1) * clean up payload type * Provide dedicated functions for merging accounts and storage trees why: Storage trees are always linked to an account, so there is no need for an application to fiddle about (e.e. creating, re-cycling) with storage tree vertex IDs. * CoreDb: Disable tracer functionality why: Must be updated to accommodate new/changed `Aristo` functions. * CoreDb: Use new `mergeXXX()` functions why: Makes explicit vertex ID management obsolete for creating new storage trees. * Remove `mergePayload()` and other cruft from API, `aristo_merge`, etc. * clean up merge functions details: The merge implementation `mergePayloadImpl()` does not need to be super generic anymore as all the edge cases are covered by the specialised functions `mergeAccountPayload()`, `mergeGenericData()`, and `mergeStorageData()`. * No tracer available at the moment, so disable offending tests	2024-06-18 11:14:02 +00:00
Jordan Hrycaj	392088e5e9	Coredb fix storage tree issues (#2317 ) * Code cosmetics * Re-org `aristo_merge`, internally split into sub-modules why: Became a burden for maintenance because it hosts two different functionalities under the same merge paradigm: account/data merge and snap proof merge where the latter produces a partial trie. * Fix CoreDb tracer * Ledger: fix potential account vs. storage tree sync problems * Remove bound on the size of removable whole storage trees * Activate `test_tracer_json`	2024-06-07 10:56:31 +00:00
Jordan Hrycaj	0f430c70fd	Aristo avoid storage trie update race conditions (#2251 ) * Update TDD suite logger output format choices why: New format is not practical for TDD as it just dumps data across a wide range (considerably larder than 80 columns.) So the new format can be turned on by function argument. * Update unit tests samples configuration why: Slightly changed the way to find the `era1` directory * Remove compiler warnings (fix deprecated expressions and phrases) * Update `Aristo` debugging tools * Always update the `storageID` field of account leaf vertices why: Storage tries are weekly linked to an account leaf object in that the `storageID` field is updated by the application. Previously, `Aristo` verified that leaf objects make sense when passed to the database. As a consequence * the database was inconsistent for a short while * the burden for correctness was all on the application which led to delayed error handling which is hard to debug. So `Aristo` will internally update the account leaf objects so that there are no race conditions due to the storage trie handling * Aristo: Let `stow()`/`persist()` bail out unless there is a `VertexID(1)` why: The journal and filter logic depends on the hash of the `VertexID(1)` which is commonly known as the state root. This implies that all changes to the database are somehow related to that. * Make sure that a `Ledger` account does not overwrite the storage trie reference why: Due to the abstraction of a sub-trie (now referred to as column with a hash describing its state) there was a weakness in the `Aristo` handler where an account leaf could be overwritten though changing the validity of the database. This has been changed and the database will now reject such changes. This patch fixes the behaviour on the application layer. In particular, the column handle returned by the `CoreDb` needs to be updated by the `Aristo` database state. This mitigates the problem that a storage trie might have vanished or re-apperaed with a different vertex ID. * Fix sub-trie deletion test why: Was originally hinged on `VertexID(1)` which cannot be wholesale deleted anymore after the last Aristo update. Also, running with `VertexID(2)` needs an artificial `VertexID(1)` for making `stow()` or `persist()` work. * Cosmetics * Activate `test_generalstate_json` * Temporarily `deactivate test_tracer_json` * Fix copyright header --------- Co-authored-by: jordan <jordan@dry.pudding> Co-authored-by: Jacek Sieka <jacek@status.im>	2024-05-30 17:48:38 +00:00
Jordan Hrycaj	143f2e99f5	Core db+aristo fixes and tx handling updates (#2164 ) * Aristo: Rename journal related sources and functions why: Previously, the naming was hinged on the phrases `fifo`, `filter` etc. which reflect the inner workings of cascaded filters. This was unfortunate for reading/understanding the source code for actions where the focus is the journal as a whole. * Aristo: Fix buffer overflow (path length truncating error) * Aristo: Tighten `hikeUp()` stop check, update error code why: Detect dangling vertex links. These are legit with `snap` sync processing but not with regular processing. * Aristo: Raise assert in regular mode `merge()` at a dangling link/edge why: With `snap` sync processing, partial trees are ok and can be amended. Not so in regular mode. Previously there was only a debug message when a non-legit dangling edge was encountered. * Aristo: Make sure that vertices are copied before modification why: Otherwise vertices from lower layers might also be modified * Aristo: Fix relaxed mode for validity checker `check()` * Remove cruft * Aristo: Update API for transaction handling details: + Split `aristo_tx.nim` into sub-modules + Split `forkWith()` into `findTx()` + `forkTx()` + Removed `forkTop()`, `forkBase()` (now superseded by new `forkTx()`) * CoreDb+Aristo: Fix initialiser (missing methods)	2024-05-03 17:38:17 +00:00
Jordan Hrycaj	1502014e36	Core db+aristo re org tracer (#2123 ) * Kvt: Update API hooks * Aristo: Generalised merging snap proofs, now for multiple state roots why: This accommodates pre-loading partial tries for unit tests * Aristo: Update some unit tests * CoreDb+Aristo: Re-factor tracer why: Was bonkers anyway. The main change is that the trace journal is now kept in a way similar to a transaction layer so that it can predictably interact with DB transactions. * Ledger: Debugging helper * Update tracer unit test applicable for `Aristo` * Fix copyright year * Disable `dump()` function as compile time default why: This needs to pull in the `rocks_db` library at compile time.	2024-04-03 15:48:35 +00:00
Jordan Hrycaj	8ed40c78e0	Core db+aristo provides tracer funtionality (#2089 ) * Aristo: Provide descriptor fork based on search in transaction stack details: Try to find the tx that has a particular pair `(vertex-id,hash-key)`, and by extension try filter and backend if the former fails. * Cleanup & docu * CoreDb+Aristo: Implement context re-position to earlier in-memory state why: It is a easy way to explore how there can be concurrent access to the same backend storage DB with different view states. This one can access an earlier state from the transaction stack. * CoreDb+Aristo: Populate tracer stubs with real functionality * Update `tracer.nim` to new API why: Legacy API does not sufficiently support `Aristo` * Fix logging problems in tracer details: Debug logging turned off by default * Fix function prototypes * Add Copyright header * Add tables import why: For older compiler versions on CI	2024-03-21 10:45:57 +00:00
andri lim	c41206be39	Fix styles and reduce compiler warnings (#2086 ) * Fix styles and reduce compiler warnings * Fix copyright year	2024-03-20 14:35:38 +07:00
Jordan Hrycaj	587ca3abbe	Coredb use stackable api for aristo backend (#2060 ) * Aristo/Kvt: Provide function hooks APIs why: These APIs can be used for installing tracers, profiling functoinality, and other niceties on the databases. * Aristo: Provide optional API profiling details: It basically is a re-implementation of the `CoreDb` profiling implementation * Kvt: Provide optional API profiling similar to `Aristo` * CoreDb: Re-implementing profiling using `aristo_profile` * Ledger: Re-implementing profiling using `aristo_profile` * CoreDb: Update unit tests for maintainability * update copyright dates	2024-02-29 21:10:24 +00:00
Jordan Hrycaj	8e18e85288	Aristodb remove obsolete and time consuming admin features (#2048 ) * Aristo: Reorg `hashify()` using different schedule algorithm why: Directly calculating the search tree top down from the roots turns out to be faster than using the cached structures left over by `merge()` and `delete()`. Time gains is short of 20% * Aristo: Remove `lTab[]` leaf entry object type why: Not used anymore. It was previously needed to build the schedule for `hashify()`. * Aristo: Avoid unnecessary re-org of the vertex ID recycling list why: This list can become quite large so a heuristic is employed whether it makes sense to re-org. Also, re-org check is only done by `delete()` functions. * Aristo: Remove key/reverse lookup table from tx layers why: It is ignored except for handling proof nodes and costs unnecessary run time resources. This feature was originally needed to accommodate the mental transition from the legacy MPT to the `Aristo` trie :). * Fix copyright year	2024-02-22 08:24:58 +00:00
Jordan Hrycaj	1b4a43c140	Aristo db remove over engineered object type (#2027 ) * CoreDb: update test suite * Aristo: Simplify reverse key map why: The reverse key map `pAmk: (root,key) -> {vid,..}` as been simplified to `pAmk: key -> {vid,..}` as the state `root` domain argument is not used, anymore * Aristo: Remove `HashLabel` object type and replace it by `HashKey` why: The `HashLabel` object attaches a root hash to a hash key. This is nowhere used, anymore. * Fix copyright	2024-02-14 19:11:59 +00:00
Jordan Hrycaj	2c35390bdf	Core db and aristo maintenance update (#2014 ) * Aristo: Update error return code why: Failing of `Aristo` function `delete()` might fail because there is no such data item on the db. This must return a single error code as is done with `fetch()`. * Ledger: Better error handling why: The `expect()` clauses have been replaced by raising asserts indicating the error from the database backend. Also, `delete()` failures are legitimate if the item to delete does not exist. * Aristo: Delete function must always leave a label on DB for `hashify()` why: The `hashify()` uses the labels left bu `merge()` and `delete()` to compile (and optimise) a scheduler for subsequent hashing. Originally, the labels were not used for deleted entries and `delete()` still had some edge case where the deletion label was not properly handled. * Aristo: Update `hashify()` scheduler, remove buggy optimisation why: Was left over from version without virtual state roots which did not know about account payload leaf vertices referring to storage roots. * Aristo: Label storage trie account in `delete()` similar to `merge()` details; The `delete()` function applied to a non-static state root (assumed to be a storage root) will check the payload of an accounts leaf and mark its Merkle keys to be re-checked when runninh `hashify()` * Aristo: Clean up and re-org recycled vertex IDs in `hashify()` why: Re-organising the recycled vertex IDs list intends to reduce the size of the list. This list is organised as a LIFO (or stack.) By reorganising it in a way so that the least vertex ID numbers are on top, the list will be kept smaller as observed on some examples (less than 30%.) * CoreDb: Accept storage trie deletion requests in non-initialised state why: Due to lazy initialisation, the root vertex ID might not yet exist. So the `Aristo` database handlers would reject this call with an error and this condition needs to be handled by the API (which realises the lazy feature.) * Cosmetics & code massage, prettify logging * fix missing import	2024-02-08 16:32:16 +00:00
Jordan Hrycaj	3b306a9689	Aristo: Update unit test suite (#2002 ) * Aristo: Update unit test suite * Aristo/Kvt: Fix iterators why: Generic iterators were not properly updated after backend change * Aristo: Add sub-trie deletion functionality why: For storage tries linked to an account payload vertex ID, a the whole storage trie needs to be deleted with the account. * Aristo: Reserve vertex ID numbers for static custom state roots why: Static custom state roots may be controlled by an application, e.g. for a receipt or a transaction root. The `Aristo` functions are agnostic of what the static state roots are when different from the internal tree vertex ID 1. details; The `merge()` function applied to a non-static state root (assumed to be a storage root) will check the payload of an accounts leaf and mark its Merkle keys to be re-checked. * Aristo: Correct error code symbol * Aristo: Update error code symbols * Aristo: Code cosmetics/comments * Aristo: Fix hashify schedule calculator why: Had a tendency to stop early leaving an incomplete job	2024-02-01 21:27:48 +00:00
Jordan Hrycaj	a1161b537b	Core db update storage root management for sub tries (#1964 ) * Aristo: Re-phrase `LayerDelta` and `LayerFinal` as object references why: Avoids copying in some cases * Fix copyright header * Aristo: Verify `leafTie.root` function argument for `merge()` proc why: Zero root will lead to inconsistent DB entry * Aristo: Update failure condition for hash labels compiler `hashify()` why: Node need not be rejected as long as links are on the schedule. In that case, `redo[]` is to become `wff.base[]` at a later stage. This amends an earlier fix, part of #1952 by also testing against the target nodes of the `wff.base[]` sets. * Aristo: Add storage root glue record to `hashify()` schedule why: An account leaf node might refer to a non-resolvable storage root ID. Storage root node chains will end up at the storage root. So the link `storage-root->account-leaf` needs an extra item in the schedule. * Aristo: fix error code returned by `fetchPayload()` details: Final error code is implied by the error code form the `hikeUp()` function. * CoreDb: Discard `createOk` argument in API `getRoot()` function why: Not needed for the legacy DB. For the `Arsto` DB, a lazy approach is implemented where a stprage root node is created on-the-fly. * CoreDb: Prevent `$$` logging in some cases why: Logging the function `$$` is not useful when it is used for internal use, i.e. retrieving an an error text for logging. * CoreDb: Add `tryHashFn()` to API for pretty printing why: Pretty printing must not change the hashification status for the `Aristo` DB. So there is an independent API wrapper for getting the node hash which never updated the hashes. * CoreDb: Discard `update` argument in API `hash()` function why: When calling the API function `hash()`, the latest state is always wanted. For a version that uses the current state as-is without checking, the function `tryHash()` was added to the backend. * CoreDb: Update opaque vertex ID objects for the `Aristo` backend why: For `Aristo`, vID objects encapsulate a numeric `VertexID` referencing a vertex (rather than a node hash as used on the legacy backend.) For storage sub-tries, there might be no initial vertex known when the descriptor is created. So opaque vertex ID objects are supported without a valid `VertexID` which will be initalised on-the-fly when the first item is merged. * CoreDb: Add pretty printer for opaque vertex ID objects * Cosmetics, printing profiling data * CoreDb: Fix segfault in `Aristo` backend when creating MPT descriptor why: Missing initialisation error * CoreDb: Allow MPT to inherit shared context on `Aristo` backend why: Creates descriptors with different storage roots for the same shared `Aristo` DB descriptor. * Cosmetics, update diagnostic message items for `Aristo` backend * Fix Copyright year	2024-01-11 19:11:38 +00:00
Jordan Hrycaj	ffa8ad2246	Core db use differential tx layers for aristo and kvt (#1949 ) * Fix kvt headers * Provide differential layers for KVT transaction stack why: Significant performance improvement * Provide abstraction layer for database top cache layer why: This will eventually implemented as a differential database layers or transaction layers. The latter is needed to improve performance. behavioural changes: Zero vertex and keys (i.e. delete requests) are not optimised out until the last layer is written to the database. * Provide differential layers for Aristo transaction stack why: Significant performance improvement	2023-12-19 12:39:23 +00:00
Jordan Hrycaj	13f51939f6	Core db aristo hasher profiling and timing improvement (#1938 ) * Explicitly use shared `Kvt` table on `Ledger` and `Clique` lookup. why: Speeds up lookup time with `Aristo` backend. For writing `Clique` data, the `Companion` model allows to write `Clique` data past the database locked by evm transactions. * Implement `CoreDb` profiling with API tracking why: Chasing time spent per APT procs ... * Implement `Ledger` profiling with API tracking why: Chasing time spent per APT procs ... * Always hashify when commiting or storing why: A dirty cache makes no sense when committing * Make sure that a zero key is created when adding/updating vertices why: This is an error fix mainly for edge cases. A typical error was that the root key got deleted when there were only a few vertices left on the DB. * Need all created and changed vertices zero-keyed on the cache why: A zero key (i.e. empty Merkle hash) indicates that a vertex key needs to be updated. This would not be needed immediately after a merge as there is an actual leaf path on the cache layer. But after subsequent merge and delete operations this information might get blurred. * Re-org hashing algorithm why: Apart from errors, the previous implementation was too slow for two reasons: + some control hashes were calculated for debugging (now all verification is done in `aristo_check` module) + the leaf paths stored on the cache are used to build the labelling (aka hashing) schedule; there paths were accumulated over successive hash sessions although it is clear that all keys were generated, already	2023-12-12 17:47:41 +00:00
Jordan Hrycaj	657379f484	Aristo db update merkle hasher (#1925 ) * Register paths for added leafs because of trie re-balancing why: While the payload would not change, the prefix in the leaf vertex would. So it needs to be flagged for hash recompilation for the `hashify()` module. also: Make sure that `Hike` paths which might have vertex links into the backend filter are replaced by vertex copies before manipulating. Otherwise the vertices on the immutable filter might be involuntarily changed. * Also check for paths where the leaf vertex is on the backend, already why: A a path can have dome vertices on the top layer cache with the `Leaf` vertex on the backend. * Re-define a void `HashLabel` type. why: A `HashLabel` type is a pair `(root-vertex-ID, Keccak-hash)`. Previously, a valid `HashLabel` consisted of a non-empty hash and a non-zero vertex ID. This definition leads to a non-unique representation of a void `HashLabel` with either root-ID or has void. This has been changed to the unique void `HashLabel` exactly if the hash entry is void. * Update consistency checkers * Re-org `hashify()` procedure why: Syncing against block chain showed serious deficiencies which produced wrong hashes or simply bailed out with error. So all fringe cases (mainly due to deleted entries) could be integrated into the labelling schedule rather than handling separate fringe cases.	2023-12-04 20:39:26 +00:00
Jordan Hrycaj	5462c05dc6	Core db update api tracking (#1907 ) * Fix copyright year * Show elapsed times with enabled `CoreDb` API tracking * Show elapsed times with enabled `LedgerRef` API tracking * Reorg `CoreDb` auto destructors for `Aristo` DB why: While `Aristo` supports some parallelism for concurrent database access, this comes with a price of management overhead. With a naive approach, the auto-destructor will slow down execution because the ledger and evm treat the database in a shared mode where a DB descriptor is just created and thrown away shortly after. This is reflected in the `Coredb` abstraction layer above `Aristo`/`Kvt` where a few `Shared` type descriptors are cached and a shared reference is returned rather than a disposable new object. * For `CoreDb` support transaction level tracking details: This is mainly an extra for the legacy DB as `Aristo` and `Kvt` support this already. Also return an error on the legacy DB backend when `persistent()` is called while there are transactions pending (the `persistent()` call does nothing otherwise on the legacy backend.) * Clear compiler warnings (remove unused variables etc.)	2023-11-24 22:16:21 +00:00
Jordan Hrycaj	c47f021596	Core db and aristo updates for destructor and tx logic (#1894 ) * Disable `TransactionID` related functions from `state_db.nim` why: Functions `getCommittedStorage()` and `updateOriginalRoot()` from the `state_db` module are nowhere used. The emulation of a legacy `TransactionID` type functionality is administratively expensive to provide by `Aristo` (the legacy DB version is only partially implemented, anyway). As there is no other place where `TransactionID`s are used, they will not be provided by the `Aristo` variant of the `CoreDb`. For the legacy DB API, nothing will change. * Fix copyright headers in source code * Get rid of compiler warning * Update Aristo code, remove unused `merge()` variant, export `hashify()` why: Adapt to upcoming `CoreDb` wrapper * Remove synced tx feature from `Aristo` why: + This feature allowed to synchronise transaction methods like begin, commit, and rollback for a group of descriptors. + The feature is over engineered and not needed for `CoreDb`, neither is it complete (some convergence features missing.) * Add debugging helpers to `Kvt` also: Update database iterator, add count variable yield argument similar to `Aristo`. * Provide optional destructors for `CoreDb` API why; For the upcoming Aristo wrapper, this allows to control when certain smart destruction and update can take place. The auto destructor works fine in general when the storage/cache strategy is known and acceptable when creating descriptors. * Add update option for `CoreDb` API function `hash()` why; The hash function is typically used to get the state root of the MPT. Due to lazy hashing, this might be not available on the `Aristo` DB. So the `update` function asks for re-hashing the gurrent state changes if needed. * Update API tracking log mode: `info` => `debug * Use shared `Kvt` descriptor in new Ledger API why: No need to create a new descriptor all the time	2023-11-16 19:35:03 +00:00
Jordan Hrycaj	6e0397e276	Aristo and ledger small updates (#1888 ) * Fix debug noise in `hashify()` for perfectly normal situation why: Was previously considered a fixable error * Fix test sample file names why: The larger test file `goerli68161.txt.gz` is already in the local archive. So there is no need to use the smaller one from the external repo. * Activate `accounts_cache` module from `db/ledger` why: A copy of the original `accounts_cache.nim` source to be integrated into the `Ledger` module wrapper which allows to switch between different `accounts_cache` implementations unser tha same API. details: At a later state, the `db/accounts_cache.nim` wrapper will be removed so that there is only one access to that module via `db/ledger/accounts_cache.nim`. * Fix copyright headers in source code	2023-11-08 16:52:25 +00:00
Jordan Hrycaj	4feaa2cfab	Aristo db update for short nodes key edge cases (#1887 ) * Aristo: Provide key-value list signature calculator detail: Simple wrappers around `Aristo` core functionality * Update new API for `CoreDb` details: + Renamed new API functions `contains()` => `hasKey()` or `hasPath()` which disables the `in` operator on non-boolean `contains()` functions + The functions `get()` and `fetch()` always return a not-found error if there is no item, available. The new functions `getOrEmpty()` and `mergeOrEmpty()` return an an empty `Blob` if there is no such key found. * Rewrite `core_apps.nim` using new API from `CoreDb` * Use `Aristo` functionality for calculating Merkle signatures details: For debugging, the `VerifyAristoForMerkleRootCalc` can be set so that `Aristo` results will be verified against the legacy versions. * Provide general interface for Merkle signing key-value tables details: Export `Aristo` wrappers * Activate `CoreDb` tests why: Now, API seems to be stable enough for general tests. * Update `toHex()` usage why: Byteutils' `toHex()` is superior to `toSeq.mapIt(it.toHex(2)).join` * Split `aristo_transcode` => `aristo_serialise` + `aristo_blobify` why: + Different modules for different purposes + `aristo_serialise`: RLP encoding/decoding + `aristo_blobify`: Aristo database encoding/decoding * Compacted representation of small nodes' links instead of Keccak hashes why: Ethereum MPTs use Keccak hashes as node links if the size of an RLP encoded node is at least 32 bytes. Otherwise, the RLP encoded node value is used as a pseudo node link (rather than a hash.) Such a node is nor stored on key-value database. Rather the RLP encoded node value is stored instead of a lode link in a parent node instead. Only for the root hash, the top level node is always referred to by the hash. This feature needed an abstraction of the `HashKey` object which is now either a hash or a blob of length at most 31 bytes. This leaves two ways of representing an empty/void `HashKey` type, either as an empty blob of zero length, or the hash of an empty blob. * Update `CoreDb` interface (mainly reducing logger noise) * Fix copyright years (to make `Lint` happy)	2023-11-08 12:18:32 +00:00
Jordan Hrycaj	3fe0a49a5e	Aristo db allow shorter than 64 nibbles path keys (#1864 ) * Aristo: Single `FetchPathNotFound` error in `fetchXxx()` and `hasPath()` why: Missing path hike returns too many detailed reasons why it failed which becomes cumbersome to handle. also: Renamed `contains()` => `hasPath()` which disables the `in` operator on non-boolean `contains()` functions * Kvt: Renamed `contains()` => `hasKey()` why: which disables the `in` operator on non-boolean `contains()` functions * Aristo: Generalising `HashID` by variable length `PathID` why: There are cases when the `Aristo` database is to be used with shorter than 64 nibbles keys when handling transactions indexes with sequence IDs. caveat: This patch only works reliable for full length `PathID` values. Tests for shorter `PathID` values are currently missing.	2023-10-27 22:36:51 +01:00
Jordan Hrycaj	395580ff9d	Aristo and core db updates (#1800 ) * Aristo: remove obsolete functions * Aristo: Fix error code for non-available hash keys why: Must not return `not-found` when the key is not available (i.e. the current changes were not hashified, yet.) * CoreDB: Provide TDD and test framework	2023-10-03 12:56:13 +01:00
Jordan Hrycaj	6bc55d4e6f	Core db aristo and kvt updates preparing for integration (#1760 ) * Kvt: Implemented multi-descriptor access on the same backend why: This behaviour mirrors the one of Aristo and can be used for simultaneous transactions on Aristo + Kvt * Kvt: Update database iterators why: Forgot to run on the top layer first * Kvt: Misc fixes * Aristo, use `openArray[byte]` rather than `Blob` in prototype * Aristo, by default hashify right after cloning descriptor why: Typically, a completed descriptor is expected after cloning. Hashing can be suppressed by argument flag. * Aristo provides `replicate()` iterator, similar to legacy `replicate()` * Aristo API fixes and updates * CoreDB: Rename `legacy_persistent` => `legacy_rocksdb` why: More systematic, will be in line with Aristo DB which might have more than one persistent backends * CoreDB: Prettify API sources why: Better to read and maintain details: Annotating with custom pragmas which cleans up the prototypes * CoreDB: Update MPT/put() prototype allowing `CatchableError` why: Will be needed for Aristo API (legacy is OK with `RlpError`)	2023-09-18 21:20:28 +01:00
Jordan Hrycaj	cd1d370543	Aristo db api extensions for use as core db backend (#1754 ) * Update docu * Update Aristo/Kvt constructor prototype why: Previous version used an `enum` value to indicate what backend is to be used. This was replaced by using the backend object type. * Rewrite `hikeUp()` return code into `Result[Hike,(Hike,AristoError)]` why: Better code maintenance. Previously, the `Hike` object was returned. It had an internal error field so partial success was also available on a failure. This error field has been removed. * Use `openArray[byte]` rather than `Blob` in functions prototypes * Provide synchronised multi instance transactions why: The `CoreDB` object was geared towards the legacy DB which used a single transaction for the key-value backend DB. Different state roots are provided by the backend database, so all instances work directly on the same backend. Aristo db instances have different in-memory mappings (aka different state roots) and the transactions are on top of there mappings. So each instance might run different transactions. Multi instance transactions are a compromise to converge towards the legacy behaviour. The synchronised transactions span over all instances available at the time when base transaction was opened. Instances created later are unaffected. * Provide key-value pair database iterator why: Needed in `CoreDB` for `replicate()` emulation also: Some update of internal code * Extend API (i.e. prototype variants) why: Needed for `CoreDB` geared towards the legacy backend which has a more basic API than Aristo.	2023-09-15 16:23:53 +01:00
Jordan Hrycaj	8e00143313	Aristo db code massage n cosmetics (#1745 ) * Rewrite remaining `AristoError` return code into `Result[void,AristoError]` why: Better code maintenance * Update import sections * Update Aristo DB paths why: More systematic so directory can be shared with other DB types * More cosmetcs * Update unit tests runners why: Proper handling of persistent and mem-only DB. The latter can be consistently triggered by an empty DB path.	2023-09-12 19:45:12 +01:00

1 2

65 Commits