nimbus-eth1

Commit Graph

Author	SHA1	Message	Date
Jacek Sieka	df4a21c910	Store cached hash at the layer corresponding to the source data (#2492 ) When lazily verifying state roots, we may end up with an entire state without roots that gets computed for the whole database - in the current design, that would result in hashes for the entire trie being held in memory. Since the hash depends only on the data in the vertex, we can store it directly at the top-most level derived from the verticies it depends on - be that memory or database - this makes the memory usage broadly linear with respect to the already-existing in-memory change set stored in the layers. It also ensures that if we have multiple forks in memory, hashes get cached in the correct layer maximising reuse between forks. The same layer numbering scheme as elsewhere is reused, where -2 is the backend, -1 is the balancer, then 0+ is the top of the stack and stack. A downside of this approach is that we create many small batches - a future improvement could be to collect all such writes in a single batch, though the memory profile of this approach should be examined first (where is the batch kept, exactly?).	2024-07-18 09:13:56 +02:00
Jordan Hrycaj	a84a2131cd	No ext update (#2494 ) * Imported/rebase from `no-ext`, PR #2485 Store extension nodes together with the branch Extension nodes must be followed by a branch - as such, it makes sense to store the two together both in the database and in memory: * fewer reads, writes and updates to traverse the tree * simpler logic for maintaining the node structure * less space used, both memory and storage, because there are fewer nodes overall There is also a downside: hashes can no longer be cached for an extension - instead, only the extension+branch hash can be cached - this seems like a fine tradeoff since computing it should be fast. TODO: fix commented code * Fix merge functions and `toNode()` * Update `merkleSignCommit()` prototype why: Result is always a 32bit hash * Update short Merkle hash key generation details: Ethereum reference MPTs use Keccak hashes as node links if the size of an RLP encoded node is at least 32 bytes. Otherwise, the RLP encoded node value is used as a pseudo node link (rather than a hash.) This is specified in the yellow paper, appendix D. Different to the `Aristo` implementation, the reference MPT would not store such a node on the key-value database. Rather the RLP encoded node value is stored instead of a node link in a parent node is stored as a node link on the parent database. Only for the root hash, the top level node is always referred to by the hash. * Fix/update `Extension` sections why: Were commented out after removal of a dedicated `Extension` type which left the system disfunctional. * Clean up unused error codes * Update unit tests * Update docu --------- Co-authored-by: Jacek Sieka <jacek@status.im>	2024-07-16 19:47:59 +00:00
Jacek Sieka	f3a56002ca	Turn payload into value type (#2483 ) The Vertex type unifies branches, extensions and leaves into a single memory area where the larges member is the branch (128 bytes + overhead) - the payloads we have are all smaller than 128 thus wrapping them in an extra layer of `ref` is wasteful from a memory usage perspective. Further, the ref:s must be visited during the M&S phase of garbage collection - since we keep millions of these, many of them short-lived, this takes up significant CPU time. ``` Function CPU Time: Total CPU Time: Self Module Function (Full) Source File Start Address system::markStackAndRegisters 10.0% 4.922s nimbus system::markStackAndRegisters(var<system::GcHeap>).constprop.0 gc.nim 0x701230` ```	2024-07-14 12:02:05 +02:00
Jacek Sieka	a6764670f0	merge: avoid hike allocations (#2472 ) hike allocations (and the garbage collection maintenance that follows) are responsible for some 10% of cpu time (not wall time!) at this point - this PR avoids them by stepping through the layers one step at a time, simplifying the code at the same time.	2024-07-11 13:26:46 +02:00
Jacek Sieka	81e75622cf	storage: store root id together with vid, for better locality of refe… (#2449 ) The state and account MPT:s currenty share key space in the database based on that vertex id:s are assigned essentially randomly, which means that when two adjacent slot values from the same contract are accessed, they might reside at large distance from each other. Here, we prefix each vertex id by its root causing them to be sorted together thus bringing all data belonging to a particular contract closer together - the same effect also happens for the main state MPT whose nodes now end up clustered together more tightly. In the future, the prefix given to the storage keys can also be used to perform range operations such as reading all the storage at once and/or deleting an account with a batch operation. Notably, parts of the API already supported this rooting concept while parts didn't - this PR makes the API consistent by always working with a root+vid.	2024-07-04 15:46:52 +02:00
Jacek Sieka	b23795ab39	remove pPrf, fRpp (#2445 ) No longer used now that hashify is gone	2024-07-03 22:21:57 +02:00
Jacek Sieka	443c6d1f8e	Cache account path storage id (#2443 ) The storage id is frequently accessed when executing contract code and finding the path via the database requires several hops making the process slow - here, we add a cache to keep the most recently used account storage id:s in memory. A possible future improvement would be to cache all account accesses so that for example updating the balance doesn't cause several hikes.	2024-07-03 17:58:25 +02:00
Jacek Sieka	3d3831dde8	Small cleanups (#2435 ) * avoid costly hike memory allocations for operations that don't need to re-traverse it * avoid unnecessary state checks (which might trigger unwanted state root computations) * disable optimize-for-hits due to the MPT no longer being complete at all times	2024-07-01 14:07:39 +02:00
Jordan Hrycaj	6dc2773957	Only use pre hashed addresses as account keys (#2424 ) * Normalised storage tree addressing in function prototypes detail: Argument list is always `<db> <account-path> <slot-path> ..` with both path arguments as `openArray[]` * Remove cruft * CoreDb internally Use full account paths rather than addresses * Update API logging * Use hashed account address only in prototypes why: This avoids unnecessary repeated hashing of the same account address. The burden of doing that is upon the application. In the case here, the ledger caches all kinds of stuff anyway so it is common sense to exploit that for account address hashes. caveat: Using `openArray[byte]` argument types for hashed accounts is inherently fragile. In non-release mode, a length verification `doAssert` is enabled by default. * No accPath in data record (use `AristoAccount` as `CoreDbAccount`) * Remove now unused `eAddr` field from ledger `AccountRef` type why: Is duplicate of lookup key * Avoid merging the account record/statement in the ledger twice.	2024-06-27 19:21:01 +00:00
Jordan Hrycaj	61bbf40014	Update storage tree admin (#2419 ) * Tighten `CoreDb` API for accounts why: Apart from cruft, the way to fetch the accounts state root via a `CoreDbColRef` record was unnecessarily complicated. * Extend `CoreDb` API for accounts to cover storage tries why: In future, this will make the notion of column objects obsolete. Storage trees will then be indexed by the account address rather than the vertex ID equivalent like a `CoreDbColRef`. * Apply new/extended accounts API to ledger and tests details: This makes the `distinct_ledger` module obsolete * Remove column object constructors why: They were needed as an abstraction of MPT sub-trees including storage trees. Now, storage trees are handled by the account (e.g. via address) they belong to and all other trees can be identified by a constant well known vertex ID. So there is no need for column objects anymore. Still there are some left-over column object methods wnich will be removed next. * Remove `serialise()` and `PayloadRef` from default Aristo API why: Not needed. `PayloadRef` was used for unstructured/unknown payload formats (account or blob) and `serialise()` was used for decodng `PayloadRef`. Now it is known in advance what the payload looks like. * Added query function `hasStorageData()` whether a storage area exists why: Useful for supporting `slotStateEmpty()` of the `CoreDb` API * In the `Ledger` replace `storage.stateEmpty()` by `slotStateEmpty()` * On Aristo, hide the storage root/vertex ID in the `PayloadRef` why: The storage vertex ID is fully controlled by Aristo while the `AristoAccount` object is controlled by the application. With the storage root part of the `AristoAccount` object, there was a useless administrative burden to keep that storage root field up to date. * Remove cruft, update comments etc. * Update changed MPT access paradigms why: Fixes verified proxy tests * Fluffy cosmetics	2024-06-27 09:01:26 +00:00
Jacek Sieka	6b68ff92d3	Allocation-free nibbles buffer (#2406 ) This buffer eleminates a large part of allocations during MPT traversal, reducing overall memory usage and GC pressure. Ideally, we would use it throughout in the API instead of `openArray[byte]` since the built-in length limit appropriately exposes the natural 64-nibble depth constraint that `openArray` fails to capture.	2024-06-22 22:33:37 +02:00
Jordan Hrycaj	51f02090b8	Aristo uses pre classified tree types (#2385 ) * Remove unused `merge()` functions (for production) details: Some functionality moved to test suite Make sure that only `AccountData` leaf type is exactly used on VertexID(1) * clean up payload type * Provide dedicated functions for merging accounts and storage trees why: Storage trees are always linked to an account, so there is no need for an application to fiddle about (e.e. creating, re-cycling) with storage tree vertex IDs. * CoreDb: Disable tracer functionality why: Must be updated to accommodate new/changed `Aristo` functions. * CoreDb: Use new `mergeXXX()` functions why: Makes explicit vertex ID management obsolete for creating new storage trees. * Remove `mergePayload()` and other cruft from API, `aristo_merge`, etc. * clean up merge functions details: The merge implementation `mergePayloadImpl()` does not need to be super generic anymore as all the edge cases are covered by the specialised functions `mergeAccountPayload()`, `mergeGenericData()`, and `mergeStorageData()`. * No tracer available at the moment, so disable offending tests	2024-06-18 11:14:02 +00:00
Jacek Sieka	eb041abba7	avoid unnecessary memory allocations and lookups (#2334 ) * use `withValue` instead of `hasKey` + `[]` * avoid `@` et al * parse database data inside `onData` instead of making seq then parsing	2024-06-11 11:38:58 +02:00
Jordan Hrycaj	392088e5e9	Coredb fix storage tree issues (#2317 ) * Code cosmetics * Re-org `aristo_merge`, internally split into sub-modules why: Became a burden for maintenance because it hosts two different functionalities under the same merge paradigm: account/data merge and snap proof merge where the latter produces a partial trie. * Fix CoreDb tracer * Ledger: fix potential account vs. storage tree sync problems * Remove bound on the size of removable whole storage trees * Activate `test_tracer_json`	2024-06-07 10:56:31 +00:00

14 Commits