nimbus-eth1

Commit Graph

Author	SHA1	Message	Date
Jordan Hrycaj	6bae929439	Added comments (#2546 )	2024-08-06 12:43:39 +00:00
Jordan Hrycaj	5b502a06c4	Added portal proof nodes generation functionality (#2539 ) * Extracted `test_tx.testTxMergeProofAndKvpList()` => separate file * Fix serialiser why: Typo lead to duplicate rlp-encoded nodes in chain * Remove cruft * Implemnt portal proof nodes generators `partXxxTwig()` * Add unit test for portal proof nodes generator `partAccountTwig()` * Cosmetics * Simplify serialiser return code format * Fix proof generator for extension nodes why: Code was simply bonkers, not detected before the unit tests were adapted to check for just this. * Implemented portal proof nodes verifier `partUntwig()` * Cosmetics * Fix `testutp` cli poblem	2024-08-06 11:29:26 +00:00
Jordan Hrycaj	01b5c08763	Revive json tracer unit tests (#2538 ) * Some `Aristo` clean-ups/updates * Re-implemented core-db tracer functionality * Rename nimbus tracer `no-tracer.nim` => `tracer.nim` why: Restore original name for easy diff tracking with upcoming update * Update nimbus tracer using new core-db tracer functionality * Updating json tracer unit tests * Enable json tracer unit tests	2024-08-01 10:41:20 +00:00
Jordan Hrycaj	72c3ab8ced	Provide partial tree support for preloading tests (#2536 ) * Implement partial trees why: This is currently needed for unit tests to pre-load the database with test data similar to `proof` node pre-load. The basic features for `snap-sync` boundary proofs are available as well for future use. What is missing is the final proof verification and a complete storage data load/merge function (stub is available.) * Cosmetics, clean up	2024-07-29 20:15:17 +00:00
Jacek Sieka	bdc86b3fd4	small cleanups (#2526 ) * remove some redundant EH * avoid pessimising move (introduces a copy in this case!) * shift less data around when reading era files (reduces stack usage)	2024-07-26 12:32:01 +07:00
andri lim	254bda365f	Remove txpool sender locality (#2525 ) * Remove txpool sender locality We no longer distinct local or remote sender * Fix copyright year	2024-07-25 22:36:08 +07:00
Jordan Hrycaj	1452e7b1c0	Misc updates (#2513 ) * Update config for Ledger and CoreDb why: Prepare for tracer which depends on the API jump table (as well as the profiler.) The API jump table is now enabled in unit/integration test mode piggybacking on the `unittest2DisableParamFiltering` compiler flag or on an extra compiler flag `dbjapi_enabled`. * No deed for error field in `NodeRef` why: Was opnly needed by proof nodes pre-loader which will be re-implemented * Cosmetics	2024-07-22 18:10:04 +00:00
Jordan Hrycaj	5ac362fe6f	Aristo and kvt balancer management update (#2504 ) * Aristo: Merge `delta_siblings` module into `deltaPersistent()` * Aristo: Add `isEmpty()` for canonical checking whether a layer is empty * Aristo: Merge `LayerDeltaRef` into `LayerObj` why: No need to maintain nested object refs anymore. Previously the `LayerDeltaRef` object had a companion `LayerFinalRef` which held non-delta layer information. * Kvt: Merge `LayerDeltaRef` into `LayerRef` why: No need to maintain nested object refs (as with `Aristo`) * Kvt: Re-write balancer logic similar to `Aristo` why: Although `Kvt` was a cheap copy of `Aristo` it sort of got out of sync and the balancer code was wrong. * Update iterator over forked peers why: Yield additional field `isLast` indicating that the last iteration cycle was approached. * Optimise balancer calculation. why: One can often avoid providing a new object containing the merge of two layers for the balancer. This avoids copying tables. In some cases this is replaced by `hasKey()` look ups though. One uses one of the two to combine and merges the other into the first. Of course, this needs some checks for making sure that none of the components to merge is eventually shared with something else. * Fix copyright year	2024-07-18 21:32:32 +00:00
Jacek Sieka	df4a21c910	Store cached hash at the layer corresponding to the source data (#2492 ) When lazily verifying state roots, we may end up with an entire state without roots that gets computed for the whole database - in the current design, that would result in hashes for the entire trie being held in memory. Since the hash depends only on the data in the vertex, we can store it directly at the top-most level derived from the verticies it depends on - be that memory or database - this makes the memory usage broadly linear with respect to the already-existing in-memory change set stored in the layers. It also ensures that if we have multiple forks in memory, hashes get cached in the correct layer maximising reuse between forks. The same layer numbering scheme as elsewhere is reused, where -2 is the backend, -1 is the balancer, then 0+ is the top of the stack and stack. A downside of this approach is that we create many small batches - a future improvement could be to collect all such writes in a single batch, though the memory profile of this approach should be examined first (where is the batch kept, exactly?).	2024-07-18 09:13:56 +02:00
Jordan Hrycaj	6677f57ea9	Aristo balancer clean up (#2501 ) * Remove `chunkedMpt` from `persistent()`/`stow()` function why: Proof-mode code was removed with PR #2445 and needs to be re-designed. * Remove unused `beStateRoot` argument from `deltaMerge()` * Update/drastically simplify `txStow()` why: Got rid of many boundary conditions details: Many pre-conditions have changed. In particular, previous versions used the account state (hash) which was conveniently available and checked it against the backend in order to find out whether there was something to do, at all. Currently, only an empty set of all tables in the delta layer has the balancer update ignored. Notable changes are: * no check against account state (see above) * balancer filters have no hash signature (some legacy stuff left over from journals) * no (shap sync) proof data which made the generation of the a top layer more complex * Cosmetics, cruft removal * Update unit test file & function name why: Was legacy module	2024-07-17 19:27:33 +00:00
Jordan Hrycaj	17391b58d0	Hash keys and hash256 revisited (#2497 ) * Remove cruft left-over from PR #2494 * TODO * Update comments on `HashKey` type values * Remove obsolete hash key conversion flag `forceRoot` why: Is treated implicitly by having vertex keys as `HashKey` type and root vertex states converted to `Hash256`	2024-07-17 20:48:21 +07:00
andri lim	a59cc84fca	Not using deprecated functions in config anymore (#2495 )	2024-07-17 02:57:19 +00:00
Jordan Hrycaj	a84a2131cd	No ext update (#2494 ) * Imported/rebase from `no-ext`, PR #2485 Store extension nodes together with the branch Extension nodes must be followed by a branch - as such, it makes sense to store the two together both in the database and in memory: * fewer reads, writes and updates to traverse the tree * simpler logic for maintaining the node structure * less space used, both memory and storage, because there are fewer nodes overall There is also a downside: hashes can no longer be cached for an extension - instead, only the extension+branch hash can be cached - this seems like a fine tradeoff since computing it should be fast. TODO: fix commented code * Fix merge functions and `toNode()` * Update `merkleSignCommit()` prototype why: Result is always a 32bit hash * Update short Merkle hash key generation details: Ethereum reference MPTs use Keccak hashes as node links if the size of an RLP encoded node is at least 32 bytes. Otherwise, the RLP encoded node value is used as a pseudo node link (rather than a hash.) This is specified in the yellow paper, appendix D. Different to the `Aristo` implementation, the reference MPT would not store such a node on the key-value database. Rather the RLP encoded node value is stored instead of a node link in a parent node is stored as a node link on the parent database. Only for the root hash, the top level node is always referred to by the hash. * Fix/update `Extension` sections why: Were commented out after removal of a dedicated `Extension` type which left the system disfunctional. * Clean up unused error codes * Update unit tests * Update docu --------- Co-authored-by: Jacek Sieka <jacek@status.im>	2024-07-16 19:47:59 +00:00
Jacek Sieka	0e36a17e5b	avoid re-writing code (#2490 ) Avoids pointless rocksdb writes that cause write compaction / amplification, specially in the case where code is shared between multiple accounts	2024-07-15 15:02:23 +02:00
Jacek Sieka	9d91191154	storage hike cache (#2484 ) This PR adds a storage hike cache similar to the account hike cache already present - this cache is less efficient because account storage is already partically cached in the account ledger but nonetheless helps keep hiking down. Notably, there's an opportunity to optimise this cache and the others so that they cooperate better insteado of overlapping, which is left for a future PR. This PR also fixes an O(N) memory usage for storage slots where the delete would keep the full storage in a work list which on mainnet can grow very large - the work list is replaced with a more conventional recursive `O(log N)` approach.	2024-07-14 19:12:10 +02:00
Jacek Sieka	f3a56002ca	Turn payload into value type (#2483 ) The Vertex type unifies branches, extensions and leaves into a single memory area where the larges member is the branch (128 bytes + overhead) - the payloads we have are all smaller than 128 thus wrapping them in an extra layer of `ref` is wasteful from a memory usage perspective. Further, the ref:s must be visited during the M&S phase of garbage collection - since we keep millions of these, many of them short-lived, this takes up significant CPU time. ``` Function CPU Time: Total CPU Time: Self Module Function (Full) Source File Start Address system::markStackAndRegisters 10.0% 4.922s nimbus system::markStackAndRegisters(var<system::GcHeap>).constprop.0 gc.nim 0x701230` ```	2024-07-14 12:02:05 +02:00
Jacek Sieka	72947b3647	odds and ends (#2481 ) small cleanups to reduce memory allocations	2024-07-13 20:42:49 +02:00
Jordan Hrycaj	f08178c592	Separate constructor helpers for core db and ledger (#2480 ) * Extract `CoreDb` constructor helpers from `base.nim` into separate module why: This makes it easier to avoid circular imports. * Extract `Ledger` constructor helpers from `base.nim` into separate module why: Move `accounts_ledger.nim` file to sub-folder `backend`. That way the layout resembles that of the `core_db`.	2024-07-12 19:32:31 +00:00
Jordan Hrycaj	b924fdcaa7	Separate config for core db and ledger (#2479 ) * Updates and corrections * Extract `CoreDb` configuration from `base.nim` into separate module why: This makes it easier to avoid circular imports, in particular when the capture journal (aka tracer) is revived. * Extract `Ledger` configuration from `base.nim` into separate module why: This makes it easier to avoid circular imports (if any.) also: Move `accounts_ledger.nim` file to sub-folder `backend`. That way the layout resembles that of the `core_db`.	2024-07-12 13:12:25 +00:00
Jacek Sieka	01ab209497	cache account payload (#2478 ) Instead of caching just the storage id, we can cache the full payload which further reduces expensive hikes	2024-07-12 15:08:26 +02:00
Jacek Sieka	d07540766f	coredb: tracking fixes (#2476 )	2024-07-12 13:40:13 +02:00
Jacek Sieka	a6764670f0	merge: avoid hike allocations (#2472 ) hike allocations (and the garbage collection maintenance that follows) are responsible for some 10% of cpu time (not wall time!) at this point - this PR avoids them by stepping through the layers one step at a time, simplifying the code at the same time.	2024-07-11 13:26:46 +02:00
Jordan Hrycaj	800fd77333	Core db remove legacy phrases (#2468 ) * Rename `newKvt()` -> `ctx.getKvt()` why: Clean up legacy shortcut. Also, the `KVT` returned is not instantiated but refers to the shared `KVT` that resides in a context which is a generalisation of an in-memory database fork. The function `ctx` retrieves the default context. * Rename `newTransaction()` -> `ctx.newTransaction()` why: Clean up legacy shortcut. The transaction is applied to a context as a generalisation of an in-memory database fork. The function `ctx` retrieves the default context. * Rename `getColumn(CtGeneric)` -> `getGeneric()` why: No more a list of well known sub-tries needed, a single one is enough. In fact, `getColumn()` did only support a single sub-tree by now. * Reduce TODO list	2024-07-10 12:19:35 +00:00
Jacek Sieka	3382c2427b	increase rdb cache sizes (#2466 ) This trivial bump should improve performance a bit without costing too much memory - as the trie grows, so does the number of levels in it and creating hikes becomes ever more expensive - hopefully this cache increase should give a nice little boost even if it's not a lot.	2024-07-09 17:35:27 +02:00
Jacek Sieka	ab23148aab	don't rewrite hash->slot map (#2463 ) Avoid writing the same slot/hash values to the hash->slot mapping to avoid spamming the rocksdb WAL and cause unnecessary compaction In the same vein, avoid writing trivially detectable A-B-A storage changes which happen with surprising frequency.	2024-07-09 17:25:43 +02:00
andri lim	c775c906a2	Fix LedgerRef storage iterator and add test (#2458 )	2024-07-05 10:15:48 +00:00
Jacek Sieka	7d78fd97d5	avoid allocations for slot storage (#2455 ) Introduce a new `StoData` payload type similar to `AccountData` * slightly more efficient storage format * typed api * fewer seqs * fix encoding docs - it wasn't rlp after all :)	2024-07-04 23:48:45 +00:00
Jacek Sieka	81e75622cf	storage: store root id together with vid, for better locality of refe… (#2449 ) The state and account MPT:s currenty share key space in the database based on that vertex id:s are assigned essentially randomly, which means that when two adjacent slot values from the same contract are accessed, they might reside at large distance from each other. Here, we prefix each vertex id by its root causing them to be sorted together thus bringing all data belonging to a particular contract closer together - the same effect also happens for the main state MPT whose nodes now end up clustered together more tightly. In the future, the prefix given to the storage keys can also be used to perform range operations such as reading all the storage at once and/or deleting an account with a batch operation. Notably, parts of the API already supported this rooting concept while parts didn't - this PR makes the API consistent by always working with a root+vid.	2024-07-04 15:46:52 +02:00
Jacek Sieka	b23795ab39	remove pPrf, fRpp (#2445 ) No longer used now that hashify is gone	2024-07-03 22:21:57 +02:00
Jacek Sieka	443c6d1f8e	Cache account path storage id (#2443 ) The storage id is frequently accessed when executing contract code and finding the path via the database requires several hops making the process slow - here, we add a cache to keep the most recently used account storage id:s in memory. A possible future improvement would be to cache all account accesses so that for example updating the balance doesn't cause several hikes.	2024-07-03 17:58:25 +02:00
Jordan Hrycaj	ea7c756a9d	Core db reorg (#2444 ) * CoreDb: Merged all sub-descriptors into `base_desc` module * Dissolve `aristo_db/common_desc.nim` * No need to export `Aristo` methods in `CoreDb` * Resolve/tighten methods in `aristo_db` sub-moduled why: So they can be straihgt implemented into the `base` module * Moved/re-implemented `KVT` methods into `base` module * Moved/re-implemented `MPT` methods into `base` module * Moved/re-implemented account methods into `base` module * Moved/re-implemented `CTX` methods into `base` module * Moved/re-implemented `handler_{aristo,kvt}` into `aristo_db` module * Moved/re-implemented `TX` methods into `base` module * Moved/re-implemented base methods into `base` module * Replaced `toAristoSavedStateBlockNumber()` by proper base method why: Was the last for keeping reason for keeping low level backend access methods * Remove dedicated low level access to `Aristo` backend why: Not needed anymore, for debugging the descriptors can be accessed directly also: some clean up stuff * Re-factor `CoreDb` descriptor layout and adjust base methods * Moved/re-implemented iterators into `base_iterator` modules Update docu	2024-07-03 15:50:27 +00:00
Jacek Sieka	1f60e8e453	Use `Hash256` directly for account path (#2439 ) Account paths are always a hash - passing it around as such helps avoid confusion as to how long it is	2024-07-03 10:14:26 +02:00
Jacek Sieka	c364426422	Smaller in-database representations (#2436 ) These representations use ~15-20% less data compared to the status quo, mainly by removing redundant zeroes in the integer encodings - a significant effect of this change is that the various rocksdb caches see better efficiency since more items fit in the same amount of space. * use RLP encoding for `VertexID` and `UInt256` wherever it appears * pack `VertexRef`/`PayloadRef` more tightly	2024-07-02 20:25:06 +02:00
web3-developer	e163b69261	Bump RocksDb version and enable autoClose on opt types to prevent memory leaks (#2427 ) * Bump RocksDb version and enable autoClose on opt types to prevent memory leaks.	2024-07-02 13:44:09 +08:00
Jacek Sieka	3d3831dde8	Small cleanups (#2435 ) * avoid costly hike memory allocations for operations that don't need to re-traverse it * avoid unnecessary state checks (which might trigger unwanted state root computations) * disable optimize-for-hits due to the MPT no longer being complete at all times	2024-07-01 14:07:39 +02:00
Jordan Hrycaj	2c87fd1636	Aristo code cosmetics and tests update (#2434 ) * Update some docu * Resolve obsolete compile time option why: Not optional anymore * Update checks why: The notion of what constitutes a valid `Aristo` db has changed due to (even more) lazy calculating Merkle hash keys. * Disable redundant unit test for production	2024-07-01 10:59:18 +00:00
andri lim	c24affadee	Use simpler schema when writing transactions, receipts, and withdrawals (#2420 ) * Use simpler schema when writing transactions, receipts, and withdrawals Using MPT not only slow but also take up more spaces than needed. Aristo will remove older tries and only keep the last block tries. Using simpler schema will avoid those problems. * Rename getTransaction to getTransactionByIndex	2024-06-29 12:43:17 +07:00
Jordan Hrycaj	8dd038144b	Some cleanups (#2428 ) * Remove `dirty` set from structural objects why: Not used anymore, the tree is dirty by default. * Rename `aristo_hashify` -> `aristo_compute` * Remove cruft, update comments, cosmetics, etc. * Simplify `SavedState` object why: The key chaining have become obsolete after extra lazy hashing. There is some available space for a state hash to be maintained in future. details: Accept the legacy `SavedState` object serialisation format for a while (which will be overwritten by new format.)	2024-06-28 18:43:04 +00:00
Jordan Hrycaj	14c3772545	On demand mpt revisited (#2426 ) * rebased from `github/on-demand-mpt` ackn: wip: on-demand mpt construction Given that actual data is stored in the `Vertex` structure, it's useful to think of the MPT as a cache for computing roots rather than being a functional requirement on its own. This PR engenders this line of thinking by incrementally computing the MPT only when it's needed, ie when a state (or similar) root is needed. This has the effect of siginficantly reducing memory usage as well as improving performance: * no need for dirty-mpt-node book-keeping * no need to build complex forest of upcoming hashing work * only hashes that are functionally needed are ever computed - intermediate nodes whose MTP root is not observed are never computed / processed * Unit test hot fixes * Unit test hot fixes cont. (somehow lost that part) --------- Co-authored-by: Jacek Sieka <jacek@status.im>	2024-06-28 15:03:12 +00:00
Jordan Hrycaj	6dc2773957	Only use pre hashed addresses as account keys (#2424 ) * Normalised storage tree addressing in function prototypes detail: Argument list is always `<db> <account-path> <slot-path> ..` with both path arguments as `openArray[]` * Remove cruft * CoreDb internally Use full account paths rather than addresses * Update API logging * Use hashed account address only in prototypes why: This avoids unnecessary repeated hashing of the same account address. The burden of doing that is upon the application. In the case here, the ledger caches all kinds of stuff anyway so it is common sense to exploit that for account address hashes. caveat: Using `openArray[byte]` argument types for hashed accounts is inherently fragile. In non-release mode, a length verification `doAssert` is enabled by default. * No accPath in data record (use `AristoAccount` as `CoreDbAccount`) * Remove now unused `eAddr` field from ledger `AccountRef` type why: Is duplicate of lookup key * Avoid merging the account record/statement in the ledger twice.	2024-06-27 19:21:01 +00:00
Jordan Hrycaj	61bbf40014	Update storage tree admin (#2419 ) * Tighten `CoreDb` API for accounts why: Apart from cruft, the way to fetch the accounts state root via a `CoreDbColRef` record was unnecessarily complicated. * Extend `CoreDb` API for accounts to cover storage tries why: In future, this will make the notion of column objects obsolete. Storage trees will then be indexed by the account address rather than the vertex ID equivalent like a `CoreDbColRef`. * Apply new/extended accounts API to ledger and tests details: This makes the `distinct_ledger` module obsolete * Remove column object constructors why: They were needed as an abstraction of MPT sub-trees including storage trees. Now, storage trees are handled by the account (e.g. via address) they belong to and all other trees can be identified by a constant well known vertex ID. So there is no need for column objects anymore. Still there are some left-over column object methods wnich will be removed next. * Remove `serialise()` and `PayloadRef` from default Aristo API why: Not needed. `PayloadRef` was used for unstructured/unknown payload formats (account or blob) and `serialise()` was used for decodng `PayloadRef`. Now it is known in advance what the payload looks like. * Added query function `hasStorageData()` whether a storage area exists why: Useful for supporting `slotStateEmpty()` of the `CoreDb` API * In the `Ledger` replace `storage.stateEmpty()` by `slotStateEmpty()` * On Aristo, hide the storage root/vertex ID in the `PayloadRef` why: The storage vertex ID is fully controlled by Aristo while the `AristoAccount` object is controlled by the application. With the storage root part of the `AristoAccount` object, there was a useless administrative burden to keep that storage root field up to date. * Remove cruft, update comments etc. * Update changed MPT access paradigms why: Fixes verified proxy tests * Fluffy cosmetics	2024-06-27 09:01:26 +00:00
web3-developer	ea94e8a351	Use RocksDb column family handles instead of name strings. (#2418 ) * Bump RocksDb to latest and update Nimbus database to pass column family handles to RocksDb API. * Bump RocksDb version.	2024-06-27 16:51:43 +08:00
Jacek Sieka	c8cdffa775	Small cleanups (#2414 ) * remove unnecessary / expensive error checking * avoid some trivial memory allocs * work around table move bug	2024-06-26 09:25:09 +02:00
Jacek Sieka	3e001e322c	Fix memory usage spikes during sync, give memory to rocksdb (#2413 ) * creating a seq from a table that holds lots of changes means copying all data into the table - this can be several GB of data while syncing blocks * nim fails to optimize the moving of the `WidthFirstForest` - the real solution is to not construct a `wff` to begin with, but this PR provides relief while that is being worked on This spike fix allows us to bump the rocksdb cache by another 2 GB and still have a significantly lower peak memory usage during sync.	2024-06-25 13:39:53 +02:00
Jacek Sieka	f294d1e086	Clear account cache after each block (#2411 ) When processing long ranges of blocks, the account cache grows unbounded which cause huge memory spikes. Here, we move the cache to a second-level cache after each block - the second-level cache is cleared on the next block after that which creates a simple LRU effect. There's a small performance cost of course, though overall the freed-up memory can now be reassigned to the rocksdb row cache which not only makes up for the loss but overall leads to a performance increase. The bump to 2gb of rocksdb row cache here needs more testing but is slightly less and loosely basedy on the savings from this PR and the circular ref fix in #2408 - another way to phrase this is that it's better to give rocksdb more breathing room than let the memory sit unused until circular ref collection happens ;)	2024-06-25 07:30:32 +02:00
Jacek Sieka	9521582005	avoid closure environment for mpt methods (#2408 ) An instance of `CoreDbMptRef` is created for and stored in every account - when we are processing blocks and have many accounts in memory, this closure environment takes up hundreds of mb of memory (around block 5M, it is the 4:th largest memory consumer!) - incidentally, this also removes a circular reference in the setup that causes the `AristoCodeDbMptRef` to linger in memory much longer than it has to which is the core reason why it takes so much. The real solution here is to remove the methods indirection entirely, but this PR provides relief until that has been done. Similar treatment is given to some of the other core api functions to avoid circulars there too.	2024-06-24 07:56:41 +02:00
Jacek Sieka	6b68ff92d3	Allocation-free nibbles buffer (#2406 ) This buffer eleminates a large part of allocations during MPT traversal, reducing overall memory usage and GC pressure. Ideally, we would use it throughout in the API instead of `openArray[byte]` since the built-in length limit appropriately exposes the natural 64-nibble depth constraint that `openArray` fails to capture.	2024-06-22 22:33:37 +02:00
Jacek Sieka	768307d91d	Cache code and invalid jump destination tables (fixes #2268 ) (#2404 ) It is common for many accounts to share the same code - at the database level, code is stored by hash meaning only one copy exists per unique program but when loaded in memory, a copy is made for each account. Further, every time we execute the code, it must be scanned for invalid jump destinations which slows down EVM exeuction. Finally, the extcodesize call causes code to be loaded even if only the size is needed. This PR improves on all these points by introducing a shared CodeBytesRef type whose code section is immutable and that can be shared between accounts. Further, a dedicated `len` API call is added so that the EXTCODESIZE opcode can operate without polluting the GC and code cache, for cases where only the size is requested - rocksdb will in this case cache the code itself in the row cache meaning that lookup of the code itself remains fast when length is asked for first. With 16k code entries, there's a 90% hit rate which goes up to 99% during the 2.3M attack - the cache significantly lowers memory consumption and execution time not only during this event but across the board.	2024-06-21 09:44:10 +02:00
Jordan Hrycaj	081cb15493	Coredb maintenance (#2398 ) * CoreDb: remove PHK tries why: There is no general use anymore for an MPT with a pre-hashed key. It was used to resemble the `SecureHexaryTrie` logic from the legacy DB. The only pace where this is needed is the `Leger` which uses a a distinct MPT version anyway (see `distinct_ledgers.nim`.) * Rename `CoreDx` -> `CoreDb` why: The naming `CoreDx` was used to differentiate the new CoreDb API from the legacy API which had descriptors named `CoreDb`.	2024-06-19 14:13:12 +00:00
Jordan Hrycaj	e7be0d185c	Aristo uses pre classified tree types cont2 (#2397 ) * Provide dedicated functions for fetching accounts and storage trees why: Different prototypes for each class `account`, `generic` and `storage`. * Remove `fetchPayload()` and other cruft from API, `aristo_fetch`, etc. * Fix typos, debugging left overs, comments	2024-06-19 12:40:00 +00:00
andri lim	0e5fd3ffc9	LedgerRef: stateOrVoid become stateEmptyOrVoid (#2394 )	2024-06-19 14:14:36 +02:00
andri lim	5a39fc0d69	Remove unused dbkey (#2396 )	2024-06-19 14:11:14 +02:00
Jacek Sieka	41cf81f80b	Fix dboptions init (#2391 ) For the block cache to be shared between column families, the options instance must be shared between the various column families being created. This also ensures that there is only one source of truth for configuration options instead of having two different sets depending on how the tables were initialized. This PR also removes the re-opening mechanism which can double startup time - every time the database is opened, the log is replayed - a large log file will take a long time to open. Finally, several options got correclty implemented as column family options, including an one that puts a hash index in the SST files.	2024-06-19 10:55:57 +02:00
Miran	ea0d18424a	use Nim 2.0.6 (#2384 ) * use Nim 2.0.6 * Fixes for nim 2.0.6 * Workaround nim 2.0 array indexing issue * Remove excess gcsafe pragma * Oops, fix recursive template * Fix imports * Fluffy nph linting --------- Co-authored-by: jangko <jangko128@gmail.com> Co-authored-by: tersec <tersec@users.noreply.github.com>	2024-06-19 01:27:54 +00:00
Jordan Hrycaj	8727307ef4	Aristo uses pre classified tree types cont1 (#2389 ) * Provide dedicated functions for deleteing accounts and storage trees why: Storage trees are always linked to an account, so there is no need for an application to fiddle about (e.g. re-cycling, unlinking) storage tree vertex IDs. * Remove `delete()` and other cruft from API, `aristo_delete`, etc. * clean up delete functions details: The delete implementations `deleteImpl()` and `delTreeImpl()` do not need to be super generic anymore as all the edge cases are covered by the specialised `deleteAccountPayload()`, `deleteGenericData()`, etc. * Avoid unnecessary re-calculations of account keys why: The function `registerAccountForUpdate()` did extract the storage ID (if any) and automatically marked the Merkle keys along the account path for re-hashing. This would also apply if there was later detected that the account or the storage tree did not need to be updated. So the `registerAccountForUpdate()` function was split into a part which retrieved the storage ID, and another one which marked the Merkle keys for re-calculation to be applied only when needed.	2024-06-18 19:30:01 +00:00
Jordan Hrycaj	51f02090b8	Aristo uses pre classified tree types (#2385 ) * Remove unused `merge()` functions (for production) details: Some functionality moved to test suite Make sure that only `AccountData` leaf type is exactly used on VertexID(1) * clean up payload type * Provide dedicated functions for merging accounts and storage trees why: Storage trees are always linked to an account, so there is no need for an application to fiddle about (e.e. creating, re-cycling) with storage tree vertex IDs. * CoreDb: Disable tracer functionality why: Must be updated to accommodate new/changed `Aristo` functions. * CoreDb: Use new `mergeXXX()` functions why: Makes explicit vertex ID management obsolete for creating new storage trees. * Remove `mergePayload()` and other cruft from API, `aristo_merge`, etc. * clean up merge functions details: The merge implementation `mergePayloadImpl()` does not need to be super generic anymore as all the edge cases are covered by the specialised functions `mergeAccountPayload()`, `mergeGenericData()`, and `mergeStorageData()`. * No tracer available at the moment, so disable offending tests	2024-06-18 11:14:02 +00:00
Jacek Sieka	8926da02b6	Fix lowest-hanging fruit in VM (#2382 ) * replace set with bitseq for code validity test * remove unusued code from CodeStream * avoid unnecessary byte-by-byte copies	2024-06-18 07:55:35 +07:00
Jacek Sieka	9c6fd46a51	avoid computing state root just to know if storage is empty (#2380 ) The state root computation here is one of the major hotspots in block processing - in the cases the code only needs to know if it's empty or not, it can be done a lot faster. Adding a separate function for this looks fragile and should probably be revisited.	2024-06-17 15:29:07 +02:00
Jacek Sieka	1fb658ff03	Remove hashify calls when forking (#2377 ) This appears to no longer be needed and we want to delay hashing as much as possible.	2024-06-17 14:18:50 +02:00
andri lim	69044dda60	Remove AccountStateDB (#2368 ) * Remove AccountStateDB AccountStateDB should no longer be used. It's usage have been reduce to read only operations. Replace it with LedgerRef to reduce maintenance burden. * remove extra spaces Co-authored-by: tersec <tersec@users.noreply.github.com> --------- Co-authored-by: tersec <tersec@users.noreply.github.com>	2024-06-16 10:21:02 +07:00
Jacek Sieka	af34f90fe4	fix `max_total_wal_size` which should be set on the DB (#2363 )	2024-06-16 02:11:30 +00:00
Jacek Sieka	242bbf03fc	Light verification and storage mode for import (#2367 ) When performing block import, we can batch state root verifications and header checks, doing them only once per chunk of blocks, assuming that the other blocks in the batch are valid by extension. When we're not generating receipts, we can also skip per-transaction state root computation pre-byzantium, which is what provides a ~20% speedup in this PR, at least on those early blocks :) We also stop storing transactions, receipts and uncles redundantly when importing from era1 - there is no need to waste database storage on this when we can load it from the era1 file (eventually).	2024-06-15 11:22:37 +02:00
Jacek Sieka	68f462e3e4	avoid state root lookup when computing linear history (#2362 ) State lookups potentially trigger expensive re-hashings - this is the first of several steps to remove the unnecessary ones from the general flow of block processing * avoid re-reading parent block header from database when it's already in memory	2024-06-14 15:56:56 +02:00
Jordan Hrycaj	debba5a620	Coeredb related clean up and maint fixes (#2360 ) * Fix initialiser why: Possible crash (app profiling, tracer etc.) * Update column family options processing why: Same for kvt as for aristo * Move `AristoDbDualRocks` backend type to the test suite why: So it is not available for production * Fix typos in API jump table why: Used for tracing and app profiling only. Needed some update * Purged CoreDb legacy API why: Not needed anymore, was transitionary and disabled. * Rename `flush` argument to `eradicate` in a DB close context why: The word `eradicate` leaves no doubt what is meant * Rename `stoFlush()` -> `stoDelete()` * Rename `core_apps_newapi` -> `core_apps` (not so new anymore)	2024-06-14 11:19:48 +00:00
andri lim	5a18537450	Bump nim-eth, nim-web3, nimbus-eth2 (#2344 ) * Bump nim-eth, nim-web3, nimbus-eth2 - Replace std.Option with results.Opt - Fields name changes * More fixes * Fix Portal stream async raises and portal testnet Opt usage * Bump eth + nimbus-eth2 + more fixes related to eth_types changes * Fix in utp test app and nimbus-eth2 bump * Fix test_blockchain_json rebase conflict * Fix EVMC block_timestamp conversion plus commentary --------- Co-authored-by: kdeme <kim.demey@gmail.com>	2024-06-14 14:31:08 +07:00
Jacek Sieka	189a20bbae	Avoid recomputing hashes when persisting data (#2350 )	2024-06-14 07:10:00 +02:00
Jordan Hrycaj	5a5cc6295e	Triggered write event for kvt (#2351 ) * bump rockdb * Rename `KVT` objects related to filters according to `Aristo` naming details: filter* => delta* roFilter => balancer * Compulsory error handling if `persistent()` fails * Add return code to `reCentre()` why: Might eventually fail if re-centring is blocked. Some logic will be added in subsequent patch sets. * Add column families from earlier session to rocksdb in opening procedure why: All previously used CFs must be declared when re-opening an existing database. * Update `init()` and add rocksdb `reinit()` methods for changing parameters why: Opening a set column families (with different open options) must span at least the ones that are already on disk. * Provide write-trigger-event interface into `Aristo` backend why: This allows to save data from a guest application (think `KVT`) to get synced with the write cycle so the guest and `Aristo` save all atomically. * Use `KVT` with new column family interface from `Aristo` * Remove obsolete guest interface * Implement `KVT` piggyback on `Aristo` backend * CoreDb: Add separate `KVT`/`Aristo` backend mode for debugging * Remove `rocks_db` import from `persist()` function why: Some systems (i.p `fluffy` and friends) use the `Aristo` memory backend emulation and do not link against rocksdb when building the application. So this should fix that problem.	2024-06-13 18:15:11 +00:00
Jacek Sieka	5f44be1bcd	Remove rehashing in storage ledger init (#2340 ) This looks like a debug leftover, but it causes a state root recomputation which is slow	2024-06-12 21:13:53 +02:00
Jacek Sieka	54f793f946	Apply some basic rocksdb options (#2339 ) These options, inspired by Nethermind and general internet wisdom, bring the database size down to 2/3 without affecting throughput. In theory, they should also bring down memory usage and/or make more efficient use of whatever memory is already assigned to rocksdb but this needs verification in a longer test at synced-mainnet sizes. In the meantime, they make testing easier by removing some noise that the profiler says are bad, such as excessive SkipList access (countered by bloom filters).	2024-06-12 14:52:27 +02:00
Jacek Sieka	eb041abba7	avoid unnecessary memory allocations and lookups (#2334 ) * use `withValue` instead of `hasKey` + `[]` * avoid `@` et al * parse database data inside `onData` instead of making seq then parsing	2024-06-11 11:38:58 +02:00
Jordan Hrycaj	a347291413	Aristo use rocksdb cf instead of key pfx (#2332 ) * Use RocksDb column families instead of a prefixed single column why: Better performance * Use structural objects `VertexRef` and `HashKey` in LRU cache for RocksDb why: Avoids repeated de/serialisation	2024-06-10 12:04:22 +00:00
Jacek Sieka	f6be4bd0ec	avoid initTable (#2328 ) `initTable` is obsolete since nim 0.19 and can introduce significant memory overhead while providing no benefit (since the table will be grown to the default initial size on first use anyway). In particular, aristo layers will not necessarily use all tables they initialize, for exampe when many empty accounts are being created.	2024-06-10 11:05:30 +02:00
Jacek Sieka	359f7ada65	eth: avoid sink (#2331 ) * eth: avoid sink * bump * fix extra transactions from #2330	2024-06-10 14:16:22 +07:00
Jacek Sieka	0b32078c4b	Consolidate block type for block processing (#2325 ) This PR consolidates the split header-body sequences into a single EthBlock sequence and cleans up the fallout from that which significantly reduces block processing overhead during import thanks to less garbage collection and fewer copies of things all around. Notably, since the number of headers must always match the number of bodies, we also get rid of a pointless degree of freedom that in the future could introduce unnecessary bugs. * only read header and body from era file * avoid several unnecessary copies along the block processing way * simplify signatures, cleaning up unused arguemnts and returns * use `stew/assign2` in a few strategic places where the generated nim assignent is slow and add a few `move` to work around poor analysis in nim 1.6 (will need to be revisited for 2.0) ``` stats-20240607_2223-a814aa0b.csv vs stats-20240608_0714-21c1d0a9.csv bps_x bps_y tps_x tps_y bpsd tpsd timed block_number (498305, 713245] 1,540.52 1,809.73 2,361.58 2775.340189 17.63% 17.63% -14.92% (713245, 928185] 730.36 865.26 1,715.90 2028.973852 18.01% 18.01% -15.21% (928185, 1143126] 663.03 789.10 2,529.26 3032.490771 19.79% 19.79% -16.28% (1143126, 1358066] 393.46 508.05 2,152.50 2777.578119 29.13% 29.13% -22.50% (1358066, 1573007] 370.88 440.72 2,351.31 2791.896052 18.81% 18.81% -15.80% (1573007, 1787947] 283.65 335.11 2,068.93 2441.373402 17.60% 17.60% -14.91% (1787947, 2002888] 287.29 342.11 2,078.39 2474.179448 18.99% 18.99% -15.91% (2002888, 2217828] 293.38 343.16 2,208.83 2584.77457 17.16% 17.16% -14.61% (2217828, 2432769] 140.09 167.86 1,081.87 1296.336926 18.82% 18.82% -15.80% blocks: 1934464, baseline: 3h13m1s, contender: 2h43m47s bpsd (mean): 19.55% tpsd (mean): 19.55% Time (total): -29m13s, -15.14% ```	2024-06-09 16:32:20 +02:00
web3-developer	db8c5b90bd	Cleanup stateless and block witness code. (#2295 ) * Cleanup unneeded stateless and block witness code. Keeping MultiKeys which is used in the eth_getProofsByBlockNumber RPC endpoint which is needed for the Fluffy state network bridge. * Rename generateWitness flag to collectWitnessData to better describe what the flag does. We only collect the keys of the touched accounts and storage slots but no block witness generation is supported for now. * Move remaining stateless code into nimbus directory. * Add vmstate parameter to ChainRef to fix test. * Exclude *.in from check copyright year --------- Co-authored-by: jangko <jangko128@gmail.com>	2024-06-08 15:05:00 +07:00
tersec	5008b89185	EIP-2537 BLS12-381 G1 add/mul/exp and G2 add/mul support with tests (#2315 )	2024-06-08 07:39:53 +07:00
tersec	fd03038cab	Replace some usage of std/options with results Opt (#2323 ) * Replace some usage of std/options with results Opt * more updates	2024-06-07 23:39:58 +02:00
Jordan Hrycaj	392088e5e9	Coredb fix storage tree issues (#2317 ) * Code cosmetics * Re-org `aristo_merge`, internally split into sub-modules why: Became a burden for maintenance because it hosts two different functionalities under the same merge paradigm: account/data merge and snap proof merge where the latter produces a partial trie. * Fix CoreDb tracer * Ledger: fix potential account vs. storage tree sync problems * Remove bound on the size of removable whole storage trees * Activate `test_tracer_json`	2024-06-07 10:56:31 +00:00
Jacek Sieka	c5b3081828	eth: bump (#2308 ) * eth: bump Speed up basic operations like hashing and creating RLP:s - up to 25% improvement in certain block ranges! ``` 876729c.csv /data/nimbus_stats/stats-20240605_2204-ed4f6221.csv stats-20240605_2000-c876729c.csv vs stats-20240605_2204-ed4f6221.csv bps_x bps_y tps_x tps_y bpsd tpsd timed block_number (500001, 888889] 1,017.72 996.07 1,784.96 1742.438676 -2.72% -2.72% 3.31% (888889, 1277778] 528.00 536.30 2,159.79 2198.781046 1.69% 1.69% -1.44% (1277778, 1666667] 324.29 317.78 2,064.48 2008.106377 -2.82% -2.82% 3.33% (1666667, 2055556] 253.87 258.74 1,840.94 1872.935273 1.67% 1.67% -1.39% (2055556, 2444445] 175.79 178.66 1,340.61 1363.248939 0.93% 0.93% -0.74% (2444445, 2833334] 137.27 159.74 958.75 1113.323757 14.24% 14.24% -10.69% (2833334, 3222223] 170.48 228.63 1,272.70 1704.047195 34.41% 34.41% -25.17% (3222223, 3611112] 127.49 125.48 1,572.39 1548.835791 -1.19% -1.19% 1.47% (3611112, 4000001] 37.25 40.42 1,100.65 1184.740493 9.58% 9.58% -7.04% blocks: 3501696, baseline: 11h59m40s, contender: 11h21m38s bpsd (mean): 6.18% tpsd (mean): 6.18% Time (sum): -38m1s, -4.26% bpsd = blocks per sec diff (+), tpsd = txs per sec diff, timed = time to process diff (-) + = more is better, - = less is better ``` * ignore gitignore	2024-06-06 23:39:09 +00:00
Jordan Hrycaj	e9eae4df70	Core db disable legacy api n remove distinct tries (#2299 ) * CoreDb: Remove crufty second/off-site KVT why: Was used to allow late `Clique` to store directly to disk * CoreDb: Remove prune flag related functionality why: Is completely legacy stuff * CoreDb: Remove dependence on legacy API (tests unsupported yet) why: Does not fully support Aristo * Re-factoring `state_db` using new API details: Only minimum changes needed to compile `nimbus` * Update tests and aux modules * Turn off legacy API and remove `distinct_tries` comment: The legacy API has now cruft status, will be removed soon * Fix copyright years * Update rpc for verified proxy --------- Co-authored-by: Jacek Sieka <jacek@status.im>	2024-06-05 20:52:04 +00:00
Jordan Hrycaj	8985535ab2	Core db+aristo updates n fixes (#2298 ) * Fix `blobify()` for `SavedState` object why: Have to treat varying sizes for `HashKey`, i.p. for an empty key which has zero size. * Store correct block number in `SavedState` record why: Stored `block-number - 1` for some obscure reason. * Cosmetcs, docu	2024-06-05 18:17:50 +00:00
Jacek Sieka	c876729c4d	Add some basic rocksdb options to command line (#2286 ) These options are there mainly to drive experiments, and are therefore hidden. One thing that this PR brings in is an initial set of caches and buffers for rocksdb - the set that I've been using during various performance tests to get to a viable baseline performance level.	2024-06-05 17:08:29 +02:00
Jacek Sieka	95a4adc1e8	use statically linked rocksdb on linux/mac, dll on windows (#2291 ) The `rocksdb` version shipped with distributions is typically old and therefore often lacks features we use - it also doesn't match the one assumed by nim-rocksdb leading to ABI mismatch risks. Instead of depending on the system rocksdb, we'll now use the rocksdb version assumed by nim-rocksdb and locked in its vendor folder by always building it together with nimbus. This avoids the problem of unknown rocksdb versions at a (small) cost to build time. CI caching and full windows support for building from source [remains TODO](https://github.com/status-im/nim-rocksdb/issues/44).	2024-06-04 18:15:33 +02:00
Jordan Hrycaj	69a158864c	Remove vid recycling feature (#2294 )	2024-06-04 15:05:13 +00:00
Jordan Hrycaj	cc909c99f2	Fix crash in de-serialiser (#2289 ) why: Late change from `Hash256` to `HashKey` without fully updating the serialiser.	2024-06-04 10:38:11 +00:00
Jordan Hrycaj	f926222fec	Aristo cull journal related stuff (#2288 ) * Remove all journal related stuff * Refactor function names journal() => delta(), filter() => delta() * remove `trg` fileld from `FilterRef` why: Same as `kMap[$1]` * Re-type FilterRef.src as `HashKey` why: So it is directly comparable to `kMap[$1]` * Moved `vGen[]` field from `LayerFinalRef` to `LayerDeltaRef` why: Then a separate `FilterRef` type is not needed, anymore * Rename `roFilter` field in `AristoDbRef` => `balancer` why: New name more appropriate. * Replace `FilterRef` by `LayerDeltaRef` type why: This allows to avoid copying into the `balancer` (see next patch set) most of the time. Typically, only one instance is running on the backend and the `balancer` is only used as a stage before saving data. * Refactor way how to store data persistently why: Avoid useless copy when staging `top` layer for persistently saving to backend. * Fix copyright header?	2024-06-03 20:10:35 +00:00
Jacek Sieka	7f76586214	Speed up account ledger a little (#2279 ) `persist` is a hotspot when processing blocks because it is run at least once per transaction and loops over the entire account cache every time. Here, we introduce an extra `dirty` map that keeps track of all accounts that need checking during `persist` which fixes the immediate inefficiency, though probably this could benefit from a more thorough review - we also get rid of the unused clearCache flag - we start with a fresh cache on every fresh vmState. * avoid unnecessary code hash comparisons * avoid unnecessary copies when iterating * use EMPTY_CODE_HASH throughout for code hash comparison	2024-06-02 21:21:29 +02:00
Jacek Sieka	ef864ba167	disable vid reuse compaction (#2276 ) The current implementation cannot be used practically since it causes several full reallocations of the whole free list per deletion - it needs to be reimplemented, or the chain cannot practically progress beyond ~2.5M blocks where a lot of removals happen. Co-authored-by: tersec <tersec@users.noreply.github.com>	2024-06-01 17:14:54 +02:00
Jacek Sieka	9f879406f3	append instead of reallocate in blobify (#2277 ) ...otherwise, we get lots and lots of temporary allocations of seq's	2024-06-01 17:13:24 +02:00
andri lim	f9765e617b	Early exit from some of CoreDbRef functions if nothing to do (#2259 ) * Early exit from some of CoreDbRef functions if nothing to do * More exits * persistReceipts early exit if nothing to do	2024-06-01 16:54:02 +02:00
tersec	cfbbcda4f7	rm unused Nim modules (#2270 )	2024-06-01 17:49:46 +07:00
Jordan Hrycaj	bda760f41d	Run coredb without journal (#2266 ) * Add persistent last state stamp feature why: This allows to run `CoreDb` without journal * Start `CoreDb` without journal * Remove journal related functions from `CoredDb`	2024-05-31 17:32:22 +00:00
Jordan Hrycaj	0f430c70fd	Aristo avoid storage trie update race conditions (#2251 ) * Update TDD suite logger output format choices why: New format is not practical for TDD as it just dumps data across a wide range (considerably larder than 80 columns.) So the new format can be turned on by function argument. * Update unit tests samples configuration why: Slightly changed the way to find the `era1` directory * Remove compiler warnings (fix deprecated expressions and phrases) * Update `Aristo` debugging tools * Always update the `storageID` field of account leaf vertices why: Storage tries are weekly linked to an account leaf object in that the `storageID` field is updated by the application. Previously, `Aristo` verified that leaf objects make sense when passed to the database. As a consequence * the database was inconsistent for a short while * the burden for correctness was all on the application which led to delayed error handling which is hard to debug. So `Aristo` will internally update the account leaf objects so that there are no race conditions due to the storage trie handling * Aristo: Let `stow()`/`persist()` bail out unless there is a `VertexID(1)` why: The journal and filter logic depends on the hash of the `VertexID(1)` which is commonly known as the state root. This implies that all changes to the database are somehow related to that. * Make sure that a `Ledger` account does not overwrite the storage trie reference why: Due to the abstraction of a sub-trie (now referred to as column with a hash describing its state) there was a weakness in the `Aristo` handler where an account leaf could be overwritten though changing the validity of the database. This has been changed and the database will now reject such changes. This patch fixes the behaviour on the application layer. In particular, the column handle returned by the `CoreDb` needs to be updated by the `Aristo` database state. This mitigates the problem that a storage trie might have vanished or re-apperaed with a different vertex ID. * Fix sub-trie deletion test why: Was originally hinged on `VertexID(1)` which cannot be wholesale deleted anymore after the last Aristo update. Also, running with `VertexID(2)` needs an artificial `VertexID(1)` for making `stow()` or `persist()` work. * Cosmetics * Activate `test_generalstate_json` * Temporarily `deactivate test_tracer_json` * Fix copyright header --------- Co-authored-by: jordan <jordan@dry.pudding> Co-authored-by: Jacek Sieka <jacek@status.im>	2024-05-30 17:48:38 +00:00
Jacek Sieka	919242c98e	results: use canonical import (#2248 )	2024-05-30 14:54:03 +02:00
andri lim	0a07425112	Cleanup unused raises in evm/state and other obsolete informations (#2243 )	2024-05-30 09:03:54 +00:00
tersec	674394b924	fix import path; force refc memory management even with Nim 2.0+ (#2241 )	2024-05-29 20:47:06 +02:00
tersec	c466edfd8d	change an Aristo function name to avoid Nim stdlib ambiguity (#2240 )	2024-05-29 15:08:00 +00:00
andri lim	eaf3d9897e	Simplify AccountsLedgerRef complexity (#2239 )	2024-05-29 13:06:49 +02:00
Jordan Hrycaj	3a62250d04	Make test op memory work again (#2236 ) * Remove crufty `pruneTrie` arguments * Replaced legacy `distinct_trie` logic by new `ledger` functionality why: The module `distinct_trie` is supported by `Aristo` in trivial cases. * Activate `test_op_memory`	2024-05-28 14:24:10 +00:00
Jacek Sieka	08e98eb385	restore a few tests, cleanup (#2234 ) * remove `compensateLegacySetup`, `localDbOnly` * enable trivially fixable tests	2024-05-28 14:49:35 +02:00

1 2 3 4 5 ...

448 Commits