Commit Graph

128 Commits

Author SHA1 Message Date
Jordan Hrycaj 5b6ccddaa0
Db folder sources and related remove compiler warnings (#2673)
* Aristo: Rename `Hash256` -> `Hash32`

* CoreDb: Rename `Hash256` -> `Hash32`

* Ledger: Rename `Hash256` -> `Hash32`

* StorageTypes: Rename `Hash256` -> `Hash32`

* Aristo: Rename `Blob` -> `seq[byte]`, `keccakHash` -> `keccak256`

* Kvt: Rename `Blob` -> `seq[byte]`

* CoreDb: Rename `Blob` -> `seq[byte]`, `keccakHash` -> `keccak256`

* Ledger: Rename `Blob` -> `seq[byte]`, `keccakHash` -> `keccak256`

* CoreDb: Rename `BlockHeader` -> `Header`, `BlockNonce` -> `Bytes8`

* Misc: Rename `StorageKey` -> `Bytes32`

* Tracer: `Hash256` -> `Hash32`, `BlockHeader` -> `Header`, etc.

* Fix copyright header
2024-10-01 21:03:10 +00:00
Jacek Sieka c210885b73
eth: bump to new types (#2660)
This is a minimal set of changes to make things work with the new types
in nim-eth - this is the minimal PR that merely resolves
incompatibilities while the full change set would include more cleanup
and migration.
2024-09-29 14:37:09 +02:00
Jacek Sieka f3e3c6bbe0
init style for Hash256 (#2661)
* init style for Hash256

https://github.com/status-im/nim-eth/pull/733 updates `Hash256` to
become an array instead of an object - unfortunately, nim does not allow
constructing arrays with `name()`, so this PR changes it to `default`
which works with both.

* lint
2024-09-26 13:24:36 +02:00
Jacek Sieka 5c1e2e7d3b
Migrate `keyed_queue` to `minilru` (#2608)
Compared to `keyed_queue`, `minilru` uses significantly less memory, in
particular for the 32-byte hash keys where `kq` stores several copies of
the key redundantly.
2024-09-13 15:47:50 +02:00
Jordan Hrycaj 75808bc03b
Add portal proof functionality for non-existing keys/paths (#2610) 2024-09-11 09:39:45 +00:00
Jacek Sieka d39c589ec3
lru cache updates (#2590)
* replace rocksdb row cache with larger rdb lru caches - these serve the
same purpose but are more efficient because they skips serialization,
locking and rocksdb layering
* don't append fresh items to cache - this has the effect of evicting
the existing items and replacing them with low-value entries that might
never be read - during write-heavy periods of processing, the
newly-added entries were evicted during the store loop
* allow tuning rdb lru size at runtime
* add (hidden) option to print lru stats at exit (replacing the
compile-time flag)

pre:
```
INF 2024-09-03 15:07:01.136+02:00 Imported blocks
blockNumber=20012001 blocks=12000 importedSlot=9216851 txs=1837042
mgas=181911.265 bps=11.675 tps=1870.397 mgps=176.819 avgBps=10.288
avgTps=1574.889 avgMGps=155.952 elapsed=19m26s458ms
```

post:
```
INF 2024-09-03 13:54:26.730+02:00 Imported blocks
blockNumber=20012001 blocks=12000 importedSlot=9216851 txs=1837042
mgas=181911.265 bps=11.637 tps=1864.384 mgps=176.250 avgBps=11.202
avgTps=1714.920 avgMGps=169.818 elapsed=17m51s211ms
```

9%:ish import perf improvement on similar mem usage :)
2024-09-05 11:18:32 +02:00
Jordan Hrycaj 3c6400673d
Coredb fix kvt save only fringe condition (#2592)
* Cosmetics, spelling, etc.

* Aristo: make sure that a save cycle always commits even when empty

why:
  If `Kvt` is tied to the `Aristo` DB save cycle, then this save cycle
  must also be committed if there is no data to save for `Aristo`.

  Otherwise this will lead to excessive core memory use with some fringe
  condition where Eth headers (or blocks) are downloaded while syncing
  and not really stored on disk.

* CoreDb: Correct persistent save mode

why:
  Saving `Kvt` first is seen as a harbinger (or canary) for `Aristo` as
  both run in sync. If `Kvt` succeeds saving first, so must be `Aristo`
  next. Other than this is a defect.
2024-09-04 13:48:38 +00:00
andri lim 4d9e288340
Wiring ForkedChainRef to other components (#2423)
* Wiring ForkedChainRef to other components

- Disable majority of hive simulators
- Only enable pyspec_sim for the moment
- The pyspec_sim is using a smaller RPC service wired to ForkedChainRef
- The RPC service will gradually grow

* Addressing PR review

* Fix test_beacon/setup_env

* Enable consensus_sim (#2441)

* Enable consensus_sim

* Remove isFile check

* Enable Engine API jwt auth tests and exchange cap tests

* Enable engine api in build_sim.sh

* Wire ForkedChainRef to Engine API newPayload

* Wire Engine API getBodies to ForkedChainRef

* Wire Engine API api_forkchoice to ForkedChainRef

* Wire more RPC methods to ForkedChainRef

* Implement eth_syncing

* Implement eth_call and eth_getlogs

* TxPool: simplify smartHead

* Fix smartHead usage

* Fix txpool headDiff

* Remove hasBlockHeader and use headerExists

* Addressing review
2024-09-04 09:54:54 +00:00
Jacek Sieka 84a72c8658
Use zstd compression in bottommost layer (#2582)
Tested up to block ~14m, zstd uses ~12% less space which seems to result
in a small:ish (2-4%) performance improvement on block import speed -
this seems like a better baseline for more extensive testing in the
future.

Pre: 57383308 kb
Post: 50831236 kb
2024-08-30 17:32:13 +02:00
Jordan Hrycaj 42a08cfba9
Coredb and sync maintenance update (#2583)
* bump metrics

* Remove cruft

* Cosmetics, update some logging, noise control

* Renamed `CoreDb` function `hasKey` => `hasKeyRc` and provided `hasKey`

why:
  Currently, `hasKey` returns a `Result[]` rather than a `bool` which
  is what one would expect from a function prototype of this name.

  This was a bit of an annoyance and cost unnecessary attention.
2024-08-30 11:18:36 +00:00
Jacek Sieka 19451cadff
rebalance rocksdb cache sizes (#2557)
Based on some simple testing done with a few combinations of cache
sizes, it seems that the block cache has grown in importance compared to
the where we were before changing on-disk format and adding a lot of
other point caches.

With these settings, there's roughly a 15% performance increase when
processing blocks in the 18M range over the status quo while memory
usage decreases by more than 1gb!

Only a few values were tested so there's certainly more to do here but
this change sets up a better baseline for any future optimizations.

In particular, since the initial defaults were chosen root vertex id:s
were introduced as key prefixes meaning that storage for each account
will be grouped together and thus it becomes more likely that a block
loaded from disk will be hit multiple times - this seems to give the
block cache an edge over the row cache, specially when traversing the
storage trie.
2024-08-12 05:52:09 +00:00
Jordan Hrycaj 488bdbc267
Provide portal proof functionality with coredb (#2550)
* Provide portal proof functions in `aristo_api`

why:
  So it can be fully supported by `CoreDb`

* Fix prototype in `kvt_api`

* Fix node constructor for account leafs with storage trees

* Provide simple path check based on portal proof functionality

* Provide portal proof functionality in `CoreDb`

* Update TODO list
2024-08-07 11:30:55 +00:00
Jordan Hrycaj 5b502a06c4
Added portal proof nodes generation functionality (#2539)
* Extracted `test_tx.testTxMergeProofAndKvpList()` => separate file

* Fix serialiser

why:
  Typo lead to duplicate rlp-encoded nodes in chain

* Remove cruft

* Implemnt portal proof nodes generators `partXxxTwig()`

* Add unit test for portal proof nodes generator `partAccountTwig()`

* Cosmetics

* Simplify serialiser return code format

* Fix proof generator for extension nodes

why:
  Code was simply bonkers, not detected before the unit tests were
  adapted to check for just this.

* Implemented portal proof nodes verifier `partUntwig()`

* Cosmetics

* Fix `testutp` cli poblem
2024-08-06 11:29:26 +00:00
Jordan Hrycaj 01b5c08763
Revive json tracer unit tests (#2538)
* Some `Aristo` clean-ups/updates

* Re-implemented core-db tracer functionality

* Rename nimbus tracer `no-tracer.nim` => `tracer.nim`

why:
  Restore original name for easy diff tracking with upcoming update

* Update nimbus tracer using new core-db tracer functionality

* Updating json tracer unit tests

* Enable json tracer unit tests
2024-08-01 10:41:20 +00:00
Jordan Hrycaj 72c3ab8ced
Provide partial tree support for preloading tests (#2536)
* Implement partial trees

why:
  This is currently needed for unit tests to pre-load the database
  with test data similar to `proof` node pre-load.

  The basic features for `snap-sync` boundary proofs are available
  as well for future use. What is missing is the final proof verification
  and a complete storage data load/merge function (stub is available.)

* Cosmetics, clean up
2024-07-29 20:15:17 +00:00
Jacek Sieka bdc86b3fd4
small cleanups (#2526)
* remove some redundant EH
* avoid pessimising move (introduces a copy in this case!)
* shift less data around when reading era files (reduces stack usage)
2024-07-26 12:32:01 +07:00
andri lim 254bda365f
Remove txpool sender locality (#2525)
* Remove txpool sender locality

We no longer distinct local or remote sender

* Fix copyright year
2024-07-25 22:36:08 +07:00
Jordan Hrycaj 1452e7b1c0
Misc updates (#2513)
* Update config for Ledger and CoreDb

why:
  Prepare for tracer which depends on the API jump table (as well as
  the profiler.) The API jump table is now enabled in unit/integration
  test mode piggybacking on the `unittest2DisableParamFiltering`
  compiler flag or on an extra compiler flag `dbjapi_enabled`.

* No deed for error field in `NodeRef`

why:
  Was opnly needed by proof nodes pre-loader which will be re-implemented

* Cosmetics
2024-07-22 18:10:04 +00:00
Jordan Hrycaj a84a2131cd
No ext update (#2494)
* Imported/rebase from `no-ext`, PR #2485

  Store extension nodes together with the branch

  Extension nodes must be followed by a branch - as such, it makes sense
  to store the two together both in the database and in memory:

  * fewer reads, writes and updates to traverse the tree
  * simpler logic for maintaining the node structure
  * less space used, both memory and storage, because there are fewer
    nodes overall

  There is also a downside: hashes can no longer be cached for an
  extension - instead, only the extension+branch hash can be cached - this
  seems like a fine tradeoff since computing it should be fast.

  TODO: fix commented code

* Fix merge functions and `toNode()`

* Update `merkleSignCommit()` prototype

why:
  Result is always a 32bit hash

* Update short Merkle hash key generation

details:
  Ethereum reference MPTs use Keccak hashes as node links if the size of
  an RLP encoded node is at least 32 bytes. Otherwise, the RLP encoded
  node value is used as a pseudo node link (rather than a hash.) This is
  specified in the yellow paper, appendix D.

  Different to the `Aristo` implementation, the reference MPT would not
  store such a node on the key-value database. Rather the RLP encoded node value is stored instead of a node link in a parent node
  is stored as a node link on the parent database.

  Only for the root hash, the top level node is always referred to by the
  hash.

* Fix/update `Extension` sections

why:
  Were commented out after removal of a dedicated `Extension` type which
  left the system disfunctional.

* Clean up unused error codes

* Update unit tests

* Update docu

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2024-07-16 19:47:59 +00:00
Jacek Sieka f3a56002ca
Turn payload into value type (#2483)
The Vertex type unifies branches, extensions and leaves into a single
memory area where the larges member is the branch (128 bytes + overhead) -
the payloads we have are all smaller than 128 thus wrapping them in an
extra layer of `ref` is wasteful from a memory usage perspective.

Further, the ref:s must be visited during the M&S phase of garbage
collection - since we keep millions of these, many of them
short-lived, this takes up significant CPU time.

```
Function	CPU Time: Total	CPU Time: Self	Module	Function (Full)	Source File	Start Address
system::markStackAndRegisters	10.0%	4.922s	nimbus	system::markStackAndRegisters(var<system::GcHeap>).constprop.0	gc.nim	0x701230`
```
2024-07-14 12:02:05 +02:00
Jacek Sieka 72947b3647
odds and ends (#2481)
small cleanups to reduce memory allocations
2024-07-13 20:42:49 +02:00
Jordan Hrycaj f08178c592
Separate constructor helpers for core db and ledger (#2480)
* Extract `CoreDb` constructor helpers from `base.nim` into separate module

why:
  This makes it easier to avoid circular imports.

* Extract `Ledger` constructor helpers from `base.nim` into separate module

why:
  Move `accounts_ledger.nim` file to sub-folder `backend`. That way the
  layout resembles that of the `core_db`.
2024-07-12 19:32:31 +00:00
Jordan Hrycaj b924fdcaa7
Separate config for core db and ledger (#2479)
* Updates and corrections

* Extract `CoreDb` configuration from `base.nim` into separate module

why:
  This makes it easier to avoid circular imports, in particular
  when the capture journal (aka tracer) is revived.

* Extract `Ledger` configuration from `base.nim` into separate module

why:
  This makes it easier to avoid circular imports (if any.)

also:
  Move `accounts_ledger.nim` file to sub-folder `backend`. That way the
  layout resembles that of the `core_db`.
2024-07-12 13:12:25 +00:00
Jacek Sieka d07540766f
coredb: tracking fixes (#2476) 2024-07-12 13:40:13 +02:00
Jacek Sieka a6764670f0
merge: avoid hike allocations (#2472)
hike allocations (and the garbage collection maintenance that follows)
are responsible for some 10% of cpu time (not wall time!) at this point
- this PR avoids them by stepping through the layers one step at a time,
simplifying the code at the same time.
2024-07-11 13:26:46 +02:00
Jordan Hrycaj 800fd77333
Core db remove legacy phrases (#2468)
* Rename `newKvt()` -> `ctx.getKvt()`

why:
  Clean up legacy shortcut. Also, the `KVT` returned is not instantiated
  but refers to the shared `KVT` that resides in a context which is a
  generalisation of an in-memory database fork. The function `ctx`
  retrieves the default context.

* Rename `newTransaction()` -> `ctx.newTransaction()`

why:
  Clean up legacy shortcut. The transaction is applied to a context as a
  generalisation of an in-memory database fork. The function `ctx`
  retrieves the default context.

* Rename `getColumn(CtGeneric)` -> `getGeneric()`

why:
  No more a list of well known sub-tries needed, a single one is enough.
  In fact, `getColumn()` did only support a single sub-tree by now.

* Reduce TODO list
2024-07-10 12:19:35 +00:00
andri lim c775c906a2
Fix LedgerRef storage iterator and add test (#2458) 2024-07-05 10:15:48 +00:00
Jacek Sieka 7d78fd97d5
avoid allocations for slot storage (#2455)
Introduce a new `StoData` payload type similar to `AccountData`

* slightly more efficient storage format
* typed api
* fewer seqs
* fix encoding docs - it wasn't rlp after all :)
2024-07-04 23:48:45 +00:00
Jacek Sieka 81e75622cf
storage: store root id together with vid, for better locality of refe… (#2449)
The state and account MPT:s currenty share key space in the database
based on that vertex id:s are assigned essentially randomly, which means
that when two adjacent slot values from the same contract are accessed,
they might reside at large distance from each other.

Here, we prefix each vertex id by its root causing them to be sorted
together thus bringing all data belonging to a particular contract
closer together - the same effect also happens for the main state MPT
whose nodes now end up clustered together more tightly.

In the future, the prefix given to the storage keys can also be used to
perform range operations such as reading all the storage at once and/or
deleting an account with a batch operation.

Notably, parts of the API already supported this rooting concept while
parts didn't - this PR makes the API consistent by always working with a
root+vid.
2024-07-04 15:46:52 +02:00
Jacek Sieka b23795ab39
remove pPrf, fRpp (#2445)
No longer used now that hashify is gone
2024-07-03 22:21:57 +02:00
Jordan Hrycaj ea7c756a9d
Core db reorg (#2444)
* CoreDb: Merged all sub-descriptors into `base_desc` module

* Dissolve `aristo_db/common_desc.nim`

* No need to export `Aristo` methods in `CoreDb`

* Resolve/tighten methods in `aristo_db` sub-moduled

why:
  So they can be straihgt implemented into the `base` module

* Moved/re-implemented `KVT` methods into `base` module

* Moved/re-implemented `MPT` methods into `base` module

* Moved/re-implemented account methods into `base` module

* Moved/re-implemented `CTX` methods into `base` module

* Moved/re-implemented `handler_{aristo,kvt}` into `aristo_db` module

* Moved/re-implemented `TX` methods into `base` module

* Moved/re-implemented base methods into `base` module

* Replaced `toAristoSavedStateBlockNumber()` by proper base method

why:
  Was the last for keeping reason for keeping low level backend access
  methods

* Remove dedicated low level access to `Aristo` backend

why:
  Not needed anymore, for debugging the descriptors can be accessed
  directly

also:
  some clean up stuff

* Re-factor `CoreDb` descriptor layout and adjust base methods

* Moved/re-implemented iterators into `base_iterator*` modules

* Update docu
2024-07-03 15:50:27 +00:00
Jacek Sieka 1f60e8e453
Use `Hash256` directly for account path (#2439)
Account paths are always a hash - passing it around as such helps avoid
confusion as to how long it is
2024-07-03 10:14:26 +02:00
Jacek Sieka c364426422
Smaller in-database representations (#2436)
These representations use ~15-20% less data compared to the status quo,
mainly by removing redundant zeroes in the integer encodings - a
significant effect of this change is that the various rocksdb caches see
better efficiency since more items fit in the same amount of space.

* use RLP encoding for `VertexID` and `UInt256` wherever it appears
* pack `VertexRef`/`PayloadRef` more tightly
2024-07-02 20:25:06 +02:00
web3-developer e163b69261
Bump RocksDb version and enable autoClose on opt types to prevent memory leaks (#2427)
* Bump RocksDb version and enable autoClose on opt types to prevent memory leaks.
2024-07-02 13:44:09 +08:00
Jacek Sieka 3d3831dde8
Small cleanups (#2435)
* avoid costly hike memory allocations for operations that don't need to
re-traverse it
* avoid unnecessary state checks (which might trigger unwanted state
root computations)
* disable optimize-for-hits due to the MPT no longer being complete at
all times
2024-07-01 14:07:39 +02:00
Jordan Hrycaj 2c87fd1636
Aristo code cosmetics and tests update (#2434)
* Update some docu

* Resolve obsolete compile time option

why:
  Not optional anymore

* Update checks

why:
  The notion of what constitutes a valid `Aristo` db has changed due to
  (even more) lazy calculating Merkle hash keys.

* Disable redundant unit test for production
2024-07-01 10:59:18 +00:00
andri lim c24affadee
Use simpler schema when writing transactions, receipts, and withdrawals (#2420)
* Use simpler schema when writing transactions, receipts, and withdrawals

Using MPT not only slow but also take up more spaces than needed.
Aristo will remove older tries and only keep the last block tries.
Using simpler schema will avoid those problems.

* Rename getTransaction to getTransactionByIndex
2024-06-29 12:43:17 +07:00
Jordan Hrycaj 8dd038144b
Some cleanups (#2428)
* Remove `dirty` set from structural objects

why:
  Not used anymore, the tree is dirty by default.

* Rename `aristo_hashify` -> `aristo_compute`

* Remove cruft, update comments, cosmetics, etc.

* Simplify `SavedState` object

why:
  The key chaining have become obsolete after extra lazy hashing. There
  is some available space for a state hash to be maintained in future.

details:
  Accept the legacy `SavedState` object serialisation format for a
  while (which will be overwritten by new format.)
2024-06-28 18:43:04 +00:00
Jordan Hrycaj 14c3772545
On demand mpt revisited (#2426)
* rebased from `github/on-demand-mpt`

ackn:
  wip: on-demand mpt construction

  Given that actual data is stored in the `Vertex` structure, it's useful
  to think of the MPT as a cache for computing roots rather than being a
  functional requirement on its own.

  This PR engenders this line of thinking by incrementally computing the
  MPT only when it's needed, ie when a state (or similar) root is needed.

  This has the effect of siginficantly reducing memory usage as well as
  improving performance:

  * no need for dirty-mpt-node book-keeping
  * no need to build complex forest of upcoming hashing work
  * only hashes that are functionally needed are ever computed -
  intermediate nodes whose MTP root is not observed are never computed /
  processed

* Unit test hot fixes

* Unit test hot fixes cont.

(somehow lost that part)

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2024-06-28 15:03:12 +00:00
Jordan Hrycaj 6dc2773957
Only use pre hashed addresses as account keys (#2424)
* Normalised storage tree addressing in function prototypes

detail:
  Argument list is always `<db> <account-path> <slot-path> ..` with
  both path arguments as `openArray[]`

* Remove cruft

* CoreDb internally Use full account paths rather than addresses

* Update API logging

* Use hashed account address only in prototypes

why:
  This avoids unnecessary repeated hashing of the same account address.
  The burden of doing that is upon the application. In the case here,
  the ledger caches all kinds of stuff anyway so it is common sense to
  exploit that for account address hashes.

caveat:
  Using `openArray[byte]` argument types for hashed accounts is inherently
  fragile. In non-release mode, a length verification `doAssert` is
  enabled by default.

* No accPath in data record (use `AristoAccount` as `CoreDbAccount`)

* Remove now unused `eAddr` field from ledger `AccountRef` type

why:
  Is duplicate of lookup key

* Avoid merging the account record/statement in the ledger twice.
2024-06-27 19:21:01 +00:00
Jordan Hrycaj 61bbf40014
Update storage tree admin (#2419)
* Tighten `CoreDb` API for accounts

why:
  Apart from cruft, the way to fetch the accounts state root via a
  `CoreDbColRef` record was unnecessarily complicated.

* Extend `CoreDb` API for accounts to cover storage tries

why:
  In future, this will make the notion of column objects obsolete. Storage
  trees will then be indexed by the account address rather than the vertex
  ID equivalent like a `CoreDbColRef`.

* Apply new/extended accounts API to ledger and tests

details:
  This makes the `distinct_ledger` module obsolete

* Remove column object constructors

why:
  They were needed as an abstraction of MPT sub-trees including storage
  trees. Now, storage trees are handled by the account (e.g. via address)
  they belong to and all other trees can be identified by a constant well
  known vertex ID. So there is no need for column objects anymore.

  Still there are some left-over column object methods wnich will be
  removed next.

* Remove `serialise()` and `PayloadRef` from default Aristo API

why:
  Not needed. `PayloadRef` was used for unstructured/unknown payload
  formats (account or blob) and `serialise()` was used for decodng
  `PayloadRef`. Now it is known in advance what the payload looks
  like.

* Added query function `hasStorageData()` whether a storage area exists

why:
  Useful for supporting `slotStateEmpty()` of the `CoreDb` API

* In the `Ledger` replace `storage.stateEmpty()` by 	`slotStateEmpty()`

* On Aristo, hide the storage root/vertex ID in the `PayloadRef`

why:
  The storage vertex ID is fully controlled by Aristo while the
  `AristoAccount` object is controlled by the application. With the
  storage root part of the `AristoAccount` object, there was a useless
  administrative burden to keep that storage root field up to date.

* Remove cruft, update comments etc.

* Update changed MPT access paradigms

why:
  Fixes verified proxy tests

* Fluffy cosmetics
2024-06-27 09:01:26 +00:00
Jacek Sieka 9521582005
avoid closure environment for mpt methods (#2408)
An instance of `CoreDbMptRef` is created for and stored in every account
- when we are processing blocks and have many accounts in memory, this
closure environment takes up hundreds of mb of memory (around block 5M,
it is the 4:th largest memory consumer!) - incidentally, this also
removes a circular reference in the setup that causes the
`AristoCodeDbMptRef` to linger in memory much longer than it
has to which is the core reason why it takes so much.

The real solution here is to remove the methods indirection entirely,
but this PR provides relief until that has been done.

Similar treatment is given to some of the other core api functions to
avoid circulars there too.
2024-06-24 07:56:41 +02:00
Jacek Sieka 6b68ff92d3
Allocation-free nibbles buffer (#2406)
This buffer eleminates a large part of allocations during MPT traversal,
reducing overall memory usage and GC pressure.

Ideally, we would use it throughout in the API instead of
`openArray[byte]` since the built-in length limit appropriately exposes
the natural 64-nibble depth constraint that `openArray` fails to
capture.
2024-06-22 22:33:37 +02:00
Jacek Sieka 768307d91d
Cache code and invalid jump destination tables (fixes #2268) (#2404)
It is common for many accounts to share the same code - at the database
level, code is stored by hash meaning only one copy exists per unique
program but when loaded in memory, a copy is made for each account.

Further, every time we execute the code, it must be scanned for invalid
jump destinations which slows down EVM exeuction.

Finally, the extcodesize call causes code to be loaded even if only the
size is needed.

This PR improves on all these points by introducing a shared
CodeBytesRef type whose code section is immutable and that can be shared
between accounts. Further, a dedicated `len` API call is added so that
the EXTCODESIZE opcode can operate without polluting the GC and code
cache, for cases where only the size is requested - rocksdb will in this
case cache the code itself in the row cache meaning that lookup of the
code itself remains fast when length is asked for first.

With 16k code entries, there's a 90% hit rate which goes up to 99%
during the 2.3M attack - the cache significantly lowers memory
consumption and execution time not only during this event but across the
board.
2024-06-21 09:44:10 +02:00
Jordan Hrycaj 081cb15493
Coredb maintenance (#2398)
* CoreDb: remove PHK tries

why:
  There is no general use anymore for an MPT with a pre-hashed key. It
  was used to resemble the `SecureHexaryTrie` logic from the legacy DB.

  The only pace where this is needed is the `Leger` which uses a
  a distinct MPT version anyway (see `distinct_ledgers.nim`.)

* Rename `CoreDx*` -> `CoreDb*`

why:
  The naming `CoreDx*` was used to differentiate the new CoreDb API from
  the legacy API which had descriptors named `CoreDb*`.
2024-06-19 14:13:12 +00:00
Jordan Hrycaj e7be0d185c
Aristo uses pre classified tree types cont2 (#2397)
* Provide dedicated functions for fetching accounts and storage trees

why:
  Different prototypes for each class `account`, `generic` and
  `storage`.

* Remove `fetchPayload()` and other cruft from API, `aristo_fetch`, etc.

* Fix typos, debugging left overs, comments
2024-06-19 12:40:00 +00:00
andri lim 0e5fd3ffc9
LedgerRef: stateOrVoid become stateEmptyOrVoid (#2394) 2024-06-19 14:14:36 +02:00
Jacek Sieka 41cf81f80b
Fix dboptions init (#2391)
For the block cache to be shared between column families, the options
instance must be shared between the various column families being
created. This also ensures that there is only one source of truth for
configuration options instead of having two different sets depending on
how the tables were initialized.

This PR also removes the re-opening mechanism which can double startup
time - every time the database is opened, the log is replayed - a large
log file will take a long time to open.

Finally, several options got correclty implemented as column family
options, including an one that puts a hash index in the SST files.
2024-06-19 10:55:57 +02:00
Jordan Hrycaj 8727307ef4
Aristo uses pre classified tree types cont1 (#2389)
* Provide dedicated functions for deleteing accounts and storage trees

why:
  Storage trees are always linked to an account, so there is no need
  for an application to fiddle about (e.g. re-cycling, unlinking)
  storage tree vertex IDs.

* Remove `delete()` and other cruft from API, `aristo_delete`, etc.

* clean up delete functions

details:
  The delete implementations `deleteImpl()` and `delTreeImpl()` do not
  need to be super generic anymore as all the edge cases are covered by
  the specialised `deleteAccountPayload()`, `deleteGenericData()`, etc.

* Avoid unnecessary re-calculations of account keys

why:
  The function `registerAccountForUpdate()` did extract the storage ID
  (if any) and automatically marked the Merkle keys along the account
  path for re-hashing.

  This would also apply if there was later detected that the account
  or the storage tree did not need to be updated.

  So the `registerAccountForUpdate()` function was split into a part
  which retrieved the storage ID, and another one which marked the
  Merkle keys for re-calculation to be applied only when needed.
2024-06-18 19:30:01 +00:00
Jordan Hrycaj 51f02090b8
Aristo uses pre classified tree types (#2385)
* Remove unused `merge*()` functions (for production)

details:
  Some functionality moved to test suite

* Make sure that only `AccountData` leaf type is exactly used on VertexID(1)

* clean up payload type

* Provide dedicated functions for merging accounts and storage trees

why:
  Storage trees are always linked to an account, so there is no need
  for an application to fiddle about (e.e. creating, re-cycling) with
  storage tree vertex IDs.

* CoreDb: Disable tracer functionality

why:
  Must be updated to accommodate new/changed `Aristo` functions.

* CoreDb: Use new `mergeXXX()` functions

why:
  Makes explicit vertex ID management obsolete for creating new
  storage trees.

* Remove `mergePayload()` and other cruft from API, `aristo_merge`, etc.

* clean up merge functions

details:
  The merge implementation `mergePayloadImpl()` does not need to be super
  generic anymore as all the edge cases are covered by the specialised
  functions `mergeAccountPayload()`, `mergeGenericData()`, and
  `mergeStorageData()`.

* No tracer available at the moment, so disable offending tests
2024-06-18 11:14:02 +00:00