Commit Graph

28 Commits

Author SHA1 Message Date
Jordan Hrycaj 1452e7b1c0
Misc updates (#2513)
* Update config for Ledger and CoreDb

why:
  Prepare for tracer which depends on the API jump table (as well as
  the profiler.) The API jump table is now enabled in unit/integration
  test mode piggybacking on the `unittest2DisableParamFiltering`
  compiler flag or on an extra compiler flag `dbjapi_enabled`.

* No deed for error field in `NodeRef`

why:
  Was opnly needed by proof nodes pre-loader which will be re-implemented

* Cosmetics
2024-07-22 18:10:04 +00:00
Jordan Hrycaj f08178c592
Separate constructor helpers for core db and ledger (#2480)
* Extract `CoreDb` constructor helpers from `base.nim` into separate module

why:
  This makes it easier to avoid circular imports.

* Extract `Ledger` constructor helpers from `base.nim` into separate module

why:
  Move `accounts_ledger.nim` file to sub-folder `backend`. That way the
  layout resembles that of the `core_db`.
2024-07-12 19:32:31 +00:00
Jordan Hrycaj b924fdcaa7
Separate config for core db and ledger (#2479)
* Updates and corrections

* Extract `CoreDb` configuration from `base.nim` into separate module

why:
  This makes it easier to avoid circular imports, in particular
  when the capture journal (aka tracer) is revived.

* Extract `Ledger` configuration from `base.nim` into separate module

why:
  This makes it easier to avoid circular imports (if any.)

also:
  Move `accounts_ledger.nim` file to sub-folder `backend`. That way the
  layout resembles that of the `core_db`.
2024-07-12 13:12:25 +00:00
Jacek Sieka 81e75622cf
storage: store root id together with vid, for better locality of refe… (#2449)
The state and account MPT:s currenty share key space in the database
based on that vertex id:s are assigned essentially randomly, which means
that when two adjacent slot values from the same contract are accessed,
they might reside at large distance from each other.

Here, we prefix each vertex id by its root causing them to be sorted
together thus bringing all data belonging to a particular contract
closer together - the same effect also happens for the main state MPT
whose nodes now end up clustered together more tightly.

In the future, the prefix given to the storage keys can also be used to
perform range operations such as reading all the storage at once and/or
deleting an account with a batch operation.

Notably, parts of the API already supported this rooting concept while
parts didn't - this PR makes the API consistent by always working with a
root+vid.
2024-07-04 15:46:52 +02:00
Jordan Hrycaj 6dc2773957
Only use pre hashed addresses as account keys (#2424)
* Normalised storage tree addressing in function prototypes

detail:
  Argument list is always `<db> <account-path> <slot-path> ..` with
  both path arguments as `openArray[]`

* Remove cruft

* CoreDb internally Use full account paths rather than addresses

* Update API logging

* Use hashed account address only in prototypes

why:
  This avoids unnecessary repeated hashing of the same account address.
  The burden of doing that is upon the application. In the case here,
  the ledger caches all kinds of stuff anyway so it is common sense to
  exploit that for account address hashes.

caveat:
  Using `openArray[byte]` argument types for hashed accounts is inherently
  fragile. In non-release mode, a length verification `doAssert` is
  enabled by default.

* No accPath in data record (use `AristoAccount` as `CoreDbAccount`)

* Remove now unused `eAddr` field from ledger `AccountRef` type

why:
  Is duplicate of lookup key

* Avoid merging the account record/statement in the ledger twice.
2024-06-27 19:21:01 +00:00
Jordan Hrycaj 61bbf40014
Update storage tree admin (#2419)
* Tighten `CoreDb` API for accounts

why:
  Apart from cruft, the way to fetch the accounts state root via a
  `CoreDbColRef` record was unnecessarily complicated.

* Extend `CoreDb` API for accounts to cover storage tries

why:
  In future, this will make the notion of column objects obsolete. Storage
  trees will then be indexed by the account address rather than the vertex
  ID equivalent like a `CoreDbColRef`.

* Apply new/extended accounts API to ledger and tests

details:
  This makes the `distinct_ledger` module obsolete

* Remove column object constructors

why:
  They were needed as an abstraction of MPT sub-trees including storage
  trees. Now, storage trees are handled by the account (e.g. via address)
  they belong to and all other trees can be identified by a constant well
  known vertex ID. So there is no need for column objects anymore.

  Still there are some left-over column object methods wnich will be
  removed next.

* Remove `serialise()` and `PayloadRef` from default Aristo API

why:
  Not needed. `PayloadRef` was used for unstructured/unknown payload
  formats (account or blob) and `serialise()` was used for decodng
  `PayloadRef`. Now it is known in advance what the payload looks
  like.

* Added query function `hasStorageData()` whether a storage area exists

why:
  Useful for supporting `slotStateEmpty()` of the `CoreDb` API

* In the `Ledger` replace `storage.stateEmpty()` by 	`slotStateEmpty()`

* On Aristo, hide the storage root/vertex ID in the `PayloadRef`

why:
  The storage vertex ID is fully controlled by Aristo while the
  `AristoAccount` object is controlled by the application. With the
  storage root part of the `AristoAccount` object, there was a useless
  administrative burden to keep that storage root field up to date.

* Remove cruft, update comments etc.

* Update changed MPT access paradigms

why:
  Fixes verified proxy tests

* Fluffy cosmetics
2024-06-27 09:01:26 +00:00
Jacek Sieka f294d1e086
Clear account cache after each block (#2411)
When processing long ranges of blocks, the account cache grows unbounded
which cause huge memory spikes.

Here, we move the cache to a second-level cache after each block - the
second-level cache is cleared on the next block after that which creates
a simple LRU effect.

There's a small performance cost of course, though overall the freed-up
memory can now be reassigned to the rocksdb row cache which not only
makes up for the loss but overall leads to a performance increase.

The bump to 2gb of rocksdb row cache here needs more testing but is
slightly less and loosely basedy on the savings from this PR and the
circular ref fix in #2408 - another way to phrase this is that it's
better to give rocksdb more breathing room than let the memory sit
unused until circular ref collection happens ;)
2024-06-25 07:30:32 +02:00
Jacek Sieka 768307d91d
Cache code and invalid jump destination tables (fixes #2268) (#2404)
It is common for many accounts to share the same code - at the database
level, code is stored by hash meaning only one copy exists per unique
program but when loaded in memory, a copy is made for each account.

Further, every time we execute the code, it must be scanned for invalid
jump destinations which slows down EVM exeuction.

Finally, the extcodesize call causes code to be loaded even if only the
size is needed.

This PR improves on all these points by introducing a shared
CodeBytesRef type whose code section is immutable and that can be shared
between accounts. Further, a dedicated `len` API call is added so that
the EXTCODESIZE opcode can operate without polluting the GC and code
cache, for cases where only the size is requested - rocksdb will in this
case cache the code itself in the row cache meaning that lookup of the
code itself remains fast when length is asked for first.

With 16k code entries, there's a 90% hit rate which goes up to 99%
during the 2.3M attack - the cache significantly lowers memory
consumption and execution time not only during this event but across the
board.
2024-06-21 09:44:10 +02:00
Jordan Hrycaj 081cb15493
Coredb maintenance (#2398)
* CoreDb: remove PHK tries

why:
  There is no general use anymore for an MPT with a pre-hashed key. It
  was used to resemble the `SecureHexaryTrie` logic from the legacy DB.

  The only pace where this is needed is the `Leger` which uses a
  a distinct MPT version anyway (see `distinct_ledgers.nim`.)

* Rename `CoreDx*` -> `CoreDb*`

why:
  The naming `CoreDx*` was used to differentiate the new CoreDb API from
  the legacy API which had descriptors named `CoreDb*`.
2024-06-19 14:13:12 +00:00
andri lim 69044dda60
Remove AccountStateDB (#2368)
* Remove AccountStateDB

AccountStateDB should no longer be used.
It's usage have been reduce to read only operations.
Replace it with LedgerRef to reduce maintenance burden.

* remove extra spaces

Co-authored-by: tersec <tersec@users.noreply.github.com>

---------

Co-authored-by: tersec <tersec@users.noreply.github.com>
2024-06-16 10:21:02 +07:00
web3-developer db8c5b90bd
Cleanup stateless and block witness code. (#2295)
* Cleanup unneeded stateless and block witness code. Keeping MultiKeys which is used in the eth_getProofsByBlockNumber RPC endpoint which is needed for the Fluffy state network bridge.

* Rename generateWitness flag to collectWitnessData to better describe what the flag does. We only collect the keys of the touched accounts and storage slots but no block witness generation is supported for now.

* Move remaining stateless code into nimbus directory.

* Add vmstate parameter to ChainRef to fix test.

* Exclude *.in from check copyright year

---------

Co-authored-by: jangko <jangko128@gmail.com>
2024-06-08 15:05:00 +07:00
Jordan Hrycaj e9eae4df70
Core db disable legacy api n remove distinct tries (#2299)
* CoreDb: Remove crufty second/off-site KVT

why:
  Was used to allow late `Clique` to store directly to disk

* CoreDb: Remove prune flag related functionality

why:
  Is completely legacy stuff

* CoreDb: Remove dependence on legacy API (tests unsupported yet)

why:
  Does not fully support Aristo

* Re-factoring `state_db` using new API

details:
  Only minimum changes needed to compile `nimbus`

* Update tests and aux modules

* Turn off legacy API and remove `distinct_tries`

comment:
  The legacy API has now cruft status, will be removed soon

* Fix copyright years

* Update rpc for verified proxy

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2024-06-05 20:52:04 +00:00
Jacek Sieka 7f76586214
Speed up account ledger a little (#2279)
`persist` is a hotspot when processing blocks because it is run at least
once per transaction and loops over the entire account cache every time.

Here, we introduce an extra `dirty` map that keeps track of all accounts
that need checking during `persist` which fixes the immediate
inefficiency, though probably this could benefit from a more thorough
review - we also get rid of the unused clearCache flag - we start with
a fresh cache on every fresh vmState.

* avoid unnecessary code hash comparisons
* avoid unnecessary copies when iterating
* use EMPTY_CODE_HASH throughout for code hash comparison
2024-06-02 21:21:29 +02:00
andri lim eaf3d9897e
Simplify AccountsLedgerRef complexity (#2239) 2024-05-29 13:06:49 +02:00
Jordan Hrycaj 2c322054e0
Attempt to roll back stateless mode implementation in a single PR (#2209)
* Attempt to roll back stateless mode implementation in a single PR

why:
+ Stateless mode is not fully working and in the way
+ Single PR should make it feasible to investigate for a possible
  re-implementation

* Fix copyright year

* Fix annotation for exception (evmc mode)
2024-05-22 21:01:19 +00:00
Jordan Hrycaj 7d9e1d8607
Misc updates for full sync (#2140)
* Code cosmetics

* Aristo+Kvt: Fix api wrappers

why:
  Api setup killed the backend descriptor when backend mapping was
  disabled.

* Aristo: Implement masked profiling entries

why:
  Database backend should be listed but not counted in tally

* CoreDb: Simplify backend() methods

why:
  DBMS backend access Was provided very early and over engineered. Now
  there are only two backend machines, one for `Kvt` and the other one
  for an `Mpt` available only via new API.

* CoreDb: Code cleanup regarding descriptor types

* CoreDb: Refactor/redefine `persistent()` methods

why:
  There were `persistent()` methods for any type of caching storage
  facilities `Kvt`, `Mpt`, `Phk`, and `Acc`. Now there is only a single
  `persistent()` method storing all facilities in tandem (similar to
  how transactions work.)

  For non shared `Kvt` tables, there is now an extra storage method
  `saveOffSite()`.

* CoreDb lingo update: `trie` becomes `column`

why:
  Notion of a `trie` is pretty much hidden by the new `CoreDb` api.
  Revealed are sort of database columns for accounts an storage data,
  any of which have an internal state represented by a Keccack hash.
  So a `trie` or `MPT` becomes a `column` and a `rootHash` becomes a
  column state.

* Aristo: rename backend filed `filters` => `journal`

* Update full sync logging

details:
  + Disable eth handler noise while syncing
  + Log journal depth (if available)

* Fix copyright year

* Fix cruft and unwanted imports
2024-04-19 18:37:27 +00:00
jangko 7a941c8ca0
Rename hasCodeOrNonce to contractCollision 2024-04-16 09:31:10 +07:00
Jordan Hrycaj 14a5f46d13
Core db implement ctx layer for mpt state admin (#2082)
* CoreDb+Ledger: Update logging

why:
  Use symbol `api` rather than `ctx` because the latter will be used
  as name for particular objects

* CoreDb: Remove cruft

* CoreDb: Remove `TxID` support

why:
  It is nowhere used and ugly implemented. The upcoming context layer
  will be a cleaner alternative to use, instead should this particular
  functionality be needed.

* CoreDb: Rearrange base methods in source code for better reading

* CoreDb+Aristo: Update API closures for better reading & maintenance

* CoreDb: Implement context layer for MPT

why:
  On `Aristo` the context layer allows to manage different views on
  the same backend database. This is an abstraction of the legacy
  hexary trie which can be localised on a particular root nose.

details:
  The `ctx` context provides the state (equiv. to state root) of the
  database for MPT and account descriptors.

* Fix Copyright headers
2024-03-18 19:40:23 +00:00
Jordan Hrycaj 587ca3abbe
Coredb use stackable api for aristo backend (#2060)
* Aristo/Kvt: Provide function hooks APIs

why:
  These APIs can be used for installing tracers, profiling functoinality,
  and other niceties on the databases.

* Aristo: Provide optional API profiling

details:
  It basically is a re-implementation of the `CoreDb` profiling
  implementation

* Kvt: Provide optional API profiling similar to `Aristo`

* CoreDb: Re-implementing profiling using `aristo_profile`

* Ledger: Re-implementing profiling using `aristo_profile`

* CoreDb: Update unit tests for maintainability

* update copyright dates
2024-02-29 21:10:24 +00:00
andri lim 7722a907dc
Implement getAccessList of db/ledger (#2055) 2024-02-29 18:16:47 +07:00
andri lim 6ff2edc416
Fix styles (#2046)
* Fix styles

* Fix copyright year
2024-02-21 23:04:59 +07:00
Jordan Hrycaj 13f51939f6
Core db aristo hasher profiling and timing improvement (#1938)
* Explicitly use shared `Kvt` table on `Ledger` and `Clique` lookup.

why:
  Speeds up lookup time with `Aristo` backend. For writing `Clique` data,
  the `Companion` model allows to write `Clique` data past the database
  locked by evm transactions.

* Implement `CoreDb` profiling with API tracking

why:
  Chasing time spent per APT procs ...

* Implement `Ledger` profiling with API tracking

why:
  Chasing time spent per APT procs ...

* Always hashify when commiting or storing

why:
  A dirty cache makes no sense when committing

* Make sure that a zero key is created when adding/updating vertices

why:
  This is an error fix mainly for edge cases. A typical error was
  that the root key got deleted when there were only a few vertices
  left on the DB.

* Need all created and changed vertices zero-keyed on the cache

why:
  A zero key (i.e. empty Merkle hash) indicates that a vertex key
  needs to be updated. This would not be needed immediately after
  a merge as there is an actual leaf path on the cache layer. But
  after subsequent merge and delete operations this information
  might get blurred.

* Re-org hashing algorithm

why:
  Apart from errors, the previous implementation was too slow for
  two reasons:
  + some control hashes were calculated for debugging (now all
    verification is done in `aristo_check` module)
  + the leaf paths stored on the cache are used to build the
    labelling (aka hashing) schedule; there paths were accumulated
    over successive hash sessions although it is clear that all
    keys were generated, already
2023-12-12 17:47:41 +00:00
Jordan Hrycaj 5462c05dc6
Core db update api tracking (#1907)
* Fix copyright year

* Show elapsed times with enabled `CoreDb` API tracking

* Show elapsed times with enabled `LedgerRef` API tracking

* Reorg `CoreDb` auto destructors for `Aristo` DB

why:
  While `Aristo` supports some parallelism for concurrent database access,
  this comes with a price of management overhead. With a naive approach,
  the auto-destructor will slow down execution because the ledger and
  evm treat the database in a shared mode where a DB descriptor is just
  created and thrown away shortly after.

  This is reflected in the `Coredb` abstraction layer above `Aristo`/`Kvt`
  where a few `Shared` type descriptors are cached and a shared reference
  is returned rather than a disposable new object.

* For `CoreDb` support transaction level tracking

details:
  This is mainly an extra for the legacy DB as `Aristo` and `Kvt` support
  this already.

  Also return an error on the legacy DB backend when `persistent()` is
  called while there are transactions pending (the `persistent()` call
  does nothing otherwise on the legacy backend.)

* Clear compiler warnings (remove unused variables etc.)
2023-11-24 22:16:21 +00:00
Jordan Hrycaj 610e2d338d
Core db fix legacy db root vertex fetcher (#1899)
* Using different `tmp` directories for `Kvt` and `Aristo`

why:
  Closing one database would leave the other set of directories
  incomplete.

* Code cosmetics, silence compiler

* Fix typo `EMPTY_ROOT_HASH` vs. `EMPTY_CODE_HASH`

* Fix copyright years
2023-11-20 20:22:27 +00:00
Jordan Hrycaj 3e88589eb1
Optional accounts cache module for creating genesis (#1897)
* Split off `ReadOnlyStateDB` from `AccountStateDB` from `state_db.nim`

why:
  Apart from testing, applications use `ReadOnlyStateDB` as an easy
  way to access the accounts ledger. This is well supported by the
  `Aristo` db, but writable mode is only parially supported.

  The writable AccountStateDB` object for modifying accounts is not
  used by production code.

  So, for lecgacy and testing apps, the full support of the previous
  `AccountStateDB` is now enabled by `import db/state_db/read_write`
  and the `import db/state_db` provides read-only mode.

* Encapsulate `AccountStateDB` as `GenesisLedgerRef` or genesis creation

why:
  `AccountStateDB` has poor support for `Aristo` and is not widely used
   in favour of `AccountsLedger` (which will be abstracted as `ledger`.)

   Currently, using other than the `AccountStateDB` ledgers within the
   `GenesisLedgerRef` wrapper is experimental and test only. Eventually,
    the wrapper should disappear so that the `Ledger` object (which
    encapsulates `AccountsCache` and `AccountsLedger`) will prevail.

* For the `Ledger`, provide access to raw accounts `MPT`

why:
  This gives to the `CoreDbMptRef` descriptor from the `CoreDb` (which is
  the legacy version of 	CoreDxMptRef`.) For the new `ledger` API, the
  accounts are based on the `CoreDxMAccRef` descriptor which uses a
  particular sub-system for accounts while legacy applications use the
  `CoreDbPhkRef` equivalent of the `SecureHexaryTrie`.

  The only place where this feature will currently be used is the
 `genesis.nim` source file.

* Fix `Aristo` bugs, missing boundary checks, typos, etc.

* Verify root vertex in `MPT` and account constructors

why:
  Was missing so far, in particular the accounts constructor must
  verify `VertexID(1)

* Fix include file
2023-11-20 11:51:43 +00:00
Jordan Hrycaj c47f021596
Core db and aristo updates for destructor and tx logic (#1894)
* Disable `TransactionID` related functions from `state_db.nim`

why:
  Functions `getCommittedStorage()` and `updateOriginalRoot()` from
  the `state_db` module are nowhere used. The emulation of a legacy
  `TransactionID` type functionality is administratively expensive to
  provide by `Aristo` (the legacy DB version is only partially
  implemented, anyway).

  As there is no other place where `TransactionID`s are used, they will
  not be provided by the `Aristo` variant of the `CoreDb`. For the
  legacy DB API, nothing will change.

* Fix copyright headers in source code

* Get rid of compiler warning

* Update Aristo code, remove unused `merge()` variant, export `hashify()`

why:
  Adapt to upcoming `CoreDb` wrapper

* Remove synced tx feature from `Aristo`

why:
+ This feature allowed to synchronise transaction methods like begin,
  commit, and rollback for a group of descriptors.
+ The feature is over engineered and not needed for `CoreDb`, neither
  is it complete (some convergence features missing.)

* Add debugging helpers to `Kvt`

also:
  Update database iterator, add count variable yield argument similar
  to `Aristo`.

* Provide optional destructors for `CoreDb` API

why;
  For the upcoming Aristo wrapper, this allows to control when certain
  smart destruction and update can take place. The auto destructor works
  fine in general when the storage/cache strategy is known and acceptable
  when creating descriptors.

* Add update option for `CoreDb` API function `hash()`

why;
  The hash function is typically used to get the state root of the MPT.
  Due to lazy hashing, this might be not available on the `Aristo` DB.
  So the `update` function asks for re-hashing the gurrent state changes
  if needed.

* Update API tracking log mode: `info` => `debug

* Use shared `Kvt` descriptor in new Ledger API

why:
  No need to create a new descriptor all the time
2023-11-16 19:35:03 +00:00
Jordan Hrycaj 3198ad1bbd
Fix default pruning for ledger and update core db and ledger logging (#1861)
* Make sure that storage tries are not pruned (by default) on the new Ledger API

why:
  Pruning might kill some unwanted entries from storage tries ending up with an unstable database
  leading to crashes.

* Implement `CoreDb` and `LedgerRef` API tracing

details:
+ Locally enabled at compile time via constants `ProvideCoreDbLegacyAPI`
  and `EnableApiTracking` in either `base.nim` source
+ If enabled it can be selectively turned on/off via public switches in
  the `CoreDb` descriptor.

* Allow suppressing opportunistic `ifNecessaryGetXxx()` functions

why:
  Better troubleshooting when the system crashes (assertions will then
  most probably happen outside an `async` function.)
2023-10-25 15:03:09 +01:00
Jordan Hrycaj e8ad950e0a
Ledger abstraction for accounts cache (#1824)
* Provide TDD/debug facility for inspecting `persistBlocks()` working

detail:
+ Make sure that the last block of a test sample is the first batch
  item in `persistBlocks()`.
+ Additionally, allow `AccountsCache` API tracing by setting the flag
  `extraTraceMessages = true` in the file `accounts_cache.nim`

* Overload AccountsCache by abstraction wrapper

details:
  Can facilitate CoreDb API switch, details in `ledger/README.md`.
2023-10-18 20:27:22 +01:00