Commit Graph

35 Commits

Author SHA1 Message Date
Jacek Sieka c8cdffa775
Small cleanups (#2414)
* remove unnecessary / expensive error checking
* avoid some trivial memory allocs
* work around table move bug
2024-06-26 09:25:09 +02:00
Jacek Sieka f294d1e086
Clear account cache after each block (#2411)
When processing long ranges of blocks, the account cache grows unbounded
which cause huge memory spikes.

Here, we move the cache to a second-level cache after each block - the
second-level cache is cleared on the next block after that which creates
a simple LRU effect.

There's a small performance cost of course, though overall the freed-up
memory can now be reassigned to the rocksdb row cache which not only
makes up for the loss but overall leads to a performance increase.

The bump to 2gb of rocksdb row cache here needs more testing but is
slightly less and loosely basedy on the savings from this PR and the
circular ref fix in #2408 - another way to phrase this is that it's
better to give rocksdb more breathing room than let the memory sit
unused until circular ref collection happens ;)
2024-06-25 07:30:32 +02:00
Jacek Sieka 768307d91d
Cache code and invalid jump destination tables (fixes #2268) (#2404)
It is common for many accounts to share the same code - at the database
level, code is stored by hash meaning only one copy exists per unique
program but when loaded in memory, a copy is made for each account.

Further, every time we execute the code, it must be scanned for invalid
jump destinations which slows down EVM exeuction.

Finally, the extcodesize call causes code to be loaded even if only the
size is needed.

This PR improves on all these points by introducing a shared
CodeBytesRef type whose code section is immutable and that can be shared
between accounts. Further, a dedicated `len` API call is added so that
the EXTCODESIZE opcode can operate without polluting the GC and code
cache, for cases where only the size is requested - rocksdb will in this
case cache the code itself in the row cache meaning that lookup of the
code itself remains fast when length is asked for first.

With 16k code entries, there's a 90% hit rate which goes up to 99%
during the 2.3M attack - the cache significantly lowers memory
consumption and execution time not only during this event but across the
board.
2024-06-21 09:44:10 +02:00
Jordan Hrycaj 081cb15493
Coredb maintenance (#2398)
* CoreDb: remove PHK tries

why:
  There is no general use anymore for an MPT with a pre-hashed key. It
  was used to resemble the `SecureHexaryTrie` logic from the legacy DB.

  The only pace where this is needed is the `Leger` which uses a
  a distinct MPT version anyway (see `distinct_ledgers.nim`.)

* Rename `CoreDx*` -> `CoreDb*`

why:
  The naming `CoreDx*` was used to differentiate the new CoreDb API from
  the legacy API which had descriptors named `CoreDb*`.
2024-06-19 14:13:12 +00:00
andri lim 0e5fd3ffc9
LedgerRef: stateOrVoid become stateEmptyOrVoid (#2394) 2024-06-19 14:14:36 +02:00
Jacek Sieka 8926da02b6
Fix lowest-hanging fruit in VM (#2382)
* replace set with bitseq for code validity test
* remove unusued code from CodeStream
* avoid unnecessary byte-by-byte copies
2024-06-18 07:55:35 +07:00
Jacek Sieka 9c6fd46a51
avoid computing state root just to know if storage is empty (#2380)
The state root computation here is one of the major hotspots in block
processing - in the cases the code only needs to know if it's empty or
not, it can be done a lot faster.

Adding a separate function for this looks fragile and should probably be
revisited.
2024-06-17 15:29:07 +02:00
andri lim 69044dda60
Remove AccountStateDB (#2368)
* Remove AccountStateDB

AccountStateDB should no longer be used.
It's usage have been reduce to read only operations.
Replace it with LedgerRef to reduce maintenance burden.

* remove extra spaces

Co-authored-by: tersec <tersec@users.noreply.github.com>

---------

Co-authored-by: tersec <tersec@users.noreply.github.com>
2024-06-16 10:21:02 +07:00
Jacek Sieka f6be4bd0ec
avoid initTable (#2328)
`initTable` is obsolete since nim 0.19 and can introduce significant
memory overhead while providing no benefit (since the table will be
grown to the default initial size on first use anyway).

In particular, aristo layers will not necessarily use all tables they
initialize, for exampe when many empty accounts are being created.
2024-06-10 11:05:30 +02:00
web3-developer db8c5b90bd
Cleanup stateless and block witness code. (#2295)
* Cleanup unneeded stateless and block witness code. Keeping MultiKeys which is used in the eth_getProofsByBlockNumber RPC endpoint which is needed for the Fluffy state network bridge.

* Rename generateWitness flag to collectWitnessData to better describe what the flag does. We only collect the keys of the touched accounts and storage slots but no block witness generation is supported for now.

* Move remaining stateless code into nimbus directory.

* Add vmstate parameter to ChainRef to fix test.

* Exclude *.in from check copyright year

---------

Co-authored-by: jangko <jangko128@gmail.com>
2024-06-08 15:05:00 +07:00
Jordan Hrycaj 392088e5e9
Coredb fix storage tree issues (#2317)
* Code cosmetics

* Re-org `aristo_merge`, internally split into sub-modules

why:
  Became a burden for maintenance because it hosts two different
  functionalities under the same merge paradigm: account/data merge
  and snap proof merge where the latter produces a partial trie.

* Fix CoreDb tracer

* Ledger: fix potential account vs. storage tree sync problems

* Remove bound on the size of removable whole storage trees

* Activate `test_tracer_json`
2024-06-07 10:56:31 +00:00
Jacek Sieka 7f76586214
Speed up account ledger a little (#2279)
`persist` is a hotspot when processing blocks because it is run at least
once per transaction and loops over the entire account cache every time.

Here, we introduce an extra `dirty` map that keeps track of all accounts
that need checking during `persist` which fixes the immediate
inefficiency, though probably this could benefit from a more thorough
review - we also get rid of the unused clearCache flag - we start with
a fresh cache on every fresh vmState.

* avoid unnecessary code hash comparisons
* avoid unnecessary copies when iterating
* use EMPTY_CODE_HASH throughout for code hash comparison
2024-06-02 21:21:29 +02:00
Jordan Hrycaj 0f430c70fd
Aristo avoid storage trie update race conditions (#2251)
* Update TDD suite logger output format choices

why:
  New format is not practical for TDD as it just dumps data across a wide
  range (considerably larder than 80 columns.)

  So the new format can be turned on by function argument.

* Update unit tests samples configuration

why:
  Slightly changed the way to find the `era1` directory

* Remove compiler warnings (fix deprecated expressions and phrases)

* Update `Aristo` debugging tools

* Always update the `storageID` field of account leaf vertices

why:
  Storage tries are weekly linked to an account leaf object in that
  the `storageID` field is updated by the application.

  Previously, `Aristo` verified that leaf objects make sense when passed
  to the database. As a consequence
  * the database was inconsistent for a short while
  * the burden for correctness was all on the application which led
    to delayed error handling which is hard to debug.

  So `Aristo` will internally update the account leaf objects so that
  there are no race conditions due to the storage trie handling

* Aristo: Let `stow()`/`persist()` bail out unless there is a `VertexID(1)`

why:
  The journal and filter logic depends on the hash of the `VertexID(1)`
  which is commonly known as the state root. This implies that all
  changes to the database are somehow related to that.

* Make sure that a `Ledger` account does not overwrite the storage trie reference

why:
  Due to the abstraction of a sub-trie (now referred to as column with a
  hash describing its state) there was a weakness in the `Aristo` handler
  where an account leaf could be overwritten though changing the validity
  of the database. This has been changed and the database will now reject
  such changes.

  This patch fixes the behaviour on the application layer. In particular,
  the column handle returned by the `CoreDb` needs to be updated by
  the `Aristo` database state. This mitigates the problem that a storage
  trie might have vanished or re-apperaed with a different vertex ID.

* Fix sub-trie deletion test

why:
  Was originally hinged on `VertexID(1)` which cannot be wholesale
  deleted anymore after the last Aristo update. Also, running with
  `VertexID(2)` needs an artificial `VertexID(1)` for making `stow()`
  or `persist()` work.

* Cosmetics

* Activate `test_generalstate_json`

* Temporarily `deactivate test_tracer_json`

* Fix copyright header

---------

Co-authored-by: jordan <jordan@dry.pudding>
Co-authored-by: Jacek Sieka <jacek@status.im>
2024-05-30 17:48:38 +00:00
Jordan Hrycaj 3a62250d04
Make test op memory work again (#2236)
* Remove crufty `pruneTrie` arguments

* Replaced legacy `distinct_trie` logic by new `ledger` functionality

why:
  The module `distinct_trie` is supported by `Aristo` in trivial cases.

* Activate `test_op_memory`
2024-05-28 14:24:10 +00:00
Jacek Sieka 08e98eb385
restore a few tests, cleanup (#2234)
* remove `compensateLegacySetup`, `localDbOnly`
* enable trivially fixable tests
2024-05-28 14:49:35 +02:00
andri lim eb50dcc2f9
Noop in AccountsLedger.getCode if the codehash is EMPTY_CODE_HASH (#2229) 2024-05-27 10:26:26 +02:00
Jordan Hrycaj 9cc6e5a3aa
Aristo resume off line syncing on pre loaded database (#2203)
* Update some docu & messages

* Remove cruft from the ledger modules

* Must not overwrite genesis data on an initialised database

why:
  This will overwrite the global state of the Aristo single state DB.
  Otherwise resuming at the last synced state becomes impossible.

* Provide latest block number from journal

why:
  This relates the global state of the DB directly to the corresponding
  block number.

* Implemented unit test providing DB pre-load and resume
2024-05-22 13:41:14 +00:00
Jordan Hrycaj ee9aea171d
Culling legacy DB and accounts cache (#2197)
details:
+ Compiles nimbus all_tests
+ Failing tests have been commented out
2024-05-20 10:17:51 +00:00
Jacek Sieka 293ce28e4d
remove geth_db (#2189)
doesn't look like it's used
2024-05-16 22:00:20 +07:00
jangko d261484dd7
evm: Reject contract creation if the storage is non-empty(EIP-7610) 2024-05-07 09:19:59 +07:00
Jordan Hrycaj 7d9e1d8607
Misc updates for full sync (#2140)
* Code cosmetics

* Aristo+Kvt: Fix api wrappers

why:
  Api setup killed the backend descriptor when backend mapping was
  disabled.

* Aristo: Implement masked profiling entries

why:
  Database backend should be listed but not counted in tally

* CoreDb: Simplify backend() methods

why:
  DBMS backend access Was provided very early and over engineered. Now
  there are only two backend machines, one for `Kvt` and the other one
  for an `Mpt` available only via new API.

* CoreDb: Code cleanup regarding descriptor types

* CoreDb: Refactor/redefine `persistent()` methods

why:
  There were `persistent()` methods for any type of caching storage
  facilities `Kvt`, `Mpt`, `Phk`, and `Acc`. Now there is only a single
  `persistent()` method storing all facilities in tandem (similar to
  how transactions work.)

  For non shared `Kvt` tables, there is now an extra storage method
  `saveOffSite()`.

* CoreDb lingo update: `trie` becomes `column`

why:
  Notion of a `trie` is pretty much hidden by the new `CoreDb` api.
  Revealed are sort of database columns for accounts an storage data,
  any of which have an internal state represented by a Keccack hash.
  So a `trie` or `MPT` becomes a `column` and a `rootHash` becomes a
  column state.

* Aristo: rename backend filed `filters` => `journal`

* Update full sync logging

details:
  + Disable eth handler noise while syncing
  + Log journal depth (if available)

* Fix copyright year

* Fix cruft and unwanted imports
2024-04-19 18:37:27 +00:00
jangko 7a941c8ca0
Rename hasCodeOrNonce to contractCollision 2024-04-16 09:31:10 +07:00
Jordan Hrycaj 0d73637f14
Core db simplify new api storage modes (#2075)
* Aristo+Kvt: Fix backend `dup()` function in api setup

why:
  Backend object is subject to an inheritance cascade which was not
  taken care of, before. Only the base object was duplicated.

* Kvt: Simplify DB clone/peers management

* Aristo: Simplify DB clone/peers management

* Aristo: Adjust unit test for working with memory DB only

why:
  This currently causes some memory corruption persumably in the
  `libc` background layer.

* CoredDb+Kvt: Simplify API for KVT

why:
  Simplified storage models (was over engineered) for better performance
  and code maintenance.

* CoredDb+Aristo: Simplify API for `Aristo`

why:
  Only single database state needed here. Accessing a similar state will
  be implemented from outside this module using a context layer. This
  gives better performance and improves code maintenance.

* Fix Copyright headers

* CoreDb: Turn off API tracking

why:
  CI would ot go through. Was accidentally turned on.
2024-03-14 22:17:43 +00:00
andri lim 7722a907dc
Implement getAccessList of db/ledger (#2055) 2024-02-29 18:16:47 +07:00
andri lim 6ff2edc416
Fix styles (#2046)
* Fix styles

* Fix copyright year
2024-02-21 23:04:59 +07:00
Jordan Hrycaj d5a54f66ee
Core db+friends fix sync bail outs (#2026)
* Aristo: Update schedule runner for `hashify()`

why:
  Width-first schedule walker overlooked resolvable vertex and ended in
  a deadlock situation

* CoreDb+Aristo: Update error code for fringe condition

* Ledger: Remove redundant extra MPT hashing statements

why:
  These statement were used for troubleshooting

* CoreDb: Couch medicine for legacy DB backend

why:
  MPT will occasionally enter inconsistent states after deletion, some
  dirty fix is randomly added to mitgate that.
2024-02-14 10:02:09 +00:00
web3-developer 93fb4c8f59
More witness fixes (#2009)
* Update experimental rpc test to do further validation on proof responses.

* Enable zero value storage slots in the witness cache data so that proofs will be returned when a storage slot is updated to zero. Refactor and simplify implementation of getProofs endpoint.

* Improve test validation.

* Minor fixes and added test to track the changes introduced in every block using a local state.

* Refactor and cleanup test.

* Comments added to test and account cache fixes applied to account ledger.

* Return updated storage slots even when storage is empty and add test to build.

* Fix copyright and remove incorrect depth check during witness building in writeShortRlp assertion.
2024-02-09 12:09:02 +08:00
Jordan Hrycaj f1e9ca8526
Core db+ledger aristo backend update (#2006)
* CoreDb: Improve API and API tracking

why:
  Now logs state roots where appropriate

* CoreDb: re-implement `CoreDbVidRef` => `CoreDbTrieRef`

why:
  Instead of a root vertex ID wrapper, the purpose of this object type
  has been upgrades to a sub-trie prototype.

caveat:
  `Aristo` backend not fully functional, yet.

* CoreDb: Update `Aristo` backend

why:
  Supports virtual sub-tries

* CoreDb: Account address tracking for `StorageTrie` virtual tries

details:
  Supported with API tracking/logging

* CoreDb: Keep account address in payload object

why:
  No need to provide extra address argument for `merge()`, also
  provides tracking possibility for account debugging.

* Ledger: Update new API for `Aristo` specific storage trie handling

* CoreDb+Ledger: Update unit tests

* Fix copyright headers
2024-02-02 20:23:04 +00:00
web3-developer 54644fa3cb
Witness Generation Bug Fixes (#1981)
* Return slots when verifying witness regardless of code length.

* Prevent AccountCache WitnessData codeTouched from being reset.

* Default to using {wfNoFlag} for witness flags in tests to allow running with data from before EIP170.

* Add additional json files to experimental JSON RPC test.

* Use HTTP RPC server in tests.

* Add test to check that block witness contains bytecode.
2024-01-25 01:18:45 +08:00
Jordan Hrycaj a1161b537b
Core db update storage root management for sub tries (#1964)
* Aristo: Re-phrase `LayerDelta` and `LayerFinal` as object references

why:
  Avoids copying in some cases

* Fix copyright header

* Aristo: Verify `leafTie.root` function argument for `merge()` proc

why:
  Zero root will lead to inconsistent DB entry

* Aristo: Update failure condition for hash labels compiler `hashify()`

why:
  Node need not be rejected as long as links are on the schedule. In
  that case, `redo[]` is to become `wff.base[]` at a later stage.

  This amends an earlier fix, part of #1952 by also testing against
  the target nodes of the `wff.base[]` sets.

* Aristo: Add storage root glue record to `hashify()` schedule

why:
  An account leaf node might refer to a non-resolvable storage root ID.
  Storage root node chains will end up at the storage root. So the link
  `storage-root->account-leaf` needs an extra item in the schedule.

* Aristo: fix error code returned by `fetchPayload()`

details:
  Final error code is implied by the error code form the `hikeUp()`
  function.

* CoreDb: Discard `createOk` argument in API `getRoot()` function

why:
  Not needed for the legacy DB. For the `Arsto` DB, a lazy approach is
  implemented where a stprage root node is created on-the-fly.

* CoreDb: Prevent `$$` logging in some cases

why:
  Logging the function `$$` is not useful when it is used for internal
  use, i.e. retrieving an an error text for logging.

* CoreDb: Add `tryHashFn()` to API for pretty printing

why:
  Pretty printing must not change the hashification status for the
  `Aristo` DB. So there is an independent API wrapper for getting the
  node hash which never updated the hashes.

* CoreDb: Discard `update` argument in API `hash()` function

why:
  When calling the API function `hash()`, the latest state is always
  wanted. For a version that uses the current state as-is without checking,
  the function `tryHash()` was added to the backend.

* CoreDb: Update opaque vertex ID objects for the `Aristo` backend

why:
  For `Aristo`, vID objects encapsulate a numeric `VertexID`
  referencing a vertex (rather than a node hash as used on the
  legacy backend.) For storage sub-tries, there might be no initial
  vertex known when the descriptor is created. So opaque vertex ID
  objects are supported without a valid `VertexID` which will be
  initalised on-the-fly when the first item is merged.

* CoreDb: Add pretty printer for opaque vertex ID objects

* Cosmetics, printing profiling data

* CoreDb: Fix segfault in `Aristo` backend when creating MPT descriptor

why:
  Missing initialisation  error

* CoreDb: Allow MPT to inherit shared context on `Aristo` backend

why:
  Creates descriptors with different storage roots for the same
  shared `Aristo` DB descriptor.

* Cosmetics, update diagnostic message items for `Aristo` backend

* Fix Copyright year
2024-01-11 19:11:38 +00:00
Jordan Hrycaj 13f51939f6
Core db aristo hasher profiling and timing improvement (#1938)
* Explicitly use shared `Kvt` table on `Ledger` and `Clique` lookup.

why:
  Speeds up lookup time with `Aristo` backend. For writing `Clique` data,
  the `Companion` model allows to write `Clique` data past the database
  locked by evm transactions.

* Implement `CoreDb` profiling with API tracking

why:
  Chasing time spent per APT procs ...

* Implement `Ledger` profiling with API tracking

why:
  Chasing time spent per APT procs ...

* Always hashify when commiting or storing

why:
  A dirty cache makes no sense when committing

* Make sure that a zero key is created when adding/updating vertices

why:
  This is an error fix mainly for edge cases. A typical error was
  that the root key got deleted when there were only a few vertices
  left on the DB.

* Need all created and changed vertices zero-keyed on the cache

why:
  A zero key (i.e. empty Merkle hash) indicates that a vertex key
  needs to be updated. This would not be needed immediately after
  a merge as there is an actual leaf path on the cache layer. But
  after subsequent merge and delete operations this information
  might get blurred.

* Re-org hashing algorithm

why:
  Apart from errors, the previous implementation was too slow for
  two reasons:
  + some control hashes were calculated for debugging (now all
    verification is done in `aristo_check` module)
  + the leaf paths stored on the cache are used to build the
    labelling (aka hashing) schedule; there paths were accumulated
    over successive hash sessions although it is clear that all
    keys were generated, already
2023-12-12 17:47:41 +00:00
Jordan Hrycaj c47f021596
Core db and aristo updates for destructor and tx logic (#1894)
* Disable `TransactionID` related functions from `state_db.nim`

why:
  Functions `getCommittedStorage()` and `updateOriginalRoot()` from
  the `state_db` module are nowhere used. The emulation of a legacy
  `TransactionID` type functionality is administratively expensive to
  provide by `Aristo` (the legacy DB version is only partially
  implemented, anyway).

  As there is no other place where `TransactionID`s are used, they will
  not be provided by the `Aristo` variant of the `CoreDb`. For the
  legacy DB API, nothing will change.

* Fix copyright headers in source code

* Get rid of compiler warning

* Update Aristo code, remove unused `merge()` variant, export `hashify()`

why:
  Adapt to upcoming `CoreDb` wrapper

* Remove synced tx feature from `Aristo`

why:
+ This feature allowed to synchronise transaction methods like begin,
  commit, and rollback for a group of descriptors.
+ The feature is over engineered and not needed for `CoreDb`, neither
  is it complete (some convergence features missing.)

* Add debugging helpers to `Kvt`

also:
  Update database iterator, add count variable yield argument similar
  to `Aristo`.

* Provide optional destructors for `CoreDb` API

why;
  For the upcoming Aristo wrapper, this allows to control when certain
  smart destruction and update can take place. The auto destructor works
  fine in general when the storage/cache strategy is known and acceptable
  when creating descriptors.

* Add update option for `CoreDb` API function `hash()`

why;
  The hash function is typically used to get the state root of the MPT.
  Due to lazy hashing, this might be not available on the `Aristo` DB.
  So the `update` function asks for re-hashing the gurrent state changes
  if needed.

* Update API tracking log mode: `info` => `debug

* Use shared `Kvt` descriptor in new Ledger API

why:
  No need to create a new descriptor all the time
2023-11-16 19:35:03 +00:00
Jordan Hrycaj 3198ad1bbd
Fix default pruning for ledger and update core db and ledger logging (#1861)
* Make sure that storage tries are not pruned (by default) on the new Ledger API

why:
  Pruning might kill some unwanted entries from storage tries ending up with an unstable database
  leading to crashes.

* Implement `CoreDb` and `LedgerRef` API tracing

details:
+ Locally enabled at compile time via constants `ProvideCoreDbLegacyAPI`
  and `EnableApiTracking` in either `base.nim` source
+ If enabled it can be selectively turned on/off via public switches in
  the `CoreDb` descriptor.

* Allow suppressing opportunistic `ifNecessaryGetXxx()` functions

why:
  Better troubleshooting when the system crashes (assertions will then
  most probably happen outside an `async` function.)
2023-10-25 15:03:09 +01:00
jangko c005281391
processBeaconBlockRoot in TxPool(EIP-4788) 2023-10-19 07:50:07 +07:00
Jordan Hrycaj e8ad950e0a
Ledger abstraction for accounts cache (#1824)
* Provide TDD/debug facility for inspecting `persistBlocks()` working

detail:
+ Make sure that the last block of a test sample is the first batch
  item in `persistBlocks()`.
+ Additionally, allow `AccountsCache` API tracing by setting the flag
  `extraTraceMessages = true` in the file `accounts_cache.nim`

* Overload AccountsCache by abstraction wrapper

details:
  Can facilitate CoreDb API switch, details in `ledger/README.md`.
2023-10-18 20:27:22 +01:00