Commit Graph

1915 Commits

Author SHA1 Message Date
Jordan Hrycaj 7becf4e389
Remove vertex ID recycle function (#2558)
why:
  It is not safe in general to recycle vertex IDs while the `RocksDb`
  cache has `VertexID` rather than `RootedVertexID` where the former
  type seems preferable.

  In some fringe cases one might remove a vertex with key `(root1,vid)`
  and insert another vertex with key `(root2,vid)` while re-using the
  vertex ID `vid`. Without knowledge of `root1` and `root2`, the LRU
  cache will return the same vertex for `(root2,vid)` also for
  `(root1,vid)`.
2024-08-12 20:56:15 +00:00
Jacek Sieka 19451cadff
rebalance rocksdb cache sizes (#2557)
Based on some simple testing done with a few combinations of cache
sizes, it seems that the block cache has grown in importance compared to
the where we were before changing on-disk format and adding a lot of
other point caches.

With these settings, there's roughly a 15% performance increase when
processing blocks in the 18M range over the status quo while memory
usage decreases by more than 1gb!

Only a few values were tested so there's certainly more to do here but
this change sets up a better baseline for any future optimizations.

In particular, since the initial defaults were chosen root vertex id:s
were introduced as key prefixes meaning that storage for each account
will be grouped together and thus it becomes more likely that a block
loaded from disk will be hit multiple times - this seems to give the
block cache an edge over the row cache, specially when traversing the
storage trie.
2024-08-12 05:52:09 +00:00
andri lim b8e128203f
Rewire blockValue from Txpool to EngineAPI (#2554) 2024-08-09 06:05:18 +07:00
Jacek Sieka 3dc30195ad
log http/jwt information on startup (#2553) 2024-08-08 10:03:30 +00:00
Jacek Sieka 094486d0ce
Hash bump 2024-08-08 07:46:35 +02:00
Jacek Sieka 3cefd7ed38
move db init to init (#2552)
When using the common interface, the database always (potentially) needs
init - take the opportunity to log some basic database info on startup.
2024-08-08 07:45:30 +02:00
andri lim d5786758b5
TxPool: Merge tx_chain and tx_packer to reduce complexity (#2549)
* TxPool: Merge tx_chain and tx_packer to reduce complexity

* Fix copyright year
2024-08-07 22:35:17 +07:00
Jordan Hrycaj 38572bd8ea
Cache a storage root ID forever in the leaf payload of an account (#2551)
details:
  Stale root IDs are marked disabled while the ID is kept in the leaf
  payload.

why:
  This might lead to further caching advantages.
2024-08-07 13:28:01 +00:00
Jordan Hrycaj 488bdbc267
Provide portal proof functionality with coredb (#2550)
* Provide portal proof functions in `aristo_api`

why:
  So it can be fully supported by `CoreDb`

* Fix prototype in `kvt_api`

* Fix node constructor for account leafs with storage trees

* Provide simple path check based on portal proof functionality

* Provide portal proof functionality in `CoreDb`

* Update TODO list
2024-08-07 11:30:55 +00:00
andri lim 3cef119b78
Return empty list instead of error in getPooledTxs handler (#2547) 2024-08-06 22:06:48 +07:00
Jordan Hrycaj 6bae929439
Added comments (#2546) 2024-08-06 12:43:39 +00:00
Jordan Hrycaj 5b502a06c4
Added portal proof nodes generation functionality (#2539)
* Extracted `test_tx.testTxMergeProofAndKvpList()` => separate file

* Fix serialiser

why:
  Typo lead to duplicate rlp-encoded nodes in chain

* Remove cruft

* Implemnt portal proof nodes generators `partXxxTwig()`

* Add unit test for portal proof nodes generator `partAccountTwig()`

* Cosmetics

* Simplify serialiser return code format

* Fix proof generator for extension nodes

why:
  Code was simply bonkers, not detected before the unit tests were
  adapted to check for just this.

* Implemented portal proof nodes verifier `partUntwig()`

* Cosmetics

* Fix `testutp` cli poblem
2024-08-06 11:29:26 +00:00
andri lim ec118a438a
Refactor txpool: reduce complexity (#2542) 2024-08-06 16:12:56 +07:00
andri lim 9dacfed943
Disable txpool in eth wire protocol handler (#2540) 2024-08-06 11:26:55 +07:00
Jordan Hrycaj 01b5c08763
Revive json tracer unit tests (#2538)
* Some `Aristo` clean-ups/updates

* Re-implemented core-db tracer functionality

* Rename nimbus tracer `no-tracer.nim` => `tracer.nim`

why:
  Restore original name for easy diff tracking with upcoming update

* Update nimbus tracer using new core-db tracer functionality

* Updating json tracer unit tests

* Enable json tracer unit tests
2024-08-01 10:41:20 +00:00
andri lim e331c9e9b7
TxPool: Replace GasPrice and GasPriceEx with GasInt (#2537)
* TxPool: Replace GasPrice and GasPriceEx with GasInt
2024-07-31 14:33:30 +07:00
Jordan Hrycaj 72c3ab8ced
Provide partial tree support for preloading tests (#2536)
* Implement partial trees

why:
  This is currently needed for unit tests to pre-load the database
  with test data similar to `proof` node pre-load.

  The basic features for `snap-sync` boundary proofs are available
  as well for future use. What is missing is the final proof verification
  and a complete storage data load/merge function (stub is available.)

* Cosmetics, clean up
2024-07-29 20:15:17 +00:00
Jacek Sieka bdc86b3fd4
small cleanups (#2526)
* remove some redundant EH
* avoid pessimising move (introduces a copy in this case!)
* shift less data around when reading era files (reduces stack usage)
2024-07-26 12:32:01 +07:00
andri lim 254bda365f
Remove txpool sender locality (#2525)
* Remove txpool sender locality

We no longer distinct local or remote sender

* Fix copyright year
2024-07-25 22:36:08 +07:00
andri lim 0cc730dd05
Fix CodeBytes: invalidPositions out of bound crash (#2523) 2024-07-25 19:23:53 +07:00
andri lim 01ba18da74
Fix sepolia chain config: mergeForkBlock -> 1450409 (#2518)
* Fix sepolia chain config: mergeForkBlock -> 1450407

* Fix test_forkid
2024-07-24 03:07:55 +00:00
Advaita Saha 08bbb0079f
faster slot finding in nimbus import (#2491)
* faster slot finding in nimbus import

* feat: blocknumber based slot finding

* fix: formatting

* added comments

* fix: added is_execution_block

* added comment
2024-07-22 21:17:07 +00:00
Jordan Hrycaj 1452e7b1c0
Misc updates (#2513)
* Update config for Ledger and CoreDb

why:
  Prepare for tracer which depends on the API jump table (as well as
  the profiler.) The API jump table is now enabled in unit/integration
  test mode piggybacking on the `unittest2DisableParamFiltering`
  compiler flag or on an extra compiler flag `dbjapi_enabled`.

* No deed for error field in `NodeRef`

why:
  Was opnly needed by proof nodes pre-loader which will be re-implemented

* Cosmetics
2024-07-22 18:10:04 +00:00
andri lim 6d03acec30
TxPool refactoring: Simplify TxChainRef and remove gauges (#2506)
This is one of the txPool refactoring series to make it ready
for integration with the new ForkedChainRef
2024-07-19 16:24:36 +07:00
andri lim fb196849ee
EVM cosmetic changes, one less indirect access of VmCpt (#2503) 2024-07-19 08:44:01 +07:00
Jordan Hrycaj 5ac362fe6f
Aristo and kvt balancer management update (#2504)
* Aristo: Merge `delta_siblings` module into `deltaPersistent()`

* Aristo: Add `isEmpty()` for canonical checking whether a layer is empty

* Aristo: Merge `LayerDeltaRef` into `LayerObj`

why:
  No need to maintain nested object refs anymore. Previously the
 `LayerDeltaRef` object had a companion `LayerFinalRef` which held
  non-delta layer information.

* Kvt: Merge `LayerDeltaRef` into `LayerRef`

why:
  No need to maintain nested object refs (as with `Aristo`)

* Kvt: Re-write balancer logic similar to `Aristo`

why:
  Although `Kvt` was a cheap copy of `Aristo` it sort of got out of
  sync and the balancer code was wrong.

* Update iterator over forked peers

why:
  Yield additional field `isLast` indicating that the last iteration
  cycle was approached.

* Optimise balancer calculation.

why:
  One can often avoid providing a new object containing the merge of two
  layers for the balancer. This avoids copying tables. In some cases this
  is replaced by `hasKey()` look ups though. One uses one of the two
  to combine and merges the other into the first.

  Of course, this needs some checks for making sure that none of the
  components to merge is eventually shared with something else.

* Fix copyright year
2024-07-18 21:32:32 +00:00
andri lim ee323d5ff8
Optimize EVM stack usage (#2502)
* EVM: Optimize CALL family stack usage

* EVM: Optimize CREATE family stack usage

* EVM: Optimize arith stack usage

* EVM: Optimize stack usage in the rest of opcodes

* Fix test_op_env and clean up unused imports

* EVM: Optimize arithmetic binary ops
2024-07-18 18:59:53 +07:00
Jacek Sieka df4a21c910
Store cached hash at the layer corresponding to the source data (#2492)
When lazily verifying state roots, we may end up with an entire state
without roots that gets computed for the whole database - in the current
design, that would result in hashes for the entire trie being held in
memory.

Since the hash depends only on the data in the vertex, we can store it
directly at the top-most level derived from the verticies it depends on
- be that memory or database - this makes the memory usage broadly
linear with respect to the already-existing in-memory change set stored
in the layers.

It also ensures that if we have multiple forks in memory, hashes get
cached in the correct layer maximising reuse between forks.

The same layer numbering scheme as elsewhere is reused, where -2 is the
backend, -1 is the balancer, then 0+ is the top of the stack and stack.

A downside of this approach is that we create many small batches - a
future improvement could be to collect all such writes in a single
batch, though the memory profile of this approach should be examined
first (where is the batch kept, exactly?).
2024-07-18 09:13:56 +02:00
Jordan Hrycaj 6677f57ea9
Aristo balancer clean up (#2501)
* Remove `chunkedMpt` from `persistent()`/`stow()` function

why:
  Proof-mode code was removed with PR #2445 and needs to be re-designed.

* Remove unused `beStateRoot` argument from `deltaMerge()`

* Update/drastically simplify `txStow()`

why:
  Got rid of many boundary conditions

details:
  Many pre-conditions have changed. In particular, previous versions
  used the account state (hash) which was conveniently available and
  checked it against the backend in order to find out whether there
  was something to do, at all. Currently, only an empty set of all
  tables in the delta layer has the balancer update ignored.

  Notable changes are:
  * no check against account state (see above)
  * balancer filters have no hash signature (some legacy stuff left over
    from journals)
  * no (shap sync) proof data which made the generation of the a top layer
    more complex

* Cosmetics, cruft removal

* Update unit test file & function name

why:
  Was legacy module
2024-07-17 19:27:33 +00:00
andri lim cfe14f1825
EVM: use assign2 whenever possible (#2499)
Before: GST finish in 59 secs.
After: GST finish in 52 secs!
2024-07-17 20:48:50 +07:00
andri lim 8d1e21bbae
Simplify txPool gasLimit calculator (#2498)
Our need is only a baseline tx pool gasLimit calculator.
If need we can expand it in the future.
But for now, a simple but understandable tx pool is more important.
2024-07-17 20:48:35 +07:00
Jordan Hrycaj 17391b58d0
Hash keys and hash256 revisited (#2497)
* Remove cruft left-over from PR #2494

* TODO

* Update comments on `HashKey` type values

* Remove obsolete hash key conversion flag `forceRoot`

why:
  Is treated implicitly by having vertex keys as `HashKey` type and
  root vertex states converted to `Hash256`
2024-07-17 20:48:21 +07:00
andri lim 916f88a373
Use block number or timestamp to determine fork rules (#2496)
* Use block number or timestamp to determine fork rules

Avoid confusion raised by `forkGTE` usage where block informations are present.

* Get rid of forkGTE
2024-07-17 17:05:53 +07:00
andri lim a59cc84fca
Not using deprecated functions in config anymore (#2495) 2024-07-17 02:57:19 +00:00
Jordan Hrycaj a84a2131cd
No ext update (#2494)
* Imported/rebase from `no-ext`, PR #2485

  Store extension nodes together with the branch

  Extension nodes must be followed by a branch - as such, it makes sense
  to store the two together both in the database and in memory:

  * fewer reads, writes and updates to traverse the tree
  * simpler logic for maintaining the node structure
  * less space used, both memory and storage, because there are fewer
    nodes overall

  There is also a downside: hashes can no longer be cached for an
  extension - instead, only the extension+branch hash can be cached - this
  seems like a fine tradeoff since computing it should be fast.

  TODO: fix commented code

* Fix merge functions and `toNode()`

* Update `merkleSignCommit()` prototype

why:
  Result is always a 32bit hash

* Update short Merkle hash key generation

details:
  Ethereum reference MPTs use Keccak hashes as node links if the size of
  an RLP encoded node is at least 32 bytes. Otherwise, the RLP encoded
  node value is used as a pseudo node link (rather than a hash.) This is
  specified in the yellow paper, appendix D.

  Different to the `Aristo` implementation, the reference MPT would not
  store such a node on the key-value database. Rather the RLP encoded node value is stored instead of a node link in a parent node
  is stored as a node link on the parent database.

  Only for the root hash, the top level node is always referred to by the
  hash.

* Fix/update `Extension` sections

why:
  Were commented out after removal of a dedicated `Extension` type which
  left the system disfunctional.

* Clean up unused error codes

* Update unit tests

* Update docu

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2024-07-16 19:47:59 +00:00
Jacek Sieka 0e36a17e5b
avoid re-writing code (#2490)
Avoids pointless rocksdb writes that cause write compaction /
amplification, specially in the case where code is shared between
multiple accounts
2024-07-15 15:02:23 +02:00
Jacek Sieka 9d91191154
storage hike cache (#2484)
This PR adds a storage hike cache similar to the account hike cache
already present - this cache is less efficient because account storage
is already partically cached in the account ledger but nonetheless helps
keep hiking down.

Notably, there's an opportunity to optimise this cache and the others so
that they cooperate better insteado of overlapping, which is left for a
future PR.

This PR also fixes an O(N) memory usage for storage slots where the
delete would keep the full storage in a work list which on mainnet can
grow very large - the work list is replaced with a more conventional
recursive `O(log N)` approach.
2024-07-14 19:12:10 +02:00
Jacek Sieka f3a56002ca
Turn payload into value type (#2483)
The Vertex type unifies branches, extensions and leaves into a single
memory area where the larges member is the branch (128 bytes + overhead) -
the payloads we have are all smaller than 128 thus wrapping them in an
extra layer of `ref` is wasteful from a memory usage perspective.

Further, the ref:s must be visited during the M&S phase of garbage
collection - since we keep millions of these, many of them
short-lived, this takes up significant CPU time.

```
Function	CPU Time: Total	CPU Time: Self	Module	Function (Full)	Source File	Start Address
system::markStackAndRegisters	10.0%	4.922s	nimbus	system::markStackAndRegisters(var<system::GcHeap>).constprop.0	gc.nim	0x701230`
```
2024-07-14 12:02:05 +02:00
Jacek Sieka 72947b3647
odds and ends (#2481)
small cleanups to reduce memory allocations
2024-07-13 20:42:49 +02:00
Jordan Hrycaj f08178c592
Separate constructor helpers for core db and ledger (#2480)
* Extract `CoreDb` constructor helpers from `base.nim` into separate module

why:
  This makes it easier to avoid circular imports.

* Extract `Ledger` constructor helpers from `base.nim` into separate module

why:
  Move `accounts_ledger.nim` file to sub-folder `backend`. That way the
  layout resembles that of the `core_db`.
2024-07-12 19:32:31 +00:00
Jordan Hrycaj b924fdcaa7
Separate config for core db and ledger (#2479)
* Updates and corrections

* Extract `CoreDb` configuration from `base.nim` into separate module

why:
  This makes it easier to avoid circular imports, in particular
  when the capture journal (aka tracer) is revived.

* Extract `Ledger` configuration from `base.nim` into separate module

why:
  This makes it easier to avoid circular imports (if any.)

also:
  Move `accounts_ledger.nim` file to sub-folder `backend`. That way the
  layout resembles that of the `core_db`.
2024-07-12 13:12:25 +00:00
Jacek Sieka 01ab209497
cache account payload (#2478)
Instead of caching just the storage id, we can cache the full payload
which further reduces expensive hikes
2024-07-12 15:08:26 +02:00
Jacek Sieka d07540766f
coredb: tracking fixes (#2476) 2024-07-12 13:40:13 +02:00
Advaita Saha 25af347dfd
Shift era helpers to a different file (#2475)
* shift helpers to a different file

* fix: few logic fixed for transition from era1 to era
2024-07-12 03:15:14 +00:00
Jacek Sieka a6764670f0
merge: avoid hike allocations (#2472)
hike allocations (and the garbage collection maintenance that follows)
are responsible for some 10% of cpu time (not wall time!) at this point
- this PR avoids them by stepping through the layers one step at a time,
simplifying the code at the same time.
2024-07-11 13:26:46 +02:00
Jordan Hrycaj 800fd77333
Core db remove legacy phrases (#2468)
* Rename `newKvt()` -> `ctx.getKvt()`

why:
  Clean up legacy shortcut. Also, the `KVT` returned is not instantiated
  but refers to the shared `KVT` that resides in a context which is a
  generalisation of an in-memory database fork. The function `ctx`
  retrieves the default context.

* Rename `newTransaction()` -> `ctx.newTransaction()`

why:
  Clean up legacy shortcut. The transaction is applied to a context as a
  generalisation of an in-memory database fork. The function `ctx`
  retrieves the default context.

* Rename `getColumn(CtGeneric)` -> `getGeneric()`

why:
  No more a list of well known sub-tries needed, a single one is enough.
  In fact, `getColumn()` did only support a single sub-tree by now.

* Reduce TODO list
2024-07-10 12:19:35 +00:00
Jacek Sieka 3382c2427b
increase rdb cache sizes (#2466)
This trivial bump should improve performance a bit without costing too
much memory - as the trie grows, so does the number of levels in it and
creating hikes becomes ever more expensive - hopefully this cache
increase should give a nice little boost even if it's not a lot.
2024-07-09 17:35:27 +02:00
Jacek Sieka ab23148aab
don't rewrite hash->slot map (#2463)
Avoid writing the same slot/hash values to the hash->slot mapping
to avoid spamming the rocksdb WAL and cause unnecessary compaction

In the same vein, avoid writing trivially detectable A-B-A storage
changes which happen with surprising frequency.
2024-07-09 17:25:43 +02:00
Advaita Saha 9a499eb45f
Era support for nimbus import (#2429)
* add the era-dir option

* feat: support for era files in nimbus import

* fix: metric logs

* fix: eraDir check

* fix: redundant code and sepolia support

* fix: remove dependency from csv + formatting

* fix: typo

* fix: RVO

* fix: parseBiggestInt

* fix: opt impl

* fix: network agnostic loading

* fix: shift to int64
2024-07-09 15:28:01 +02:00
andri lim 4fa3756860
Convert GasInt to uint64, bump nim-eth and nimbus-eth2 (#2461)
* Convert GasInt to uint64, bump nim-eth and nimbus-eth2

* Bump nimbus-eth2

* int64.high.GasInt instead of 0x7fffffffffffffff.GasInt
2024-07-07 06:52:11 +00:00
andri lim e8683692fd
EVM gasSstore refund reduction using positive integer (#2460)
This is the hopefully the last part of preparations
before converting GasInt to uint64
2024-07-06 08:39:38 +07:00
andri lim 4eaae5cbfa
EVM gasCall values always stay on positive side (#2459)
* EVM gasCall values always stay on positive side

This is also another part of preparations before
converting GasInt to uint64

* Fix test_evm_support
2024-07-06 08:39:22 +07:00
andri lim c775c906a2
Fix LedgerRef storage iterator and add test (#2458) 2024-07-05 10:15:48 +00:00
andri lim 6fe7411ac0
Saner EVM gasCosts (#2457)
This is also a part of preparations before converting GasInt to uint64
2024-07-05 11:55:13 +07:00
andri lim 23c00ce88c
Separate evmc gasCosts and nim-evm gasCosts (#2454)
This is part of preparations before converting GasInt to uint64
2024-07-05 07:00:03 +07:00
Jacek Sieka 7d78fd97d5
avoid allocations for slot storage (#2455)
Introduce a new `StoData` payload type similar to `AccountData`

* slightly more efficient storage format
* typed api
* fewer seqs
* fix encoding docs - it wasn't rlp after all :)
2024-07-04 23:48:45 +00:00
Jacek Sieka 79788c01d4
Add debug mode for disabling per-chunk state root validation (#2453)
This significantly speeds up block import at the cost of less protection
against invalid data, potentially resulting in an invalid database
getting stored.

The risk is small given that import is used only for validated data -
evaluating the right level of of validation vs performance is left for a
future PR.

A side effect of this approach is that there is no cached stated root in
the database - computing it currently requires a lot of memory since the
intermediate roots get cached in memory in full while the computation is
ongoing - a future PR will need to address this deficiency, for example
by streaming the already-computed hashes directly to the database.
2024-07-04 16:51:50 +02:00
andri lim f04f30c72b
Reduce EVM complexity by removing forkOverride (#2448)
* Reduce EVM complexity by removing forkOverride

* Fixes
2024-07-04 15:48:36 +02:00
Jacek Sieka 81e75622cf
storage: store root id together with vid, for better locality of refe… (#2449)
The state and account MPT:s currenty share key space in the database
based on that vertex id:s are assigned essentially randomly, which means
that when two adjacent slot values from the same contract are accessed,
they might reside at large distance from each other.

Here, we prefix each vertex id by its root causing them to be sorted
together thus bringing all data belonging to a particular contract
closer together - the same effect also happens for the main state MPT
whose nodes now end up clustered together more tightly.

In the future, the prefix given to the storage keys can also be used to
perform range operations such as reading all the storage at once and/or
deleting an account with a batch operation.

Notably, parts of the API already supported this rooting concept while
parts didn't - this PR makes the API consistent by always working with a
root+vid.
2024-07-04 15:46:52 +02:00
andri lim b82dcdcc76
Remove unused StructLog (#2447) 2024-07-04 19:23:53 +07:00
andri lim d9e502bbc5
Bump web3/kzg4844/nimbus-eth2 and related fixes (#2446) 2024-07-04 05:41:32 +00:00
Jacek Sieka b23795ab39
remove pPrf, fRpp (#2445)
No longer used now that hashify is gone
2024-07-03 22:21:57 +02:00
Jacek Sieka 443c6d1f8e
Cache account path storage id (#2443)
The storage id is frequently accessed when executing contract code and
finding the path via the database requires several hops making the
process slow - here, we add a cache to keep the most recently used
account storage id:s in memory.

A possible future improvement would be to cache all account accesses so
that for example updating the balance doesn't cause several hikes.
2024-07-03 17:58:25 +02:00
Jordan Hrycaj ea7c756a9d
Core db reorg (#2444)
* CoreDb: Merged all sub-descriptors into `base_desc` module

* Dissolve `aristo_db/common_desc.nim`

* No need to export `Aristo` methods in `CoreDb`

* Resolve/tighten methods in `aristo_db` sub-moduled

why:
  So they can be straihgt implemented into the `base` module

* Moved/re-implemented `KVT` methods into `base` module

* Moved/re-implemented `MPT` methods into `base` module

* Moved/re-implemented account methods into `base` module

* Moved/re-implemented `CTX` methods into `base` module

* Moved/re-implemented `handler_{aristo,kvt}` into `aristo_db` module

* Moved/re-implemented `TX` methods into `base` module

* Moved/re-implemented base methods into `base` module

* Replaced `toAristoSavedStateBlockNumber()` by proper base method

why:
  Was the last for keeping reason for keeping low level backend access
  methods

* Remove dedicated low level access to `Aristo` backend

why:
  Not needed anymore, for debugging the descriptors can be accessed
  directly

also:
  some clean up stuff

* Re-factor `CoreDb` descriptor layout and adjust base methods

* Moved/re-implemented iterators into `base_iterator*` modules

* Update docu
2024-07-03 15:50:27 +00:00
Jacek Sieka 1f60e8e453
Use `Hash256` directly for account path (#2439)
Account paths are always a hash - passing it around as such helps avoid
confusion as to how long it is
2024-07-03 10:14:26 +02:00
Jacek Sieka c364426422
Smaller in-database representations (#2436)
These representations use ~15-20% less data compared to the status quo,
mainly by removing redundant zeroes in the integer encodings - a
significant effect of this change is that the various rocksdb caches see
better efficiency since more items fit in the same amount of space.

* use RLP encoding for `VertexID` and `UInt256` wherever it appears
* pack `VertexRef`/`PayloadRef` more tightly
2024-07-02 20:25:06 +02:00
web3-developer e163b69261
Bump RocksDb version and enable autoClose on opt types to prevent memory leaks (#2427)
* Bump RocksDb version and enable autoClose on opt types to prevent memory leaks.
2024-07-02 13:44:09 +08:00
Jacek Sieka 3d3831dde8
Small cleanups (#2435)
* avoid costly hike memory allocations for operations that don't need to
re-traverse it
* avoid unnecessary state checks (which might trigger unwanted state
root computations)
* disable optimize-for-hits due to the MPT no longer being complete at
all times
2024-07-01 14:07:39 +02:00
Jordan Hrycaj 2c87fd1636
Aristo code cosmetics and tests update (#2434)
* Update some docu

* Resolve obsolete compile time option

why:
  Not optional anymore

* Update checks

why:
  The notion of what constitutes a valid `Aristo` db has changed due to
  (even more) lazy calculating Merkle hash keys.

* Disable redundant unit test for production
2024-07-01 10:59:18 +00:00
andri lim 401537ad38
Add ForkedChainRef tests (#2430)
ForkedChainRef have become quite complex.
test_blockchain_json is not sufficient cover for edge cases
or synthetic cases.
2024-06-30 14:40:14 +07:00
andri lim c24affadee
Use simpler schema when writing transactions, receipts, and withdrawals (#2420)
* Use simpler schema when writing transactions, receipts, and withdrawals

Using MPT not only slow but also take up more spaces than needed.
Aristo will remove older tries and only keep the last block tries.
Using simpler schema will avoid those problems.

* Rename getTransaction to getTransactionByIndex
2024-06-29 12:43:17 +07:00
Jordan Hrycaj 8dd038144b
Some cleanups (#2428)
* Remove `dirty` set from structural objects

why:
  Not used anymore, the tree is dirty by default.

* Rename `aristo_hashify` -> `aristo_compute`

* Remove cruft, update comments, cosmetics, etc.

* Simplify `SavedState` object

why:
  The key chaining have become obsolete after extra lazy hashing. There
  is some available space for a state hash to be maintained in future.

details:
  Accept the legacy `SavedState` object serialisation format for a
  while (which will be overwritten by new format.)
2024-06-28 18:43:04 +00:00
Jordan Hrycaj 14c3772545
On demand mpt revisited (#2426)
* rebased from `github/on-demand-mpt`

ackn:
  wip: on-demand mpt construction

  Given that actual data is stored in the `Vertex` structure, it's useful
  to think of the MPT as a cache for computing roots rather than being a
  functional requirement on its own.

  This PR engenders this line of thinking by incrementally computing the
  MPT only when it's needed, ie when a state (or similar) root is needed.

  This has the effect of siginficantly reducing memory usage as well as
  improving performance:

  * no need for dirty-mpt-node book-keeping
  * no need to build complex forest of upcoming hashing work
  * only hashes that are functionally needed are ever computed -
  intermediate nodes whose MTP root is not observed are never computed /
  processed

* Unit test hot fixes

* Unit test hot fixes cont.

(somehow lost that part)

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2024-06-28 15:03:12 +00:00
Jordan Hrycaj 6dc2773957
Only use pre hashed addresses as account keys (#2424)
* Normalised storage tree addressing in function prototypes

detail:
  Argument list is always `<db> <account-path> <slot-path> ..` with
  both path arguments as `openArray[]`

* Remove cruft

* CoreDb internally Use full account paths rather than addresses

* Update API logging

* Use hashed account address only in prototypes

why:
  This avoids unnecessary repeated hashing of the same account address.
  The burden of doing that is upon the application. In the case here,
  the ledger caches all kinds of stuff anyway so it is common sense to
  exploit that for account address hashes.

caveat:
  Using `openArray[byte]` argument types for hashed accounts is inherently
  fragile. In non-release mode, a length verification `doAssert` is
  enabled by default.

* No accPath in data record (use `AristoAccount` as `CoreDbAccount`)

* Remove now unused `eAddr` field from ledger `AccountRef` type

why:
  Is duplicate of lookup key

* Avoid merging the account record/statement in the ledger twice.
2024-06-27 19:21:01 +00:00
Jordan Hrycaj 61bbf40014
Update storage tree admin (#2419)
* Tighten `CoreDb` API for accounts

why:
  Apart from cruft, the way to fetch the accounts state root via a
  `CoreDbColRef` record was unnecessarily complicated.

* Extend `CoreDb` API for accounts to cover storage tries

why:
  In future, this will make the notion of column objects obsolete. Storage
  trees will then be indexed by the account address rather than the vertex
  ID equivalent like a `CoreDbColRef`.

* Apply new/extended accounts API to ledger and tests

details:
  This makes the `distinct_ledger` module obsolete

* Remove column object constructors

why:
  They were needed as an abstraction of MPT sub-trees including storage
  trees. Now, storage trees are handled by the account (e.g. via address)
  they belong to and all other trees can be identified by a constant well
  known vertex ID. So there is no need for column objects anymore.

  Still there are some left-over column object methods wnich will be
  removed next.

* Remove `serialise()` and `PayloadRef` from default Aristo API

why:
  Not needed. `PayloadRef` was used for unstructured/unknown payload
  formats (account or blob) and `serialise()` was used for decodng
  `PayloadRef`. Now it is known in advance what the payload looks
  like.

* Added query function `hasStorageData()` whether a storage area exists

why:
  Useful for supporting `slotStateEmpty()` of the `CoreDb` API

* In the `Ledger` replace `storage.stateEmpty()` by 	`slotStateEmpty()`

* On Aristo, hide the storage root/vertex ID in the `PayloadRef`

why:
  The storage vertex ID is fully controlled by Aristo while the
  `AristoAccount` object is controlled by the application. With the
  storage root part of the `AristoAccount` object, there was a useless
  administrative burden to keep that storage root field up to date.

* Remove cruft, update comments etc.

* Update changed MPT access paradigms

why:
  Fixes verified proxy tests

* Fluffy cosmetics
2024-06-27 09:01:26 +00:00
web3-developer ea94e8a351
Use RocksDb column family handles instead of name strings. (#2418)
* Bump RocksDb to latest and update Nimbus database to pass column family handles to RocksDb API.

* Bump RocksDb version.
2024-06-27 16:51:43 +08:00
andri lim b80521a84d
ForkedChain become ForkedChainRef (#2417)
* ForkedChain become ForkedChainRef

It will be shared between engine API, RPC, and txPool

* Fix ForkedChainRef constructor
2024-06-27 12:54:52 +07:00
andri lim 27339e9520
Simplify txpool baseFeeGet (#2416)
* Simplify txpool baseFeeGet

- Avoid using toEVMFork because we are not in EVM
- Rename `isLondon` to `isLondonOrLater`

* Remove timestamp from isLondonOrLater
2024-06-27 12:54:36 +07:00
Jacek Sieka c8cdffa775
Small cleanups (#2414)
* remove unnecessary / expensive error checking
* avoid some trivial memory allocs
* work around table move bug
2024-06-26 09:25:09 +02:00
andri lim cd21c4fbec
ForkedChain implementation (#2405)
* ForkedChain implementation

- revamp test_blockchain_json using ForkedChain
- re-enable previously failing test cases.

* Remove excess error handling

* Avoid reloading parent header

* Do not force base update

* Write baggage to database

* Add findActiveChain to finalizedSegment

* Create new stagingTx in addBlock

* Check last stateRoot existence in test_blockchain_json

* Resolve rebase conflict

* More precise nomenclature for block import cursor

* Ensure bad block nor imported and good block not rejected

* finalizeSegment become forkChoice and align with engine API forkChoice spec

* Display reason when good block rejected

* Fix comments

* Put BaseDistance into CalculateNewBase equation

* Separate finalizedHash from baseHash

* Add more doAssert constraint

* Add push raises: []
2024-06-26 07:27:48 +07:00
Jacek Sieka 3e001e322c
Fix memory usage spikes during sync, give memory to rocksdb (#2413)
* creating a seq from a table that holds lots of changes means copying
all data into the table - this can be several GB of data while syncing
blocks
* nim fails to optimize the moving of the `WidthFirstForest` - the real
solution is to not construct a `wff` to begin with, but this PR provides
relief while that is being worked on

This spike fix allows us to bump the rocksdb cache by another 2 GB and
still have a significantly lower peak memory usage during sync.
2024-06-25 13:39:53 +02:00
Jacek Sieka f294d1e086
Clear account cache after each block (#2411)
When processing long ranges of blocks, the account cache grows unbounded
which cause huge memory spikes.

Here, we move the cache to a second-level cache after each block - the
second-level cache is cleared on the next block after that which creates
a simple LRU effect.

There's a small performance cost of course, though overall the freed-up
memory can now be reassigned to the rocksdb row cache which not only
makes up for the loss but overall leads to a performance increase.

The bump to 2gb of rocksdb row cache here needs more testing but is
slightly less and loosely basedy on the savings from this PR and the
circular ref fix in #2408 - another way to phrase this is that it's
better to give rocksdb more breathing room than let the memory sit
unused until circular ref collection happens ;)
2024-06-25 07:30:32 +02:00
andri lim c79b0b8a47
Avoid loading parent header from db in gaslimit validation (#2410) 2024-06-24 08:40:22 +02:00
andri lim 6a10dfd0fe
Remove pre and post opcode handlers from EVM (#2409) 2024-06-24 07:58:15 +02:00
Jacek Sieka 9521582005
avoid closure environment for mpt methods (#2408)
An instance of `CoreDbMptRef` is created for and stored in every account
- when we are processing blocks and have many accounts in memory, this
closure environment takes up hundreds of mb of memory (around block 5M,
it is the 4:th largest memory consumer!) - incidentally, this also
removes a circular reference in the setup that causes the
`AristoCodeDbMptRef` to linger in memory much longer than it
has to which is the core reason why it takes so much.

The real solution here is to remove the methods indirection entirely,
but this PR provides relief until that has been done.

Similar treatment is given to some of the other core api functions to
avoid circulars there too.
2024-06-24 07:56:41 +02:00
andri lim 99ff8dc876
Fix t8n: blobGasUsed exceeds allowance issue (#2407)
* Fix t8n: blobGasUsed exceeds allowance issue

* Put blobGasUsed validation into transaction precessing pipeline
2024-06-24 07:56:24 +02:00
Jacek Sieka 6b68ff92d3
Allocation-free nibbles buffer (#2406)
This buffer eleminates a large part of allocations during MPT traversal,
reducing overall memory usage and GC pressure.

Ideally, we would use it throughout in the API instead of
`openArray[byte]` since the built-in length limit appropriately exposes
the natural 64-nibble depth constraint that `openArray` fails to
capture.
2024-06-22 22:33:37 +02:00
Jacek Sieka 768307d91d
Cache code and invalid jump destination tables (fixes #2268) (#2404)
It is common for many accounts to share the same code - at the database
level, code is stored by hash meaning only one copy exists per unique
program but when loaded in memory, a copy is made for each account.

Further, every time we execute the code, it must be scanned for invalid
jump destinations which slows down EVM exeuction.

Finally, the extcodesize call causes code to be loaded even if only the
size is needed.

This PR improves on all these points by introducing a shared
CodeBytesRef type whose code section is immutable and that can be shared
between accounts. Further, a dedicated `len` API call is added so that
the EXTCODESIZE opcode can operate without polluting the GC and code
cache, for cases where only the size is requested - rocksdb will in this
case cache the code itself in the row cache meaning that lookup of the
code itself remains fast when length is asked for first.

With 16k code entries, there's a 90% hit rate which goes up to 99%
during the 2.3M attack - the cache significantly lowers memory
consumption and execution time not only during this event but across the
board.
2024-06-21 09:44:10 +02:00
Jacek Sieka 83b3eeeb18
metrics: enable during import (#2401)
This allows monitoring the import process using prometheus/grafana/etc
2024-06-20 19:06:58 +02:00
Jordan Hrycaj 081cb15493
Coredb maintenance (#2398)
* CoreDb: remove PHK tries

why:
  There is no general use anymore for an MPT with a pre-hashed key. It
  was used to resemble the `SecureHexaryTrie` logic from the legacy DB.

  The only pace where this is needed is the `Leger` which uses a
  a distinct MPT version anyway (see `distinct_ledgers.nim`.)

* Rename `CoreDx*` -> `CoreDb*`

why:
  The naming `CoreDx*` was used to differentiate the new CoreDb API from
  the legacy API which had descriptors named `CoreDb*`.
2024-06-19 14:13:12 +00:00
Jordan Hrycaj e7be0d185c
Aristo uses pre classified tree types cont2 (#2397)
* Provide dedicated functions for fetching accounts and storage trees

why:
  Different prototypes for each class `account`, `generic` and
  `storage`.

* Remove `fetchPayload()` and other cruft from API, `aristo_fetch`, etc.

* Fix typos, debugging left overs, comments
2024-06-19 12:40:00 +00:00
andri lim 035ef696a6
EVMC refundGas not breaching host/evm separation anymore (#2395) 2024-06-19 14:15:23 +02:00
andri lim 0e5fd3ffc9
LedgerRef: stateOrVoid become stateEmptyOrVoid (#2394) 2024-06-19 14:14:36 +02:00
andri lim 5a39fc0d69
Remove unused dbkey (#2396) 2024-06-19 14:11:14 +02:00
Jacek Sieka 41cf81f80b
Fix dboptions init (#2391)
For the block cache to be shared between column families, the options
instance must be shared between the various column families being
created. This also ensures that there is only one source of truth for
configuration options instead of having two different sets depending on
how the tables were initialized.

This PR also removes the re-opening mechanism which can double startup
time - every time the database is opened, the log is replayed - a large
log file will take a long time to open.

Finally, several options got correclty implemented as column family
options, including an one that puts a hash index in the SST files.
2024-06-19 10:55:57 +02:00
andri lim 83f6f89869
Add t8n debugging tool and fix EVM regression (#2386)
- fix blockNumber overflow in blockHash op code
- reenable 3 test cases of test_blockchain_json
- fix t8n crash when creating invalid tracer stream
2024-06-19 08:58:08 +07:00
Kim De Mey 4fd2ecddec
Bump nim-eth/web3/kzg4844/nimbus-eth2 and related fixes (#2392)
Bump nim-eth, which requires nimbus-eth2 bump, which requires
bumps of web3 and kzg4844 + related fixes to all those bumps.
2024-06-19 08:57:45 +07:00
Jacek Sieka 1a96b4a97c
evm: generate more specialized functions (#2390)
Nicer name in profiler and avoids a few range checks
2024-06-19 08:57:29 +07:00
Miran ea0d18424a
use Nim 2.0.6 (#2384)
* use Nim 2.0.6

* Fixes for nim 2.0.6

* Workaround nim 2.0 array indexing issue

* Remove excess gcsafe pragma

* Oops, fix recursive template

* Fix imports

* Fluffy nph linting

---------

Co-authored-by: jangko <jangko128@gmail.com>
Co-authored-by: tersec <tersec@users.noreply.github.com>
2024-06-19 01:27:54 +00:00
Jordan Hrycaj 8727307ef4
Aristo uses pre classified tree types cont1 (#2389)
* Provide dedicated functions for deleteing accounts and storage trees

why:
  Storage trees are always linked to an account, so there is no need
  for an application to fiddle about (e.g. re-cycling, unlinking)
  storage tree vertex IDs.

* Remove `delete()` and other cruft from API, `aristo_delete`, etc.

* clean up delete functions

details:
  The delete implementations `deleteImpl()` and `delTreeImpl()` do not
  need to be super generic anymore as all the edge cases are covered by
  the specialised `deleteAccountPayload()`, `deleteGenericData()`, etc.

* Avoid unnecessary re-calculations of account keys

why:
  The function `registerAccountForUpdate()` did extract the storage ID
  (if any) and automatically marked the Merkle keys along the account
  path for re-hashing.

  This would also apply if there was later detected that the account
  or the storage tree did not need to be updated.

  So the `registerAccountForUpdate()` function was split into a part
  which retrieved the storage ID, and another one which marked the
  Merkle keys for re-calculation to be applied only when needed.
2024-06-18 19:30:01 +00:00