Commit Graph

1011 Commits

Author SHA1 Message Date
Jacek Sieka 43d93bcdab
Don't write slot hashes on import (#2564)
The reverse slot hash mechanism causes quite a bit of database traffic
but is broadly not useful except for iterating the storage of an
account, something that a validator never does (it's used by the
tracers).

This flag adds one more thing that is not stored in the database, to be
explored more comprehensively when designing full, validator and archive
modes with different pruning options in the future.

`ldb` says this is 60gb of data (!):
```
ldb --db=. --ignore_unknown_options --column_family=KvtGen approxsize
--hex --from=0x05
--to=0x05ffffffffffffffffffffffffffffffffffffffffffffff
66488353954
```
2024-08-16 08:22:51 +02:00
Jordan Hrycaj 4dbc1653ea
Cleanup (#2565)
* Move snap un-dumpers to aristo unit test folder

why:
  The only place where it is used, now to test the database against
  legacy snap sync dump samples.

   While the details of the dumped data have mostly outlived their purpuse,
   its use as **entropy** data thrown against `Aristo` has still been
   useful to find/debug tricky DB problems.

* Remove cruft

* `nimbus-eth1-blobs` not used anymore as test data source
2024-08-15 12:31:07 +00:00
Jordan Hrycaj ce713d95fc
Aristo lazily delete larger subtrees (#2560)
* Extract sub-tree deletion functions into separate sub-modules

* Move/rename `aristo_desc.accLruSize` => `aristo_constants.ACC_LRU_SIZE`

* Lazily delete sub-trees

why:
  This gives some control of the memory used to keep the deleted vertices
  in the cached layers. For larger sub-trees, keys and vertices might be
  on the persistent backend to a large extend. This would pull an amount
  of extra information from the backend into the cached layer.

  For lazy deleting it is enough to remember sub-trees by a small set of
  (at most 16) sub-roots to be processed when storing persistent data.
  Marking the tree root deleted immediately allows to let most of the code
  base work as before.

* Comments and cosmetics

* No need to import all for `Aristo` here

* Kludge to make `chronicle` usage in sub-modules work with `fluffy`

why:
  That `fluffy` would not run with any logging in `core_deb` is a problem
  I have known for a while. Up to now, logging was only used for debugging.

  With the current `Aristo` PR, there are cases where logging might be
  wanted but this works only if `chronicles` runs without the
  `json[dynamic]` sinks.

  So this should be re-visited.

* More of a kludge
2024-08-14 08:54:44 +00:00
andri lim b8e128203f
Rewire blockValue from Txpool to EngineAPI (#2554) 2024-08-09 06:05:18 +07:00
Jacek Sieka 094486d0ce
Hash bump 2024-08-08 07:46:35 +02:00
Jacek Sieka 3cefd7ed38
move db init to init (#2552)
When using the common interface, the database always (potentially) needs
init - take the opportunity to log some basic database info on startup.
2024-08-08 07:45:30 +02:00
andri lim d5786758b5
TxPool: Merge tx_chain and tx_packer to reduce complexity (#2549)
* TxPool: Merge tx_chain and tx_packer to reduce complexity

* Fix copyright year
2024-08-07 22:35:17 +07:00
Jordan Hrycaj 38572bd8ea
Cache a storage root ID forever in the leaf payload of an account (#2551)
details:
  Stale root IDs are marked disabled while the ID is kept in the leaf
  payload.

why:
  This might lead to further caching advantages.
2024-08-07 13:28:01 +00:00
Jordan Hrycaj 488bdbc267
Provide portal proof functionality with coredb (#2550)
* Provide portal proof functions in `aristo_api`

why:
  So it can be fully supported by `CoreDb`

* Fix prototype in `kvt_api`

* Fix node constructor for account leafs with storage trees

* Provide simple path check based on portal proof functionality

* Provide portal proof functionality in `CoreDb`

* Update TODO list
2024-08-07 11:30:55 +00:00
Jordan Hrycaj 5b502a06c4
Added portal proof nodes generation functionality (#2539)
* Extracted `test_tx.testTxMergeProofAndKvpList()` => separate file

* Fix serialiser

why:
  Typo lead to duplicate rlp-encoded nodes in chain

* Remove cruft

* Implemnt portal proof nodes generators `partXxxTwig()`

* Add unit test for portal proof nodes generator `partAccountTwig()`

* Cosmetics

* Simplify serialiser return code format

* Fix proof generator for extension nodes

why:
  Code was simply bonkers, not detected before the unit tests were
  adapted to check for just this.

* Implemented portal proof nodes verifier `partUntwig()`

* Cosmetics

* Fix `testutp` cli poblem
2024-08-06 11:29:26 +00:00
andri lim ec118a438a
Refactor txpool: reduce complexity (#2542) 2024-08-06 16:12:56 +07:00
Jordan Hrycaj 01b5c08763
Revive json tracer unit tests (#2538)
* Some `Aristo` clean-ups/updates

* Re-implemented core-db tracer functionality

* Rename nimbus tracer `no-tracer.nim` => `tracer.nim`

why:
  Restore original name for easy diff tracking with upcoming update

* Update nimbus tracer using new core-db tracer functionality

* Updating json tracer unit tests

* Enable json tracer unit tests
2024-08-01 10:41:20 +00:00
andri lim e331c9e9b7
TxPool: Replace GasPrice and GasPriceEx with GasInt (#2537)
* TxPool: Replace GasPrice and GasPriceEx with GasInt
2024-07-31 14:33:30 +07:00
Jordan Hrycaj 72c3ab8ced
Provide partial tree support for preloading tests (#2536)
* Implement partial trees

why:
  This is currently needed for unit tests to pre-load the database
  with test data similar to `proof` node pre-load.

  The basic features for `snap-sync` boundary proofs are available
  as well for future use. What is missing is the final proof verification
  and a complete storage data load/merge function (stub is available.)

* Cosmetics, clean up
2024-07-29 20:15:17 +00:00
andri lim 254bda365f
Remove txpool sender locality (#2525)
* Remove txpool sender locality

We no longer distinct local or remote sender

* Fix copyright year
2024-07-25 22:36:08 +07:00
andri lim 0cc730dd05
Fix CodeBytes: invalidPositions out of bound crash (#2523) 2024-07-25 19:23:53 +07:00
andri lim 01ba18da74
Fix sepolia chain config: mergeForkBlock -> 1450409 (#2518)
* Fix sepolia chain config: mergeForkBlock -> 1450407

* Fix test_forkid
2024-07-24 03:07:55 +00:00
andri lim 6d03acec30
TxPool refactoring: Simplify TxChainRef and remove gauges (#2506)
This is one of the txPool refactoring series to make it ready
for integration with the new ForkedChainRef
2024-07-19 16:24:36 +07:00
Jordan Hrycaj 5ac362fe6f
Aristo and kvt balancer management update (#2504)
* Aristo: Merge `delta_siblings` module into `deltaPersistent()`

* Aristo: Add `isEmpty()` for canonical checking whether a layer is empty

* Aristo: Merge `LayerDeltaRef` into `LayerObj`

why:
  No need to maintain nested object refs anymore. Previously the
 `LayerDeltaRef` object had a companion `LayerFinalRef` which held
  non-delta layer information.

* Kvt: Merge `LayerDeltaRef` into `LayerRef`

why:
  No need to maintain nested object refs (as with `Aristo`)

* Kvt: Re-write balancer logic similar to `Aristo`

why:
  Although `Kvt` was a cheap copy of `Aristo` it sort of got out of
  sync and the balancer code was wrong.

* Update iterator over forked peers

why:
  Yield additional field `isLast` indicating that the last iteration
  cycle was approached.

* Optimise balancer calculation.

why:
  One can often avoid providing a new object containing the merge of two
  layers for the balancer. This avoids copying tables. In some cases this
  is replaced by `hasKey()` look ups though. One uses one of the two
  to combine and merges the other into the first.

  Of course, this needs some checks for making sure that none of the
  components to merge is eventually shared with something else.

* Fix copyright year
2024-07-18 21:32:32 +00:00
andri lim ee323d5ff8
Optimize EVM stack usage (#2502)
* EVM: Optimize CALL family stack usage

* EVM: Optimize CREATE family stack usage

* EVM: Optimize arith stack usage

* EVM: Optimize stack usage in the rest of opcodes

* Fix test_op_env and clean up unused imports

* EVM: Optimize arithmetic binary ops
2024-07-18 18:59:53 +07:00
Jordan Hrycaj 6677f57ea9
Aristo balancer clean up (#2501)
* Remove `chunkedMpt` from `persistent()`/`stow()` function

why:
  Proof-mode code was removed with PR #2445 and needs to be re-designed.

* Remove unused `beStateRoot` argument from `deltaMerge()`

* Update/drastically simplify `txStow()`

why:
  Got rid of many boundary conditions

details:
  Many pre-conditions have changed. In particular, previous versions
  used the account state (hash) which was conveniently available and
  checked it against the backend in order to find out whether there
  was something to do, at all. Currently, only an empty set of all
  tables in the delta layer has the balancer update ignored.

  Notable changes are:
  * no check against account state (see above)
  * balancer filters have no hash signature (some legacy stuff left over
    from journals)
  * no (shap sync) proof data which made the generation of the a top layer
    more complex

* Cosmetics, cruft removal

* Update unit test file & function name

why:
  Was legacy module
2024-07-17 19:27:33 +00:00
andri lim 8d1e21bbae
Simplify txPool gasLimit calculator (#2498)
Our need is only a baseline tx pool gasLimit calculator.
If need we can expand it in the future.
But for now, a simple but understandable tx pool is more important.
2024-07-17 20:48:35 +07:00
Jordan Hrycaj a84a2131cd
No ext update (#2494)
* Imported/rebase from `no-ext`, PR #2485

  Store extension nodes together with the branch

  Extension nodes must be followed by a branch - as such, it makes sense
  to store the two together both in the database and in memory:

  * fewer reads, writes and updates to traverse the tree
  * simpler logic for maintaining the node structure
  * less space used, both memory and storage, because there are fewer
    nodes overall

  There is also a downside: hashes can no longer be cached for an
  extension - instead, only the extension+branch hash can be cached - this
  seems like a fine tradeoff since computing it should be fast.

  TODO: fix commented code

* Fix merge functions and `toNode()`

* Update `merkleSignCommit()` prototype

why:
  Result is always a 32bit hash

* Update short Merkle hash key generation

details:
  Ethereum reference MPTs use Keccak hashes as node links if the size of
  an RLP encoded node is at least 32 bytes. Otherwise, the RLP encoded
  node value is used as a pseudo node link (rather than a hash.) This is
  specified in the yellow paper, appendix D.

  Different to the `Aristo` implementation, the reference MPT would not
  store such a node on the key-value database. Rather the RLP encoded node value is stored instead of a node link in a parent node
  is stored as a node link on the parent database.

  Only for the root hash, the top level node is always referred to by the
  hash.

* Fix/update `Extension` sections

why:
  Were commented out after removal of a dedicated `Extension` type which
  left the system disfunctional.

* Clean up unused error codes

* Update unit tests

* Update docu

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2024-07-16 19:47:59 +00:00
Jacek Sieka f3a56002ca
Turn payload into value type (#2483)
The Vertex type unifies branches, extensions and leaves into a single
memory area where the larges member is the branch (128 bytes + overhead) -
the payloads we have are all smaller than 128 thus wrapping them in an
extra layer of `ref` is wasteful from a memory usage perspective.

Further, the ref:s must be visited during the M&S phase of garbage
collection - since we keep millions of these, many of them
short-lived, this takes up significant CPU time.

```
Function	CPU Time: Total	CPU Time: Self	Module	Function (Full)	Source File	Start Address
system::markStackAndRegisters	10.0%	4.922s	nimbus	system::markStackAndRegisters(var<system::GcHeap>).constprop.0	gc.nim	0x701230`
```
2024-07-14 12:02:05 +02:00
Jacek Sieka 72947b3647
odds and ends (#2481)
small cleanups to reduce memory allocations
2024-07-13 20:42:49 +02:00
Jordan Hrycaj f08178c592
Separate constructor helpers for core db and ledger (#2480)
* Extract `CoreDb` constructor helpers from `base.nim` into separate module

why:
  This makes it easier to avoid circular imports.

* Extract `Ledger` constructor helpers from `base.nim` into separate module

why:
  Move `accounts_ledger.nim` file to sub-folder `backend`. That way the
  layout resembles that of the `core_db`.
2024-07-12 19:32:31 +00:00
Jordan Hrycaj b924fdcaa7
Separate config for core db and ledger (#2479)
* Updates and corrections

* Extract `CoreDb` configuration from `base.nim` into separate module

why:
  This makes it easier to avoid circular imports, in particular
  when the capture journal (aka tracer) is revived.

* Extract `Ledger` configuration from `base.nim` into separate module

why:
  This makes it easier to avoid circular imports (if any.)

also:
  Move `accounts_ledger.nim` file to sub-folder `backend`. That way the
  layout resembles that of the `core_db`.
2024-07-12 13:12:25 +00:00
Jordan Hrycaj 800fd77333
Core db remove legacy phrases (#2468)
* Rename `newKvt()` -> `ctx.getKvt()`

why:
  Clean up legacy shortcut. Also, the `KVT` returned is not instantiated
  but refers to the shared `KVT` that resides in a context which is a
  generalisation of an in-memory database fork. The function `ctx`
  retrieves the default context.

* Rename `newTransaction()` -> `ctx.newTransaction()`

why:
  Clean up legacy shortcut. The transaction is applied to a context as a
  generalisation of an in-memory database fork. The function `ctx`
  retrieves the default context.

* Rename `getColumn(CtGeneric)` -> `getGeneric()`

why:
  No more a list of well known sub-tries needed, a single one is enough.
  In fact, `getColumn()` did only support a single sub-tree by now.

* Reduce TODO list
2024-07-10 12:19:35 +00:00
andri lim 4fa3756860
Convert GasInt to uint64, bump nim-eth and nimbus-eth2 (#2461)
* Convert GasInt to uint64, bump nim-eth and nimbus-eth2

* Bump nimbus-eth2

* int64.high.GasInt instead of 0x7fffffffffffffff.GasInt
2024-07-07 06:52:11 +00:00
andri lim 4eaae5cbfa
EVM gasCall values always stay on positive side (#2459)
* EVM gasCall values always stay on positive side

This is also another part of preparations before
converting GasInt to uint64

* Fix test_evm_support
2024-07-06 08:39:22 +07:00
andri lim c775c906a2
Fix LedgerRef storage iterator and add test (#2458) 2024-07-05 10:15:48 +00:00
Jacek Sieka 7d78fd97d5
avoid allocations for slot storage (#2455)
Introduce a new `StoData` payload type similar to `AccountData`

* slightly more efficient storage format
* typed api
* fewer seqs
* fix encoding docs - it wasn't rlp after all :)
2024-07-04 23:48:45 +00:00
tersec 1f40b710ee
fix UnusedImport warnings; bump nim-bearssl, nim-stint, and nim-stew (#2456) 2024-07-05 06:46:59 +07:00
andri lim f04f30c72b
Reduce EVM complexity by removing forkOverride (#2448)
* Reduce EVM complexity by removing forkOverride

* Fixes
2024-07-04 15:48:36 +02:00
Jacek Sieka 81e75622cf
storage: store root id together with vid, for better locality of refe… (#2449)
The state and account MPT:s currenty share key space in the database
based on that vertex id:s are assigned essentially randomly, which means
that when two adjacent slot values from the same contract are accessed,
they might reside at large distance from each other.

Here, we prefix each vertex id by its root causing them to be sorted
together thus bringing all data belonging to a particular contract
closer together - the same effect also happens for the main state MPT
whose nodes now end up clustered together more tightly.

In the future, the prefix given to the storage keys can also be used to
perform range operations such as reading all the storage at once and/or
deleting an account with a batch operation.

Notably, parts of the API already supported this rooting concept while
parts didn't - this PR makes the API consistent by always working with a
root+vid.
2024-07-04 15:46:52 +02:00
Jacek Sieka b23795ab39
remove pPrf, fRpp (#2445)
No longer used now that hashify is gone
2024-07-03 22:21:57 +02:00
Jordan Hrycaj ea7c756a9d
Core db reorg (#2444)
* CoreDb: Merged all sub-descriptors into `base_desc` module

* Dissolve `aristo_db/common_desc.nim`

* No need to export `Aristo` methods in `CoreDb`

* Resolve/tighten methods in `aristo_db` sub-moduled

why:
  So they can be straihgt implemented into the `base` module

* Moved/re-implemented `KVT` methods into `base` module

* Moved/re-implemented `MPT` methods into `base` module

* Moved/re-implemented account methods into `base` module

* Moved/re-implemented `CTX` methods into `base` module

* Moved/re-implemented `handler_{aristo,kvt}` into `aristo_db` module

* Moved/re-implemented `TX` methods into `base` module

* Moved/re-implemented base methods into `base` module

* Replaced `toAristoSavedStateBlockNumber()` by proper base method

why:
  Was the last for keeping reason for keeping low level backend access
  methods

* Remove dedicated low level access to `Aristo` backend

why:
  Not needed anymore, for debugging the descriptors can be accessed
  directly

also:
  some clean up stuff

* Re-factor `CoreDb` descriptor layout and adjust base methods

* Moved/re-implemented iterators into `base_iterator*` modules

* Update docu
2024-07-03 15:50:27 +00:00
Jacek Sieka 1f60e8e453
Use `Hash256` directly for account path (#2439)
Account paths are always a hash - passing it around as such helps avoid
confusion as to how long it is
2024-07-03 10:14:26 +02:00
Jordan Hrycaj 2c87fd1636
Aristo code cosmetics and tests update (#2434)
* Update some docu

* Resolve obsolete compile time option

why:
  Not optional anymore

* Update checks

why:
  The notion of what constitutes a valid `Aristo` db has changed due to
  (even more) lazy calculating Merkle hash keys.

* Disable redundant unit test for production
2024-07-01 10:59:18 +00:00
andri lim 740882d8ce
Import forked_chain_test in all_tests (#2433) 2024-07-01 09:57:42 +07:00
andri lim 401537ad38
Add ForkedChainRef tests (#2430)
ForkedChainRef have become quite complex.
test_blockchain_json is not sufficient cover for edge cases
or synthetic cases.
2024-06-30 14:40:14 +07:00
andri lim c24affadee
Use simpler schema when writing transactions, receipts, and withdrawals (#2420)
* Use simpler schema when writing transactions, receipts, and withdrawals

Using MPT not only slow but also take up more spaces than needed.
Aristo will remove older tries and only keep the last block tries.
Using simpler schema will avoid those problems.

* Rename getTransaction to getTransactionByIndex
2024-06-29 12:43:17 +07:00
andri lim b751d3adee
Combine smaller tests into bigger one (#2425)
1. test_state_db and test_ledger -> test_ledger.
   They are the same thing now.
2. stack, memory, code_stream, gas_meter, misc,
   overflow -> test_evm_support.
   They are small tests and fall into the same area.
2024-06-29 08:57:30 +07:00
Jordan Hrycaj 8dd038144b
Some cleanups (#2428)
* Remove `dirty` set from structural objects

why:
  Not used anymore, the tree is dirty by default.

* Rename `aristo_hashify` -> `aristo_compute`

* Remove cruft, update comments, cosmetics, etc.

* Simplify `SavedState` object

why:
  The key chaining have become obsolete after extra lazy hashing. There
  is some available space for a state hash to be maintained in future.

details:
  Accept the legacy `SavedState` object serialisation format for a
  while (which will be overwritten by new format.)
2024-06-28 18:43:04 +00:00
Jordan Hrycaj 14c3772545
On demand mpt revisited (#2426)
* rebased from `github/on-demand-mpt`

ackn:
  wip: on-demand mpt construction

  Given that actual data is stored in the `Vertex` structure, it's useful
  to think of the MPT as a cache for computing roots rather than being a
  functional requirement on its own.

  This PR engenders this line of thinking by incrementally computing the
  MPT only when it's needed, ie when a state (or similar) root is needed.

  This has the effect of siginficantly reducing memory usage as well as
  improving performance:

  * no need for dirty-mpt-node book-keeping
  * no need to build complex forest of upcoming hashing work
  * only hashes that are functionally needed are ever computed -
  intermediate nodes whose MTP root is not observed are never computed /
  processed

* Unit test hot fixes

* Unit test hot fixes cont.

(somehow lost that part)

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2024-06-28 15:03:12 +00:00
andri lim 44deff9b28
Enable test_txpool by disabling failing cases (#2421)
* Enable test_txpool by disabling failing cases

Because we cannot use goerli replay to feed the txpool anymore,
we use only a list of transactions.

But some test cases still failing because it requires block state
replay.

* Fix tx info
2024-06-28 11:53:25 +07:00
Jordan Hrycaj 6dc2773957
Only use pre hashed addresses as account keys (#2424)
* Normalised storage tree addressing in function prototypes

detail:
  Argument list is always `<db> <account-path> <slot-path> ..` with
  both path arguments as `openArray[]`

* Remove cruft

* CoreDb internally Use full account paths rather than addresses

* Update API logging

* Use hashed account address only in prototypes

why:
  This avoids unnecessary repeated hashing of the same account address.
  The burden of doing that is upon the application. In the case here,
  the ledger caches all kinds of stuff anyway so it is common sense to
  exploit that for account address hashes.

caveat:
  Using `openArray[byte]` argument types for hashed accounts is inherently
  fragile. In non-release mode, a length verification `doAssert` is
  enabled by default.

* No accPath in data record (use `AristoAccount` as `CoreDbAccount`)

* Remove now unused `eAddr` field from ledger `AccountRef` type

why:
  Is duplicate of lookup key

* Avoid merging the account record/statement in the ledger twice.
2024-06-27 19:21:01 +00:00
Jordan Hrycaj 61bbf40014
Update storage tree admin (#2419)
* Tighten `CoreDb` API for accounts

why:
  Apart from cruft, the way to fetch the accounts state root via a
  `CoreDbColRef` record was unnecessarily complicated.

* Extend `CoreDb` API for accounts to cover storage tries

why:
  In future, this will make the notion of column objects obsolete. Storage
  trees will then be indexed by the account address rather than the vertex
  ID equivalent like a `CoreDbColRef`.

* Apply new/extended accounts API to ledger and tests

details:
  This makes the `distinct_ledger` module obsolete

* Remove column object constructors

why:
  They were needed as an abstraction of MPT sub-trees including storage
  trees. Now, storage trees are handled by the account (e.g. via address)
  they belong to and all other trees can be identified by a constant well
  known vertex ID. So there is no need for column objects anymore.

  Still there are some left-over column object methods wnich will be
  removed next.

* Remove `serialise()` and `PayloadRef` from default Aristo API

why:
  Not needed. `PayloadRef` was used for unstructured/unknown payload
  formats (account or blob) and `serialise()` was used for decodng
  `PayloadRef`. Now it is known in advance what the payload looks
  like.

* Added query function `hasStorageData()` whether a storage area exists

why:
  Useful for supporting `slotStateEmpty()` of the `CoreDb` API

* In the `Ledger` replace `storage.stateEmpty()` by 	`slotStateEmpty()`

* On Aristo, hide the storage root/vertex ID in the `PayloadRef`

why:
  The storage vertex ID is fully controlled by Aristo while the
  `AristoAccount` object is controlled by the application. With the
  storage root part of the `AristoAccount` object, there was a useless
  administrative burden to keep that storage root field up to date.

* Remove cruft, update comments etc.

* Update changed MPT access paradigms

why:
  Fixes verified proxy tests

* Fluffy cosmetics
2024-06-27 09:01:26 +00:00
andri lim b80521a84d
ForkedChain become ForkedChainRef (#2417)
* ForkedChain become ForkedChainRef

It will be shared between engine API, RPC, and txPool

* Fix ForkedChainRef constructor
2024-06-27 12:54:52 +07:00
andri lim cd21c4fbec
ForkedChain implementation (#2405)
* ForkedChain implementation

- revamp test_blockchain_json using ForkedChain
- re-enable previously failing test cases.

* Remove excess error handling

* Avoid reloading parent header

* Do not force base update

* Write baggage to database

* Add findActiveChain to finalizedSegment

* Create new stagingTx in addBlock

* Check last stateRoot existence in test_blockchain_json

* Resolve rebase conflict

* More precise nomenclature for block import cursor

* Ensure bad block nor imported and good block not rejected

* finalizeSegment become forkChoice and align with engine API forkChoice spec

* Display reason when good block rejected

* Fix comments

* Put BaseDistance into CalculateNewBase equation

* Separate finalizedHash from baseHash

* Add more doAssert constraint

* Add push raises: []
2024-06-26 07:27:48 +07:00