* Extract `CoreDb` constructor helpers from `base.nim` into separate module
why:
This makes it easier to avoid circular imports.
* Extract `Ledger` constructor helpers from `base.nim` into separate module
why:
Move `accounts_ledger.nim` file to sub-folder `backend`. That way the
layout resembles that of the `core_db`.
* Updates and corrections
* Extract `CoreDb` configuration from `base.nim` into separate module
why:
This makes it easier to avoid circular imports, in particular
when the capture journal (aka tracer) is revived.
* Extract `Ledger` configuration from `base.nim` into separate module
why:
This makes it easier to avoid circular imports (if any.)
also:
Move `accounts_ledger.nim` file to sub-folder `backend`. That way the
layout resembles that of the `core_db`.
hike allocations (and the garbage collection maintenance that follows)
are responsible for some 10% of cpu time (not wall time!) at this point
- this PR avoids them by stepping through the layers one step at a time,
simplifying the code at the same time.
* Rename `newKvt()` -> `ctx.getKvt()`
why:
Clean up legacy shortcut. Also, the `KVT` returned is not instantiated
but refers to the shared `KVT` that resides in a context which is a
generalisation of an in-memory database fork. The function `ctx`
retrieves the default context.
* Rename `newTransaction()` -> `ctx.newTransaction()`
why:
Clean up legacy shortcut. The transaction is applied to a context as a
generalisation of an in-memory database fork. The function `ctx`
retrieves the default context.
* Rename `getColumn(CtGeneric)` -> `getGeneric()`
why:
No more a list of well known sub-tries needed, a single one is enough.
In fact, `getColumn()` did only support a single sub-tree by now.
* Reduce TODO list
This trivial bump should improve performance a bit without costing too
much memory - as the trie grows, so does the number of levels in it and
creating hikes becomes ever more expensive - hopefully this cache
increase should give a nice little boost even if it's not a lot.
Avoid writing the same slot/hash values to the hash->slot mapping
to avoid spamming the rocksdb WAL and cause unnecessary compaction
In the same vein, avoid writing trivially detectable A-B-A storage
changes which happen with surprising frequency.
Introduce a new `StoData` payload type similar to `AccountData`
* slightly more efficient storage format
* typed api
* fewer seqs
* fix encoding docs - it wasn't rlp after all :)
This significantly speeds up block import at the cost of less protection
against invalid data, potentially resulting in an invalid database
getting stored.
The risk is small given that import is used only for validated data -
evaluating the right level of of validation vs performance is left for a
future PR.
A side effect of this approach is that there is no cached stated root in
the database - computing it currently requires a lot of memory since the
intermediate roots get cached in memory in full while the computation is
ongoing - a future PR will need to address this deficiency, for example
by streaming the already-computed hashes directly to the database.
The state and account MPT:s currenty share key space in the database
based on that vertex id:s are assigned essentially randomly, which means
that when two adjacent slot values from the same contract are accessed,
they might reside at large distance from each other.
Here, we prefix each vertex id by its root causing them to be sorted
together thus bringing all data belonging to a particular contract
closer together - the same effect also happens for the main state MPT
whose nodes now end up clustered together more tightly.
In the future, the prefix given to the storage keys can also be used to
perform range operations such as reading all the storage at once and/or
deleting an account with a batch operation.
Notably, parts of the API already supported this rooting concept while
parts didn't - this PR makes the API consistent by always working with a
root+vid.
The storage id is frequently accessed when executing contract code and
finding the path via the database requires several hops making the
process slow - here, we add a cache to keep the most recently used
account storage id:s in memory.
A possible future improvement would be to cache all account accesses so
that for example updating the balance doesn't cause several hikes.
* CoreDb: Merged all sub-descriptors into `base_desc` module
* Dissolve `aristo_db/common_desc.nim`
* No need to export `Aristo` methods in `CoreDb`
* Resolve/tighten methods in `aristo_db` sub-moduled
why:
So they can be straihgt implemented into the `base` module
* Moved/re-implemented `KVT` methods into `base` module
* Moved/re-implemented `MPT` methods into `base` module
* Moved/re-implemented account methods into `base` module
* Moved/re-implemented `CTX` methods into `base` module
* Moved/re-implemented `handler_{aristo,kvt}` into `aristo_db` module
* Moved/re-implemented `TX` methods into `base` module
* Moved/re-implemented base methods into `base` module
* Replaced `toAristoSavedStateBlockNumber()` by proper base method
why:
Was the last for keeping reason for keeping low level backend access
methods
* Remove dedicated low level access to `Aristo` backend
why:
Not needed anymore, for debugging the descriptors can be accessed
directly
also:
some clean up stuff
* Re-factor `CoreDb` descriptor layout and adjust base methods
* Moved/re-implemented iterators into `base_iterator*` modules
* Update docu
These representations use ~15-20% less data compared to the status quo,
mainly by removing redundant zeroes in the integer encodings - a
significant effect of this change is that the various rocksdb caches see
better efficiency since more items fit in the same amount of space.
* use RLP encoding for `VertexID` and `UInt256` wherever it appears
* pack `VertexRef`/`PayloadRef` more tightly
* avoid costly hike memory allocations for operations that don't need to
re-traverse it
* avoid unnecessary state checks (which might trigger unwanted state
root computations)
* disable optimize-for-hits due to the MPT no longer being complete at
all times
* Update some docu
* Resolve obsolete compile time option
why:
Not optional anymore
* Update checks
why:
The notion of what constitutes a valid `Aristo` db has changed due to
(even more) lazy calculating Merkle hash keys.
* Disable redundant unit test for production
* Use simpler schema when writing transactions, receipts, and withdrawals
Using MPT not only slow but also take up more spaces than needed.
Aristo will remove older tries and only keep the last block tries.
Using simpler schema will avoid those problems.
* Rename getTransaction to getTransactionByIndex
* Remove `dirty` set from structural objects
why:
Not used anymore, the tree is dirty by default.
* Rename `aristo_hashify` -> `aristo_compute`
* Remove cruft, update comments, cosmetics, etc.
* Simplify `SavedState` object
why:
The key chaining have become obsolete after extra lazy hashing. There
is some available space for a state hash to be maintained in future.
details:
Accept the legacy `SavedState` object serialisation format for a
while (which will be overwritten by new format.)
* rebased from `github/on-demand-mpt`
ackn:
wip: on-demand mpt construction
Given that actual data is stored in the `Vertex` structure, it's useful
to think of the MPT as a cache for computing roots rather than being a
functional requirement on its own.
This PR engenders this line of thinking by incrementally computing the
MPT only when it's needed, ie when a state (or similar) root is needed.
This has the effect of siginficantly reducing memory usage as well as
improving performance:
* no need for dirty-mpt-node book-keeping
* no need to build complex forest of upcoming hashing work
* only hashes that are functionally needed are ever computed -
intermediate nodes whose MTP root is not observed are never computed /
processed
* Unit test hot fixes
* Unit test hot fixes cont.
(somehow lost that part)
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
* Normalised storage tree addressing in function prototypes
detail:
Argument list is always `<db> <account-path> <slot-path> ..` with
both path arguments as `openArray[]`
* Remove cruft
* CoreDb internally Use full account paths rather than addresses
* Update API logging
* Use hashed account address only in prototypes
why:
This avoids unnecessary repeated hashing of the same account address.
The burden of doing that is upon the application. In the case here,
the ledger caches all kinds of stuff anyway so it is common sense to
exploit that for account address hashes.
caveat:
Using `openArray[byte]` argument types for hashed accounts is inherently
fragile. In non-release mode, a length verification `doAssert` is
enabled by default.
* No accPath in data record (use `AristoAccount` as `CoreDbAccount`)
* Remove now unused `eAddr` field from ledger `AccountRef` type
why:
Is duplicate of lookup key
* Avoid merging the account record/statement in the ledger twice.
* Tighten `CoreDb` API for accounts
why:
Apart from cruft, the way to fetch the accounts state root via a
`CoreDbColRef` record was unnecessarily complicated.
* Extend `CoreDb` API for accounts to cover storage tries
why:
In future, this will make the notion of column objects obsolete. Storage
trees will then be indexed by the account address rather than the vertex
ID equivalent like a `CoreDbColRef`.
* Apply new/extended accounts API to ledger and tests
details:
This makes the `distinct_ledger` module obsolete
* Remove column object constructors
why:
They were needed as an abstraction of MPT sub-trees including storage
trees. Now, storage trees are handled by the account (e.g. via address)
they belong to and all other trees can be identified by a constant well
known vertex ID. So there is no need for column objects anymore.
Still there are some left-over column object methods wnich will be
removed next.
* Remove `serialise()` and `PayloadRef` from default Aristo API
why:
Not needed. `PayloadRef` was used for unstructured/unknown payload
formats (account or blob) and `serialise()` was used for decodng
`PayloadRef`. Now it is known in advance what the payload looks
like.
* Added query function `hasStorageData()` whether a storage area exists
why:
Useful for supporting `slotStateEmpty()` of the `CoreDb` API
* In the `Ledger` replace `storage.stateEmpty()` by `slotStateEmpty()`
* On Aristo, hide the storage root/vertex ID in the `PayloadRef`
why:
The storage vertex ID is fully controlled by Aristo while the
`AristoAccount` object is controlled by the application. With the
storage root part of the `AristoAccount` object, there was a useless
administrative burden to keep that storage root field up to date.
* Remove cruft, update comments etc.
* Update changed MPT access paradigms
why:
Fixes verified proxy tests
* Fluffy cosmetics
* Simplify txpool baseFeeGet
- Avoid using toEVMFork because we are not in EVM
- Rename `isLondon` to `isLondonOrLater`
* Remove timestamp from isLondonOrLater
* ForkedChain implementation
- revamp test_blockchain_json using ForkedChain
- re-enable previously failing test cases.
* Remove excess error handling
* Avoid reloading parent header
* Do not force base update
* Write baggage to database
* Add findActiveChain to finalizedSegment
* Create new stagingTx in addBlock
* Check last stateRoot existence in test_blockchain_json
* Resolve rebase conflict
* More precise nomenclature for block import cursor
* Ensure bad block nor imported and good block not rejected
* finalizeSegment become forkChoice and align with engine API forkChoice spec
* Display reason when good block rejected
* Fix comments
* Put BaseDistance into CalculateNewBase equation
* Separate finalizedHash from baseHash
* Add more doAssert constraint
* Add push raises: []
* creating a seq from a table that holds lots of changes means copying
all data into the table - this can be several GB of data while syncing
blocks
* nim fails to optimize the moving of the `WidthFirstForest` - the real
solution is to not construct a `wff` to begin with, but this PR provides
relief while that is being worked on
This spike fix allows us to bump the rocksdb cache by another 2 GB and
still have a significantly lower peak memory usage during sync.
When processing long ranges of blocks, the account cache grows unbounded
which cause huge memory spikes.
Here, we move the cache to a second-level cache after each block - the
second-level cache is cleared on the next block after that which creates
a simple LRU effect.
There's a small performance cost of course, though overall the freed-up
memory can now be reassigned to the rocksdb row cache which not only
makes up for the loss but overall leads to a performance increase.
The bump to 2gb of rocksdb row cache here needs more testing but is
slightly less and loosely basedy on the savings from this PR and the
circular ref fix in #2408 - another way to phrase this is that it's
better to give rocksdb more breathing room than let the memory sit
unused until circular ref collection happens ;)
An instance of `CoreDbMptRef` is created for and stored in every account
- when we are processing blocks and have many accounts in memory, this
closure environment takes up hundreds of mb of memory (around block 5M,
it is the 4:th largest memory consumer!) - incidentally, this also
removes a circular reference in the setup that causes the
`AristoCodeDbMptRef` to linger in memory much longer than it
has to which is the core reason why it takes so much.
The real solution here is to remove the methods indirection entirely,
but this PR provides relief until that has been done.
Similar treatment is given to some of the other core api functions to
avoid circulars there too.
This buffer eleminates a large part of allocations during MPT traversal,
reducing overall memory usage and GC pressure.
Ideally, we would use it throughout in the API instead of
`openArray[byte]` since the built-in length limit appropriately exposes
the natural 64-nibble depth constraint that `openArray` fails to
capture.
It is common for many accounts to share the same code - at the database
level, code is stored by hash meaning only one copy exists per unique
program but when loaded in memory, a copy is made for each account.
Further, every time we execute the code, it must be scanned for invalid
jump destinations which slows down EVM exeuction.
Finally, the extcodesize call causes code to be loaded even if only the
size is needed.
This PR improves on all these points by introducing a shared
CodeBytesRef type whose code section is immutable and that can be shared
between accounts. Further, a dedicated `len` API call is added so that
the EXTCODESIZE opcode can operate without polluting the GC and code
cache, for cases where only the size is requested - rocksdb will in this
case cache the code itself in the row cache meaning that lookup of the
code itself remains fast when length is asked for first.
With 16k code entries, there's a 90% hit rate which goes up to 99%
during the 2.3M attack - the cache significantly lowers memory
consumption and execution time not only during this event but across the
board.
* CoreDb: remove PHK tries
why:
There is no general use anymore for an MPT with a pre-hashed key. It
was used to resemble the `SecureHexaryTrie` logic from the legacy DB.
The only pace where this is needed is the `Leger` which uses a
a distinct MPT version anyway (see `distinct_ledgers.nim`.)
* Rename `CoreDx*` -> `CoreDb*`
why:
The naming `CoreDx*` was used to differentiate the new CoreDb API from
the legacy API which had descriptors named `CoreDb*`.
* Provide dedicated functions for fetching accounts and storage trees
why:
Different prototypes for each class `account`, `generic` and
`storage`.
* Remove `fetchPayload()` and other cruft from API, `aristo_fetch`, etc.
* Fix typos, debugging left overs, comments
For the block cache to be shared between column families, the options
instance must be shared between the various column families being
created. This also ensures that there is only one source of truth for
configuration options instead of having two different sets depending on
how the tables were initialized.
This PR also removes the re-opening mechanism which can double startup
time - every time the database is opened, the log is replayed - a large
log file will take a long time to open.
Finally, several options got correclty implemented as column family
options, including an one that puts a hash index in the SST files.
* Provide dedicated functions for deleteing accounts and storage trees
why:
Storage trees are always linked to an account, so there is no need
for an application to fiddle about (e.g. re-cycling, unlinking)
storage tree vertex IDs.
* Remove `delete()` and other cruft from API, `aristo_delete`, etc.
* clean up delete functions
details:
The delete implementations `deleteImpl()` and `delTreeImpl()` do not
need to be super generic anymore as all the edge cases are covered by
the specialised `deleteAccountPayload()`, `deleteGenericData()`, etc.
* Avoid unnecessary re-calculations of account keys
why:
The function `registerAccountForUpdate()` did extract the storage ID
(if any) and automatically marked the Merkle keys along the account
path for re-hashing.
This would also apply if there was later detected that the account
or the storage tree did not need to be updated.
So the `registerAccountForUpdate()` function was split into a part
which retrieved the storage ID, and another one which marked the
Merkle keys for re-calculation to be applied only when needed.
* Remove unused `merge*()` functions (for production)
details:
Some functionality moved to test suite
* Make sure that only `AccountData` leaf type is exactly used on VertexID(1)
* clean up payload type
* Provide dedicated functions for merging accounts and storage trees
why:
Storage trees are always linked to an account, so there is no need
for an application to fiddle about (e.e. creating, re-cycling) with
storage tree vertex IDs.
* CoreDb: Disable tracer functionality
why:
Must be updated to accommodate new/changed `Aristo` functions.
* CoreDb: Use new `mergeXXX()` functions
why:
Makes explicit vertex ID management obsolete for creating new
storage trees.
* Remove `mergePayload()` and other cruft from API, `aristo_merge`, etc.
* clean up merge functions
details:
The merge implementation `mergePayloadImpl()` does not need to be super
generic anymore as all the edge cases are covered by the specialised
functions `mergeAccountPayload()`, `mergeGenericData()`, and
`mergeStorageData()`.
* No tracer available at the moment, so disable offending tests
The state root computation here is one of the major hotspots in block
processing - in the cases the code only needs to know if it's empty or
not, it can be done a lot faster.
Adding a separate function for this looks fragile and should probably be
revisited.
Broadly, when importing blocks we don't need a transaction / frame per
block because we can simply abort the whole update and try again with a
smaller range if we find a faulty block.
Of course, this applies mainly to semi-trusted blocks where we're not
expected to fail in applying them - this could be blocks either from
files or header-verified blocks as given by consensus.
* Remove AccountStateDB
AccountStateDB should no longer be used.
It's usage have been reduce to read only operations.
Replace it with LedgerRef to reduce maintenance burden.
* remove extra spaces
Co-authored-by: tersec <tersec@users.noreply.github.com>
---------
Co-authored-by: tersec <tersec@users.noreply.github.com>
When performing block import, we can batch state root verifications and
header checks, doing them only once per chunk of blocks, assuming that
the other blocks in the batch are valid by extension.
When we're not generating receipts, we can also skip per-transaction
state root computation pre-byzantium, which is what provides a ~20%
speedup in this PR, at least on those early blocks :)
We also stop storing transactions, receipts and uncles redundantly when
importing from era1 - there is no need to waste database storage on this
when we can load it from the era1 file (eventually).
State lookups potentially trigger expensive re-hashings - this is the
first of several steps to remove the unnecessary ones from the general
flow of block processing
* avoid re-reading parent block header from database when it's already
in memory
* Fix initialiser
why:
Possible crash (app profiling, tracer etc.)
* Update column family options processing
why:
Same for kvt as for aristo
* Move `AristoDbDualRocks` backend type to the test suite
why:
So it is not available for production
* Fix typos in API jump table
why:
Used for tracing and app profiling only. Needed some update
* Purged CoreDb legacy API
why:
Not needed anymore, was transitionary and disabled.
* Rename `flush` argument to `eradicate` in a DB close context
why:
The word `eradicate` leaves no doubt what is meant
* Rename `stoFlush()` -> `stoDelete()`
* Rename `core_apps_newapi` -> `core_apps` (not so new anymore)
* Bump nim-eth, nim-web3, nimbus-eth2
- Replace std.Option with results.Opt
- Fields name changes
* More fixes
* Fix Portal stream async raises and portal testnet Opt usage
* Bump eth + nimbus-eth2 + more fixes related to eth_types changes
* Fix in utp test app and nimbus-eth2 bump
* Fix test_blockchain_json rebase conflict
* Fix EVMC block_timestamp conversion plus commentary
---------
Co-authored-by: kdeme <kim.demey@gmail.com>
* bump rockdb
* Rename `KVT` objects related to filters according to `Aristo` naming
details:
filter* => delta*
roFilter => balancer
* Compulsory error handling if `persistent()` fails
* Add return code to `reCentre()`
why:
Might eventually fail if re-centring is blocked. Some logic will be
added in subsequent patch sets.
* Add column families from earlier session to rocksdb in opening procedure
why:
All previously used CFs must be declared when re-opening an existing
database.
* Update `init()` and add rocksdb `reinit()` methods for changing parameters
why:
Opening a set column families (with different open options) must span
at least the ones that are already on disk.
* Provide write-trigger-event interface into `Aristo` backend
why:
This allows to save data from a guest application (think `KVT`) to
get synced with the write cycle so the guest and `Aristo` save all
atomically.
* Use `KVT` with new column family interface from `Aristo`
* Remove obsolete guest interface
* Implement `KVT` piggyback on `Aristo` backend
* CoreDb: Add separate `KVT`/`Aristo` backend mode for debugging
* Remove `rocks_db` import from `persist()` function
why:
Some systems (i.p `fluffy` and friends) use the `Aristo` memory
backend emulation and do not link against rocksdb when building the
application. So this should fix that problem.
These options, inspired by Nethermind and general internet wisdom, bring
the database size down to 2/3 without affecting throughput. In theory,
they should also bring down memory usage and/or make more efficient use
of whatever memory is already assigned to rocksdb but this needs
verification in a longer test at synced-mainnet sizes.
In the meantime, they make testing easier by removing some noise that
the profiler says are bad, such as excessive SkipList access (countered
by bloom filters).
* Use RocksDb column families instead of a prefixed single column
why:
Better performance
* Use structural objects `VertexRef` and `HashKey` in LRU cache for RocksDb
why:
Avoids repeated de/serialisation
`initTable` is obsolete since nim 0.19 and can introduce significant
memory overhead while providing no benefit (since the table will be
grown to the default initial size on first use anyway).
In particular, aristo layers will not necessarily use all tables they
initialize, for exampe when many empty accounts are being created.
This PR consolidates the split header-body sequences into a single EthBlock
sequence and cleans up the fallout from that which significantly reduces
block processing overhead during import thanks to less garbage collection
and fewer copies of things all around.
Notably, since the number of headers must always match the number of bodies,
we also get rid of a pointless degree of freedom that in the future could
introduce unnecessary bugs.
* only read header and body from era file
* avoid several unnecessary copies along the block processing way
* simplify signatures, cleaning up unused arguemnts and returns
* use `stew/assign2` in a few strategic places where the generated
nim assignent is slow and add a few `move` to work around poor
analysis in nim 1.6 (will need to be revisited for 2.0)
```
stats-20240607_2223-a814aa0b.csv vs stats-20240608_0714-21c1d0a9.csv
bps_x bps_y tps_x tps_y bpsd tpsd timed
block_number
(498305, 713245] 1,540.52 1,809.73 2,361.58 2775.340189 17.63% 17.63% -14.92%
(713245, 928185] 730.36 865.26 1,715.90 2028.973852 18.01% 18.01% -15.21%
(928185, 1143126] 663.03 789.10 2,529.26 3032.490771 19.79% 19.79% -16.28%
(1143126, 1358066] 393.46 508.05 2,152.50 2777.578119 29.13% 29.13% -22.50%
(1358066, 1573007] 370.88 440.72 2,351.31 2791.896052 18.81% 18.81% -15.80%
(1573007, 1787947] 283.65 335.11 2,068.93 2441.373402 17.60% 17.60% -14.91%
(1787947, 2002888] 287.29 342.11 2,078.39 2474.179448 18.99% 18.99% -15.91%
(2002888, 2217828] 293.38 343.16 2,208.83 2584.77457 17.16% 17.16% -14.61%
(2217828, 2432769] 140.09 167.86 1,081.87 1296.336926 18.82% 18.82% -15.80%
blocks: 1934464, baseline: 3h13m1s, contender: 2h43m47s
bpsd (mean): 19.55%
tpsd (mean): 19.55%
Time (total): -29m13s, -15.14%
```
* Cleanup unneeded stateless and block witness code. Keeping MultiKeys which is used in the eth_getProofsByBlockNumber RPC endpoint which is needed for the Fluffy state network bridge.
* Rename generateWitness flag to collectWitnessData to better describe what the flag does. We only collect the keys of the touched accounts and storage slots but no block witness generation is supported for now.
* Move remaining stateless code into nimbus directory.
* Add vmstate parameter to ChainRef to fix test.
* Exclude *.in from check copyright year
---------
Co-authored-by: jangko <jangko128@gmail.com>
* Code cosmetics
* Re-org `aristo_merge`, internally split into sub-modules
why:
Became a burden for maintenance because it hosts two different
functionalities under the same merge paradigm: account/data merge
and snap proof merge where the latter produces a partial trie.
* Fix CoreDb tracer
* Ledger: fix potential account vs. storage tree sync problems
* Remove bound on the size of removable whole storage trees
* Activate `test_tracer_json`
* Remove exception from evm memory
* Remove exception from gas meter
* Remove exception from stack
* Remove exception from precompiles
* Remove exception from gas_costs
* Remove exception from op handlers
* Remove exception from op dispatcher
* Remove exception from call_evm
* Remove exception from EVM
* Fix tools and tests
* Remove exception from EVMC
* fix evmc
* Fix evmc
* Remove remnants of async evm stuff
* Remove superflous error handling
* Proc to func
* Fix errors detected by CI
* Fix EVM op call stack usage
* REmove exception handling from getVmState
* Better error message instead of just doAssert
* Remove unused validation
* Remove superflous catchRaise
* Use results.expect instead of unsafeValue
This new option saves a CSV to disk while performing `import` such that
the performance of one import can be compared with the other.
This early version is likely to change in the future
It's unused and causing trouble because of unhandled exception effects -
if we were to use it, it would need re-implementation such that it
doesn't reallocate the whole queue on writing.
* CoreDb: Remove crufty second/off-site KVT
why:
Was used to allow late `Clique` to store directly to disk
* CoreDb: Remove prune flag related functionality
why:
Is completely legacy stuff
* CoreDb: Remove dependence on legacy API (tests unsupported yet)
why:
Does not fully support Aristo
* Re-factoring `state_db` using new API
details:
Only minimum changes needed to compile `nimbus`
* Update tests and aux modules
* Turn off legacy API and remove `distinct_tries`
comment:
The legacy API has now cruft status, will be removed soon
* Fix copyright years
* Update rpc for verified proxy
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
* Fix `blobify()` for `SavedState` object
why:
Have to treat varying sizes for `HashKey`, i.p. for an empty key which
has zero size.
* Store correct block number in `SavedState` record
why:
Stored `block-number - 1` for some obscure reason.
* Cosmetcs, docu
These options are there mainly to drive experiments, and are therefore
hidden.
One thing that this PR brings in is an initial set of caches and buffers for rocksdb - the set that I've been using during various performance tests to get to a viable baseline performance level.
The `rocksdb` version shipped with distributions is typically old and
therefore often lacks features we use - it also doesn't match the one
assumed by nim-rocksdb leading to ABI mismatch risks.
Instead of depending on the system rocksdb, we'll now use the rocksdb
version assumed by nim-rocksdb and locked in its vendor folder by always
building it together with nimbus.
This avoids the problem of unknown rocksdb versions at a (small) cost to
build time.
CI caching and full windows support for building from source [remains
TODO](https://github.com/status-im/nim-rocksdb/issues/44).
* Remove all journal related stuff
* Refactor function names journal*() => delta*(), filter*() => delta*()
* remove `trg` fileld from `FilterRef`
why:
Same as `kMap[$1]`
* Re-type FilterRef.src as `HashKey`
why:
So it is directly comparable to `kMap[$1]`
* Moved `vGen[]` field from `LayerFinalRef` to `LayerDeltaRef`
why:
Then a separate `FilterRef` type is not needed, anymore
* Rename `roFilter` field in `AristoDbRef` => `balancer`
why:
New name more appropriate.
* Replace `FilterRef` by `LayerDeltaRef` type
why:
This allows to avoid copying into the `balancer` (see next patch set)
most of the time. Typically, only one instance is running on the backend
and the `balancer` is only used as a stage before saving data.
* Refactor way how to store data persistently
why:
Avoid useless copy when staging `top` layer for persistently saving to
backend.
* Fix copyright header?
`persist` is a hotspot when processing blocks because it is run at least
once per transaction and loops over the entire account cache every time.
Here, we introduce an extra `dirty` map that keeps track of all accounts
that need checking during `persist` which fixes the immediate
inefficiency, though probably this could benefit from a more thorough
review - we also get rid of the unused clearCache flag - we start with
a fresh cache on every fresh vmState.
* avoid unnecessary code hash comparisons
* avoid unnecessary copies when iterating
* use EMPTY_CODE_HASH throughout for code hash comparison
The current implementation cannot be used practically since it causes
several full reallocations of the whole free list per deletion - it
needs to be reimplemented, or the chain cannot practically progress
beyond ~2.5M blocks where a lot of removals happen.
Co-authored-by: tersec <tersec@users.noreply.github.com>
* fCU Ignore update to old head
* Only enable terminal PoW block conditions for fCUV1
* Typo fix: TDD -> TTD
* Add link to Shanghai fCUV2 specification
* Add link to Paris fCUV1 specification
* Add persistent last state stamp feature
why:
This allows to run `CoreDb` without journal
* Start `CoreDb` without journal
* Remove journal related functions from `CoredDb`
why:
Typo. That worked the wrong way in the unit tests because the
tests always store the last item separately with extended
logging.
Co-authored-by: jordan <jordan@dry.pudding>
This PR extends the `nimbus import` command to also allow reading from
era files - this command allows creating or topping up an existing
database with data coming from era files instead of network sync.
* add `--era1-dir` and `--max-blocks` options to command line
* make `persistBlocks` report basic stats like transactions and gas
* improve error reporting in several API
* allow importing multiple RLP files in one go
* clean up logging options to match nimbus-eth2
* make sure database is closed properly on shutdown
* Update TDD suite logger output format choices
why:
New format is not practical for TDD as it just dumps data across a wide
range (considerably larder than 80 columns.)
So the new format can be turned on by function argument.
* Update unit tests samples configuration
why:
Slightly changed the way to find the `era1` directory
* Remove compiler warnings (fix deprecated expressions and phrases)
* Update `Aristo` debugging tools
* Always update the `storageID` field of account leaf vertices
why:
Storage tries are weekly linked to an account leaf object in that
the `storageID` field is updated by the application.
Previously, `Aristo` verified that leaf objects make sense when passed
to the database. As a consequence
* the database was inconsistent for a short while
* the burden for correctness was all on the application which led
to delayed error handling which is hard to debug.
So `Aristo` will internally update the account leaf objects so that
there are no race conditions due to the storage trie handling
* Aristo: Let `stow()`/`persist()` bail out unless there is a `VertexID(1)`
why:
The journal and filter logic depends on the hash of the `VertexID(1)`
which is commonly known as the state root. This implies that all
changes to the database are somehow related to that.
* Make sure that a `Ledger` account does not overwrite the storage trie reference
why:
Due to the abstraction of a sub-trie (now referred to as column with a
hash describing its state) there was a weakness in the `Aristo` handler
where an account leaf could be overwritten though changing the validity
of the database. This has been changed and the database will now reject
such changes.
This patch fixes the behaviour on the application layer. In particular,
the column handle returned by the `CoreDb` needs to be updated by
the `Aristo` database state. This mitigates the problem that a storage
trie might have vanished or re-apperaed with a different vertex ID.
* Fix sub-trie deletion test
why:
Was originally hinged on `VertexID(1)` which cannot be wholesale
deleted anymore after the last Aristo update. Also, running with
`VertexID(2)` needs an artificial `VertexID(1)` for making `stow()`
or `persist()` work.
* Cosmetics
* Activate `test_generalstate_json`
* Temporarily `deactivate test_tracer_json`
* Fix copyright header
---------
Co-authored-by: jordan <jordan@dry.pudding>
Co-authored-by: Jacek Sieka <jacek@status.im>
* Implement engine_getClientVersionV1
* full git revision string
* Limit GitRevisionString to 8 chars
* Fixes
* Debug windows CI
* debug windows ci
* produce git revision using -C
* try not to delete .git folder in windows ci
* Harden GitRevision procuration
* Add double quotes to git -C param
* Escape sourcePath
* Remove double quotes from git -C param
* Remove crufty `pruneTrie` arguments
* Replaced legacy `distinct_trie` logic by new `ledger` functionality
why:
The module `distinct_trie` is supported by `Aristo` in trivial cases.
* Activate `test_op_memory`
* Workaround for disallowed transaction superseding
The transaction spammer from Kurtosis keeps spamming transactions with
the same nonce because report 'pending' account nocne based on 'latest'
rather than actually considering the mempool. To avoid errors from
rejecting those, disable the required gas price bump when replacing
a transaction with a new nonce.
* Lint
* Skip superseding negative test
* Workaround for 0 gas price estimation
The transaction spammer from Kurtosis estimates the gas price of its
transactions using 'eth_gasPrice'. Our implementation returns 0 when
no transactions have been executed yet, not taking into account the
EIP-1559 block base fee. Force a hardcoded minimum for now to unstuck.
* Adjust tests to cover new minimum gas fee
* Skip gas price test
This PR exploits structural properties of era files to simplify the
implementation and in particular remove the need to load all era file
indicies at startup which may be slow (due to archival storage residing
on slow drives)
* Attempt to roll back stateless mode implementation in a single PR
why:
+ Stateless mode is not fully working and in the way
+ Single PR should make it feasible to investigate for a possible
re-implementation
* Fix copyright year
* Fix annotation for exception (evmc mode)
* Update some docu & messages
* Remove cruft from the ledger modules
* Must not overwrite genesis data on an initialised database
why:
This will overwrite the global state of the Aristo single state DB.
Otherwise resuming at the last synced state becomes impossible.
* Provide latest block number from journal
why:
This relates the global state of the DB directly to the corresponding
block number.
* Implemented unit test providing DB pre-load and resume
why:
When deleting accounts while restoring the previous state, storage tries
must be deleted first. Otherwise a `DelDanglingStoTrie` error will occur
when trying to delete an account which refers to an active storage trie.
* Introduce wrapper type for EIP-4844 transactions
EIP-4844 blob sidecars are a concept that only exists in the mempool.
After inclusion of a transaction into an execution block, only the
versioned hash within the transaction remains. To improve type safety,
replace the `Transaction.networkPayload` member with a wrapper type
`PooledTransaction` that is used in contexts where blob sidecars exist.
* Bump nimbus-eth2 to 87605d08a7f9cfc3b223bd32143e93a6cdf351ac
* IPv6 'listen-address' in `nimbus_verified_proxy`
* Bump nim-libp2p to 21cbe3a91a70811522554e89e6a791172cebfef2
* Fix beacon_lc_bridge payload conversion and conf.listenAddress type
* Change nimbus_verified_proxy.asExecutionData param to SomeExecutionPayload
* Rerun nph to fix asExecutionData style format
* nimbus_verified_proxy listenAddress
* Use PooledTransaction in nimbus-eth1 tests
---------
Co-authored-by: jangko <jangko128@gmail.com>
The transaction spammer from Kurtosis requests the nonce value of its
account based on 'pending' tag. As we lack that implementation, go with
the next best answer from the 'latest' tag. This does not include info
about transactions from the mempool that have not yet been executed.
* Aristo: Generalise alien/guest interface for piggiback on database
* Aristo: Code cosmetics
* CoreDb+Kvt: Update transaction API
why:
Use single addressable function `forkTx(backLevel: int)` as used
in `Aristo`. So `Kvt` can be synced simultaneously to `Aristo`.
also:
Refactored `kvt_tx.nim` in a similar fashion to `Aristo`.
* Kvt: Replace `LayerDelta` object by reference
why:
Will be needed when introducing filters
* Kvt: Remodel backend filter facility similar to `Aristo`
why:
This allows to operate on several KVT instances simultaneously.
* CoreDb+Kvt: Fix on-disk storage
why:
Overlooked name change: `stow()` => `persist()` for permanent storage
* Fix copyright headers
* Aristo: Rename journal related sources and functions
why:
Previously, the naming was hinged on the phrases `fifo`, `filter` etc.
which reflect the inner workings of cascaded filters. This was
unfortunate for reading/understanding the source code for actions where
the focus is the journal as a whole.
* Aristo: Fix buffer overflow (path length truncating error)
* Aristo: Tighten `hikeUp()` stop check, update error code
why:
Detect dangling vertex links. These are legit with `snap` sync
processing but not with regular processing.
* Aristo: Raise assert in regular mode `merge()` at a dangling link/edge
why:
With `snap` sync processing, partial trees are ok and can be amended.
Not so in regular mode.
Previously there was only a debug message when a non-legit dangling edge
was encountered.
* Aristo: Make sure that vertices are copied before modification
why:
Otherwise vertices from lower layers might also be modified
* Aristo: Fix relaxed mode for validity checker `check()`
* Remove cruft
* Aristo: Update API for transaction handling
details:
+ Split `aristo_tx.nim` into sub-modules
+ Split `forkWith()` into `findTx()` + `forkTx()`
+ Removed `forkTop()`, `forkBase()` (now superseded by new `forkTx()`)
* CoreDb+Aristo: Fix initialiser (missing methods)
* Aristo: Allow to define/set `FilterID` for journal filter records
why:
After some changes, the `FilterID` is isomorphic to the `BlockNumber`
scalar (well, the first 2^64 entries of a `BlockNumber`.)
The needed change for `FilterID` is that the `FilterID(0)` value is
valid part of the `FilterID` scalar. A non-valid `FilterID` entry is
represented by `none(FilterID)`.
* Aristo: Split off function `persist()` as persistent version of `stow()`
why:
In production, `stow(persistent=false,..)` is currently unused. So,
using `persist()` rather than `stow(persistent=true,..)` improves
readability and is better to maintain.
* CoreDb+Aristo: Store block numbers in journal records
why:
This makes journal records searchable by block numbers
* Aristo: Rename some journal related functions
why:
The name *journal* is more appropriate to api functions than something
with *fifo* or *filter*.
* CoreDb+Aristo: Update last/oldest journal state retrieval
* CoreDb+Aristo: Register block number with state root in journal
why:
No need anymore for extra lookup table `stRootToBlockNum` which maps
a storage root -> block number.
* Aristo: Remove unused function `getFilUbe()` from api
* CoreDb: Remove now unused virtual table `stRootToBlockNum`
why:
Was used to map a state root to a block number. This functionality
is now embedded into the recovery journal backend.
* Turn of API tracking (will fail on `fluffy`)
* Aristo: Code cosmetics, e.g. update some CamelCase names
* CoreDb+Aristo: Provide oldest known state root implied
details:
The Aristo journal allows to recover earlier but not all state roots.
* Aristo: Fix journal backward index operator, e.g. `[^1]`
* Aristo: Fix journal updater
why:
The `fifosStore()` store function slightly misinterpreted the update
instructions when translation is to database `put()` functions. The
effect was that the journal was ever growing due to stale entries which
were never deleted.
* CoreDb+Aristo: Provide utils for purging stale data from the KVT
details:
See earlier patch, not all state roots are available. This patch
provides a mapping from some state root to a block number and allows to
remove all KVT data related to a particular block number
* Aristo+Kvt: Implement a clean up schedule for expired data in KVT
why:
For a single state ledger like `Aristo`, there is only a limited
backlog of states. So KVT data (i.e. headers etc.) are cleaned up
regularly
* Fix copyright year
* Aristo+Kvt: Better RocksDB profiling
why:
Providing more detailed information, mainly for `Aristo`
* Aristo: Renamed journal `stats()` to `capacity()`
why:
`Stats()` was a misnomer
* Aristo: Provide backend read caches for key and vertex IDs
why:
Dedicated LRU caching for particular types gives a throughput advantage.
The sizes of the LRU queues used for caching are currently constant
but might be adjusted at a later time.
* Fix copyright year
* Code cosmetics
* Aristo+Kvt: Fix api wrappers
why:
Api setup killed the backend descriptor when backend mapping was
disabled.
* Aristo: Implement masked profiling entries
why:
Database backend should be listed but not counted in tally
* CoreDb: Simplify backend() methods
why:
DBMS backend access Was provided very early and over engineered. Now
there are only two backend machines, one for `Kvt` and the other one
for an `Mpt` available only via new API.
* CoreDb: Code cleanup regarding descriptor types
* CoreDb: Refactor/redefine `persistent()` methods
why:
There were `persistent()` methods for any type of caching storage
facilities `Kvt`, `Mpt`, `Phk`, and `Acc`. Now there is only a single
`persistent()` method storing all facilities in tandem (similar to
how transactions work.)
For non shared `Kvt` tables, there is now an extra storage method
`saveOffSite()`.
* CoreDb lingo update: `trie` becomes `column`
why:
Notion of a `trie` is pretty much hidden by the new `CoreDb` api.
Revealed are sort of database columns for accounts an storage data,
any of which have an internal state represented by a Keccack hash.
So a `trie` or `MPT` becomes a `column` and a `rootHash` becomes a
column state.
* Aristo: rename backend filed `filters` => `journal`
* Update full sync logging
details:
+ Disable eth handler noise while syncing
+ Log journal depth (if available)
* Fix copyright year
* Fix cruft and unwanted imports
* Update README
* Nimbus-main: replaced `PruneMode` options by `ChainDbMode` options
details:
For the legacy database, this changes the phrase
- `conf.pruneMode == PruneMode.Full` to the expression
+ `conf.chainDbMode == ChainDbMode.Prune`.
* Fix issues moaned about by NIM compiler
* Fix copyright year
* Aristo+RocksDB: Update backend drivers
why:
RocksDB update allows use some of the newly provided methods which
were previously implemented by using the very C backend (for the lack
of NIM methods.)
* Aristo+RocksDB: Simplify drivers wrapper
* Kvt: Update backend drivers and wrappers similar to `Aristo`
* Aristo+Kvm: Use column families for RocksDB
* Aristo+MemoryDB: Code cosmetics
* Aristo: Provide guest column family for export
why:
So `Kvt` can piggyback on `Aristo` so there avoiding to run a second
DBMS system in parallel.
* Kvt: Provide import mechanism for RoksDB guest column family
why:
So `Kvt` can piggyback on `Aristo` so there avoiding to run a second
DBMS system in parallel.
* CoreDb+Aristo: Run persistent `Kvt` DB piggybacked on `Aristo`
why:
Avoiding to run two DBMS systems in parallel.
* Fix copyright year
* Ditto
* Suspend `snap` sync tests
why:
Snap needs to be updated. While this is not done, tests are obsolete
and eat up memory resources only
* Suspend rocks DB timing tests
why:
Currently of no use for general test suite.
* Mothballed unused evm code for opportunistic DB handling
why:
Needs to be refactored anyway. Uses hard coded protocol dependencies.
* Update `eth` protocol configuration
why:
+ new upcoming version `eth68`
+ prepare for eth-multi protocol stack support
* Strip the `legacy_` prefix from eth66 compiler flag variable
why:
Being it is an oddity having `eth67_enabled`, `eth68_enabled`, and
`legacy_eth_66_enabled`.
* Providing eth68 stubs
* Fix copyright year
* Fix eth68 stub method name
* Kvt: Update API hooks
* Aristo: Generalised merging snap proofs, now for multiple state roots
why:
This accommodates pre-loading partial tries for unit tests
* Aristo: Update some unit tests
* CoreDb+Aristo: Re-factor tracer
why:
Was bonkers anyway. The main change is that the trace journal is now
kept in a way similar to a transaction layer so that it can predictably
interact with DB transactions.
* Ledger: Debugging helper
* Update tracer unit test applicable for `Aristo`
* Fix copyright year
* Disable `dump()` function as compile time default
why:
This needs to pull in the `rocks_db` library at compile time.
* Remove cruft
* Docu/code cosmetics
* Aristo: Update `forkBase()`
why:
Was not up to the job
* Update/correct tracer for running against `Aristo`
details:
This patch makes sure that before creating a new `BaseVMState` the
`CoreDb` context is adjusted to accommodate for the state root that
is passed to the `BaseVMState` constructor.
* CpreDb+legacy: Always return current context with `ctxFromTx()`
why:
There was an experimental setting trying to find the node with the
proper setting in the KVT (not the hexary tie layer) which currently
does not work reliable, probably due to `Ledger` caching effects.
* Aristo: Provide descriptor fork based on search in transaction stack
details:
Try to find the tx that has a particular pair `(vertex-id,hash-key)`,
and by extension try filter and backend if the former fails.
* Cleanup & docu
* CoreDb+Aristo: Implement context re-position to earlier in-memory state
why:
It is a easy way to explore how there can be concurrent access to the
same backend storage DB with different view states. This one can access
an earlier state from the transaction stack.
* CoreDb+Aristo: Populate tracer stubs with real functionality
* Update `tracer.nim` to new API
why:
Legacy API does not sufficiently support `Aristo`
* Fix logging problems in tracer
details:
Debug logging turned off by default
* Fix function prototypes
* Add Copyright header
* Add tables import
why:
For older compiler versions on CI
* CoreDb+Aristo: Fix handler code
* Aristo+Kvt: Remove cruft
* Aristo+Kvt: The function `forkTop()` always provides a single transaction
why:
Previously it provided a single squashed tx only if there were any. Now
it will provide a blind one if there were none.
* Fix Copyright header