detail:
For practical reasons, ifsuch an account is asked for a slot, an empty
proof list is returned. It is up to the user to provide an account
proof that shows that there is no storage tree.
* Reverse order in staged blob lists
why:
having the largest block number with the least header list index `0`
makes it easier to grow the list with parent headers, i.e. decreasing
block numbers.
* Set a header response threshold when to ditch peer
* Refactor extension of staged header chains record
why:
Was cobbled together as a proof of concept after several approaches of
how to run the download.
* TODO update
* Make debugging code independent of `release` flag
* Update import from jacek
* Block header download starting at Beacon down to Era1
details:
The header download implementation is intended to be completed to a
full sync facility.
Downloaded block headers are stored in a `CoreDb` table. Later on they
should be fetched, complemented by a block body, executed/imported,
and deleted from the table.
The Era1 repository may be partial or missing. Era1 headers are neither
downloaded nor stored on the `CoreDb` table.
Headers are downloaded top down (largest block number first) using the
hash of the block header by one peer. Other peers fetch headers
opportunistically using block numbers
Observed download times for 14m `MainNet` headers varies between 30min
and 1h (Era1 size truncated to 66m blocks.), full download 52min
(anectdotal.) The number of peers downloading concurrently is crucial
here.
* Activate `flare` by command line option
* Fix copyright year
Saving both memory and processing, we can move entries from one
savepoint to another, specially when the target is empty as it often is
during transaction processing
* replace rocksdb row cache with larger rdb lru caches - these serve the
same purpose but are more efficient because they skips serialization,
locking and rocksdb layering
* don't append fresh items to cache - this has the effect of evicting
the existing items and replacing them with low-value entries that might
never be read - during write-heavy periods of processing, the
newly-added entries were evicted during the store loop
* allow tuning rdb lru size at runtime
* add (hidden) option to print lru stats at exit (replacing the
compile-time flag)
pre:
```
INF 2024-09-03 15:07:01.136+02:00 Imported blocks
blockNumber=20012001 blocks=12000 importedSlot=9216851 txs=1837042
mgas=181911.265 bps=11.675 tps=1870.397 mgps=176.819 avgBps=10.288
avgTps=1574.889 avgMGps=155.952 elapsed=19m26s458ms
```
post:
```
INF 2024-09-03 13:54:26.730+02:00 Imported blocks
blockNumber=20012001 blocks=12000 importedSlot=9216851 txs=1837042
mgas=181911.265 bps=11.637 tps=1864.384 mgps=176.250 avgBps=11.202
avgTps=1714.920 avgMGps=169.818 elapsed=17m51s211ms
```
9%:ish import perf improvement on similar mem usage :)
* Cosmetics, spelling, etc.
* Aristo: make sure that a save cycle always commits even when empty
why:
If `Kvt` is tied to the `Aristo` DB save cycle, then this save cycle
must also be committed if there is no data to save for `Aristo`.
Otherwise this will lead to excessive core memory use with some fringe
condition where Eth headers (or blocks) are downloaded while syncing
and not really stored on disk.
* CoreDb: Correct persistent save mode
why:
Saving `Kvt` first is seen as a harbinger (or canary) for `Aristo` as
both run in sync. If `Kvt` succeeds saving first, so must be `Aristo`
next. Other than this is a defect.
* Wiring ForkedChainRef to other components
- Disable majority of hive simulators
- Only enable pyspec_sim for the moment
- The pyspec_sim is using a smaller RPC service wired to ForkedChainRef
- The RPC service will gradually grow
* Addressing PR review
* Fix test_beacon/setup_env
* Enable consensus_sim (#2441)
* Enable consensus_sim
* Remove isFile check
* Enable Engine API jwt auth tests and exchange cap tests
* Enable engine api in build_sim.sh
* Wire ForkedChainRef to Engine API newPayload
* Wire Engine API getBodies to ForkedChainRef
* Wire Engine API api_forkchoice to ForkedChainRef
* Wire more RPC methods to ForkedChainRef
* Implement eth_syncing
* Implement eth_call and eth_getlogs
* TxPool: simplify smartHead
* Fix smartHead usage
* Fix txpool headDiff
* Remove hasBlockHeader and use headerExists
* Addressing review
This is a first step towards measuring the efficiency of the LRU caches
over time - metrics can be collected during import or when running
regulary.
Since `nim-metrics` carries some overhead for its default way of
reporting metrics, this PR implements a custom collector over atomic
counters, given that this is one of the hottest spots in the block
processing pipeline.
Using a compile-time flag, the same metrics can be printed on exit which
is useful when comparing different strategies for caching - here's a
recent run over blocks 16000001-1616384 - this is a good candidate to
expose in a better way in the future, maybe:
```
state vtype miss hit total hitrate
Account Leaf 4909417 4466215 9375632 47.64%
Account Branch 20742574 72015123 92757697 77.64%
World Leaf 940483 1140946 2081429 54.82%
World Branch 8224151 131496580 139720731 94.11%
all all 34816625 209118864 243935489 85.73%
```
* pre-allocate `blobify` data and remove redundant error handling
(cannot fail on correct data)
* use threadvar for temporary storage when decoding rdb, avoiding
closure env
* speed up database walkers by avoiding many temporaries
~5% perf improvement on block import, 100x on database iteration (useful
for building analysis tooling)
Tested up to block ~14m, zstd uses ~12% less space which seems to result
in a small:ish (2-4%) performance improvement on block import speed -
this seems like a better baseline for more extensive testing in the
future.
Pre: 57383308 kb
Post: 50831236 kb
* bump metrics
* Remove cruft
* Cosmetics, update some logging, noise control
* Renamed `CoreDb` function `hasKey` => `hasKeyRc` and provided `hasKey`
why:
Currently, `hasKey` returns a `Result[]` rather than a `bool` which
is what one would expect from a function prototype of this name.
This was a bit of an annoyance and cost unnecessary attention.
In the current VM opcode dispatcher, a two-level case statement is
generated that first matches the opcode and then uses another nested
case statement to select the actual implementation based on which fork
it is, causing the dispatcher to grow by `O(opcodes) * O(forks)`.
The fork does not change between instructions causing significant
inefficiency for this approach - not only because it repeats the fork
lookup but also because of code size bloat and missed optimizations.
A second source of inefficiency in dispatching is the tracer code which
in the vast majority of cases is disabled but nevertheless sees multiple
conditionals being evaluated for each instruction only to remain
disabled throughout exeuction.
This PR rewrites the opcode dispatcher macro to generate a separate
dispatcher for each fork and tracer setting and goes on to pick the
right one at the start of the computation.
This has many advantages:
* much smaller dispatcher
* easier to compile
* better inlining
* fewer pointlessly repeated instruction
* simplified macro (!)
* slow "low-compiler-memory" dispatcher code can be removed
Net block import improvement at about 4-6% depending on the contract -
synthetic EVM benchmnarks would show an even better result most likely.
Because EthBlock is quite large, the stack usage that results from the
multiple copies (temporary and not) present in the import command is
larger than it should be - this PR moves some of that data to a closure
environment allocated once per EthBlock - a larger restructuring of the
code is due but in the meantime, this simple change speeds up garbage
collection a little bit.
* Remove redundant `eth/68` message and clean up docu
details:
There is only eth/68 available at the moment
* Allow to turn on chronicles line number logging in `Makefile`
* Accept (and forget) tx hashes announcements
why:
Does no harm to just ignore it at the moment
* Bump nim-eth (rlp fix)
When the stack has an empty layer on top, there's no need to copy the
contents of `top` to it since it would be the same.
~13% processing saved (!)
pre
```
INF 2024-08-17 19:11:31.748+02:00 Imported blocks
blockNumber=18667648 blocks=12000 importedSlot=7860043 txs=1797812
mgas=181135.177 bps=8.763 tps=1375.062 mgps=132.125 avgBps=6.798
avgTps=1018.501 avgMGps=102.617 elapsed=29m25s154ms
```
post
```
INF 2024-08-17 18:22:52.513+02:00 Imported blocks
blockNumber=18667648 blocks=12000 importedSlot=7860043 txs=1797812
mgas=181135.177 bps=9.648 tps=1513.961 mgps=145.472 avgBps=7.876
avgTps=1179.998 avgMGps=118.888 elapsed=25m23s572ms
```
The reverse slot hash mechanism causes quite a bit of database traffic
but is broadly not useful except for iterating the storage of an
account, something that a validator never does (it's used by the
tracers).
This flag adds one more thing that is not stored in the database, to be
explored more comprehensively when designing full, validator and archive
modes with different pruning options in the future.
`ldb` says this is 60gb of data (!):
```
ldb --db=. --ignore_unknown_options --column_family=KvtGen approxsize
--hex --from=0x05
--to=0x05ffffffffffffffffffffffffffffffffffffffffffffff
66488353954
```
* Move snap un-dumpers to aristo unit test folder
why:
The only place where it is used, now to test the database against
legacy snap sync dump samples.
While the details of the dumped data have mostly outlived their purpuse,
its use as **entropy** data thrown against `Aristo` has still been
useful to find/debug tricky DB problems.
* Remove cruft
* `nimbus-eth1-blobs` not used anymore as test data source
* Cleaning up, removing cruft and debugging statements
* Make `aristo_delta` fluffy compatible
why:
A sub-module that uses `chronicles` must import all possible
modules used by a parent module that imports the sub-module.
* update TODO
* Extract sub-tree deletion functions into separate sub-modules
* Move/rename `aristo_desc.accLruSize` => `aristo_constants.ACC_LRU_SIZE`
* Lazily delete sub-trees
why:
This gives some control of the memory used to keep the deleted vertices
in the cached layers. For larger sub-trees, keys and vertices might be
on the persistent backend to a large extend. This would pull an amount
of extra information from the backend into the cached layer.
For lazy deleting it is enough to remember sub-trees by a small set of
(at most 16) sub-roots to be processed when storing persistent data.
Marking the tree root deleted immediately allows to let most of the code
base work as before.
* Comments and cosmetics
* No need to import all for `Aristo` here
* Kludge to make `chronicle` usage in sub-modules work with `fluffy`
why:
That `fluffy` would not run with any logging in `core_deb` is a problem
I have known for a while. Up to now, logging was only used for debugging.
With the current `Aristo` PR, there are cases where logging might be
wanted but this works only if `chronicles` runs without the
`json[dynamic]` sinks.
So this should be re-visited.
* More of a kludge
why:
It is not safe in general to recycle vertex IDs while the `RocksDb`
cache has `VertexID` rather than `RootedVertexID` where the former
type seems preferable.
In some fringe cases one might remove a vertex with key `(root1,vid)`
and insert another vertex with key `(root2,vid)` while re-using the
vertex ID `vid`. Without knowledge of `root1` and `root2`, the LRU
cache will return the same vertex for `(root2,vid)` also for
`(root1,vid)`.
Based on some simple testing done with a few combinations of cache
sizes, it seems that the block cache has grown in importance compared to
the where we were before changing on-disk format and adding a lot of
other point caches.
With these settings, there's roughly a 15% performance increase when
processing blocks in the 18M range over the status quo while memory
usage decreases by more than 1gb!
Only a few values were tested so there's certainly more to do here but
this change sets up a better baseline for any future optimizations.
In particular, since the initial defaults were chosen root vertex id:s
were introduced as key prefixes meaning that storage for each account
will be grouped together and thus it becomes more likely that a block
loaded from disk will be hit multiple times - this seems to give the
block cache an edge over the row cache, specially when traversing the
storage trie.
* Provide portal proof functions in `aristo_api`
why:
So it can be fully supported by `CoreDb`
* Fix prototype in `kvt_api`
* Fix node constructor for account leafs with storage trees
* Provide simple path check based on portal proof functionality
* Provide portal proof functionality in `CoreDb`
* Update TODO list
* Extracted `test_tx.testTxMergeProofAndKvpList()` => separate file
* Fix serialiser
why:
Typo lead to duplicate rlp-encoded nodes in chain
* Remove cruft
* Implemnt portal proof nodes generators `partXxxTwig()`
* Add unit test for portal proof nodes generator `partAccountTwig()`
* Cosmetics
* Simplify serialiser return code format
* Fix proof generator for extension nodes
why:
Code was simply bonkers, not detected before the unit tests were
adapted to check for just this.
* Implemented portal proof nodes verifier `partUntwig()`
* Cosmetics
* Fix `testutp` cli poblem
* Implement partial trees
why:
This is currently needed for unit tests to pre-load the database
with test data similar to `proof` node pre-load.
The basic features for `snap-sync` boundary proofs are available
as well for future use. What is missing is the final proof verification
and a complete storage data load/merge function (stub is available.)
* Cosmetics, clean up
* remove some redundant EH
* avoid pessimising move (introduces a copy in this case!)
* shift less data around when reading era files (reduces stack usage)
* Update config for Ledger and CoreDb
why:
Prepare for tracer which depends on the API jump table (as well as
the profiler.) The API jump table is now enabled in unit/integration
test mode piggybacking on the `unittest2DisableParamFiltering`
compiler flag or on an extra compiler flag `dbjapi_enabled`.
* No deed for error field in `NodeRef`
why:
Was opnly needed by proof nodes pre-loader which will be re-implemented
* Cosmetics
* Aristo: Merge `delta_siblings` module into `deltaPersistent()`
* Aristo: Add `isEmpty()` for canonical checking whether a layer is empty
* Aristo: Merge `LayerDeltaRef` into `LayerObj`
why:
No need to maintain nested object refs anymore. Previously the
`LayerDeltaRef` object had a companion `LayerFinalRef` which held
non-delta layer information.
* Kvt: Merge `LayerDeltaRef` into `LayerRef`
why:
No need to maintain nested object refs (as with `Aristo`)
* Kvt: Re-write balancer logic similar to `Aristo`
why:
Although `Kvt` was a cheap copy of `Aristo` it sort of got out of
sync and the balancer code was wrong.
* Update iterator over forked peers
why:
Yield additional field `isLast` indicating that the last iteration
cycle was approached.
* Optimise balancer calculation.
why:
One can often avoid providing a new object containing the merge of two
layers for the balancer. This avoids copying tables. In some cases this
is replaced by `hasKey()` look ups though. One uses one of the two
to combine and merges the other into the first.
Of course, this needs some checks for making sure that none of the
components to merge is eventually shared with something else.
* Fix copyright year
When lazily verifying state roots, we may end up with an entire state
without roots that gets computed for the whole database - in the current
design, that would result in hashes for the entire trie being held in
memory.
Since the hash depends only on the data in the vertex, we can store it
directly at the top-most level derived from the verticies it depends on
- be that memory or database - this makes the memory usage broadly
linear with respect to the already-existing in-memory change set stored
in the layers.
It also ensures that if we have multiple forks in memory, hashes get
cached in the correct layer maximising reuse between forks.
The same layer numbering scheme as elsewhere is reused, where -2 is the
backend, -1 is the balancer, then 0+ is the top of the stack and stack.
A downside of this approach is that we create many small batches - a
future improvement could be to collect all such writes in a single
batch, though the memory profile of this approach should be examined
first (where is the batch kept, exactly?).
* Remove `chunkedMpt` from `persistent()`/`stow()` function
why:
Proof-mode code was removed with PR #2445 and needs to be re-designed.
* Remove unused `beStateRoot` argument from `deltaMerge()`
* Update/drastically simplify `txStow()`
why:
Got rid of many boundary conditions
details:
Many pre-conditions have changed. In particular, previous versions
used the account state (hash) which was conveniently available and
checked it against the backend in order to find out whether there
was something to do, at all. Currently, only an empty set of all
tables in the delta layer has the balancer update ignored.
Notable changes are:
* no check against account state (see above)
* balancer filters have no hash signature (some legacy stuff left over
from journals)
* no (shap sync) proof data which made the generation of the a top layer
more complex
* Cosmetics, cruft removal
* Update unit test file & function name
why:
Was legacy module
Our need is only a baseline tx pool gasLimit calculator.
If need we can expand it in the future.
But for now, a simple but understandable tx pool is more important.
* Remove cruft left-over from PR #2494
* TODO
* Update comments on `HashKey` type values
* Remove obsolete hash key conversion flag `forceRoot`
why:
Is treated implicitly by having vertex keys as `HashKey` type and
root vertex states converted to `Hash256`
* Use block number or timestamp to determine fork rules
Avoid confusion raised by `forkGTE` usage where block informations are present.
* Get rid of forkGTE
* Imported/rebase from `no-ext`, PR #2485
Store extension nodes together with the branch
Extension nodes must be followed by a branch - as such, it makes sense
to store the two together both in the database and in memory:
* fewer reads, writes and updates to traverse the tree
* simpler logic for maintaining the node structure
* less space used, both memory and storage, because there are fewer
nodes overall
There is also a downside: hashes can no longer be cached for an
extension - instead, only the extension+branch hash can be cached - this
seems like a fine tradeoff since computing it should be fast.
TODO: fix commented code
* Fix merge functions and `toNode()`
* Update `merkleSignCommit()` prototype
why:
Result is always a 32bit hash
* Update short Merkle hash key generation
details:
Ethereum reference MPTs use Keccak hashes as node links if the size of
an RLP encoded node is at least 32 bytes. Otherwise, the RLP encoded
node value is used as a pseudo node link (rather than a hash.) This is
specified in the yellow paper, appendix D.
Different to the `Aristo` implementation, the reference MPT would not
store such a node on the key-value database. Rather the RLP encoded node value is stored instead of a node link in a parent node
is stored as a node link on the parent database.
Only for the root hash, the top level node is always referred to by the
hash.
* Fix/update `Extension` sections
why:
Were commented out after removal of a dedicated `Extension` type which
left the system disfunctional.
* Clean up unused error codes
* Update unit tests
* Update docu
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
This PR adds a storage hike cache similar to the account hike cache
already present - this cache is less efficient because account storage
is already partically cached in the account ledger but nonetheless helps
keep hiking down.
Notably, there's an opportunity to optimise this cache and the others so
that they cooperate better insteado of overlapping, which is left for a
future PR.
This PR also fixes an O(N) memory usage for storage slots where the
delete would keep the full storage in a work list which on mainnet can
grow very large - the work list is replaced with a more conventional
recursive `O(log N)` approach.
The Vertex type unifies branches, extensions and leaves into a single
memory area where the larges member is the branch (128 bytes + overhead) -
the payloads we have are all smaller than 128 thus wrapping them in an
extra layer of `ref` is wasteful from a memory usage perspective.
Further, the ref:s must be visited during the M&S phase of garbage
collection - since we keep millions of these, many of them
short-lived, this takes up significant CPU time.
```
Function CPU Time: Total CPU Time: Self Module Function (Full) Source File Start Address
system::markStackAndRegisters 10.0% 4.922s nimbus system::markStackAndRegisters(var<system::GcHeap>).constprop.0 gc.nim 0x701230`
```
* Extract `CoreDb` constructor helpers from `base.nim` into separate module
why:
This makes it easier to avoid circular imports.
* Extract `Ledger` constructor helpers from `base.nim` into separate module
why:
Move `accounts_ledger.nim` file to sub-folder `backend`. That way the
layout resembles that of the `core_db`.
* Updates and corrections
* Extract `CoreDb` configuration from `base.nim` into separate module
why:
This makes it easier to avoid circular imports, in particular
when the capture journal (aka tracer) is revived.
* Extract `Ledger` configuration from `base.nim` into separate module
why:
This makes it easier to avoid circular imports (if any.)
also:
Move `accounts_ledger.nim` file to sub-folder `backend`. That way the
layout resembles that of the `core_db`.
hike allocations (and the garbage collection maintenance that follows)
are responsible for some 10% of cpu time (not wall time!) at this point
- this PR avoids them by stepping through the layers one step at a time,
simplifying the code at the same time.
* Rename `newKvt()` -> `ctx.getKvt()`
why:
Clean up legacy shortcut. Also, the `KVT` returned is not instantiated
but refers to the shared `KVT` that resides in a context which is a
generalisation of an in-memory database fork. The function `ctx`
retrieves the default context.
* Rename `newTransaction()` -> `ctx.newTransaction()`
why:
Clean up legacy shortcut. The transaction is applied to a context as a
generalisation of an in-memory database fork. The function `ctx`
retrieves the default context.
* Rename `getColumn(CtGeneric)` -> `getGeneric()`
why:
No more a list of well known sub-tries needed, a single one is enough.
In fact, `getColumn()` did only support a single sub-tree by now.
* Reduce TODO list
This trivial bump should improve performance a bit without costing too
much memory - as the trie grows, so does the number of levels in it and
creating hikes becomes ever more expensive - hopefully this cache
increase should give a nice little boost even if it's not a lot.
Avoid writing the same slot/hash values to the hash->slot mapping
to avoid spamming the rocksdb WAL and cause unnecessary compaction
In the same vein, avoid writing trivially detectable A-B-A storage
changes which happen with surprising frequency.
Introduce a new `StoData` payload type similar to `AccountData`
* slightly more efficient storage format
* typed api
* fewer seqs
* fix encoding docs - it wasn't rlp after all :)
This significantly speeds up block import at the cost of less protection
against invalid data, potentially resulting in an invalid database
getting stored.
The risk is small given that import is used only for validated data -
evaluating the right level of of validation vs performance is left for a
future PR.
A side effect of this approach is that there is no cached stated root in
the database - computing it currently requires a lot of memory since the
intermediate roots get cached in memory in full while the computation is
ongoing - a future PR will need to address this deficiency, for example
by streaming the already-computed hashes directly to the database.
The state and account MPT:s currenty share key space in the database
based on that vertex id:s are assigned essentially randomly, which means
that when two adjacent slot values from the same contract are accessed,
they might reside at large distance from each other.
Here, we prefix each vertex id by its root causing them to be sorted
together thus bringing all data belonging to a particular contract
closer together - the same effect also happens for the main state MPT
whose nodes now end up clustered together more tightly.
In the future, the prefix given to the storage keys can also be used to
perform range operations such as reading all the storage at once and/or
deleting an account with a batch operation.
Notably, parts of the API already supported this rooting concept while
parts didn't - this PR makes the API consistent by always working with a
root+vid.
The storage id is frequently accessed when executing contract code and
finding the path via the database requires several hops making the
process slow - here, we add a cache to keep the most recently used
account storage id:s in memory.
A possible future improvement would be to cache all account accesses so
that for example updating the balance doesn't cause several hikes.
* CoreDb: Merged all sub-descriptors into `base_desc` module
* Dissolve `aristo_db/common_desc.nim`
* No need to export `Aristo` methods in `CoreDb`
* Resolve/tighten methods in `aristo_db` sub-moduled
why:
So they can be straihgt implemented into the `base` module
* Moved/re-implemented `KVT` methods into `base` module
* Moved/re-implemented `MPT` methods into `base` module
* Moved/re-implemented account methods into `base` module
* Moved/re-implemented `CTX` methods into `base` module
* Moved/re-implemented `handler_{aristo,kvt}` into `aristo_db` module
* Moved/re-implemented `TX` methods into `base` module
* Moved/re-implemented base methods into `base` module
* Replaced `toAristoSavedStateBlockNumber()` by proper base method
why:
Was the last for keeping reason for keeping low level backend access
methods
* Remove dedicated low level access to `Aristo` backend
why:
Not needed anymore, for debugging the descriptors can be accessed
directly
also:
some clean up stuff
* Re-factor `CoreDb` descriptor layout and adjust base methods
* Moved/re-implemented iterators into `base_iterator*` modules
* Update docu
These representations use ~15-20% less data compared to the status quo,
mainly by removing redundant zeroes in the integer encodings - a
significant effect of this change is that the various rocksdb caches see
better efficiency since more items fit in the same amount of space.
* use RLP encoding for `VertexID` and `UInt256` wherever it appears
* pack `VertexRef`/`PayloadRef` more tightly
* avoid costly hike memory allocations for operations that don't need to
re-traverse it
* avoid unnecessary state checks (which might trigger unwanted state
root computations)
* disable optimize-for-hits due to the MPT no longer being complete at
all times
* Update some docu
* Resolve obsolete compile time option
why:
Not optional anymore
* Update checks
why:
The notion of what constitutes a valid `Aristo` db has changed due to
(even more) lazy calculating Merkle hash keys.
* Disable redundant unit test for production
* Use simpler schema when writing transactions, receipts, and withdrawals
Using MPT not only slow but also take up more spaces than needed.
Aristo will remove older tries and only keep the last block tries.
Using simpler schema will avoid those problems.
* Rename getTransaction to getTransactionByIndex
* Remove `dirty` set from structural objects
why:
Not used anymore, the tree is dirty by default.
* Rename `aristo_hashify` -> `aristo_compute`
* Remove cruft, update comments, cosmetics, etc.
* Simplify `SavedState` object
why:
The key chaining have become obsolete after extra lazy hashing. There
is some available space for a state hash to be maintained in future.
details:
Accept the legacy `SavedState` object serialisation format for a
while (which will be overwritten by new format.)
* rebased from `github/on-demand-mpt`
ackn:
wip: on-demand mpt construction
Given that actual data is stored in the `Vertex` structure, it's useful
to think of the MPT as a cache for computing roots rather than being a
functional requirement on its own.
This PR engenders this line of thinking by incrementally computing the
MPT only when it's needed, ie when a state (or similar) root is needed.
This has the effect of siginficantly reducing memory usage as well as
improving performance:
* no need for dirty-mpt-node book-keeping
* no need to build complex forest of upcoming hashing work
* only hashes that are functionally needed are ever computed -
intermediate nodes whose MTP root is not observed are never computed /
processed
* Unit test hot fixes
* Unit test hot fixes cont.
(somehow lost that part)
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
* Normalised storage tree addressing in function prototypes
detail:
Argument list is always `<db> <account-path> <slot-path> ..` with
both path arguments as `openArray[]`
* Remove cruft
* CoreDb internally Use full account paths rather than addresses
* Update API logging
* Use hashed account address only in prototypes
why:
This avoids unnecessary repeated hashing of the same account address.
The burden of doing that is upon the application. In the case here,
the ledger caches all kinds of stuff anyway so it is common sense to
exploit that for account address hashes.
caveat:
Using `openArray[byte]` argument types for hashed accounts is inherently
fragile. In non-release mode, a length verification `doAssert` is
enabled by default.
* No accPath in data record (use `AristoAccount` as `CoreDbAccount`)
* Remove now unused `eAddr` field from ledger `AccountRef` type
why:
Is duplicate of lookup key
* Avoid merging the account record/statement in the ledger twice.
* Tighten `CoreDb` API for accounts
why:
Apart from cruft, the way to fetch the accounts state root via a
`CoreDbColRef` record was unnecessarily complicated.
* Extend `CoreDb` API for accounts to cover storage tries
why:
In future, this will make the notion of column objects obsolete. Storage
trees will then be indexed by the account address rather than the vertex
ID equivalent like a `CoreDbColRef`.
* Apply new/extended accounts API to ledger and tests
details:
This makes the `distinct_ledger` module obsolete
* Remove column object constructors
why:
They were needed as an abstraction of MPT sub-trees including storage
trees. Now, storage trees are handled by the account (e.g. via address)
they belong to and all other trees can be identified by a constant well
known vertex ID. So there is no need for column objects anymore.
Still there are some left-over column object methods wnich will be
removed next.
* Remove `serialise()` and `PayloadRef` from default Aristo API
why:
Not needed. `PayloadRef` was used for unstructured/unknown payload
formats (account or blob) and `serialise()` was used for decodng
`PayloadRef`. Now it is known in advance what the payload looks
like.
* Added query function `hasStorageData()` whether a storage area exists
why:
Useful for supporting `slotStateEmpty()` of the `CoreDb` API
* In the `Ledger` replace `storage.stateEmpty()` by `slotStateEmpty()`
* On Aristo, hide the storage root/vertex ID in the `PayloadRef`
why:
The storage vertex ID is fully controlled by Aristo while the
`AristoAccount` object is controlled by the application. With the
storage root part of the `AristoAccount` object, there was a useless
administrative burden to keep that storage root field up to date.
* Remove cruft, update comments etc.
* Update changed MPT access paradigms
why:
Fixes verified proxy tests
* Fluffy cosmetics
* Simplify txpool baseFeeGet
- Avoid using toEVMFork because we are not in EVM
- Rename `isLondon` to `isLondonOrLater`
* Remove timestamp from isLondonOrLater
* ForkedChain implementation
- revamp test_blockchain_json using ForkedChain
- re-enable previously failing test cases.
* Remove excess error handling
* Avoid reloading parent header
* Do not force base update
* Write baggage to database
* Add findActiveChain to finalizedSegment
* Create new stagingTx in addBlock
* Check last stateRoot existence in test_blockchain_json
* Resolve rebase conflict
* More precise nomenclature for block import cursor
* Ensure bad block nor imported and good block not rejected
* finalizeSegment become forkChoice and align with engine API forkChoice spec
* Display reason when good block rejected
* Fix comments
* Put BaseDistance into CalculateNewBase equation
* Separate finalizedHash from baseHash
* Add more doAssert constraint
* Add push raises: []
* creating a seq from a table that holds lots of changes means copying
all data into the table - this can be several GB of data while syncing
blocks
* nim fails to optimize the moving of the `WidthFirstForest` - the real
solution is to not construct a `wff` to begin with, but this PR provides
relief while that is being worked on
This spike fix allows us to bump the rocksdb cache by another 2 GB and
still have a significantly lower peak memory usage during sync.
When processing long ranges of blocks, the account cache grows unbounded
which cause huge memory spikes.
Here, we move the cache to a second-level cache after each block - the
second-level cache is cleared on the next block after that which creates
a simple LRU effect.
There's a small performance cost of course, though overall the freed-up
memory can now be reassigned to the rocksdb row cache which not only
makes up for the loss but overall leads to a performance increase.
The bump to 2gb of rocksdb row cache here needs more testing but is
slightly less and loosely basedy on the savings from this PR and the
circular ref fix in #2408 - another way to phrase this is that it's
better to give rocksdb more breathing room than let the memory sit
unused until circular ref collection happens ;)
An instance of `CoreDbMptRef` is created for and stored in every account
- when we are processing blocks and have many accounts in memory, this
closure environment takes up hundreds of mb of memory (around block 5M,
it is the 4:th largest memory consumer!) - incidentally, this also
removes a circular reference in the setup that causes the
`AristoCodeDbMptRef` to linger in memory much longer than it
has to which is the core reason why it takes so much.
The real solution here is to remove the methods indirection entirely,
but this PR provides relief until that has been done.
Similar treatment is given to some of the other core api functions to
avoid circulars there too.
This buffer eleminates a large part of allocations during MPT traversal,
reducing overall memory usage and GC pressure.
Ideally, we would use it throughout in the API instead of
`openArray[byte]` since the built-in length limit appropriately exposes
the natural 64-nibble depth constraint that `openArray` fails to
capture.
It is common for many accounts to share the same code - at the database
level, code is stored by hash meaning only one copy exists per unique
program but when loaded in memory, a copy is made for each account.
Further, every time we execute the code, it must be scanned for invalid
jump destinations which slows down EVM exeuction.
Finally, the extcodesize call causes code to be loaded even if only the
size is needed.
This PR improves on all these points by introducing a shared
CodeBytesRef type whose code section is immutable and that can be shared
between accounts. Further, a dedicated `len` API call is added so that
the EXTCODESIZE opcode can operate without polluting the GC and code
cache, for cases where only the size is requested - rocksdb will in this
case cache the code itself in the row cache meaning that lookup of the
code itself remains fast when length is asked for first.
With 16k code entries, there's a 90% hit rate which goes up to 99%
during the 2.3M attack - the cache significantly lowers memory
consumption and execution time not only during this event but across the
board.
* CoreDb: remove PHK tries
why:
There is no general use anymore for an MPT with a pre-hashed key. It
was used to resemble the `SecureHexaryTrie` logic from the legacy DB.
The only pace where this is needed is the `Leger` which uses a
a distinct MPT version anyway (see `distinct_ledgers.nim`.)
* Rename `CoreDx*` -> `CoreDb*`
why:
The naming `CoreDx*` was used to differentiate the new CoreDb API from
the legacy API which had descriptors named `CoreDb*`.
* Provide dedicated functions for fetching accounts and storage trees
why:
Different prototypes for each class `account`, `generic` and
`storage`.
* Remove `fetchPayload()` and other cruft from API, `aristo_fetch`, etc.
* Fix typos, debugging left overs, comments
For the block cache to be shared between column families, the options
instance must be shared between the various column families being
created. This also ensures that there is only one source of truth for
configuration options instead of having two different sets depending on
how the tables were initialized.
This PR also removes the re-opening mechanism which can double startup
time - every time the database is opened, the log is replayed - a large
log file will take a long time to open.
Finally, several options got correclty implemented as column family
options, including an one that puts a hash index in the SST files.
* Provide dedicated functions for deleteing accounts and storage trees
why:
Storage trees are always linked to an account, so there is no need
for an application to fiddle about (e.g. re-cycling, unlinking)
storage tree vertex IDs.
* Remove `delete()` and other cruft from API, `aristo_delete`, etc.
* clean up delete functions
details:
The delete implementations `deleteImpl()` and `delTreeImpl()` do not
need to be super generic anymore as all the edge cases are covered by
the specialised `deleteAccountPayload()`, `deleteGenericData()`, etc.
* Avoid unnecessary re-calculations of account keys
why:
The function `registerAccountForUpdate()` did extract the storage ID
(if any) and automatically marked the Merkle keys along the account
path for re-hashing.
This would also apply if there was later detected that the account
or the storage tree did not need to be updated.
So the `registerAccountForUpdate()` function was split into a part
which retrieved the storage ID, and another one which marked the
Merkle keys for re-calculation to be applied only when needed.
* Remove unused `merge*()` functions (for production)
details:
Some functionality moved to test suite
* Make sure that only `AccountData` leaf type is exactly used on VertexID(1)
* clean up payload type
* Provide dedicated functions for merging accounts and storage trees
why:
Storage trees are always linked to an account, so there is no need
for an application to fiddle about (e.e. creating, re-cycling) with
storage tree vertex IDs.
* CoreDb: Disable tracer functionality
why:
Must be updated to accommodate new/changed `Aristo` functions.
* CoreDb: Use new `mergeXXX()` functions
why:
Makes explicit vertex ID management obsolete for creating new
storage trees.
* Remove `mergePayload()` and other cruft from API, `aristo_merge`, etc.
* clean up merge functions
details:
The merge implementation `mergePayloadImpl()` does not need to be super
generic anymore as all the edge cases are covered by the specialised
functions `mergeAccountPayload()`, `mergeGenericData()`, and
`mergeStorageData()`.
* No tracer available at the moment, so disable offending tests
The state root computation here is one of the major hotspots in block
processing - in the cases the code only needs to know if it's empty or
not, it can be done a lot faster.
Adding a separate function for this looks fragile and should probably be
revisited.
Broadly, when importing blocks we don't need a transaction / frame per
block because we can simply abort the whole update and try again with a
smaller range if we find a faulty block.
Of course, this applies mainly to semi-trusted blocks where we're not
expected to fail in applying them - this could be blocks either from
files or header-verified blocks as given by consensus.
* Remove AccountStateDB
AccountStateDB should no longer be used.
It's usage have been reduce to read only operations.
Replace it with LedgerRef to reduce maintenance burden.
* remove extra spaces
Co-authored-by: tersec <tersec@users.noreply.github.com>
---------
Co-authored-by: tersec <tersec@users.noreply.github.com>
When performing block import, we can batch state root verifications and
header checks, doing them only once per chunk of blocks, assuming that
the other blocks in the batch are valid by extension.
When we're not generating receipts, we can also skip per-transaction
state root computation pre-byzantium, which is what provides a ~20%
speedup in this PR, at least on those early blocks :)
We also stop storing transactions, receipts and uncles redundantly when
importing from era1 - there is no need to waste database storage on this
when we can load it from the era1 file (eventually).
State lookups potentially trigger expensive re-hashings - this is the
first of several steps to remove the unnecessary ones from the general
flow of block processing
* avoid re-reading parent block header from database when it's already
in memory
* Fix initialiser
why:
Possible crash (app profiling, tracer etc.)
* Update column family options processing
why:
Same for kvt as for aristo
* Move `AristoDbDualRocks` backend type to the test suite
why:
So it is not available for production
* Fix typos in API jump table
why:
Used for tracing and app profiling only. Needed some update
* Purged CoreDb legacy API
why:
Not needed anymore, was transitionary and disabled.
* Rename `flush` argument to `eradicate` in a DB close context
why:
The word `eradicate` leaves no doubt what is meant
* Rename `stoFlush()` -> `stoDelete()`
* Rename `core_apps_newapi` -> `core_apps` (not so new anymore)
* Bump nim-eth, nim-web3, nimbus-eth2
- Replace std.Option with results.Opt
- Fields name changes
* More fixes
* Fix Portal stream async raises and portal testnet Opt usage
* Bump eth + nimbus-eth2 + more fixes related to eth_types changes
* Fix in utp test app and nimbus-eth2 bump
* Fix test_blockchain_json rebase conflict
* Fix EVMC block_timestamp conversion plus commentary
---------
Co-authored-by: kdeme <kim.demey@gmail.com>
* bump rockdb
* Rename `KVT` objects related to filters according to `Aristo` naming
details:
filter* => delta*
roFilter => balancer
* Compulsory error handling if `persistent()` fails
* Add return code to `reCentre()`
why:
Might eventually fail if re-centring is blocked. Some logic will be
added in subsequent patch sets.
* Add column families from earlier session to rocksdb in opening procedure
why:
All previously used CFs must be declared when re-opening an existing
database.
* Update `init()` and add rocksdb `reinit()` methods for changing parameters
why:
Opening a set column families (with different open options) must span
at least the ones that are already on disk.
* Provide write-trigger-event interface into `Aristo` backend
why:
This allows to save data from a guest application (think `KVT`) to
get synced with the write cycle so the guest and `Aristo` save all
atomically.
* Use `KVT` with new column family interface from `Aristo`
* Remove obsolete guest interface
* Implement `KVT` piggyback on `Aristo` backend
* CoreDb: Add separate `KVT`/`Aristo` backend mode for debugging
* Remove `rocks_db` import from `persist()` function
why:
Some systems (i.p `fluffy` and friends) use the `Aristo` memory
backend emulation and do not link against rocksdb when building the
application. So this should fix that problem.
These options, inspired by Nethermind and general internet wisdom, bring
the database size down to 2/3 without affecting throughput. In theory,
they should also bring down memory usage and/or make more efficient use
of whatever memory is already assigned to rocksdb but this needs
verification in a longer test at synced-mainnet sizes.
In the meantime, they make testing easier by removing some noise that
the profiler says are bad, such as excessive SkipList access (countered
by bloom filters).
* Use RocksDb column families instead of a prefixed single column
why:
Better performance
* Use structural objects `VertexRef` and `HashKey` in LRU cache for RocksDb
why:
Avoids repeated de/serialisation
`initTable` is obsolete since nim 0.19 and can introduce significant
memory overhead while providing no benefit (since the table will be
grown to the default initial size on first use anyway).
In particular, aristo layers will not necessarily use all tables they
initialize, for exampe when many empty accounts are being created.
This PR consolidates the split header-body sequences into a single EthBlock
sequence and cleans up the fallout from that which significantly reduces
block processing overhead during import thanks to less garbage collection
and fewer copies of things all around.
Notably, since the number of headers must always match the number of bodies,
we also get rid of a pointless degree of freedom that in the future could
introduce unnecessary bugs.
* only read header and body from era file
* avoid several unnecessary copies along the block processing way
* simplify signatures, cleaning up unused arguemnts and returns
* use `stew/assign2` in a few strategic places where the generated
nim assignent is slow and add a few `move` to work around poor
analysis in nim 1.6 (will need to be revisited for 2.0)
```
stats-20240607_2223-a814aa0b.csv vs stats-20240608_0714-21c1d0a9.csv
bps_x bps_y tps_x tps_y bpsd tpsd timed
block_number
(498305, 713245] 1,540.52 1,809.73 2,361.58 2775.340189 17.63% 17.63% -14.92%
(713245, 928185] 730.36 865.26 1,715.90 2028.973852 18.01% 18.01% -15.21%
(928185, 1143126] 663.03 789.10 2,529.26 3032.490771 19.79% 19.79% -16.28%
(1143126, 1358066] 393.46 508.05 2,152.50 2777.578119 29.13% 29.13% -22.50%
(1358066, 1573007] 370.88 440.72 2,351.31 2791.896052 18.81% 18.81% -15.80%
(1573007, 1787947] 283.65 335.11 2,068.93 2441.373402 17.60% 17.60% -14.91%
(1787947, 2002888] 287.29 342.11 2,078.39 2474.179448 18.99% 18.99% -15.91%
(2002888, 2217828] 293.38 343.16 2,208.83 2584.77457 17.16% 17.16% -14.61%
(2217828, 2432769] 140.09 167.86 1,081.87 1296.336926 18.82% 18.82% -15.80%
blocks: 1934464, baseline: 3h13m1s, contender: 2h43m47s
bpsd (mean): 19.55%
tpsd (mean): 19.55%
Time (total): -29m13s, -15.14%
```
* Cleanup unneeded stateless and block witness code. Keeping MultiKeys which is used in the eth_getProofsByBlockNumber RPC endpoint which is needed for the Fluffy state network bridge.
* Rename generateWitness flag to collectWitnessData to better describe what the flag does. We only collect the keys of the touched accounts and storage slots but no block witness generation is supported for now.
* Move remaining stateless code into nimbus directory.
* Add vmstate parameter to ChainRef to fix test.
* Exclude *.in from check copyright year
---------
Co-authored-by: jangko <jangko128@gmail.com>
* Code cosmetics
* Re-org `aristo_merge`, internally split into sub-modules
why:
Became a burden for maintenance because it hosts two different
functionalities under the same merge paradigm: account/data merge
and snap proof merge where the latter produces a partial trie.
* Fix CoreDb tracer
* Ledger: fix potential account vs. storage tree sync problems
* Remove bound on the size of removable whole storage trees
* Activate `test_tracer_json`
* Remove exception from evm memory
* Remove exception from gas meter
* Remove exception from stack
* Remove exception from precompiles
* Remove exception from gas_costs
* Remove exception from op handlers
* Remove exception from op dispatcher
* Remove exception from call_evm
* Remove exception from EVM
* Fix tools and tests
* Remove exception from EVMC
* fix evmc
* Fix evmc
* Remove remnants of async evm stuff
* Remove superflous error handling
* Proc to func
* Fix errors detected by CI
* Fix EVM op call stack usage
* REmove exception handling from getVmState
* Better error message instead of just doAssert
* Remove unused validation
* Remove superflous catchRaise
* Use results.expect instead of unsafeValue
This new option saves a CSV to disk while performing `import` such that
the performance of one import can be compared with the other.
This early version is likely to change in the future
It's unused and causing trouble because of unhandled exception effects -
if we were to use it, it would need re-implementation such that it
doesn't reallocate the whole queue on writing.
* CoreDb: Remove crufty second/off-site KVT
why:
Was used to allow late `Clique` to store directly to disk
* CoreDb: Remove prune flag related functionality
why:
Is completely legacy stuff
* CoreDb: Remove dependence on legacy API (tests unsupported yet)
why:
Does not fully support Aristo
* Re-factoring `state_db` using new API
details:
Only minimum changes needed to compile `nimbus`
* Update tests and aux modules
* Turn off legacy API and remove `distinct_tries`
comment:
The legacy API has now cruft status, will be removed soon
* Fix copyright years
* Update rpc for verified proxy
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
* Fix `blobify()` for `SavedState` object
why:
Have to treat varying sizes for `HashKey`, i.p. for an empty key which
has zero size.
* Store correct block number in `SavedState` record
why:
Stored `block-number - 1` for some obscure reason.
* Cosmetcs, docu
These options are there mainly to drive experiments, and are therefore
hidden.
One thing that this PR brings in is an initial set of caches and buffers for rocksdb - the set that I've been using during various performance tests to get to a viable baseline performance level.
The `rocksdb` version shipped with distributions is typically old and
therefore often lacks features we use - it also doesn't match the one
assumed by nim-rocksdb leading to ABI mismatch risks.
Instead of depending on the system rocksdb, we'll now use the rocksdb
version assumed by nim-rocksdb and locked in its vendor folder by always
building it together with nimbus.
This avoids the problem of unknown rocksdb versions at a (small) cost to
build time.
CI caching and full windows support for building from source [remains
TODO](https://github.com/status-im/nim-rocksdb/issues/44).
* Remove all journal related stuff
* Refactor function names journal*() => delta*(), filter*() => delta*()
* remove `trg` fileld from `FilterRef`
why:
Same as `kMap[$1]`
* Re-type FilterRef.src as `HashKey`
why:
So it is directly comparable to `kMap[$1]`
* Moved `vGen[]` field from `LayerFinalRef` to `LayerDeltaRef`
why:
Then a separate `FilterRef` type is not needed, anymore
* Rename `roFilter` field in `AristoDbRef` => `balancer`
why:
New name more appropriate.
* Replace `FilterRef` by `LayerDeltaRef` type
why:
This allows to avoid copying into the `balancer` (see next patch set)
most of the time. Typically, only one instance is running on the backend
and the `balancer` is only used as a stage before saving data.
* Refactor way how to store data persistently
why:
Avoid useless copy when staging `top` layer for persistently saving to
backend.
* Fix copyright header?
`persist` is a hotspot when processing blocks because it is run at least
once per transaction and loops over the entire account cache every time.
Here, we introduce an extra `dirty` map that keeps track of all accounts
that need checking during `persist` which fixes the immediate
inefficiency, though probably this could benefit from a more thorough
review - we also get rid of the unused clearCache flag - we start with
a fresh cache on every fresh vmState.
* avoid unnecessary code hash comparisons
* avoid unnecessary copies when iterating
* use EMPTY_CODE_HASH throughout for code hash comparison
The current implementation cannot be used practically since it causes
several full reallocations of the whole free list per deletion - it
needs to be reimplemented, or the chain cannot practically progress
beyond ~2.5M blocks where a lot of removals happen.
Co-authored-by: tersec <tersec@users.noreply.github.com>
* fCU Ignore update to old head
* Only enable terminal PoW block conditions for fCUV1
* Typo fix: TDD -> TTD
* Add link to Shanghai fCUV2 specification
* Add link to Paris fCUV1 specification
* Add persistent last state stamp feature
why:
This allows to run `CoreDb` without journal
* Start `CoreDb` without journal
* Remove journal related functions from `CoredDb`
why:
Typo. That worked the wrong way in the unit tests because the
tests always store the last item separately with extended
logging.
Co-authored-by: jordan <jordan@dry.pudding>
This PR extends the `nimbus import` command to also allow reading from
era files - this command allows creating or topping up an existing
database with data coming from era files instead of network sync.
* add `--era1-dir` and `--max-blocks` options to command line
* make `persistBlocks` report basic stats like transactions and gas
* improve error reporting in several API
* allow importing multiple RLP files in one go
* clean up logging options to match nimbus-eth2
* make sure database is closed properly on shutdown
* Update TDD suite logger output format choices
why:
New format is not practical for TDD as it just dumps data across a wide
range (considerably larder than 80 columns.)
So the new format can be turned on by function argument.
* Update unit tests samples configuration
why:
Slightly changed the way to find the `era1` directory
* Remove compiler warnings (fix deprecated expressions and phrases)
* Update `Aristo` debugging tools
* Always update the `storageID` field of account leaf vertices
why:
Storage tries are weekly linked to an account leaf object in that
the `storageID` field is updated by the application.
Previously, `Aristo` verified that leaf objects make sense when passed
to the database. As a consequence
* the database was inconsistent for a short while
* the burden for correctness was all on the application which led
to delayed error handling which is hard to debug.
So `Aristo` will internally update the account leaf objects so that
there are no race conditions due to the storage trie handling
* Aristo: Let `stow()`/`persist()` bail out unless there is a `VertexID(1)`
why:
The journal and filter logic depends on the hash of the `VertexID(1)`
which is commonly known as the state root. This implies that all
changes to the database are somehow related to that.
* Make sure that a `Ledger` account does not overwrite the storage trie reference
why:
Due to the abstraction of a sub-trie (now referred to as column with a
hash describing its state) there was a weakness in the `Aristo` handler
where an account leaf could be overwritten though changing the validity
of the database. This has been changed and the database will now reject
such changes.
This patch fixes the behaviour on the application layer. In particular,
the column handle returned by the `CoreDb` needs to be updated by
the `Aristo` database state. This mitigates the problem that a storage
trie might have vanished or re-apperaed with a different vertex ID.
* Fix sub-trie deletion test
why:
Was originally hinged on `VertexID(1)` which cannot be wholesale
deleted anymore after the last Aristo update. Also, running with
`VertexID(2)` needs an artificial `VertexID(1)` for making `stow()`
or `persist()` work.
* Cosmetics
* Activate `test_generalstate_json`
* Temporarily `deactivate test_tracer_json`
* Fix copyright header
---------
Co-authored-by: jordan <jordan@dry.pudding>
Co-authored-by: Jacek Sieka <jacek@status.im>