When running the import, currently blocks are loaded in batches into a
`seq` then passed to the importer as such.
In reality, blocks are still processed one by one, so the batching does
not offer any performance advantage. It does however require that the
client wastes memory, up to several GB, on the block sequence while
they're waiting to be processed.
This PR introduces a persister that accepts these potentially large
blocks one by one and at the same time removes a number of redundant /
unnecessary copies, assignments and resets that were slowing down the
import process in general.
* prefer the spec-derived name where possible
* don't pass stateRoot to LedgerRef and friends (it doesn't do anything)
* add deprecation warning in graphql - it needs updating to use
forkedchain instead
* partial commit
* fixes
* remove converters too
* revert changes on nimbus_verified_proxy
* revert changes in converter
* revert changes(re-xport) in rpc_types
* update copyright year
* replace types in other binaries
* chain config bug
* fix rebase conflict imcomplete buffer
* fix more rebase buffers
* remove ditto types and converters
* fix the tests
* update copyright year
This is a minimal set of changes to make things work with the new types
in nim-eth - this is the minimal PR that merely resolves
incompatibilities while the full change set would include more cleanup
and migration.
Saving both memory and processing, we can move entries from one
savepoint to another, specially when the target is empty as it often is
during transaction processing
* Extract sub-tree deletion functions into separate sub-modules
* Move/rename `aristo_desc.accLruSize` => `aristo_constants.ACC_LRU_SIZE`
* Lazily delete sub-trees
why:
This gives some control of the memory used to keep the deleted vertices
in the cached layers. For larger sub-trees, keys and vertices might be
on the persistent backend to a large extend. This would pull an amount
of extra information from the backend into the cached layer.
For lazy deleting it is enough to remember sub-trees by a small set of
(at most 16) sub-roots to be processed when storing persistent data.
Marking the tree root deleted immediately allows to let most of the code
base work as before.
* Comments and cosmetics
* No need to import all for `Aristo` here
* Kludge to make `chronicle` usage in sub-modules work with `fluffy`
why:
That `fluffy` would not run with any logging in `core_deb` is a problem
I have known for a while. Up to now, logging was only used for debugging.
With the current `Aristo` PR, there are cases where logging might be
wanted but this works only if `chronicles` runs without the
`json[dynamic]` sinks.
So this should be re-visited.
* More of a kludge
* remove some redundant EH
* avoid pessimising move (introduces a copy in this case!)
* shift less data around when reading era files (reduces stack usage)
* Imported/rebase from `no-ext`, PR #2485
Store extension nodes together with the branch
Extension nodes must be followed by a branch - as such, it makes sense
to store the two together both in the database and in memory:
* fewer reads, writes and updates to traverse the tree
* simpler logic for maintaining the node structure
* less space used, both memory and storage, because there are fewer
nodes overall
There is also a downside: hashes can no longer be cached for an
extension - instead, only the extension+branch hash can be cached - this
seems like a fine tradeoff since computing it should be fast.
TODO: fix commented code
* Fix merge functions and `toNode()`
* Update `merkleSignCommit()` prototype
why:
Result is always a 32bit hash
* Update short Merkle hash key generation
details:
Ethereum reference MPTs use Keccak hashes as node links if the size of
an RLP encoded node is at least 32 bytes. Otherwise, the RLP encoded
node value is used as a pseudo node link (rather than a hash.) This is
specified in the yellow paper, appendix D.
Different to the `Aristo` implementation, the reference MPT would not
store such a node on the key-value database. Rather the RLP encoded node value is stored instead of a node link in a parent node
is stored as a node link on the parent database.
Only for the root hash, the top level node is always referred to by the
hash.
* Fix/update `Extension` sections
why:
Were commented out after removal of a dedicated `Extension` type which
left the system disfunctional.
* Clean up unused error codes
* Update unit tests
* Update docu
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
It is common for many accounts to share the same code - at the database
level, code is stored by hash meaning only one copy exists per unique
program but when loaded in memory, a copy is made for each account.
Further, every time we execute the code, it must be scanned for invalid
jump destinations which slows down EVM exeuction.
Finally, the extcodesize call causes code to be loaded even if only the
size is needed.
This PR improves on all these points by introducing a shared
CodeBytesRef type whose code section is immutable and that can be shared
between accounts. Further, a dedicated `len` API call is added so that
the EXTCODESIZE opcode can operate without polluting the GC and code
cache, for cases where only the size is requested - rocksdb will in this
case cache the code itself in the row cache meaning that lookup of the
code itself remains fast when length is asked for first.
With 16k code entries, there's a 90% hit rate which goes up to 99%
during the 2.3M attack - the cache significantly lowers memory
consumption and execution time not only during this event but across the
board.
* Bump nim-eth, nim-web3, nimbus-eth2
- Replace std.Option with results.Opt
- Fields name changes
* More fixes
* Fix Portal stream async raises and portal testnet Opt usage
* Bump eth + nimbus-eth2 + more fixes related to eth_types changes
* Fix in utp test app and nimbus-eth2 bump
* Fix test_blockchain_json rebase conflict
* Fix EVMC block_timestamp conversion plus commentary
---------
Co-authored-by: kdeme <kim.demey@gmail.com>
It's unused and causing trouble because of unhandled exception effects -
if we were to use it, it would need re-implementation such that it
doesn't reallocate the whole queue on writing.
* Update TDD suite logger output format choices
why:
New format is not practical for TDD as it just dumps data across a wide
range (considerably larder than 80 columns.)
So the new format can be turned on by function argument.
* Update unit tests samples configuration
why:
Slightly changed the way to find the `era1` directory
* Remove compiler warnings (fix deprecated expressions and phrases)
* Update `Aristo` debugging tools
* Always update the `storageID` field of account leaf vertices
why:
Storage tries are weekly linked to an account leaf object in that
the `storageID` field is updated by the application.
Previously, `Aristo` verified that leaf objects make sense when passed
to the database. As a consequence
* the database was inconsistent for a short while
* the burden for correctness was all on the application which led
to delayed error handling which is hard to debug.
So `Aristo` will internally update the account leaf objects so that
there are no race conditions due to the storage trie handling
* Aristo: Let `stow()`/`persist()` bail out unless there is a `VertexID(1)`
why:
The journal and filter logic depends on the hash of the `VertexID(1)`
which is commonly known as the state root. This implies that all
changes to the database are somehow related to that.
* Make sure that a `Ledger` account does not overwrite the storage trie reference
why:
Due to the abstraction of a sub-trie (now referred to as column with a
hash describing its state) there was a weakness in the `Aristo` handler
where an account leaf could be overwritten though changing the validity
of the database. This has been changed and the database will now reject
such changes.
This patch fixes the behaviour on the application layer. In particular,
the column handle returned by the `CoreDb` needs to be updated by
the `Aristo` database state. This mitigates the problem that a storage
trie might have vanished or re-apperaed with a different vertex ID.
* Fix sub-trie deletion test
why:
Was originally hinged on `VertexID(1)` which cannot be wholesale
deleted anymore after the last Aristo update. Also, running with
`VertexID(2)` needs an artificial `VertexID(1)` for making `stow()`
or `persist()` work.
* Cosmetics
* Activate `test_generalstate_json`
* Temporarily `deactivate test_tracer_json`
* Fix copyright header
---------
Co-authored-by: jordan <jordan@dry.pudding>
Co-authored-by: Jacek Sieka <jacek@status.im>
* Introduce wrapper type for EIP-4844 transactions
EIP-4844 blob sidecars are a concept that only exists in the mempool.
After inclusion of a transaction into an execution block, only the
versioned hash within the transaction remains. To improve type safety,
replace the `Transaction.networkPayload` member with a wrapper type
`PooledTransaction` that is used in contexts where blob sidecars exist.
* Bump nimbus-eth2 to 87605d08a7f9cfc3b223bd32143e93a6cdf351ac
* IPv6 'listen-address' in `nimbus_verified_proxy`
* Bump nim-libp2p to 21cbe3a91a70811522554e89e6a791172cebfef2
* Fix beacon_lc_bridge payload conversion and conf.listenAddress type
* Change nimbus_verified_proxy.asExecutionData param to SomeExecutionPayload
* Rerun nph to fix asExecutionData style format
* nimbus_verified_proxy listenAddress
* Use PooledTransaction in nimbus-eth1 tests
---------
Co-authored-by: jangko <jangko128@gmail.com>
* Activate `LedgerRef` wrapper for `AccountsCache`
details:
`accounts_cache.nim` methods are indirectly processed by the wrapper
methods from `ledger.nim`.
This works for all sources except `test_state_db.nim` where the source
`accounts_cache.nim` is included (rather than imported) in order to
access objects privy to the very source.
* Provide facility to switch to a preselected `LedgerRef` type
details:
Can be set as suggestion when initialising `CommonRef`
* Update `CoreDb` test suite for better time tracking
details:
+ Allow time logging by pre-defined block intervals
+ Print `CoreDb`/`Ledger`profiling results (if enabled)
* Aristo: Provide key-value list signature calculator
detail:
Simple wrappers around `Aristo` core functionality
* Update new API for `CoreDb`
details:
+ Renamed new API functions `contains()` => `hasKey()` or `hasPath()`
which disables the `in` operator on non-boolean `contains()` functions
+ The functions `get()` and `fetch()` always return a not-found error if
there is no item, available. The new functions `getOrEmpty()` and
`mergeOrEmpty()` return an an empty `Blob` if there is no such key
found.
* Rewrite `core_apps.nim` using new API from `CoreDb`
* Use `Aristo` functionality for calculating Merkle signatures
details:
For debugging, the `VerifyAristoForMerkleRootCalc` can be set so
that `Aristo` results will be verified against the legacy versions.
* Provide general interface for Merkle signing key-value tables
details:
Export `Aristo` wrappers
* Activate `CoreDb` tests
why:
Now, API seems to be stable enough for general tests.
* Update `toHex()` usage
why:
Byteutils' `toHex()` is superior to `toSeq.mapIt(it.toHex(2)).join`
* Split `aristo_transcode` => `aristo_serialise` + `aristo_blobify`
why:
+ Different modules for different purposes
+ `aristo_serialise`: RLP encoding/decoding
+ `aristo_blobify`: Aristo database encoding/decoding
* Compacted representation of small nodes' links instead of Keccak hashes
why:
Ethereum MPTs use Keccak hashes as node links if the size of an RLP
encoded node is at least 32 bytes. Otherwise, the RLP encoded node
value is used as a pseudo node link (rather than a hash.) Such a node
is nor stored on key-value database. Rather the RLP encoded node value
is stored instead of a lode link in a parent node instead. Only for
the root hash, the top level node is always referred to by the hash.
This feature needed an abstraction of the `HashKey` object which is now
either a hash or a blob of length at most 31 bytes. This leaves two
ways of representing an empty/void `HashKey` type, either as an empty
blob of zero length, or the hash of an empty blob.
* Update `CoreDb` interface (mainly reducing logger noise)
* Fix copyright years (to make `Lint` happy)
* Kvt: Implemented multi-descriptor access on the same backend
why:
This behaviour mirrors the one of Aristo and can be used for
simultaneous transactions on Aristo + Kvt
* Kvt: Update database iterators
why:
Forgot to run on the top layer first
* Kvt: Misc fixes
* Aristo, use `openArray[byte]` rather than `Blob` in prototype
* Aristo, by default hashify right after cloning descriptor
why:
Typically, a completed descriptor is expected after cloning. Hashing
can be suppressed by argument flag.
* Aristo provides `replicate()` iterator, similar to legacy `replicate()`
* Aristo API fixes and updates
* CoreDB: Rename `legacy_persistent` => `legacy_rocksdb`
why:
More systematic, will be in line with Aristo DB which might have
more than one persistent backends
* CoreDB: Prettify API sources
why:
Better to read and maintain
details:
Annotating with custom pragmas which cleans up the prototypes
* CoreDB: Update MPT/put() prototype allowing `CatchableError`
why:
Will be needed for Aristo API (legacy is OK with `RlpError`)