Introduce a new `StoData` payload type similar to `AccountData`
* slightly more efficient storage format
* typed api
* fewer seqs
* fix encoding docs - it wasn't rlp after all :)
The state and account MPT:s currenty share key space in the database
based on that vertex id:s are assigned essentially randomly, which means
that when two adjacent slot values from the same contract are accessed,
they might reside at large distance from each other.
Here, we prefix each vertex id by its root causing them to be sorted
together thus bringing all data belonging to a particular contract
closer together - the same effect also happens for the main state MPT
whose nodes now end up clustered together more tightly.
In the future, the prefix given to the storage keys can also be used to
perform range operations such as reading all the storage at once and/or
deleting an account with a batch operation.
Notably, parts of the API already supported this rooting concept while
parts didn't - this PR makes the API consistent by always working with a
root+vid.
* rebased from `github/on-demand-mpt`
ackn:
wip: on-demand mpt construction
Given that actual data is stored in the `Vertex` structure, it's useful
to think of the MPT as a cache for computing roots rather than being a
functional requirement on its own.
This PR engenders this line of thinking by incrementally computing the
MPT only when it's needed, ie when a state (or similar) root is needed.
This has the effect of siginficantly reducing memory usage as well as
improving performance:
* no need for dirty-mpt-node book-keeping
* no need to build complex forest of upcoming hashing work
* only hashes that are functionally needed are ever computed -
intermediate nodes whose MTP root is not observed are never computed /
processed
* Unit test hot fixes
* Unit test hot fixes cont.
(somehow lost that part)
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
* Tighten `CoreDb` API for accounts
why:
Apart from cruft, the way to fetch the accounts state root via a
`CoreDbColRef` record was unnecessarily complicated.
* Extend `CoreDb` API for accounts to cover storage tries
why:
In future, this will make the notion of column objects obsolete. Storage
trees will then be indexed by the account address rather than the vertex
ID equivalent like a `CoreDbColRef`.
* Apply new/extended accounts API to ledger and tests
details:
This makes the `distinct_ledger` module obsolete
* Remove column object constructors
why:
They were needed as an abstraction of MPT sub-trees including storage
trees. Now, storage trees are handled by the account (e.g. via address)
they belong to and all other trees can be identified by a constant well
known vertex ID. So there is no need for column objects anymore.
Still there are some left-over column object methods wnich will be
removed next.
* Remove `serialise()` and `PayloadRef` from default Aristo API
why:
Not needed. `PayloadRef` was used for unstructured/unknown payload
formats (account or blob) and `serialise()` was used for decodng
`PayloadRef`. Now it is known in advance what the payload looks
like.
* Added query function `hasStorageData()` whether a storage area exists
why:
Useful for supporting `slotStateEmpty()` of the `CoreDb` API
* In the `Ledger` replace `storage.stateEmpty()` by `slotStateEmpty()`
* On Aristo, hide the storage root/vertex ID in the `PayloadRef`
why:
The storage vertex ID is fully controlled by Aristo while the
`AristoAccount` object is controlled by the application. With the
storage root part of the `AristoAccount` object, there was a useless
administrative burden to keep that storage root field up to date.
* Remove cruft, update comments etc.
* Update changed MPT access paradigms
why:
Fixes verified proxy tests
* Fluffy cosmetics
This buffer eleminates a large part of allocations during MPT traversal,
reducing overall memory usage and GC pressure.
Ideally, we would use it throughout in the API instead of
`openArray[byte]` since the built-in length limit appropriately exposes
the natural 64-nibble depth constraint that `openArray` fails to
capture.
* Remove unused `merge*()` functions (for production)
details:
Some functionality moved to test suite
* Make sure that only `AccountData` leaf type is exactly used on VertexID(1)
* clean up payload type
* Provide dedicated functions for merging accounts and storage trees
why:
Storage trees are always linked to an account, so there is no need
for an application to fiddle about (e.e. creating, re-cycling) with
storage tree vertex IDs.
* CoreDb: Disable tracer functionality
why:
Must be updated to accommodate new/changed `Aristo` functions.
* CoreDb: Use new `mergeXXX()` functions
why:
Makes explicit vertex ID management obsolete for creating new
storage trees.
* Remove `mergePayload()` and other cruft from API, `aristo_merge`, etc.
* clean up merge functions
details:
The merge implementation `mergePayloadImpl()` does not need to be super
generic anymore as all the edge cases are covered by the specialised
functions `mergeAccountPayload()`, `mergeGenericData()`, and
`mergeStorageData()`.
* No tracer available at the moment, so disable offending tests
* Fix debug noise in `hashify()` for perfectly normal situation
why:
Was previously considered a fixable error
* Fix test sample file names
why:
The larger test file `goerli68161.txt.gz` is already in the local
archive. So there is no need to use the smaller one from the external
repo.
* Activate `accounts_cache` module from `db/ledger`
why:
A copy of the original `accounts_cache.nim` source to be integrated
into the `Ledger` module wrapper which allows to switch between
different `accounts_cache` implementations unser tha same API.
details:
At a later state, the `db/accounts_cache.nim` wrapper will be
removed so that there is only one access to that module via
`db/ledger/accounts_cache.nim`.
* Fix copyright headers in source code
* Aristo: Provide key-value list signature calculator
detail:
Simple wrappers around `Aristo` core functionality
* Update new API for `CoreDb`
details:
+ Renamed new API functions `contains()` => `hasKey()` or `hasPath()`
which disables the `in` operator on non-boolean `contains()` functions
+ The functions `get()` and `fetch()` always return a not-found error if
there is no item, available. The new functions `getOrEmpty()` and
`mergeOrEmpty()` return an an empty `Blob` if there is no such key
found.
* Rewrite `core_apps.nim` using new API from `CoreDb`
* Use `Aristo` functionality for calculating Merkle signatures
details:
For debugging, the `VerifyAristoForMerkleRootCalc` can be set so
that `Aristo` results will be verified against the legacy versions.
* Provide general interface for Merkle signing key-value tables
details:
Export `Aristo` wrappers
* Activate `CoreDb` tests
why:
Now, API seems to be stable enough for general tests.
* Update `toHex()` usage
why:
Byteutils' `toHex()` is superior to `toSeq.mapIt(it.toHex(2)).join`
* Split `aristo_transcode` => `aristo_serialise` + `aristo_blobify`
why:
+ Different modules for different purposes
+ `aristo_serialise`: RLP encoding/decoding
+ `aristo_blobify`: Aristo database encoding/decoding
* Compacted representation of small nodes' links instead of Keccak hashes
why:
Ethereum MPTs use Keccak hashes as node links if the size of an RLP
encoded node is at least 32 bytes. Otherwise, the RLP encoded node
value is used as a pseudo node link (rather than a hash.) Such a node
is nor stored on key-value database. Rather the RLP encoded node value
is stored instead of a lode link in a parent node instead. Only for
the root hash, the top level node is always referred to by the hash.
This feature needed an abstraction of the `HashKey` object which is now
either a hash or a blob of length at most 31 bytes. This leaves two
ways of representing an empty/void `HashKey` type, either as an empty
blob of zero length, or the hash of an empty blob.
* Update `CoreDb` interface (mainly reducing logger noise)
* Fix copyright years (to make `Lint` happy)