* Update TDD suite logger output format choices
why:
New format is not practical for TDD as it just dumps data across a wide
range (considerably larder than 80 columns.)
So the new format can be turned on by function argument.
* Update unit tests samples configuration
why:
Slightly changed the way to find the `era1` directory
* Remove compiler warnings (fix deprecated expressions and phrases)
* Update `Aristo` debugging tools
* Always update the `storageID` field of account leaf vertices
why:
Storage tries are weekly linked to an account leaf object in that
the `storageID` field is updated by the application.
Previously, `Aristo` verified that leaf objects make sense when passed
to the database. As a consequence
* the database was inconsistent for a short while
* the burden for correctness was all on the application which led
to delayed error handling which is hard to debug.
So `Aristo` will internally update the account leaf objects so that
there are no race conditions due to the storage trie handling
* Aristo: Let `stow()`/`persist()` bail out unless there is a `VertexID(1)`
why:
The journal and filter logic depends on the hash of the `VertexID(1)`
which is commonly known as the state root. This implies that all
changes to the database are somehow related to that.
* Make sure that a `Ledger` account does not overwrite the storage trie reference
why:
Due to the abstraction of a sub-trie (now referred to as column with a
hash describing its state) there was a weakness in the `Aristo` handler
where an account leaf could be overwritten though changing the validity
of the database. This has been changed and the database will now reject
such changes.
This patch fixes the behaviour on the application layer. In particular,
the column handle returned by the `CoreDb` needs to be updated by
the `Aristo` database state. This mitigates the problem that a storage
trie might have vanished or re-apperaed with a different vertex ID.
* Fix sub-trie deletion test
why:
Was originally hinged on `VertexID(1)` which cannot be wholesale
deleted anymore after the last Aristo update. Also, running with
`VertexID(2)` needs an artificial `VertexID(1)` for making `stow()`
or `persist()` work.
* Cosmetics
* Activate `test_generalstate_json`
* Temporarily `deactivate test_tracer_json`
* Fix copyright header
---------
Co-authored-by: jordan <jordan@dry.pudding>
Co-authored-by: Jacek Sieka <jacek@status.im>
* Aristo: Rename journal related sources and functions
why:
Previously, the naming was hinged on the phrases `fifo`, `filter` etc.
which reflect the inner workings of cascaded filters. This was
unfortunate for reading/understanding the source code for actions where
the focus is the journal as a whole.
* Aristo: Fix buffer overflow (path length truncating error)
* Aristo: Tighten `hikeUp()` stop check, update error code
why:
Detect dangling vertex links. These are legit with `snap` sync
processing but not with regular processing.
* Aristo: Raise assert in regular mode `merge()` at a dangling link/edge
why:
With `snap` sync processing, partial trees are ok and can be amended.
Not so in regular mode.
Previously there was only a debug message when a non-legit dangling edge
was encountered.
* Aristo: Make sure that vertices are copied before modification
why:
Otherwise vertices from lower layers might also be modified
* Aristo: Fix relaxed mode for validity checker `check()`
* Remove cruft
* Aristo: Update API for transaction handling
details:
+ Split `aristo_tx.nim` into sub-modules
+ Split `forkWith()` into `findTx()` + `forkTx()`
+ Removed `forkTop()`, `forkBase()` (now superseded by new `forkTx()`)
* CoreDb+Aristo: Fix initialiser (missing methods)
* Aristo: Allow to define/set `FilterID` for journal filter records
why:
After some changes, the `FilterID` is isomorphic to the `BlockNumber`
scalar (well, the first 2^64 entries of a `BlockNumber`.)
The needed change for `FilterID` is that the `FilterID(0)` value is
valid part of the `FilterID` scalar. A non-valid `FilterID` entry is
represented by `none(FilterID)`.
* Aristo: Split off function `persist()` as persistent version of `stow()`
why:
In production, `stow(persistent=false,..)` is currently unused. So,
using `persist()` rather than `stow(persistent=true,..)` improves
readability and is better to maintain.
* CoreDb+Aristo: Store block numbers in journal records
why:
This makes journal records searchable by block numbers
* Aristo: Rename some journal related functions
why:
The name *journal* is more appropriate to api functions than something
with *fifo* or *filter*.
* CoreDb+Aristo: Update last/oldest journal state retrieval
* CoreDb+Aristo: Register block number with state root in journal
why:
No need anymore for extra lookup table `stRootToBlockNum` which maps
a storage root -> block number.
* Aristo: Remove unused function `getFilUbe()` from api
* CoreDb: Remove now unused virtual table `stRootToBlockNum`
why:
Was used to map a state root to a block number. This functionality
is now embedded into the recovery journal backend.
* Turn of API tracking (will fail on `fluffy`)
* Aristo: Code cosmetics, e.g. update some CamelCase names
* CoreDb+Aristo: Provide oldest known state root implied
details:
The Aristo journal allows to recover earlier but not all state roots.
* Aristo: Fix journal backward index operator, e.g. `[^1]`
* Aristo: Fix journal updater
why:
The `fifosStore()` store function slightly misinterpreted the update
instructions when translation is to database `put()` functions. The
effect was that the journal was ever growing due to stale entries which
were never deleted.
* CoreDb+Aristo: Provide utils for purging stale data from the KVT
details:
See earlier patch, not all state roots are available. This patch
provides a mapping from some state root to a block number and allows to
remove all KVT data related to a particular block number
* Aristo+Kvt: Implement a clean up schedule for expired data in KVT
why:
For a single state ledger like `Aristo`, there is only a limited
backlog of states. So KVT data (i.e. headers etc.) are cleaned up
regularly
* Fix copyright year
* Aristo+Kvt: Better RocksDB profiling
why:
Providing more detailed information, mainly for `Aristo`
* Aristo: Renamed journal `stats()` to `capacity()`
why:
`Stats()` was a misnomer
* Aristo: Provide backend read caches for key and vertex IDs
why:
Dedicated LRU caching for particular types gives a throughput advantage.
The sizes of the LRU queues used for caching are currently constant
but might be adjusted at a later time.
* Fix copyright year
* Code cosmetics
* Aristo+Kvt: Fix api wrappers
why:
Api setup killed the backend descriptor when backend mapping was
disabled.
* Aristo: Implement masked profiling entries
why:
Database backend should be listed but not counted in tally
* CoreDb: Simplify backend() methods
why:
DBMS backend access Was provided very early and over engineered. Now
there are only two backend machines, one for `Kvt` and the other one
for an `Mpt` available only via new API.
* CoreDb: Code cleanup regarding descriptor types
* CoreDb: Refactor/redefine `persistent()` methods
why:
There were `persistent()` methods for any type of caching storage
facilities `Kvt`, `Mpt`, `Phk`, and `Acc`. Now there is only a single
`persistent()` method storing all facilities in tandem (similar to
how transactions work.)
For non shared `Kvt` tables, there is now an extra storage method
`saveOffSite()`.
* CoreDb lingo update: `trie` becomes `column`
why:
Notion of a `trie` is pretty much hidden by the new `CoreDb` api.
Revealed are sort of database columns for accounts an storage data,
any of which have an internal state represented by a Keccack hash.
So a `trie` or `MPT` becomes a `column` and a `rootHash` becomes a
column state.
* Aristo: rename backend filed `filters` => `journal`
* Update full sync logging
details:
+ Disable eth handler noise while syncing
+ Log journal depth (if available)
* Fix copyright year
* Fix cruft and unwanted imports
* Kvt: Update API hooks
* Aristo: Generalised merging snap proofs, now for multiple state roots
why:
This accommodates pre-loading partial tries for unit tests
* Aristo: Update some unit tests
* CoreDb+Aristo: Re-factor tracer
why:
Was bonkers anyway. The main change is that the trace journal is now
kept in a way similar to a transaction layer so that it can predictably
interact with DB transactions.
* Ledger: Debugging helper
* Update tracer unit test applicable for `Aristo`
* Fix copyright year
* Disable `dump()` function as compile time default
why:
This needs to pull in the `rocks_db` library at compile time.
* CoreDb+Aristo: Fix handler code
* Aristo+Kvt: Remove cruft
* Aristo+Kvt: The function `forkTop()` always provides a single transaction
why:
Previously it provided a single squashed tx only if there were any. Now
it will provide a blind one if there were none.
* Fix Copyright header
why:
Ignoring `nil` objects was handy for a while but eventually led to
lazy programming which in turn led to double destructor calls for
the rocks-db.
* Aristo+Kvt: Fix backend `dup()` function in api setup
why:
Backend object is subject to an inheritance cascade which was not
taken care of, before. Only the base object was duplicated.
* Kvt: Simplify DB clone/peers management
* Aristo: Simplify DB clone/peers management
* Aristo: Adjust unit test for working with memory DB only
why:
This currently causes some memory corruption persumably in the
`libc` background layer.
* CoredDb+Kvt: Simplify API for KVT
why:
Simplified storage models (was over engineered) for better performance
and code maintenance.
* CoredDb+Aristo: Simplify API for `Aristo`
why:
Only single database state needed here. Accessing a similar state will
be implemented from outside this module using a context layer. This
gives better performance and improves code maintenance.
* Fix Copyright headers
* CoreDb: Turn off API tracking
why:
CI would ot go through. Was accidentally turned on.
* Aristo/Kvt: Provide function hooks APIs
why:
These APIs can be used for installing tracers, profiling functoinality,
and other niceties on the databases.
* Aristo: Provide optional API profiling
details:
It basically is a re-implementation of the `CoreDb` profiling
implementation
* Kvt: Provide optional API profiling similar to `Aristo`
* CoreDb: Re-implementing profiling using `aristo_profile`
* Ledger: Re-implementing profiling using `aristo_profile`
* CoreDb: Update unit tests for maintainability
* update copyright dates
* Aristo: Reorg `hashify()` using different schedule algorithm
why:
Directly calculating the search tree top down from the roots turns
out to be faster than using the cached structures left over by `merge()`
and `delete()`.
Time gains is short of 20%
* Aristo: Remove `lTab[]` leaf entry object type
why:
Not used anymore. It was previously needed to build the schedule for
`hashify()`.
* Aristo: Avoid unnecessary re-org of the vertex ID recycling list
why:
This list can become quite large so a heuristic is employed whether
it makes sense to re-org.
Also, re-org check is only done by `delete()` functions.
* Aristo: Remove key/reverse lookup table from tx layers
why:
It is ignored except for handling proof nodes and costs unnecessary
run time resources.
This feature was originally needed to accommodate the mental transition
from the legacy MPT to the `Aristo` trie :).
* Fix copyright year
* CoreDb: update test suite
* Aristo: Simplify reverse key map
why:
The reverse key map `pAmk: (root,key) -> {vid,..}` as been simplified to
`pAmk: key -> {vid,..}` as the state `root` domain argument is not used,
anymore
* Aristo: Remove `HashLabel` object type and replace it by `HashKey`
why:
The `HashLabel` object attaches a root hash to a hash key. This is
nowhere used, anymore.
* Fix copyright
* CoreDb: Test module with additional sample selector cmd line options
* Aristo: Do not automatically remove a storage trie with the account
why:
This is an unnecessary side effect. Rather than using an automatism, a
a storage root must be deleted manually.
* Aristo: Can handle stale storage root vertex IDs as empty IDs.
why:
This is currently needed for the ledger API supporting both, a legacy
and the `Aristo` database backend.
This feature can be disabled at compile time by re-setting the
`LOOSE_STORAGE_TRIE_COUPLING` flag in the `aristo_constants` module.
* CoreDb+Aristo: Flush/delete storage trie when deleting account
why:
On either backend, a deleted account leave a dangling storage trie on
the database.
For consistency nn the legacy backend, storage tries must not be
deleted as they might be shared by several accounts whereas on `Aristo`
they are always unique.
* Aristo: Update error return code
why:
Failing of `Aristo` function `delete()` might fail because there is
no such data item on the db. This must return a single error code
as is done with `fetch()`.
* Ledger: Better error handling
why:
The `expect()` clauses have been replaced by raising asserts indicating
the error from the database backend.
Also, `delete()` failures are legitimate if the item to delete does not
exist.
* Aristo: Delete function must always leave a label on DB for `hashify()`
why:
The `hashify()` uses the labels left bu `merge()` and `delete()` to
compile (and optimise) a scheduler for subsequent hashing.
Originally, the labels were not used for deleted entries and `delete()`
still had some edge case where the deletion label was not properly
handled.
* Aristo: Update `hashify()` scheduler, remove buggy optimisation
why:
Was left over from version without virtual state roots which did not
know about account payload leaf vertices referring to storage roots.
* Aristo: Label storage trie account in `delete()` similar to `merge()`
details;
The `delete()` function applied to a non-static state root (assumed
to be a storage root) will check the payload of an accounts leaf
and mark its Merkle keys to be re-checked when runninh `hashify()`
* Aristo: Clean up and re-org recycled vertex IDs in `hashify()`
why:
Re-organising the recycled vertex IDs list intends to reduce the size of the
list.
This list is organised as a LIFO (or stack.) By reorganising it in a way
so that the least vertex ID numbers are on top, the list will be kept
smaller as observed on some examples (less than 30%.)
* CoreDb: Accept storage trie deletion requests in non-initialised state
why:
Due to lazy initialisation, the root vertex ID might not yet exist. So
the `Aristo` database handlers would reject this call with an error and
this condition needs to be handled by the API (which realises the lazy
feature.)
* Cosmetics & code massage, prettify logging
* fix missing import
* Aristo: Update unit test suite
* Aristo/Kvt: Fix iterators
why:
Generic iterators were not properly updated after backend change
* Aristo: Add sub-trie deletion functionality
why:
For storage tries linked to an account payload vertex ID, a the
whole storage trie needs to be deleted with the account.
* Aristo: Reserve vertex ID numbers for static custom state roots
why:
Static custom state roots may be controlled by an application,
e.g. for a receipt or a transaction root. The `Aristo` functions
are agnostic of what the static state roots are when different
from the internal tree vertex ID 1.
details;
The `merge()` function applied to a non-static state root (assumed
to be a storage root) will check the payload of an accounts leaf
and mark its Merkle keys to be re-checked.
* Aristo: Correct error code symbol
* Aristo: Update error code symbols
* Aristo: Code cosmetics/comments
* Aristo: Fix hashify schedule calculator
why:
Had a tendency to stop early leaving an incomplete job
* Update KVT layers abstraction
details:
modelled after Aristo layers
* Simplified KVT database iterators (removed item counters)
why:
Not needed for production functions
* Simplify KVT merge function `layersCc()`
* Simplified Aristo database iterators (removed item counters)
why:
Not needed for production functions
* Update failure condition for hash labels compiler `hashify()`
why:
Node need not be rejected as long as links are on the schedule. In
that case, `redo[]` is to become `wff.base[]` at a later stage.
* Update merging layers and label update functions
why:
+ Merging a stack of layers with `layersCc()` could be simplified
+ Merging layers will optimise the reverse `kMap[]` table maps
`pAmk: label->{vid, ..}` by deleting empty mappings `label->{}` where
they are redundant.
+ Updated `layersPutLabel()` for optimising `pAmk[]` tables
* Fix kvt headers
* Provide differential layers for KVT transaction stack
why:
Significant performance improvement
* Provide abstraction layer for database top cache layer
why:
This will eventually implemented as a differential database layers
or transaction layers. The latter is needed to improve performance.
behavioural changes:
Zero vertex and keys (i.e. delete requests) are not optimised out
until the last layer is written to the database.
* Provide differential layers for Aristo transaction stack
why:
Significant performance improvement
* Explicitly use shared `Kvt` table on `Ledger` and `Clique` lookup.
why:
Speeds up lookup time with `Aristo` backend. For writing `Clique` data,
the `Companion` model allows to write `Clique` data past the database
locked by evm transactions.
* Implement `CoreDb` profiling with API tracking
why:
Chasing time spent per APT procs ...
* Implement `Ledger` profiling with API tracking
why:
Chasing time spent per APT procs ...
* Always hashify when commiting or storing
why:
A dirty cache makes no sense when committing
* Make sure that a zero key is created when adding/updating vertices
why:
This is an error fix mainly for edge cases. A typical error was
that the root key got deleted when there were only a few vertices
left on the DB.
* Need all created and changed vertices zero-keyed on the cache
why:
A zero key (i.e. empty Merkle hash) indicates that a vertex key
needs to be updated. This would not be needed immediately after
a merge as there is an actual leaf path on the cache layer. But
after subsequent merge and delete operations this information
might get blurred.
* Re-org hashing algorithm
why:
Apart from errors, the previous implementation was too slow for
two reasons:
+ some control hashes were calculated for debugging (now all
verification is done in `aristo_check` module)
+ the leaf paths stored on the cache are used to build the
labelling (aka hashing) schedule; there paths were accumulated
over successive hash sessions although it is clear that all
keys were generated, already
* Register paths for added leafs because of trie re-balancing
why:
While the payload would not change, the prefix in the leaf vertex
would. So it needs to be flagged for hash recompilation for the
`hashify()` module.
also:
Make sure that `Hike` paths which might have vertex links into the
backend filter are replaced by vertex copies before manipulating.
Otherwise the vertices on the immutable filter might be involuntarily
changed.
* Also check for paths where the leaf vertex is on the backend, already
why:
A a path can have dome vertices on the top layer cache with the
`Leaf` vertex on the backend.
* Re-define a void `HashLabel` type.
why:
A `HashLabel` type is a pair `(root-vertex-ID, Keccak-hash)`. Previously,
a valid `HashLabel` consisted of a non-empty hash and a non-zero vertex
ID. This definition leads to a non-unique representation of a void
`HashLabel` with either root-ID or has void. This has been changed to
the unique void `HashLabel` exactly if the hash entry is void.
* Update consistency checkers
* Re-org `hashify()` procedure
why:
Syncing against block chain showed serious deficiencies which produced
wrong hashes or simply bailed out with error.
So all fringe cases (mainly due to deleted entries) could be integrated
into the labelling schedule rather than handling separate fringe cases.
* Disable `TransactionID` related functions from `state_db.nim`
why:
Functions `getCommittedStorage()` and `updateOriginalRoot()` from
the `state_db` module are nowhere used. The emulation of a legacy
`TransactionID` type functionality is administratively expensive to
provide by `Aristo` (the legacy DB version is only partially
implemented, anyway).
As there is no other place where `TransactionID`s are used, they will
not be provided by the `Aristo` variant of the `CoreDb`. For the
legacy DB API, nothing will change.
* Fix copyright headers in source code
* Get rid of compiler warning
* Update Aristo code, remove unused `merge()` variant, export `hashify()`
why:
Adapt to upcoming `CoreDb` wrapper
* Remove synced tx feature from `Aristo`
why:
+ This feature allowed to synchronise transaction methods like begin,
commit, and rollback for a group of descriptors.
+ The feature is over engineered and not needed for `CoreDb`, neither
is it complete (some convergence features missing.)
* Add debugging helpers to `Kvt`
also:
Update database iterator, add count variable yield argument similar
to `Aristo`.
* Provide optional destructors for `CoreDb` API
why;
For the upcoming Aristo wrapper, this allows to control when certain
smart destruction and update can take place. The auto destructor works
fine in general when the storage/cache strategy is known and acceptable
when creating descriptors.
* Add update option for `CoreDb` API function `hash()`
why;
The hash function is typically used to get the state root of the MPT.
Due to lazy hashing, this might be not available on the `Aristo` DB.
So the `update` function asks for re-hashing the gurrent state changes
if needed.
* Update API tracking log mode: `info` => `debug
* Use shared `Kvt` descriptor in new Ledger API
why:
No need to create a new descriptor all the time
* Aristo: Provide key-value list signature calculator
detail:
Simple wrappers around `Aristo` core functionality
* Update new API for `CoreDb`
details:
+ Renamed new API functions `contains()` => `hasKey()` or `hasPath()`
which disables the `in` operator on non-boolean `contains()` functions
+ The functions `get()` and `fetch()` always return a not-found error if
there is no item, available. The new functions `getOrEmpty()` and
`mergeOrEmpty()` return an an empty `Blob` if there is no such key
found.
* Rewrite `core_apps.nim` using new API from `CoreDb`
* Use `Aristo` functionality for calculating Merkle signatures
details:
For debugging, the `VerifyAristoForMerkleRootCalc` can be set so
that `Aristo` results will be verified against the legacy versions.
* Provide general interface for Merkle signing key-value tables
details:
Export `Aristo` wrappers
* Activate `CoreDb` tests
why:
Now, API seems to be stable enough for general tests.
* Update `toHex()` usage
why:
Byteutils' `toHex()` is superior to `toSeq.mapIt(it.toHex(2)).join`
* Split `aristo_transcode` => `aristo_serialise` + `aristo_blobify`
why:
+ Different modules for different purposes
+ `aristo_serialise`: RLP encoding/decoding
+ `aristo_blobify`: Aristo database encoding/decoding
* Compacted representation of small nodes' links instead of Keccak hashes
why:
Ethereum MPTs use Keccak hashes as node links if the size of an RLP
encoded node is at least 32 bytes. Otherwise, the RLP encoded node
value is used as a pseudo node link (rather than a hash.) Such a node
is nor stored on key-value database. Rather the RLP encoded node value
is stored instead of a lode link in a parent node instead. Only for
the root hash, the top level node is always referred to by the hash.
This feature needed an abstraction of the `HashKey` object which is now
either a hash or a blob of length at most 31 bytes. This leaves two
ways of representing an empty/void `HashKey` type, either as an empty
blob of zero length, or the hash of an empty blob.
* Update `CoreDb` interface (mainly reducing logger noise)
* Fix copyright years (to make `Lint` happy)
* Aristo: Single `FetchPathNotFound` error in `fetchXxx()` and `hasPath()`
why:
Missing path hike returns too many detailed reasons why it failed
which becomes cumbersome to handle.
also:
Renamed `contains()` => `hasPath()` which disables the `in` operator on
non-boolean `contains()` functions
* Kvt: Renamed `contains()` => `hasKey()`
why:
which disables the `in` operator on non-boolean `contains()` functions
* Aristo: Generalising `HashID` by variable length `PathID`
why:
There are cases when the `Aristo` database is to be used with
shorter than 64 nibbles keys when handling transactions indexes
with sequence IDs.
caveat:
This patch only works reliable for full length `PathID` values. Tests
for shorter `PathID` values are currently missing.
* CoreDB: Re-org API
details:
Legacy API internally uses vertex ID for root node abstraction
* Cosmetics: Move some unit test helpers to common sub-directory
* Extract constant from `accouns_cache.nim` => `constants.nim`
* Fix tracer methods
why:
Logger dump data were wrongly dumped from the production database. This
caused an assert exception when iterating over the persistent database
(instead of the memory logger.) This event in turn was enabled after
fixing another inconsistency which just set up an empty iterator. Unit
tests failed to detect that.
* Aristo: remove obsolete functions
* Aristo: Fix error code for non-available hash keys
why:
Must not return `not-found` when the key is not available (i.e. the
current changes were not hashified, yet.)
* CoreDB: Provide TDD and test framework
* Update docu
* Update Aristo/Kvt constructor prototype
why:
Previous version used an `enum` value to indicate what backend is to
be used. This was replaced by using the backend object type.
* Rewrite `hikeUp()` return code into `Result[Hike,(Hike,AristoError)]`
why:
Better code maintenance. Previously, the `Hike` object was returned. It
had an internal error field so partial success was also available on
a failure. This error field has been removed.
* Use `openArray[byte]` rather than `Blob` in functions prototypes
* Provide synchronised multi instance transactions
why:
The `CoreDB` object was geared towards the legacy DB which used a single
transaction for the key-value backend DB. Different state roots are
provided by the backend database, so all instances work directly on the
same backend.
Aristo db instances have different in-memory mappings (aka different
state roots) and the transactions are on top of there mappings. So each
instance might run different transactions.
Multi instance transactions are a compromise to converge towards the
legacy behaviour. The synchronised transactions span over all instances
available at the time when base transaction was opened. Instances
created later are unaffected.
* Provide key-value pair database iterator
why:
Needed in `CoreDB` for `replicate()` emulation
also:
Some update of internal code
* Extend API (i.e. prototype variants)
why:
Needed for `CoreDB` geared towards the legacy backend which has a more
basic API than Aristo.
* Rewrite remaining `AristoError` return code into `Result[void,AristoError]`
why:
Better code maintenance
* Update import sections
* Update Aristo DB paths
why:
More systematic so directory can be shared with other DB types
* More cosmetcs
* Update unit tests runners
why:
Proper handling of persistent and mem-only DB. The latter can be
consistently triggered by an empty DB path.
* Reorg of distributed backend access
details:
Now handled via API provided in `aristo_desc`.
* Rename `checkCache()` => `checkTop()`
why:
Better naming for top layer cache checker
also:
Provide cascaded fifos checker
* Provide `eq` directive for finding filter by exact filter ID (think block number)
* Some code beautification (for better code reading)
* State root reposition and reorg
details:
Repositioning is supported by forking a new descriptor. Reorg is then
accomplished by writing this forked state on the backend database.
details:
* Tested features
+ Successively store filters with increasing filter ID (think block number)
+ Cascading through fifos, deeper fifos merge groups of filters
+ Fetch squash merged N top fifos
+ Delete N top fifos, push back merged fifo, continue storing
+ Fifo chain is verified by hashes and filter ID
* Not tested yet
+ Real live scenario (using data dumps)
+ Real filter data (only shallow filters used so far)
* Set scheduler state as part of the backend descriptor
details:
Moved type definitions `QidLayoutRef` and `QidSchedRef` to
`desc_structural.nim` so that it shares the same folder as
`desc_backend.nim`
* Automatic filter queue table initialisation in backend
details:
Scheduler can be tweaked or completely disabled
* Updated backend unit tests
details:
+ some code clean up/beautification, reads better now
+ disabled persistent filters so that there is no automated filter
management which will be implemented next
* Prettify/update unit tests source code
details:
Mostly replacing the `check()` paradigm by `xCheck()`
* Somewhat simplified backend type management
why:
Backend objects are labelled with a `BackendType` symbol where the
`BackendVoid` label is implicitly assumed for a `nil` backend object
reference.
To make it easier, a `kind()` function is used now applicable to
`nil` references as well.
* Fix DB storage layout for filter objects
why:
Need to store the filter ID with the object
* Implement reverse [] index on fifo
why:
An integer index argument on `[]` retrieves the QueueID (label) of the
fifo item while a QueueID argument on `[]` retrieves the index (so
it is inverse to the former variant).
* Provide iterator over filters as fifo
why:
This iterator goes along the cascased fifo structure (i.e. in
historical order)
* Add backwards index `[]` operator into fifo
also:
Need another maintenance instruction: The last overflow queue must
irrevocably delete some item in order to make space for a new one.
* Add re-org scheduler
details:
Generates instructions how to extract and merge some leading entries
* Add filter ID selector
details:
This allows to find the next filter now newer that a given filter ID
* Message update
* Rename FilterID => QueueID
why:
The current usage does not identify a particular filter but uses it as
storage tag to manage it on the database (to be organised in a set of
FIFOs or queues.)
* Split `aristo_filter` source into sub-files
why:
Make space for filter management API
* Store filter queue IDs in pairs on the backend
why:
Any pair will will describe a FIFO accessed by bottom/top IDs
* Reorg some source file names
why:
The "aristo_" prefix for make local/private files is tedious to
use, so removed.
* Implement filter slot scheduler
details:
Filters will be stored on the database on cascaded FIFOs. When a FIFO
queue is full, some filter items are bundled together and stored on the
next FIFO.
* Remove unused unit test sources
* Redefine and document serialised data records for Aristo backend
why:
Unique record types determined by marker byte, i.e. the last byte of a
serialisation record. This just needed some tweaking after adding new
record types.
* Removed dedicated transcoder tests
why:
will implicitely be provided by other tests:
+ encode/write -> hashify -> test_tx
+ decode/read -> merge raw nodes -> test_tx
+ de/blobfiy -> backend operations, taext_tx, test_backend, test_filter
* Clarify how the vertex ID generator state is accessed from the backend
why:
This state is a list of unused vertex IDs. It was just stored somewhere
on the backend which details were exposed when iterating over some
sub-table(s).
As there will be more such single information records, an admin
sub-tables has been defined (formerly ID generator table) with dedicated
access keys and type. Also, the iterator over the single ID generator
state item has been removed. It must be accessed via the `get()`
interface.
* Remove trailing space from file name
why:
fixes windows bail out
* Remove concept of empty/blind filters
why:
Not needed. A non-existent filter is is coded as a nil reference.
* Slightly generalised backend iterators
why:
* VertexID as key for the ID generator state makes no sense
* there will be more tables addressed by non-VertexID keys
* Store serialised/blobified vertices on memory backend
why:
This is more in line with the RocksDB backend so more appropriate
for testing when comparing behaviour. For a speedy memory database,
a backend-less variant should be used.
* Drop the `Aristo` prefix from names `AristoLayerRef`, etc.
* Suppress compiler warning
why:
duplicate imports
* Add filter serialisation transcoder
why:
Will be used as storage format
* Fix hashing algorithm
why:
Particular case where a sub-tree is on the backend, linked by an
Extension vertex to the top level.
* Update backend verification to report `dirty` top layer
* Implement distributed merge of backend filters
* Implement distributed backend access management
details:
Implemented and tested as described in chapter 5 of the `README.md`
file.
* Renamed type `NoneBackendRef` => `VoidBackendRef`
* Clarify names: `BE=filter+backend` and `UBE=backend (unfiltered)`
why:
Most functions used full names as `getVtxUnfilteredBackend()` or
`getKeyBackend()`. After defining abbreviations (and its meaning) it
seems easier to use `getVtxUBE()` and `getKeyBE()`.
* Integrate `hashify()` process into transaction logic
why:
Is now transparent unless explicitly controlled.
details:
Cache changes imply setting a `dirty` flag which in turn triggers
`hashify()` processing in transaction and `pack()` directives.
* Removed `aristo_tx.exec()` directive
why:
Inconsistent implementation, functionality will be provided with a
different paradigm.
* Provide deep copy for each transaction layer
why:
Localising changes. Selective deep copy was just overlooked.
* Generalise vertex ID generator state reorg function `vidReorg()`
why:
makes it somewhat easier to handle when saving layers.
* Provide dummy back end descriptor `NoneBackendRef`
* Optional read-only filter between backend and transaction cache
why:
Some staging area for accumulating changes to the backend DB. This
will eventually be an access layer for emulating a backend with
multiple/historic state roots.
* Re-factor `persistent()` with filter between backend/tx-cache => `stow()`
why:
The filter provides an abstraction from the physically stored data on
disk. So, there can be several MPT instances using the same disk data
with different state roots. Of course, all the MPT instances should
not differ too much for practical reasons :).
TODO:
Filter administration tools need to be provided.
* Better error handling
why:
Bail out on some error as early as possible before any changes.
* Implement `fetch()` as opposite of `merge()`
rationale:
In the `Aristo` realm, the action named `fetch()` and `merge()` indicate
leaf value related actions on the MPT, while actions `get()` and `put()`
handle vertex or hash key related operations that constitute the MPT.
* Re-factor `merge()` prototypes
why:
The most used variant of `merge()` should have the simplest prototype.
* Persistent DB constructor needs to import `aristo/aristo_init/persistent`
why:
Most applications use memory DB anyway. This avoids linking `-lrocksdb`
or any other back end libraries by default.
* Re-factor transaction module
why:
Got the paradigm wrong. The transaction descriptor did replace the
database one but should be handled separately.
* Misc fixes
detail:
* Fix de-serialisation for account leafs
* Update node recovery from unit tests
* Remove `LegacyAccount` from `PayloadRef` object
why:
Legacy accounts use a hash key as storage root which is detrimental
to the working of the Aristo database which uses a vertex ID.
* Dissolve `hashify_helper` into `aristo_utils` and `aristo_transcode`
why:
Functions are of general interest so they should live in first level
code files.
* Added left/right iterators over leaf nodes
* Some helper/wrapper functions that might be useful
why:
For the main tree with root vertex ID 1, the leaf nodes hold the
account data. These accounts may link to sub trees the storage root
node ID of which must be registered here. There is no reverse key
lookup on the backend.
note:
These definitions are experimental. Also, there are some tests missing
for validating Payload data conversions.
* Provide transaction based interface for standard operations
* Provide unit tests for new Aristo interface using transactions
details:
These new tests combine and replace several single-purpose tests.
The now unused test sources will be kept for a while to be eventually
removed.
* Slightly tighten some self-check conditions
* Redefined the database descriptor object as reference (to the object)
why:
The upcoming transaction wrapper will work with a database reference
rather than the object itself
* Append state before `save()` to the Aristo descriptor
why:
This stae was previously returned by the function. Appending it to
a field of the Aristo descriptor seems easier to handle.
* Fix missing branch checks in transcoder
why:
Symmetry problem. `Blobify()` allowed for encoding degenerate branch
vertices while `Deblobify()` rejected decoding wrongly encoded data.
* Update memory backend so that it rejects storing bogus vertices.
why:
Error behaviour made similar to the rocks DB backend.
* Make sure that leaf vertex IDs are not repurposed
why:
This makes it easier to record leaf node changes
* Update error return code for next()/right() traversal
why:
Returning offending vertex ID (besides error code) helps debugging
* Update Merkle hasher for deleted nodes
why:
Not implemented, yet
also:
Provide cache & backend consistency check functions. This was
partly re-implemented from `hashifyCheck()`
* Simplify some unit tests
* Fix delete function
why:
Was conceptually wrong
* Added missing deferred cleanup directive to sub-test functions
why:
Rocksdb keeps the files locked for a short while leading to errors. This
was previously solved my using different db sub-directories
* Provide vertex deep-copy function globally.
why:
is just handy
* Avoid unnecessary vertex caching when merging proof nodes
also:
Run all merge tests on the rocksdb backend
Previously, proof node tests were run without backend
* Fix vertex ID generator state handling for rocksdb backend
why:
* Key error in walk iterator
* Needs to be loaded when opening the database
* Use non-zero sub-table prefixes for rocksdb
why:
Handy for debugging
* Fix error code for missing key on rocksdb backend
why:
Previously returned `VOID_HASH_KEY` rather than `GetKeyNotFound`
* Explicitly copy vertex data between internal table and function/result argument
why:
Function argument or return reference may still refer to the same data
object.
* Updated error symbols
why:
Error symbol names for the hike module now start with the prefix `Hike`.
* Write back modified branch node into local top layer cache
why:
With the backend available, the source of the branch node references
might not be the top layer cache. So any change must be explicitely
recorded.
* Generalised Aristo DB constructor for any type of backend
details:
* Records to be deleted are represented as key-void (rather than
key-value) pairs by the put-function arguments
* Allow direct driver access, iterators as example implementation and
for testing.
* Provide backend storage interface
details:
Stores the top layer onto backend tables
* Implemented Rocks DB backend
details:
Transaction based `put()` functionality
Iterators (based on direct RocksDB access)
* Fix include
why:
Eth67 not default yet so that got missed
* Rename `LeafKey` => `LeafTie`
why:
Name is a pen picture of what this object is for. Also, it avoids the
ubiquitous term `key`.
* Provided `getOrVoid()` wrapper for `getOrDefault()`
also:
Provide `isValid()` syntactic sugar for `.isNil.not`, `!= 0` etc.
Reorg descriptor source, split into sub-sources
* Bundled `NodeKey` objects with root ID and called it `HashLabel`
why:
`NodeKey` (aka repurposed Hash265) objects are unique only within a
particular sub-trie (e.g. storage slots) which are kept separated
(i.e non-interleaved) by design. This is not applied to the backend
as the map VertexID->NodeKey labelling the nodes needs not be injective.
For the in-memory database (transaction) layers, the injective map
VertexID->(VertexID,NodeKey) is used where the first field of the image
tuple is the root ID of the sub-trie the `NodeKey` object is valid. So
identical storage tries for different accounts can be represented.
* Exclude some storage tests
why:
These test running on external dumps slipped through. The particular
dumps were reported earlier as somehow dodgy.
This was changed in `#1457` but having a second look, the change on
hexary_interpolate.nim(350) might be incorrect.
* Redesign `Aristo DB` descriptor for transaction based layers
why:
Previous descriptor layout made it cumbersome to push/pop
database delta layers.
The new architecture keeps each layer with the full delta set
relative to the database backend.
* Keep root ID as part of the `Patricia Trie` leaf path
why;
That way, forests are supported
* Fix missing Merkle key removal in `merge()`
* Accept optional root hash argument in `hashify()`
why:
For importing a full database, there will be no proof data except the
root key. So this can be used to check and set the root key in the
database descriptor.
also:
Associate vertex ID to `hashify()` error return code
* Added Aristo Trie traversal function
why:
* step along leaf vertices in sorted order
* tree/trie consistency checks when debugging
* Enabled storage slots test data for Aristo DB
* Keep vertex ID generator state with each db-layer
why:
The vertex ID generator state is part of the difference to the below
layer
* Move otherwise unused source to test directory
* Add Merkle hash generator
also:
* Verification facility for debugging
* Empty Merkle key hashes encoded as `EMPTY_ROOT_HASH`
details:
1. Merging a leaf vertex merges a `Patricia Trie` path (while
adding/modiying vertices) and adds a leaf node with payload
2. Merging a Merkel node merges a single vertex to the `Patricia Trie`
and registers merkel hashes
3. Action 2 can be used before action 1 in order to construct a
Merkel proof as required for handling `snap/1` data.
4. Unit tests show that action 3 is benign for now :)
* Cosmetics, renamed fields (eVtx, bVtx) -> (eVid, bVid)
* Multilayered delta architecture for Aristo DB
details:
Any VertexID or data retrieval needs to go down the rabbit hole and
fetch/get/manipulate the bottom layer -- even without explicit
backend.
* Direct reference to backend from top-level layer
why:
Some services as the vid management needs to be synchronised among all
layers. So access is optimised.
* Experimental MP-trie
why:
Deleting records is a infeasible with the current structure
* Added vertex ID recycling management
Todo:
Provide some unit tests
* DB layout update
why:
Main news is the separation of `Merkel` hashes into an extra table.
details:
The code fragments cover conversion between compact MPT records and
Aristo DB records as well as some rudimentary cache handling for
the `Merkel` hashes (i.e. the extra table entries.)
todo:
Add some simple unit test for the descriptor record (currently used
for vertex ID management, only.)
* Updated vertex ID recycling management
details:
added simple unit tests (mainly testing ABI)
* docu update