Each branch node may have up to 16 sub-items - currently, these are
given VertexID based when they are first needed leading to a
mostly-random order of vertexid for each subitem.
Here, we pre-allocate all 16 vertex ids such that when a branch subitem
is filled, it already has a vertexid waiting for it. This brings several
important benefits:
* subitems are sorted and "close" in their id sequencing - this means
that when rocksdb stores them, they are likely to end up in the same
data block thus improving read efficiency
* because the ids are consequtive, we can store just the starting id and
a bitmap representing which subitems are in use - this reduces disk
space usage for branches allowing more of them fit into a single disk
read, further improving disk read and caching performance - disk usage
at block 18M is down from 84 to 78gb!
* the in-memory footprint of VertexRef reduced allowing more instances
to fit into caches and less memory to be used overall.
Because of the increased locality of reference, it turns out that we no
longer need to iterate over the entire database to efficiently generate
the hash key database because the normal computation is now faster -
this significantly benefits "live" chain processing as well where each
dirtied key must be accompanied by a read of all branch subitems next to
it - most of the performance benefit in this branch comes from this
locality-of-reference improvement.
On a sample resync, there's already ~20% improvement with later blocks
seeing increasing benefit (because the trie is deeper in later blocks
leading to more benefit from branch read perf improvements)
```
blocks: 18729664, baseline: 190h43m49s, contender: 153h59m0s
Time (total): -36h44m48s, -19.27%
```
Note: clients need to be resynced as the PR changes the on-disk format
R.I.P. little bloom filter - your life in the repo was short but
valuable
This kind of data is not used except in tests where it is used only to
create databases that don't match actual usage of aristo.
Removing simplifies future optimizations that can focus on processing
specific leaf types more efficiently.
A casualty of this removal is some test code as well as some proof
generation code that is unused - on the surface, it looks like it should
be possible to port both of these to the more specific data types -
doing so would ensure that a database written by one part of the
codebase can interact with the other - as it stands, there is confusion
on this point since using the proof generation code will result in a
database of a shape that is incompatible with the rest of eth1.
* move pfx out of variant which avoids pointless field type panic checks
and copies on access
* make `VertexRef` a non-inheritable object which reduces its memory
footprint and simplifies its use - it's also unclear from a semantic
point of view why inheritance makes sense for storing keys
detail:
For practical reasons, ifsuch an account is asked for a slot, an empty
proof list is returned. It is up to the user to provide an account
proof that shows that there is no storage tree.
* Provide portal proof functions in `aristo_api`
why:
So it can be fully supported by `CoreDb`
* Fix prototype in `kvt_api`
* Fix node constructor for account leafs with storage trees
* Provide simple path check based on portal proof functionality
* Provide portal proof functionality in `CoreDb`
* Update TODO list
* Extracted `test_tx.testTxMergeProofAndKvpList()` => separate file
* Fix serialiser
why:
Typo lead to duplicate rlp-encoded nodes in chain
* Remove cruft
* Implemnt portal proof nodes generators `partXxxTwig()`
* Add unit test for portal proof nodes generator `partAccountTwig()`
* Cosmetics
* Simplify serialiser return code format
* Fix proof generator for extension nodes
why:
Code was simply bonkers, not detected before the unit tests were
adapted to check for just this.
* Implemented portal proof nodes verifier `partUntwig()`
* Cosmetics
* Fix `testutp` cli poblem
* Implement partial trees
why:
This is currently needed for unit tests to pre-load the database
with test data similar to `proof` node pre-load.
The basic features for `snap-sync` boundary proofs are available
as well for future use. What is missing is the final proof verification
and a complete storage data load/merge function (stub is available.)
* Cosmetics, clean up