Root encoding is on the hot path for block verification both in the
consensus (when syncing) and execution clients and oddly consititutes a
significant part of resource usage even though it is not that much work.
While the trie code is capable of producing a transaction root and
similar feats, it turns out that it is quite inefficient - even for
small work loads.
This PR brings in a helper for the specific use case of building tries
of lists of values whose key is the RLP-encoded index of the item.
As it happens, such keys follow a particular structure where items end
up "almost" sorted, with the exception for the item at index 0 which
gets encoded as `[0x80]`, ie the empty list, thus moving it to a new
location.
Armed with this knowledge and the understanding that inserting ordered
items into a trie easily can be done with a simple recursion, this PR
brings a ~100x improvement in CPU usage (360ms vs 33s) and a ~50x
reduction in memory usage (70mb vs >3gb!) for the simple test of
encoding 1000000 keys.
In part, the memory usage reduction is due to a trick where the hash of
the item is computed as the item is being added instead of storing it in
the value.
There are further reductions possible such as maintaining a hasher per
level instead of storing hash values as well as using a direct-to-hash
rlp encoder.
Since these types were written, we've gained an executable spec:
https://github.com/ethereum/execution-specs
This PR aligns some of the types we use with this spec to simplify
comparisons and cross-referencing.
Using a `distinct` type is a tradeoff between nim ergonomics, type
safety and the ability to work around nim quirks and stdlib weaknesses.
In particular, it allows us to overload common functions such as `hash`
with correct and performant versions as well as maintain control over
string conversions etc at the cost of a little bit of ceremony when
instantiating them.
Apart from distinct byte types, `Hash32`, is introduced in lieu of the
existing `Hash256`, again aligning this commonly used type with the spec
which picks bytes rather than bits in the name.
Other implementations of MPT delete entries when attempting to put empty
value, because empty value cannot exist in RLP. We should match the
behaviour.
- https://github.com/ethereum/py-trie/pull/109
Also cross-checked with Geth and Ethereumjs implementations.
* Revert "In the incomplete-db node-existence check, don't use contains. (#603)"
This reverts commit 4b818e830710f857e5cc8001af4d084570d2204a.
* Revert "Some changes to make hexary.nim better able to handle incomplete DBs (#602)"
This reverts commit f5dd26eac05a00134ebf1af537e16235702c343b.
* Revert "Added maybeGet, for working with incomplete DBs. (#595)"
This reverts commit 4754543605770469410d4546f142c2d8db97f943.
* In the incomplete-db node-existence check, don't use contains.
(Using contains led to a problem with CaptureDB.)
* In the incomplete-db check, just checking len > 0 isn't right.
* Oh, I needed the AssertionDefect thing too.
* Need this when compiling under older versions of Nim.
* Sometimes we want missing nodes to be errors, sometimes not.
`eth_types` is being imported from many projects and ends up causing
long build times due to its extensive import lists - this PR starts
cleaning some of that up by moving the chain DB and RLP to their own
modules.
this PR also moves `keccakHash` to its own module and uses it in many
places.
Currently only setting `--styleCheck:hint` as there are some
dependency fixes required and the compiler seems to trip over the
findnode MessageKind, findnode Message field and the findNode
proc. Also over protocol.Protocol usage.
* Fix raw Exceptions in hexary caused by forward declarations
* Fix raw Exceptions in trie/db caused by forward declarations
* And now we can remove those db Proc CatchableError raises
* Add build_dcli target and add it to CI
* Fix local imports for dcli
* And use local imports for all other files too
* Use local imports in tests and rlpx protocols
* port kvstore from nim-beacon-chain
* remove old database backends
* use kvstore in trie database
* add sqlite dep
* avoid template param double evaluation
* clean up heterogenous lookup todo