Root encoding is on the hot path for block verification both in the
consensus (when syncing) and execution clients and oddly consititutes a
significant part of resource usage even though it is not that much work.
While the trie code is capable of producing a transaction root and
similar feats, it turns out that it is quite inefficient - even for
small work loads.
This PR brings in a helper for the specific use case of building tries
of lists of values whose key is the RLP-encoded index of the item.
As it happens, such keys follow a particular structure where items end
up "almost" sorted, with the exception for the item at index 0 which
gets encoded as `[0x80]`, ie the empty list, thus moving it to a new
location.
Armed with this knowledge and the understanding that inserting ordered
items into a trie easily can be done with a simple recursion, this PR
brings a ~100x improvement in CPU usage (360ms vs 33s) and a ~50x
reduction in memory usage (70mb vs >3gb!) for the simple test of
encoding 1000000 keys.
In part, the memory usage reduction is due to a trick where the hash of
the item is computed as the item is being added instead of storing it in
the value.
There are further reductions possible such as maintaining a hasher per
level instead of storing hash values as well as using a direct-to-hash
rlp encoder.