A bit unexpectedly, nibble handling shows up in the profiler mainly
because the current impl is tuned towards slicing while the most common
operation is prefix comparison - since the code is simple, might has
well get rid of some of the excess fat by always aliging the nibbles to
the byte buffer.
When walking AriVtx, parsing integers and nibbles actually becomes a
hotspot - these trivial changes reduces CPU usage during initial key
cache computation by ~15%.