Thanks to @holiman of goevmlab for his fuzzer.
Similar with Blake2b precompile regression #2919.
When error, the precompile should not return any output.
When running the import, currently blocks are loaded in batches into a
`seq` then passed to the importer as such.
In reality, blocks are still processed one by one, so the batching does
not offer any performance advantage. It does however require that the
client wastes memory, up to several GB, on the block sequence while
they're waiting to be processed.
This PR introduces a persister that accepts these potentially large
blocks one by one and at the same time removes a number of redundant /
unnecessary copies, assignments and resets that were slowing down the
import process in general.
* `shouldPrepareTracer` always true
* simple `pop` should not copy value (reading the memory shows up in a
profiler)
* continuation code simplified
* remove some unnecessary EH
The EVM stack is a hot spot in EVM execution and we end up paying a nim
seq tax in several ways, adding up to ~5% of execution time:
* on initial allocation, all bytes get zeroed - this means we have to
choose between allocating a full stack or just a partial one and then
growing it
* pushing and popping introduce additional zeroing
* reallocations on growth copy + zero - expensive again!
* redundant range checking on every operation reducing inlining etc
Here a custom stack using C memory is instroduced:
* no zeroing on allocation
* full stack allocated on EVM startup -> no reallocation during
execution
* fast push/pop - no zeroing again
* 32-byte alignment - this makes it easier for the compiler to use
vector instructions
* no stack allocated for precompiles (these never use it anyway)
Of course, this change also means we have to manage memory manually -
for the EVM, this turns out to be not too bad because we already manage
database transactions the same way (they have to be freed "manually") so
we can simply latch on to this mechanism.
While we're at it, this PR also skips database lookup for known
precompiles by resolving such addresses earlier.
* Inline gas cost/instruction fetching
These make up 5:ish % of EVM execution time - even though they're
trivial they end up not being inlined - this little change gives a
practically free perf boost ;)
Also unify the style of creating the output to `setLen`..
* avoid a few more unnecessary seq allocations
* Fixes related to Prague execution requests
Turn out the specs are changed:
- WITHDRAWAL_REQUEST_ADDRESS -> WITHDRAWAL_QUEUE_ADDRESS
- CONSOLIDATION_REQUEST_ADDRESS -> CONSOLIDATION_QUEUE_ADDRESS
- DEPOSIT_CONTRACT_ADDRESS -> only mainnet
- depositContractAddress can be configurable
Also fix bugs related to t8n tool
* Fix for evmc
* prefer the spec-derived name where possible
* don't pass stateRoot to LedgerRef and friends (it doesn't do anything)
* add deprecation warning in graphql - it needs updating to use
forkedchain instead
* partial commit
* fixes
* remove converters too
* revert changes on nimbus_verified_proxy
* revert changes in converter
* revert changes(re-xport) in rpc_types
* update copyright year
* replace types in other binaries
* chain config bug
* fix rebase conflict imcomplete buffer
* fix more rebase buffers
* remove ditto types and converters
* fix the tests
* update copyright year
This is a minimal set of changes to make things work with the new types
in nim-eth - this is the minimal PR that merely resolves
incompatibilities while the full change set would include more cleanup
and migration.
* init style for Hash256
https://github.com/status-im/nim-eth/pull/733 updates `Hash256` to
become an array instead of an object - unfortunately, nim does not allow
constructing arrays with `name()`, so this PR changes it to `default`
which works with both.
* lint
In the current VM opcode dispatcher, a two-level case statement is
generated that first matches the opcode and then uses another nested
case statement to select the actual implementation based on which fork
it is, causing the dispatcher to grow by `O(opcodes) * O(forks)`.
The fork does not change between instructions causing significant
inefficiency for this approach - not only because it repeats the fork
lookup but also because of code size bloat and missed optimizations.
A second source of inefficiency in dispatching is the tracer code which
in the vast majority of cases is disabled but nevertheless sees multiple
conditionals being evaluated for each instruction only to remain
disabled throughout exeuction.
This PR rewrites the opcode dispatcher macro to generate a separate
dispatcher for each fork and tracer setting and goes on to pick the
right one at the start of the computation.
This has many advantages:
* much smaller dispatcher
* easier to compile
* better inlining
* fewer pointlessly repeated instruction
* simplified macro (!)
* slow "low-compiler-memory" dispatcher code can be removed
Net block import improvement at about 4-6% depending on the contract -
synthetic EVM benchmnarks would show an even better result most likely.
The reverse slot hash mechanism causes quite a bit of database traffic
but is broadly not useful except for iterating the storage of an
account, something that a validator never does (it's used by the
tracers).
This flag adds one more thing that is not stored in the database, to be
explored more comprehensively when designing full, validator and archive
modes with different pruning options in the future.
`ldb` says this is 60gb of data (!):
```
ldb --db=. --ignore_unknown_options --column_family=KvtGen approxsize
--hex --from=0x05
--to=0x05ffffffffffffffffffffffffffffffffffffffffffffff
66488353954
```
* Use block number or timestamp to determine fork rules
Avoid confusion raised by `forkGTE` usage where block informations are present.
* Get rid of forkGTE
Introduce a new `StoData` payload type similar to `AccountData`
* slightly more efficient storage format
* typed api
* fewer seqs
* fix encoding docs - it wasn't rlp after all :)
This significantly speeds up block import at the cost of less protection
against invalid data, potentially resulting in an invalid database
getting stored.
The risk is small given that import is used only for validated data -
evaluating the right level of of validation vs performance is left for a
future PR.
A side effect of this approach is that there is no cached stated root in
the database - computing it currently requires a lot of memory since the
intermediate roots get cached in memory in full while the computation is
ongoing - a future PR will need to address this deficiency, for example
by streaming the already-computed hashes directly to the database.