The EVM stack is a hot spot in EVM execution and we end up paying a nim
seq tax in several ways, adding up to ~5% of execution time:
* on initial allocation, all bytes get zeroed - this means we have to
choose between allocating a full stack or just a partial one and then
growing it
* pushing and popping introduce additional zeroing
* reallocations on growth copy + zero - expensive again!
* redundant range checking on every operation reducing inlining etc
Here a custom stack using C memory is instroduced:
* no zeroing on allocation
* full stack allocated on EVM startup -> no reallocation during
execution
* fast push/pop - no zeroing again
* 32-byte alignment - this makes it easier for the compiler to use
vector instructions
* no stack allocated for precompiles (these never use it anyway)
Of course, this change also means we have to manage memory manually -
for the EVM, this turns out to be not too bad because we already manage
database transactions the same way (they have to be freed "manually") so
we can simply latch on to this mechanism.
While we're at it, this PR also skips database lookup for known
precompiles by resolving such addresses earlier.
* Inline gas cost/instruction fetching
These make up 5:ish % of EVM execution time - even though they're
trivial they end up not being inlined - this little change gives a
practically free perf boost ;)
Also unify the style of creating the output to `setLen`..
* avoid a few more unnecessary seq allocations
* Fixes related to Prague execution requests
Turn out the specs are changed:
- WITHDRAWAL_REQUEST_ADDRESS -> WITHDRAWAL_QUEUE_ADDRESS
- CONSOLIDATION_REQUEST_ADDRESS -> CONSOLIDATION_QUEUE_ADDRESS
- DEPOSIT_CONTRACT_ADDRESS -> only mainnet
- depositContractAddress can be configurable
Also fix bugs related to t8n tool
* Fix for evmc
* prefer the spec-derived name where possible
* don't pass stateRoot to LedgerRef and friends (it doesn't do anything)
* add deprecation warning in graphql - it needs updating to use
forkedchain instead
* partial commit
* fixes
* remove converters too
* revert changes on nimbus_verified_proxy
* revert changes in converter
* revert changes(re-xport) in rpc_types
* update copyright year
* replace types in other binaries
* chain config bug
* fix rebase conflict imcomplete buffer
* fix more rebase buffers
* remove ditto types and converters
* fix the tests
* update copyright year
This is a minimal set of changes to make things work with the new types
in nim-eth - this is the minimal PR that merely resolves
incompatibilities while the full change set would include more cleanup
and migration.
* init style for Hash256
https://github.com/status-im/nim-eth/pull/733 updates `Hash256` to
become an array instead of an object - unfortunately, nim does not allow
constructing arrays with `name()`, so this PR changes it to `default`
which works with both.
* lint
In the current VM opcode dispatcher, a two-level case statement is
generated that first matches the opcode and then uses another nested
case statement to select the actual implementation based on which fork
it is, causing the dispatcher to grow by `O(opcodes) * O(forks)`.
The fork does not change between instructions causing significant
inefficiency for this approach - not only because it repeats the fork
lookup but also because of code size bloat and missed optimizations.
A second source of inefficiency in dispatching is the tracer code which
in the vast majority of cases is disabled but nevertheless sees multiple
conditionals being evaluated for each instruction only to remain
disabled throughout exeuction.
This PR rewrites the opcode dispatcher macro to generate a separate
dispatcher for each fork and tracer setting and goes on to pick the
right one at the start of the computation.
This has many advantages:
* much smaller dispatcher
* easier to compile
* better inlining
* fewer pointlessly repeated instruction
* simplified macro (!)
* slow "low-compiler-memory" dispatcher code can be removed
Net block import improvement at about 4-6% depending on the contract -
synthetic EVM benchmnarks would show an even better result most likely.
The reverse slot hash mechanism causes quite a bit of database traffic
but is broadly not useful except for iterating the storage of an
account, something that a validator never does (it's used by the
tracers).
This flag adds one more thing that is not stored in the database, to be
explored more comprehensively when designing full, validator and archive
modes with different pruning options in the future.
`ldb` says this is 60gb of data (!):
```
ldb --db=. --ignore_unknown_options --column_family=KvtGen approxsize
--hex --from=0x05
--to=0x05ffffffffffffffffffffffffffffffffffffffffffffff
66488353954
```
* Use block number or timestamp to determine fork rules
Avoid confusion raised by `forkGTE` usage where block informations are present.
* Get rid of forkGTE
Introduce a new `StoData` payload type similar to `AccountData`
* slightly more efficient storage format
* typed api
* fewer seqs
* fix encoding docs - it wasn't rlp after all :)
This significantly speeds up block import at the cost of less protection
against invalid data, potentially resulting in an invalid database
getting stored.
The risk is small given that import is used only for validated data -
evaluating the right level of of validation vs performance is left for a
future PR.
A side effect of this approach is that there is no cached stated root in
the database - computing it currently requires a lot of memory since the
intermediate roots get cached in memory in full while the computation is
ongoing - a future PR will need to address this deficiency, for example
by streaming the already-computed hashes directly to the database.
* Normalised storage tree addressing in function prototypes
detail:
Argument list is always `<db> <account-path> <slot-path> ..` with
both path arguments as `openArray[]`
* Remove cruft
* CoreDb internally Use full account paths rather than addresses
* Update API logging
* Use hashed account address only in prototypes
why:
This avoids unnecessary repeated hashing of the same account address.
The burden of doing that is upon the application. In the case here,
the ledger caches all kinds of stuff anyway so it is common sense to
exploit that for account address hashes.
caveat:
Using `openArray[byte]` argument types for hashed accounts is inherently
fragile. In non-release mode, a length verification `doAssert` is
enabled by default.
* No accPath in data record (use `AristoAccount` as `CoreDbAccount`)
* Remove now unused `eAddr` field from ledger `AccountRef` type
why:
Is duplicate of lookup key
* Avoid merging the account record/statement in the ledger twice.
It is common for many accounts to share the same code - at the database
level, code is stored by hash meaning only one copy exists per unique
program but when loaded in memory, a copy is made for each account.
Further, every time we execute the code, it must be scanned for invalid
jump destinations which slows down EVM exeuction.
Finally, the extcodesize call causes code to be loaded even if only the
size is needed.
This PR improves on all these points by introducing a shared
CodeBytesRef type whose code section is immutable and that can be shared
between accounts. Further, a dedicated `len` API call is added so that
the EXTCODESIZE opcode can operate without polluting the GC and code
cache, for cases where only the size is requested - rocksdb will in this
case cache the code itself in the row cache meaning that lookup of the
code itself remains fast when length is asked for first.
With 16k code entries, there's a 90% hit rate which goes up to 99%
during the 2.3M attack - the cache significantly lowers memory
consumption and execution time not only during this event but across the
board.