461 Commits

Author SHA1 Message Date
Jordan Hrycaj
1d70ba5ff0
Fix log warnings (== should have been !=) (#2907) 2024-12-04 14:36:15 +00:00
Jacek Sieka
f034af422a
Pre-allocate vids for branches (#2882)
Each branch node may have up to 16 sub-items - currently, these are
given VertexID based when they are first needed leading to a
mostly-random order of vertexid for each subitem.

Here, we pre-allocate all 16 vertex ids such that when a branch subitem
is filled, it already has a vertexid waiting for it. This brings several
important benefits:

* subitems are sorted and "close" in their id sequencing - this means
that when rocksdb stores them, they are likely to end up in the same
data block thus improving read efficiency
* because the ids are consequtive, we can store just the starting id and
a bitmap representing which subitems are in use - this reduces disk
space usage for branches allowing more of them fit into a single disk
read, further improving disk read and caching performance - disk usage
at block 18M is down from 84 to 78gb!
* the in-memory footprint of VertexRef reduced allowing more instances
to fit into caches and less memory to be used overall.

Because of the increased locality of reference, it turns out that we no
longer need to iterate over the entire database to efficiently generate
the hash key database because the normal computation is now faster -
this significantly benefits "live" chain processing as well where each
dirtied key must be accompanied by a read of all branch subitems next to
it - most of the performance benefit in this branch comes from this
locality-of-reference improvement.

On a sample resync, there's already ~20% improvement with later blocks
seeing increasing benefit (because the trie is deeper in later blocks
leading to more benefit from branch read perf improvements)

```
blocks: 18729664, baseline: 190h43m49s, contender: 153h59m0s
Time (total): -36h44m48s, -19.27%
```

Note: clients need to be resynced as the PR changes the on-disk format

R.I.P. little bloom filter - your life in the repo was short but
valuable
2024-12-04 11:42:04 +01:00
Jacek Sieka
b3cb51e89e
Speed up evm stack (#2881)
The EVM stack is a hot spot in EVM execution and we end up paying a nim
seq tax in several ways, adding up to ~5% of execution time:

* on initial allocation, all bytes get zeroed - this means we have to
choose between allocating a full stack or just a partial one and then
growing it
* pushing and popping introduce additional zeroing
* reallocations on growth copy + zero - expensive again!
* redundant range checking on every operation reducing inlining etc

Here a custom stack using C memory is instroduced:

* no zeroing on allocation
* full stack allocated on EVM startup -> no reallocation during
execution
* fast push/pop - no zeroing again
* 32-byte alignment - this makes it easier for the compiler to use
vector instructions
* no stack allocated for precompiles (these never use it anyway)

Of course, this change also means we have to manage memory manually -
for the EVM, this turns out to be not too bad because we already manage
database transactions the same way (they have to be freed "manually") so
we can simply latch on to this mechanism.

While we're at it, this PR also skips database lookup for known
precompiles by resolving such addresses earlier.
2024-11-30 10:07:10 +01:00
andri lim
e55583bf7a
Fix incomplete PR #2877 (#2880) 2024-11-27 17:45:37 +07:00
andri lim
fbfc1611d7
Implement EIP-7702: Set EOA account code (#2631)
* Implement EIP-7702 part 1: Behavior

* Implement EIP-7702 part 2: Tx validation

* Implement EIP-7702 part 3: Delegation Designation and Gas Costs
2024-11-25 11:28:03 +01:00
Jacek Sieka
652539e628
Simplify state root api (#2864)
`updateOk` is obsolete and always set to true - callers should not have
to care about this detail

also take the opportunity to clean up storage root naming
2024-11-22 14:15:35 +01:00
andri lim
7b2b59a976
Add missing fields to RPC object conversion (#2863)
* Add missing fields to RPC object conversion

* Fix populateBlockObject call

* Remove server_api_helpers.nim

* Add metric defined conditional compilation

* link with rocksdb
2024-11-22 17:07:53 +07:00
andri lim
a57a887269
Fix t8n regression: Legacy Tx should not validate chainId (#2858) 2024-11-21 21:57:22 +07:00
Jacek Sieka
6086c2903c
Small deserialization speedup (#2852)
When walking AriVtx, parsing integers and nibbles actually becomes a
hotspot - these trivial changes reduces CPU usage during initial key
cache computation by ~15%.
2024-11-20 16:04:32 +01:00
Jacek Sieka
01ca415721
Store keys together with node data (#2849)
Currently, computed hash keys are stored in a separate column family
with respect to the MPT data they're generated from - this has several
disadvantages:

* A lot of space is wasted because the lookup key (`RootedVertexID`) is
repeated in both tables - this is 30% of the `AriKey` content!
* rocksdb must maintain in-memory bloom filters and LRU caches for said
keys, doubling its "minimal efficient cache size"
* An extra disk traversal must be made to check for existence of cached
hash key
* Doubles the amount of files on disk due to each column family being
its own set of files

Here, the two CFs are joined such that both key and data is stored in
`AriVtx`. This means:

* we save ~30% disk space on repeated lookup keys
* we save ~2gb of memory overhead that can be used to cache data instead
of indices
* we can skip storing hash keys for MPT leaf nodes - these are trivial
to compute and waste a lot of space - previously they had to present in
the `AriKey` CF to avoid having to look in two tables on the happy path.
* There is a small increase in write amplification because when a hash
value is updated for a branch node, we must write both key and branch
data - previously we would write only the key
* There's a small shift in CPU usage - instead of performing lookups in
the database, hashes for leaf nodes are (re)-computed on the fly
* We can return to slightly smaller on-disk SST files since there's
fewer of them, which should reduce disk traffic a bit

Internally, there are also other advantages:

* when clearing keys, we no longer have to store a zero hash in memory -
instead, we deduce staleness of the cached key from the presence of an
updated VertexRef - this saves ~1gb of mem overhead during import
* hash key cache becomes dedicated to branch keys since leaf keys are no
longer stored in memory, reducing churn
* key computation is a lot faster thanks to the skipped second disk
traversal - a key computation for mainnet can be completed in 11 hours
instead of ~2 days (!) thanks to better cache usage and less read
amplification - with additional improvements to the on-disk format, we
can probably get rid of the initial full traversal method of seeding the
key cache on first start after import

All in all, this PR reduces the size of a mainnet database from 160gb to
110gb and the peak memory footprint during import by ~1-2gb.
2024-11-20 09:56:27 +01:00
andri lim
666f8d2cf1
Fixes related to Prague execution requests (#2847)
* Fixes related to Prague execution requests

Turn out the specs are changed:
- WITHDRAWAL_REQUEST_ADDRESS -> WITHDRAWAL_QUEUE_ADDRESS
- CONSOLIDATION_REQUEST_ADDRESS -> CONSOLIDATION_QUEUE_ADDRESS
- DEPOSIT_CONTRACT_ADDRESS -> only mainnet
- depositContractAddress can be configurable

Also fix bugs related to t8n tool

* Fix for evmc
2024-11-08 10:47:07 +07:00
andri lim
70a1f768f7
Engine API: Route more wiring from CoreDb to ForkedChain (#2844) 2024-11-07 03:43:25 +00:00
andri lim
6b86acfb8d
Cleanup db/core_apps error handling (#2838)
* Cleanup db/core_apps error handling

* Fix persistHeader

* Fix getUncles
2024-11-07 08:24:21 +07:00
andri lim
f201eb611e
Simplify LedgerRef: remove unnecessary abstraction (#2826) 2024-11-06 09:01:56 +07:00
andri lim
6c3bbbf22c
Feature: Prevent loading an existing data directory for the wrong network (#2825)
* Prevent loading an existing data directory for the wrong network

* Fix and add more info
2024-11-06 09:01:42 +07:00
andri lim
89fac051cd
Reduce declared but not used warnings (#2822) 2024-11-03 00:11:24 +00:00
Jacek Sieka
58cde36656
Remove RawData from possible leaf payload types (#2794)
This kind of data is not used except in tests where it is used only to
create databases that don't match actual usage of aristo.

Removing simplifies future optimizations that can focus on processing
specific leaf types more efficiently.

A casualty of this removal is some test code as well as some proof
generation code that is unused - on the surface, it looks like it should
be possible to port both of these to the more specific data types -
doing so would ensure that a database written by one part of the
codebase can interact with the other - as it stands, there is confusion
on this point since using the proof generation code will result in a
database of a shape that is incompatible with the rest of eth1.
2024-11-02 10:29:16 +01:00
tersec
73661fd8a4
switch to Nim v2.0.12 (#2817)
* switch to Nim v2.0.12

* fix LruCache capitalization for styleCheck

* KzgProof/KzgCommitment for styleCheck

* TxEip4844 for styleCheck

* styleCheck issues in nimbus/beacon/payload_conv.nim

* ENode for styleCheck

* isOk for styleCheck

* some more styleCheck fixes

* more styleCheck fixes

---------

Co-authored-by: jangko <jangko128@gmail.com>
2024-11-01 19:06:26 +00:00
Jacek Sieka
1406feab5f
fix computeKey account hash (#2795)
Oops. Discovered as part of making the code use the actual production
database types in the key computation test ;)
2024-10-28 19:14:28 +01:00
Jacek Sieka
43e08d08c7
drop support for generic data in coredb (#2792)
All actual access to CoreDB is typed (account or storage) - it's
unlikely ethereum will grow another trie structure in the near future.
2024-10-28 17:56:43 +01:00
Jacek Sieka
d828dead2d
Use stateRoot/storageRoot more consistently (#2791)
* prefer the spec-derived name where possible
* don't pass stateRoot to LedgerRef and friends (it doesn't do anything)
* add deprecation warning in graphql - it needs updating to use
forkedchain instead
2024-10-27 19:56:28 +01:00
Jacek Sieka
188d689d9d
Speed up initial MPT root computation after import (#2788)
When `nimbus import` runs, we end up with a database without MPT roots
leading to long startup times the first time one is needed.

Computing the state root is slow because the on-disk order based on
VertexID sorting does not match the trie traversal order and therefore
makes lookups inefficent.

Here we introduce a helper that speeds up this computation by traversing
the trie in on-disk order and computing the trie hashes bottom up
instead - even though this leads to some redundant reads of nodes that
we cannot yet compute, it's still a net win as leaves and "bottom"
branches make up the majority of the database.

This PR also addresses a few other sources of inefficiency largely due
to the separation of AriKey and AriVtx into their own column families.

Each column family is its own LSM tree that produces hundreds of SST
filtes - with a limit of 512 open files, rocksdb must keep closing and
opening files which leads to expensive metadata reads during random
access.

When rocksdb makes a lookup, it has to read several layers of files for
each lookup. Ribbon filters to skip over files that don't have the
requested data but when these filters are not in memory, reading them is
slow - this happens in two cases: when opening a file and when the
filter has been evicted from the LRU cache. Addressing the open file
limit solves one source of inefficiency, but we must also increase the
block cache size to deal with this problem.

* rocksdb.max_open_files increased to 2048
* per-file size limits increased so that fewer files are created
* WAL size increased to avoid partial flushes which lead to small files
* rocksdb block cache increased

All these increases of course lead to increased memory usage, but at
least performance is acceptable - in the future, we'll need to explore
options such as joining AriVtx and AriKey and/or reducing the row count
(by grouping branch layers under a single vertexid).

With this PR, the mainnet state root can be computed in ~8 hours (down
from 2-3 days) - not great, but still better.

Further, we write all keys to the database, also those that are less
than 32 bytes - because the mpt path is part of the input, it is very
rare that we actually hit a key like this (about 200k such entries on
mainnet), so the code complexity is not worth the benefit really, in the
current database layout / design.
2024-10-27 11:08:37 +00:00
tersec
d53989cc2c
fix some XDeclaredButNotUsed hints (#2784) 2024-10-26 05:10:06 +00:00
andri lim
67088540cf
Fix leftover eth types changes warnings (#2766) 2024-10-22 13:42:16 +07:00
Jacek Sieka
503dcd40c4
aristo: remove replicate (#2758)
Not used, not tested, mostly obsolete due to how key table has become a
cache
2024-10-20 17:25:12 +02:00
andri lim
1126c7700d
Bump nim-eth and nimbus-eth2 (#2741)
* Bump nim-eth and nimbus-eth2

* Fix ambiguous identifier
2024-10-16 13:51:38 +07:00
Chirag Parmar
2838191c4f
replace deprecated types (#2704)
* partial commit

* fixes

* remove converters too

* revert changes on nimbus_verified_proxy

* revert changes in converter

* revert changes(re-xport) in rpc_types

* update copyright year

* replace types in other binaries

* chain config bug

* fix rebase conflict imcomplete buffer

* fix more rebase buffers

* remove ditto types and converters

* fix the tests

* update copyright year
2024-10-16 08:34:12 +07:00
Jacek Sieka
11646ad3c4
Ordered trie (#2712)
Speed up trie computations and remove redundant ways of performing this
operation.

Co-authored-by: jangko <jangko128@gmail.com>
2024-10-09 09:44:15 +02:00
Jordan Hrycaj
d6eb8c36f5
Beacon sync align internal names and docu update (#2690)
* Rename `base` -> `coupler`, `B` -> `C`

why:
  Glossary: The jargon `base` is used for the `base state` block number
  which can be smaller than what is now the `coupler`.

* Rename `global state` -> `base`, `T` -> `B`

why:
  See glossary

* Rename `final` -> `end`, `F` -> `E`

why:
  See glossary. Previously, `final` denoted some finalised block but not
  `the finalised` block from the glossary (which is maximal.)

* Properly name finalised block as such, rename `Z` -> `F`

why:
  See glossary

* Rename `least` -> `dangling`, `L` -> `D`

* Metrics update (variables not covered yet)

* Docu update and corrections

* Logger updates

* Remove obsolete `skeleton*Key` kvt columns from `storage_types` module
2024-10-03 20:19:11 +00:00
Jordan Hrycaj
05483d89bd
Rename flare as beacon (#2680)
* Remove `--sync-mode` option from nimbus config

why:
  Currently there is only one sync mode available.

* Rename `flare` -> `beacon`, but not base module folder and nim source

why:
  The name `flare` was used do designate an alternative `beacon` mode that.

  Leaving the base folder and source as-is for a moment, makes it easier
  to read change diffs.

* Rename `flare` base module folder and nim source: `flare` -> `beacon`
2024-10-02 11:31:33 +00:00
Jordan Hrycaj
5b6ccddaa0
Db folder sources and related remove compiler warnings (#2673)
* Aristo: Rename `Hash256` -> `Hash32`

* CoreDb: Rename `Hash256` -> `Hash32`

* Ledger: Rename `Hash256` -> `Hash32`

* StorageTypes: Rename `Hash256` -> `Hash32`

* Aristo: Rename `Blob` -> `seq[byte]`, `keccakHash` -> `keccak256`

* Kvt: Rename `Blob` -> `seq[byte]`

* CoreDb: Rename `Blob` -> `seq[byte]`, `keccakHash` -> `keccak256`

* Ledger: Rename `Blob` -> `seq[byte]`, `keccakHash` -> `keccak256`

* CoreDb: Rename `BlockHeader` -> `Header`, `BlockNonce` -> `Bytes8`

* Misc: Rename `StorageKey` -> `Bytes32`

* Tracer: `Hash256` -> `Hash32`, `BlockHeader` -> `Header`, etc.

* Fix copyright header
2024-10-01 21:03:10 +00:00
Jacek Sieka
c210885b73
eth: bump to new types (#2660)
This is a minimal set of changes to make things work with the new types
in nim-eth - this is the minimal PR that merely resolves
incompatibilities while the full change set would include more cleanup
and migration.
2024-09-29 14:37:09 +02:00
Jacek Sieka
f3e3c6bbe0
init style for Hash256 (#2661)
* init style for Hash256

https://github.com/status-im/nim-eth/pull/733 updates `Hash256` to
become an array instead of an object - unfortunately, nim does not allow
constructing arrays with `name()`, so this PR changes it to `default`
which works with both.

* lint
2024-09-26 13:24:36 +02:00
Jacek Sieka
513f11f911
bumps (#2652)
eth/stew/unittest2 in preparation for eth refactoring
2024-09-24 13:19:09 +02:00
Jacek Sieka
7a15aa2a3a
clean up vertex delete (#2644)
avoid allocating and updating the trie twice when the branch is fully
removed
2024-09-20 10:31:29 +02:00
Jacek Sieka
b4b4d16729
speed up key computation (#2642)
* batch database key writes during `computeKey` calls
* log progress when there are many keys to update
* avoid evicting the vertex cache when traversing the trie for key
computation purposes
* avoid storing trivial leaf hashes that directly can be loaded from the
vertex
2024-09-20 07:43:53 +02:00
Jacek Sieka
2fe8cc4551
leaf cache fixes (#2637)
* Add missing leaf cache update when a leaf turns to a branch with two
leaves (on merge) and vice versa (on delete) - this could lead to stale
leaves being returned from the cache causing validation failures - it
didn't happen because the leaf caches were not being used efficiently :)
* Replace `seq` with `ArrayBuf` in `Hike` allowing it to become
allocation-free - this PR also works around an inefficiency in nim in
returning large types via a `var` parameter
* Use the leaf cache instead of `getVtxRc` to fetch recent leaves - this
makes the vertex cache more efficient at caching branches because fewer
leaf requests pass through it.
2024-09-19 10:39:06 +02:00
Jacek Sieka
5cd0297462
fix missed cache opportunity (#2628)
The storage leaf cache was being circumvented when actually fetching
leaves and was instead only being filled with items :/

Also avoids an expensive copy when fetching account data (broadly,
variant objects are comparatively expensive to copy and fetching
accounts is a hotspot)
2024-09-14 09:47:32 +02:00
Jacek Sieka
adb8d64377
simplify VertexRef (#2626)
* move pfx out of variant which avoids pointless field type panic checks
and copies on access
* make `VertexRef` a non-inheritable object which reduces its memory
footprint and simplifies its use - it's also unclear from a semantic
point of view why inheritance makes sense for storing keys
2024-09-13 18:55:17 +02:00
Jacek Sieka
5c1e2e7d3b
Migrate keyed_queue to minilru (#2608)
Compared to `keyed_queue`, `minilru` uses significantly less memory, in
particular for the 32-byte hash keys where `kq` stores several copies of
the key redundantly.
2024-09-13 15:47:50 +02:00
web3-developer
e8a9cfe555
Re-enable eth_getProof implementation (#2599)
* Re-enable eth_getProof implementation.

* Update to use latest Aristo proof changes.

* Refactor and cleanup.
2024-09-12 09:06:31 +08:00
Jordan Hrycaj
c6674311eb
Fringe case portal proof for existing account without storage tree (#2613)
detail:
  For practical reasons, ifsuch an account is asked for a slot, an empty
  proof list is returned. It is up to the user to provide an account
  proof that shows that there is no storage tree.
2024-09-11 20:27:42 +00:00
Jordan Hrycaj
75808bc03b
Add portal proof functionality for non-existing keys/paths (#2610) 2024-09-11 09:39:45 +00:00
tersec
1b173d420d
small cleanups (#2598)
* small cleanups

* stop hiding ConvFromXtoItselfNotNeeded hints

* lowmem optimization flag is no-op
2024-09-10 05:24:45 +00:00
Jordan Hrycaj
71e466d173
Block header download beacon to era1 (#2601)
* Block header download starting at Beacon down to Era1

details:
  The header download implementation is intended to be completed to a
  full sync facility.

  Downloaded block headers are stored in a `CoreDb` table. Later on they
  should be fetched, complemented by a block body, executed/imported,
  and deleted from the table.

  The Era1 repository may be partial or missing. Era1 headers are neither
  downloaded nor stored on the `CoreDb` table.

  Headers are downloaded top down (largest block number first) using the
  hash of the block header by one peer. Other peers fetch headers
  opportunistically using block numbers

  Observed download times for 14m `MainNet` headers varies between 30min
  and 1h (Era1 size truncated to 66m blocks.), full download 52min
  (anectdotal.) The number of peers downloading concurrently is crucial
  here.

* Activate `flare` by command line option

* Fix copyright year
2024-09-09 09:12:56 +00:00
Jacek Sieka
0a8986bc77
avoid copying data when merging save points (#2584)
Saving both memory and processing, we can move entries from one
savepoint to another, specially when the target is empty as it often is
during transaction processing
2024-09-06 22:45:29 +02:00
Jacek Sieka
d39c589ec3
lru cache updates (#2590)
* replace rocksdb row cache with larger rdb lru caches - these serve the
same purpose but are more efficient because they skips serialization,
locking and rocksdb layering
* don't append fresh items to cache - this has the effect of evicting
the existing items and replacing them with low-value entries that might
never be read - during write-heavy periods of processing, the
newly-added entries were evicted during the store loop
* allow tuning rdb lru size at runtime
* add (hidden) option to print lru stats at exit (replacing the
compile-time flag)

pre:
```
INF 2024-09-03 15:07:01.136+02:00 Imported blocks
blockNumber=20012001 blocks=12000 importedSlot=9216851 txs=1837042
mgas=181911.265 bps=11.675 tps=1870.397 mgps=176.819 avgBps=10.288
avgTps=1574.889 avgMGps=155.952 elapsed=19m26s458ms
```

post:
```
INF 2024-09-03 13:54:26.730+02:00 Imported blocks
blockNumber=20012001 blocks=12000 importedSlot=9216851 txs=1837042
mgas=181911.265 bps=11.637 tps=1864.384 mgps=176.250 avgBps=11.202
avgTps=1714.920 avgMGps=169.818 elapsed=17m51s211ms
```

9%:ish import perf improvement on similar mem usage :)
2024-09-05 11:18:32 +02:00
Jordan Hrycaj
3c6400673d
Coredb fix kvt save only fringe condition (#2592)
* Cosmetics, spelling, etc.

* Aristo: make sure that a save cycle always commits even when empty

why:
  If `Kvt` is tied to the `Aristo` DB save cycle, then this save cycle
  must also be committed if there is no data to save for `Aristo`.

  Otherwise this will lead to excessive core memory use with some fringe
  condition where Eth headers (or blocks) are downloaded while syncing
  and not really stored on disk.

* CoreDb: Correct persistent save mode

why:
  Saving `Kvt` first is seen as a harbinger (or canary) for `Aristo` as
  both run in sync. If `Kvt` succeeds saving first, so must be `Aristo`
  next. Other than this is a defect.
2024-09-04 13:48:38 +00:00
andri lim
4d9e288340
Wiring ForkedChainRef to other components (#2423)
* Wiring ForkedChainRef to other components

- Disable majority of hive simulators
- Only enable pyspec_sim for the moment
- The pyspec_sim is using a smaller RPC service wired to ForkedChainRef
- The RPC service will gradually grow

* Addressing PR review

* Fix test_beacon/setup_env

* Enable consensus_sim (#2441)

* Enable consensus_sim

* Remove isFile check

* Enable Engine API jwt auth tests and exchange cap tests

* Enable engine api in build_sim.sh

* Wire ForkedChainRef to Engine API newPayload

* Wire Engine API getBodies to ForkedChainRef

* Wire Engine API api_forkchoice to ForkedChainRef

* Wire more RPC methods to ForkedChainRef

* Implement eth_syncing

* Implement eth_call and eth_getlogs

* TxPool: simplify smartHead

* Fix smartHead usage

* Fix txpool headDiff

* Remove hasBlockHeader and use headerExists

* Addressing review
2024-09-04 09:54:54 +00:00
Jacek Sieka
35cc78c86d
add metrics for rdb lru cache (#2586)
This is a first step towards measuring the efficiency of the LRU caches
over time - metrics can be collected during import or when running
regulary.

Since `nim-metrics` carries some overhead for its default way of
reporting metrics, this PR implements a custom collector over atomic
counters, given that this is one of the hottest spots in the block
processing pipeline.

Using a compile-time flag, the same metrics can be printed on exit which
is useful when comparing different strategies for caching - here's a
recent run over blocks 16000001-1616384 - this is a good candidate to
expose in a better way in the future, maybe:

```
   state    vtype       miss        hit      total hitrate
 Account     Leaf    4909417    4466215    9375632  47.64%
 Account   Branch   20742574   72015123   92757697  77.64%
   World     Leaf     940483    1140946    2081429  54.82%
   World   Branch    8224151  131496580  139720731  94.11%
     all      all   34816625  209118864  243935489  85.73%
```
2024-09-02 17:34:10 +02:00