mirror of https://github.com/status-im/nimbus-eth1.git synced 2025-01-28 04:55:33 +00:00

History

Each branch node may have up to 16 sub-items - currently, these are
given VertexID based when they are first needed leading to a
mostly-random order of vertexid for each subitem.

Here, we pre-allocate all 16 vertex ids such that when a branch subitem
is filled, it already has a vertexid waiting for it. This brings several
important benefits:

* subitems are sorted and "close" in their id sequencing - this means
that when rocksdb stores them, they are likely to end up in the same
data block thus improving read efficiency
* because the ids are consequtive, we can store just the starting id and
a bitmap representing which subitems are in use - this reduces disk
space usage for branches allowing more of them fit into a single disk
read, further improving disk read and caching performance - disk usage
at block 18M is down from 84 to 78gb!
* the in-memory footprint of VertexRef reduced allowing more instances
to fit into caches and less memory to be used overall.

Because of the increased locality of reference, it turns out that we no
longer need to iterate over the entire database to efficiently generate
the hash key database because the normal computation is now faster -
this significantly benefits "live" chain processing as well where each
dirtied key must be accompanied by a read of all branch subitems next to
it - most of the performance benefit in this branch comes from this
locality-of-reference improvement.

On a sample resync, there's already ~20% improvement with later blocks
seeing increasing benefit (because the trie is deeper in later blocks
leading to more benefit from branch read perf improvements)

```
blocks: 18729664, baseline: 190h43m49s, contender: 153h59m0s
Time (total): -36h44m48s, -19.27%
```

Note: clients need to be resynced as the PR changes the on-disk format

R.I.P. little bloom filter - your life in the repo was short but
valuable

2024-12-04 11:42:04 +01:00

aristo

Pre-allocate vids for branches (#2882 )

2024-12-04 11:42:04 +01:00

core_db

Simplify state root api (#2864 )

2024-11-22 14:15:35 +01:00

era1_db

Consolidate block type for block processing (#2325 )

2024-06-09 16:32:20 +02:00

kvt

replace deprecated types (#2704 )

2024-10-16 08:34:12 +07:00

.gitignore

Database architecture diagram & module overview (#2065 )

2024-03-08 18:42:46 +00:00

access_list.nim

Bump nim-eth and nimbus-eth2 (#2741 )

2024-10-16 13:51:38 +07:00

aristo.nim

Remove RawData from possible leaf payload types (#2794 )

2024-11-02 10:29:16 +01:00

core_db.nim

Cleanup unused raises in evm/state and other obsolete informations (#2243 )

2024-05-30 09:03:54 +00:00

era1_db.nim

era: simplify, instant startup (#2218 )

2024-05-26 08:24:13 +02:00

kvstore_rocksdb.nim

Bump RocksDb version and enable autoClose on opt types to prevent memory leaks (#2427 )

2024-07-02 13:44:09 +08:00

kvt.nim

Core db reorg (#2444 )

2024-07-03 15:50:27 +00:00

ledger.nim

Speed up evm stack (#2881 )

2024-11-30 10:07:10 +01:00

opts.nim

Store keys together with node data (#2849 )

2024-11-20 09:56:27 +01:00

README.md

Aristo resume off line syncing on pre loaded database (#2203 )

2024-05-22 13:41:14 +00:00

storage_types.nim

Feature: Prevent loading an existing data directory for the wrong network (#2825 )

2024-11-06 09:01:42 +07:00

transient_storage.nim

Bump nim-eth and nimbus-eth2 (#2741 )

2024-10-16 13:51:38 +07:00

README.md

Nimbus-eth1 -- Ethereum execution layer database architecture

Last update: 2024-03-08

The following diagram gives a simplified view how components relate with regards to the data storage management.

An arrow between components a and b (as in a->b) is meant to be read as a relies directly on b, or a is served by b. For classifying the functional type of a component in the below diagram, the abstraction type is enclosed in brackets after the name of a component.

(application)
This is a group of software modules at the top level of the hierarchy. In the diagram below, the EVM is used as an example. Another application might be the RPC service.
(API)
The API classification is used for a thin software layer hiding a set of different drivers where only one driver is active for the same API instance. It servers as sort of a logical switch.
(concentrator)
The concentrator merges several sub-module instances and provides their collected services as a single unified instance. There is not much additional logic implemented besides what the sub-modules provide.
(driver)
The driver instances are sort of the lower layer workhorses. The implement logic for solving a particular problem, providing a typically well defined service, etc.

(engine)
This is a bottom level driver in the below diagram.

                       +-------------------+
                       | EVM (application) |
                       +-------------------+
                               |     |
                               v     |
   +-----------------------------+   |
   |   State DB (concentrator)   |   |
   +-----------------------------+   |
       |                       |     |
       v                       |     |
   +------------------------+  |     |
   |      Ledger (API)      |  |     |
   +------------------------+  |     |
       |              |        |     |
       v              |        |     |
   +--------------+   |        |     |
   | ledger cache |   |        |     |
   |   (driver)   |   |        |     |
   +--------------+   |        |     |
       |              v        |     |
       |   +----------------+  |     |
       |   |   Common       |  |     |
       |   | (concentrator) |  |     |
       |   +----------------+  |     |
       |             |         |     |
       v             v         v     v
   +---------------------------------------+
   |               Core DB (API)           |
   +---------------------------------------+
                     |
                     v
   +---------------------------------------+
   |    Aristo DB (driver,concentrator)    |
   +---------------------------------------+
             |             |
             v             v
   +--------------+  +---------------------+
   | Kvt (driver) |  | Aristo MPT (driver) |
   +--------------+  +---------------------+
             |             |
             v             v
   +---------------------------------------+
   |         Rocks DB (engine)             |
   +---------------------------------------+

Here is a list of path references for the components with some explanation. The sources for the components are not always complete but indicate the main locations where to start looking at.

Aristo DB (driver)
- Sources:
  ./nimbus/db/core_db/backend/aristo_*
- Synopsis:
  Combines both, the Kvt and the Aristo driver sub-modules providing an interface similar to the legacy DB (concentrator) module.
Aristo MPT (driver)
- Sources:
  ./nimbus/db/aristo*
- Synopsis:
  Revamped implementation of a hexary Merkle Patricia Tree.
Common (concentrator)
- Sources:
  ./nimbus/common*
- Synopsis:
  Collected information for running block chain execution layer applications.
Core DB (API)
- Sources:
  ./nimbus/db/core_db*
- Synopsis:
  Database abstraction layer. Unless for legacy applications, there should be no need to reach out to the layers below.
EVM (application)
- Sources:
  ./nimbus/core/executor/* ./nimbus/evm/*
- Synopsis:
  An implementation of the Ethereum Virtual Machine.
Hexary DB (driver)
- Sources:
  ./vendor/nim-eth/eth/trie/hexary.nim
- Synopsis:
  Implementation of an MPT, see compact Merkle Patricia Tree.
Key-value table (driver)
- Sources:
  ./vendor/nim-eth/eth/trie/db.nim
- Synopsis:
  Key value table interface to be used directly for key-value storage or by the Hexary DB (driver) module for storage. Some magic is applied in order to treat hexary data accordingly (based on key length.)
Kvt (driver)
- Sources:
  ./nimbus/db/kvt*
- Synopsis:
  Key value table interface for the Aristo DB (driver) module. Contrary to the Key-value table (driver), it is not used for MPT data.
Ledger (API)
- Sources:
  ./nimbus/db/ledger*
- Synopsis:
  Abstraction layer for either the legacy cache (driver) accounts cache (which works with the legacy DB (driver) backend only) or the ledger cache (driver) re-write which is supposed to work with all Core DB (API) backends.
ledger cache (driver)
- Sources:
  ./nimbus/db/ledger/accounts_ledger.nim
  ./nimbus/db/ledger/backend/accounts_ledger*
  ./nimbus/db/ledger/distinct_ledgers.nim
- Synopsis:
  Management of accounts and storage data. This is a re-write of the legacy DB (driver) which is supposed to work with all Core DB (API) backends.
legacy DB (concentrator)
- Sources:
  ./nimbus/db/core_db/backend/legacy_*
- Synopsis:
  Legacy database abstraction. It mostly forwards requests directly to the to the Key-value table (driver) and/or the hexary DB (driver).
Rocks DB (engine)
- Sources:
  ./vendor/nim-rocksdb/*
- Synopsis:
  Persistent storage engine.
State DB (concentrator)
- Sources:
  ./nimbus/evm/state.nim
  ./nimbus/evm/types.nim
- Synopsis:
  Integrated collection of modules and methods relevant for the EVM.