13 Commits

Author SHA1 Message Date
Jacek Sieka
2961905a95
aristo: fork support via layers/txframes (#2960)
* aristo: fork support via layers/txframes

This change reorganises how the database is accessed: instead holding a
"current frame" in the database object, a dag of frames is created based
on the "base frame" held in `AristoDbRef` and all database access
happens through this frame, which can be thought of as a consistent
point-in-time snapshot of the database based on a particular fork of the
chain.

In the code, "frame", "transaction" and "layer" is used to denote more
or less the same thing: a dag of stacked changes backed by the on-disk
database.

Although this is not a requirement, in practice each frame holds the
change set of a single block - as such, the frame and its ancestors
leading up to the on-disk state represents the state of the database
after that block has been applied.

"committing" means merging the changes to its parent frame so that the
difference between them is lost and only the cumulative changes remain -
this facility enables frames to be combined arbitrarily wherever they
are in the dag.

In particular, it becomes possible to consolidate a set of changes near
the base of the dag and commit those to disk without having to re-do the
in-memory frames built on top of them - this is useful for "flattening"
a set of changes during a base update and sending those to storage
without having to perform a block replay on top.

Looking at abstractions, a side effect of this change is that the KVT
and Aristo are brought closer together by considering them to be part of
the "same" atomic transaction set - the way the code gets organised,
applying a block and saving it to the kvt happens in the same "logical"
frame - therefore, discarding the frame discards both the aristo and kvt
changes at the same time - likewise, they are persisted to disk together
- this makes reasoning about the database somewhat easier but has the
downside of increased memory usage, something that perhaps will need
addressing in the future.

Because the code reasons more strictly about frames and the state of the
persisted database, it also makes it more visible where ForkedChain
should be used and where it is still missing - in particular, frames
represent a single branch of history while forkedchain manages multiple
parallel forks - user-facing services such as the RPC should use the
latter, ie until it has been finalized, a getBlock request should
consider all forks and not just the blocks in the canonical head branch.

Another advantage of this approach is that `AristoDbRef` conceptually
becomes more simple - removing its tracking of the "current" transaction
stack simplifies reasoning about what can go wrong since this state now
has to be passed around in the form of `AristoTxRef` - as such, many of
the tests and facilities in the code that were dealing with "stack
inconsistency" are now structurally prevented from happening. The test
suite will need significant refactoring after this change.

Once this change has been merged, there are several follow-ups to do:

* there's no mechanism for keeping frames up to date as they get
committed or rolled back - TODO
* naming is confused - many names for the same thing for legacy reason
* forkedchain support is still missing in lots of code
* clean up redundant logic based on previous designs - in particular the
debug and introspection code no longer makes sense
* the way change sets are stored will probably need revisiting - because
it's a stack of changes where each frame must be interrogated to find an
on-disk value, with a base distance of 128 we'll at minimum have to
perform 128 frame lookups for *every* database interaction - regardless,
the "dag-like" nature will stay
* dispose and commit are poorly defined and perhaps redundant - in
theory, one could simply let the GC collect abandoned frames etc, though
it's likely an explicit mechanism will remain useful, so they stay for
now

More about the changes:

* `AristoDbRef` gains a `txRef` field (todo: rename) that "more or less"
corresponds to the old `balancer` field
* `AristoDbRef.stack` is gone - instead, there's a chain of
`AristoTxRef` objects that hold their respective "layer" which has the
actual changes
* No more reasoning about "top" and "stack" - instead, each
`AristoTxRef` can be a "head" that "more or less" corresponds to the old
single-history `top` notion and its stack
* `level` still represents "distance to base" - it's computed from the
parent chain instead of being stored
* one has to be careful not to use frames where forkedchain was intended
- layers are only for a single branch of history!

* fix layer vtop after rollback

* engine fix

* Fix test_txpool

* Fix test_rpc

* Fix copyright year

* fix simulator

* Fix copyright year

* Fix copyright year

* Fix tracer

* Fix infinite recursion bug

* Remove aristo and kvt empty files

* Fic copyright year

* Fix fc chain_kvt

* ForkedChain refactoring

* Fix merge master conflict

* Fix copyright year

* Reparent txFrame

* Fix test

* Fix txFrame reparent again

* Cleanup and fix test

* UpdateBase bugfix and fix test

* Fixe newPayload bug discovered by hive

* Fix engine api fcu

* Clean up call template, chain_kvt, andn txguid

* Fix copyright year

* work around base block loading issue

* Add test

* Fix updateHead bug

* Fix updateBase bug

* Change func commitBase to proc commitBase

* Touch up and fix debug mode crash

---------

Co-authored-by: jangko <jangko128@gmail.com>
2025-02-06 14:04:50 +07:00
Jacek Sieka
d45d03ce0c
reduce tx naming overload (#2952)
* if it's a db function, use `txFrame...`
* if it's not a db function, don't use `txFrame...`
2024-12-18 23:03:51 +07:00
Jacek Sieka
3d58393b4c
Offload signature checking to taskpools (#2927)
In block processing, depending on the complexity of a transaction and
hotness of caches etc, signature checking can actually make up the
majority of time needed to process a transaction (60% observed in some
randomly sampled block ranges).

Fortunately, this is a task that trivially can be offloaded to a task
pool similar to how nimbus-eth2 does it.

This PR introduces taskpools in the most simple way possible, by
performing signature checking concurrently with other TX processing,
assigning a taskpool task per TX effectively.

With this little trick, we're in gigagas land 🎉 on my laptop!

```
INF 2024-12-10 21:05:35.170+01:00 Imported blocks
blockNumber=3874817 b... mgps=1222.707 ...
```

Tests don't use the taskpool for now because it needs manual cleanup and
we don't have a good mechanism in place. Future PR:s should address this
by creating a common shutdown sequence that also closes and cleans up
other resources like the DB.

Co-authored-by: andri lim <jangko128@gmail.com>
2024-12-13 11:53:41 +07:00
Jordan Hrycaj
90dd86be9a
Fc module can update base also when on parent arc (#2911)
* Re-org internal descriptor `CanonicalDesc` as `PivotArc`

why:
  Despite its name, `CanonicalDesc` contained a cursor arc (or leg) from
  the base tree with a designated block (or Header) on its arc members
  (aka blocks.) The type is used more generally than only for s block on
  the canonical cursor.

  Also, the `PivotArc` provides some more fields for caching intermediate
  data. This simplifies managing extra arguments for some functions.

* Remove cruft

details:
  No need to find cursor arc if it is given as function argument.

* Rename prototype variables `head: PivotArc` to `pvarc`

why:
  Better reading

* Function and code massage, adjust names

details:
  Avoid the syllable `canonical` in function names that do not strictly
  apply to the canonical chain. So renaming
  * findCanonicalHead() => findCursorArc()
  * canonicalChain() => findHeader()
  * trimCanonicalChain() => trimCursorArc()

* Combine `updateBase()` function-args into single `PivotArgs` object

why:
  Will generalise action for more complex scenarios in future.

* update `calculateNewBase()` return code type => `PivotArc`

why:
  So it can directly be used as argument into `updateBase()`

* Update `calculateNewBase()` for target on parent arc

* Update unit tests
2024-12-05 13:01:57 +07:00
Jordan Hrycaj
9da3f29dff
Add desc validator to fc unit tests (#2899)
* Kludge: fix `eip4844` import in `validate`

why:
  Importing `validate` needs `blscurve` here or with the importing module.

* Separate out `FC` descriptor iinto separate file

why:
  Needed for external descriptor access (e.g. for debugging)

* Debugging toolkit for `FC`

* Verify chain descriptor after changing state
2024-12-02 17:49:53 +00:00
Advaita Saha
ac2f3a4358
serve state in rpc (#2824)
* simpler state replay logic

* add tests
2024-11-22 16:45:52 +05:30
andri lim
6b86acfb8d
Cleanup db/core_apps error handling (#2838)
* Cleanup db/core_apps error handling

* Fix persistHeader

* Fix getUncles
2024-11-07 08:24:21 +07:00
Jacek Sieka
d828dead2d
Use stateRoot/storageRoot more consistently (#2791)
* prefer the spec-derived name where possible
* don't pass stateRoot to LedgerRef and friends (it doesn't do anything)
* add deprecation warning in graphql - it needs updating to use
forkedchain instead
2024-10-27 19:56:28 +01:00
Chirag Parmar
2838191c4f
replace deprecated types (#2704)
* partial commit

* fixes

* remove converters too

* revert changes on nimbus_verified_proxy

* revert changes in converter

* revert changes(re-xport) in rpc_types

* update copyright year

* replace types in other binaries

* chain config bug

* fix rebase conflict imcomplete buffer

* fix more rebase buffers

* remove ditto types and converters

* fix the tests

* update copyright year
2024-10-16 08:34:12 +07:00
Jacek Sieka
c210885b73
eth: bump to new types (#2660)
This is a minimal set of changes to make things work with the new types
in nim-eth - this is the minimal PR that merely resolves
incompatibilities while the full change set would include more cleanup
and migration.
2024-09-29 14:37:09 +02:00
Jacek Sieka
3cefd7ed38
move db init to init (#2552)
When using the common interface, the database always (potentially) needs
init - take the opportunity to log some basic database info on startup.
2024-08-08 07:45:30 +02:00
Jordan Hrycaj
800fd77333
Core db remove legacy phrases (#2468)
* Rename `newKvt()` -> `ctx.getKvt()`

why:
  Clean up legacy shortcut. Also, the `KVT` returned is not instantiated
  but refers to the shared `KVT` that resides in a context which is a
  generalisation of an in-memory database fork. The function `ctx`
  retrieves the default context.

* Rename `newTransaction()` -> `ctx.newTransaction()`

why:
  Clean up legacy shortcut. The transaction is applied to a context as a
  generalisation of an in-memory database fork. The function `ctx`
  retrieves the default context.

* Rename `getColumn(CtGeneric)` -> `getGeneric()`

why:
  No more a list of well known sub-tries needed, a single one is enough.
  In fact, `getColumn()` did only support a single sub-tree by now.

* Reduce TODO list
2024-07-10 12:19:35 +00:00
andri lim
401537ad38
Add ForkedChainRef tests (#2430)
ForkedChainRef have become quite complex.
test_blockchain_json is not sufficient cover for edge cases
or synthetic cases.
2024-06-30 14:40:14 +07:00