The `BlockHeader` structure in `nim-eth` was updated with support for
EIP-4844 (danksharding). To enable the `nim-eth` bump, the ingress of
`BlockHeader` structures has been hardened to reject headers that have
the new `excessDataGas` field until proper EIP4844 support exists.
https://github.com/status-im/nim-eth/pull/570
* Rename and update dismantle => hexaryEnvelopeDecompose()
why:
+ As for naming, a positive connotation is prefered
+ The unit tests were really insufficient
+ The function result was wrong on a few boundry conditions
detail:
+ Extracted the function from `hexary_paths.nim` and re-implemented
it together with other envelope functions => `hexary_envelope.nim`
+ Re-wrote docu for `hexaryEnvelopeDecompose()`
* Relaxed right condition for `hexaryEnvelopeDecompose()` range argument
why;
Previously, the right point of the argument interval had to be a path
to an allocated leaf node. While this is typically a given for accounts,
it is easier to require an arbitrary range of paths (or keys) with
the requirement of a `boundary proof` for left and right (i.e. enough
nodes in the database to find the end points.)
also:
Bug fixes for related functions (typos, missing conditions etc.)
* Add missing unit tests include file
* Add quick hexary trie inspector, called `dismantle()`
why:
+ Full hexary trie perusal is slow if running down leaf nodes
+ For known range of leaf nodes, work out the UInt126-complement of
partial sub-trie paths (for existing nodes). The result should cover
no (or only a few) sub-tries with leaf nodes.
* Extract common healing methods => `sub_tries_helper.nim`
details:
Also apply quick hexary trie inspection tool `dismantle()`
Replace `inspectAccountsTrie()` wrapper by `hexaryInspectTrie()`
* Re-arrange task dispatching in main peer worker
* Refactor accounts and storage slots downloaders
* Rename `HexaryDbError` => `HexaryError`
The `BlockHeader` structure in `nim-eth` was updated with support for
EIP-4895 (withdrawals). To enable the `nim-eth` bump, the ingress of
`BlockHeader` structures has been hardened to reject headers that have
the new `withdrawalsRoot` field until proper withdrawals support exists.
https://github.com/status-im/nim-eth/pull/562
* Stop negotiating pivot if peer repeatedly replies w/usesless answers
why:
There is some fringe condition where a peer replies with legit but
useless empty headers repetely. This goes on until somebody stops.
We stop now.
* Rename `missingNodes` => `sickSubTries`
why:
These (probably missing) nodes represent in reality fully or partially
missing sub-tries. The top nodes may even exist, e.g. as a shallow
sub-trie.
also:
Keep track of account healing on/of by bool variable `accountsHealing`
controlled in `pivot_helper.execSnapSyncAction()`
* Add `nimbus` option argument `snapCtx` for starting snap recovery (if any)
also:
+ Trigger the recovery (or similar) process from inside the global peer
worker initialisation `worker.setup()` and not by the `snap.start()`
function.
+ Have `runPool()` returned a `bool` code to indicate early stop to
scheduler.
* Can import partial snap sync checkpoint at start
details:
+ Modified what is stored with the checkpoint in `snapdb_pivot.nim`
+ Will be loaded within `runDaemon()` if activated
* Forgot to import total coverage range
why:
Only the top (or latest) pivot needs coverage but the total coverage
is the list of all ranges for all pivots -- simply forgotten.
* Piecemeal trie inspection
details:
Trie inspection will stop after maximum number of nodes visited.
The inspection can be resumed using the returned state from the
last session.
why:
This feature allows for task switch between `piecemeal` sessions.
* Extract pivot helper code from `worker.nim` => `pivot_helper.nim`
* Accounts import will now return dangling paths from `proof` nodes
why:
With proper bookkeeping, this can be used to start healing without
analysing the the probably full trie.
* Update `unprocessed` account range handling
why:
More generally, the API of a pairs of unprocessed intervals favours
the first set and not before that is exhausted the second set comes
into play.
This was unfortunately implemented which caused the ranges to be
unnecessarily fractioned. Now the number of range interval typically
remains in the lower single digit numbers.
* Save sync state after end of downloading some accounts
details:
restore/resume to be implemented later
* Update log ticker, using time interval rather than ticker count
why:
Counting and logging ticker occurrences is inherently imprecise. So
time intervals are used.
* Use separate storage tables for snap sync data
* Left boundary proof update
why:
Was not properly implemented, yet.
* Capture pivot in peer worker (aka buddy) tasks
why:
The pivot environment is linked to the `buddy` descriptor. While
there is a task switch, the pivot may change. So it is passed on as
function argument `env` rather than retrieved from the buddy at
the start of a sub-function.
* Split queues `fetchStorage` into `fetchStorageFull` and `fetchStoragePart`
* Remove obsolete account range returned from `GetAccountRange` message
why:
Handler returned the wrong right value of the range. This range was
for convenience, only.
* Prioritise storage slots if the queue becomes large
why:
Currently, accounts processing is prioritised up until all accounts
are downloaded. The new prioritisation has two thresholds for
+ start processing storage slots with a new worker
+ stop account processing and switch to storage processing
also:
Provide api for `SnapTodoRanges` pair of range sets in `worker_desc.nim`
* Generalise left boundary proof for accounts or storage slots.
why:
Detailed explanation how this works is documented with
`snapdb_accounts.importAccounts()`.
Instead of enforcing a left boundary proof (which is still the default),
the importer functions return a list of `holes` (aka node paths) found in
the argument ranges of leaf nodes. This in turn is used by the book
keeping software for data download.
* Forgot to pass on variable in function wrapper
also:
+ Start healing not before 99% accounts covered (previously 95%)
+ Logging updated/prettified
* Added basic async capabilities for vm2.
This is a whole new Git branch, not the same one as last time
(https://github.com/status-im/nimbus-eth1/pull/1250) - there wasn't
much worth salvaging. Main differences:
I didn't do the "each opcode has to specify an async handler" junk
that I put in last time. Instead, in oph_memory.nim you can see
sloadOp calling asyncChainTo and passing in an async operation.
That async operation is then run by the execCallOrCreate (or
asyncExecCallOrCreate) code in interpreter_dispatch.nim.
In the test code, the (previously existing) macro called "assembler"
now allows you to add a section called "initialStorage", specifying
fake data to be used by the EVM computation run by that test. (In
the long run we'll obviously want to write tests that for-real use
the JSON-RPC API to asynchronously fetch data; for now, this was
just an expedient way to write a basic unit test that exercises the
async-EVM code pathway.)
There's also a new macro called "concurrentAssemblers" that allows
you to write a test that runs multiple assemblers concurrently (and
then waits for them all to finish). There's one example test using
this, in test_op_memory_lazy.nim, though you can't actually see it
doing so unless you uncomment some echo statements in
async_operations.nim (in which case you can see the two concurrently
running EVM computations each printing out what they're doing, and
you'll see that they interleave).
A question: is it possible to make EVMC work asynchronously? (For
now, this code compiles and "make test" passes even if ENABLE_EVMC
is turned on, but it doesn't actually work asynchronously, it just
falls back on doing the usual synchronous EVMC thing. See
FIXME-asyncAndEvmc.)
* Moved the AsyncOperationFactory to the BaseVMState object.
* Made the AsyncOperationFactory into a table of fn pointers.
Also ditched the plain-data Vm2AsyncOperation type; it wasn't
really serving much purpose. Instead, the pendingAsyncOperation
field directly contains the Future.
* Removed the hasStorage idea.
It's not the right solution to the "how do we know whether we
still need to fetch the storage value or not?" problem. I
haven't implemented the right solution yet, but at least
we're better off not putting in a wrong one.
* Added/modified/removed some comments.
(Based on feedback on the PR.)
* Removed the waitFor from execCallOrCreate.
There was some back-and-forth in the PR regarding whether nested
waitFor calls are acceptable:
https://github.com/status-im/nimbus-eth1/pull/1260#discussion_r998587449
The eventual decision was to just change the waitFor to a doAssert
(since we probably won't want this extra functionality when running
synchronously anyway) to make sure that the Future is already
finished.
* Re-arrange fetching storage slots in batch module
why;
Previously, fetching partial slot ranges first has a chance of
terminating the worker peer 9due to network error) while there were
many inheritable storage slots on the queue.
Now, inheritance is checked first, then full slot ranges and finally
partial ranges.
* Update logging
* Bundled node information for healing into single object `NodeSpecs`
why:
Previously, partial paths and node keys were kept in separate variables.
This approach was error prone due to copying/reassembling function
argument objects.
As all partial paths, keys, and node data types are more or less handled
as `Blob`s over the network (using Eth/6x, or Snap/1) it makes sense to
hold these `Blob`s as named field in a single object (even if not all
fields are active for the current purpose.)
* For good housekeeping, using `NodeKey` type only for account keys
why:
previously, a mixture of `NodeKey` and `Hash256` was used. Now, only
state or storage root keys use the `Hash256` type.
* Always accept latest pivot (and not a slightly older one)
why;
For testing it was tried to use a slightly older pivot state root than
available. Some anecdotal tests seemed to suggest an advantage so that
more peers are willing to serve on that older pivot. But this could not
be confirmed in subsequent tests (still anecdotal, though.)
As a side note, the distance of the latest pivot to its predecessor is
at least 128 (or whatever the constant `minPivotBlockDistance` is
assigned to.)
* Reshuffle name components for some file and function names
why:
Clarifies purpose:
"storages" becomes: "storage slots"
"store" becomes: "range fetch"
* Stash away currently unused modules in sub-folder named "notused"
* Re-model persistent database access
why:
Storage slots healing just run on the wrong sub-trie (i.e. the wrong
key mapping). So get/put and bulk functions now use the definitions
in `snapdb_desc` (earlier there were some shortcuts for `get()`.)
* Fixes: missing return code, typo, redundant imports etc.
* Remove obsolete debugging directives from `worker_desc` module
* Correct failing unit tests for storage slots trie inspection
why:
Some pathological cases for the extended tests do not produce any
hexary trie data. This is rightly detected by the trie inspection
and the result checks needed to adjusted.
* For snap sync, publish `EthWireRef` in sync descriptor
why:
currently used for noise control
* Detect and reuse existing storage slots
* Provide healing module for storage slots
* Update statistic ticker (adding range factor for unprocessed storage)
* Complete mere function for work item ranges
why:
Merging interval into existing partial item was missing
* Show av storage queue lengths in ticker
detail;
Previous attempt shows average completeness which did not tell much
* Correct the meaning of the storage counter (per pivot)
detail:
Is the # accounts that have a storage saved
* Rename `LeafRange` => `NodeTagRange`
* Replacing storage slot partition point by interval
why:
The partition point only allows to describe slots `[point,high(Uint256)]`
for fetching interval slot ranges. This has been generalised for any
interval.
* Replacing `SnapAccountRanges` by `SnapTrieRangeBatch`
why:
Generalised healing status for accounts, and later for storage slots.
* Improve accounts healing loop
* Split `snap_db` into accounts and storage modules
why:
It is cleaner to have separate session descriptors for accounts and
storage slots (based on a common base descriptor.)
Also, persistent storage handling might be changed in future which
requires the storage slot implementation disentangled from the accounts
handling.
* Re-model worker queues for storage slots
why:
There is a dynamic list of storage sub-tries, each one has to be
treated similar to the accounts database. This applied to slot
interval downloads as well as to healing
* Compress some return value report lists for snapdb methods
why:
No need to report all handling details for work items that are filteres
out and discarded, anyway.
* Remove inner loop frame from healing function
why:
The healing function runs as a loop body already.
* Split fetch accounts into sub-modules
details:
There will be separated modules for accounts snapshot, storage snapshot,
and healing for either.
* Allow to rebase pivot before negotiated header
why:
Peers seem to have not too many snapshots available. By setting back the
pivot block header slightly, the chances might be higher to find more
peers to serve this pivot. Experiment on mainnet showed that setting back
too much (tested with 1024), the chances to find matching snapshot peers
seem to decrease.
* Add accounts healing
* Update variable/field naming in `worker_desc` for readability
* Handle leaf nodes in accounts healing
why:
There is no need to fetch accounts when they had been added by the
healing process. On the flip side, these accounts must be checked for
storage data and the batch queue updated, accordingly.
* Reorganising accounts hash ranges batch queue
why:
The aim is to formally cover as many accounts as possible for different
pivot state root environments. Formerly, this was tried by starting the
accounts batch queue at a random value for each pivot (and wrapping
around.)
Now, each pivot environment starts with an interval set mutually
disjunct from any interval set retrieved with other pivot state roots.
also:
Stop fishing for more pivots in `worker` if 100% download is reached
* Reorganise/update accounts healing
why:
Error handling was wrong and the (math. complexity of) whole process
could be better managed.
details:
Much of the algorithm is now documented at the top of the file
`heal_accounts.nim`
* Added inspect module
why:
Find dangling references for trie healing support.
details:
+ This patch set provides only the inspect module and some unit tests.
+ There are also extensive unit tests which need bulk data from the
`nimbus-eth1-blob` module.
* Alternative pivot finder
why:
Attempt to be faster on start up. Also tying to decouple pivot finder
somehow by providing different mechanisms (this one runs in `single`
mode.)
* Use inspect module for healing
details:
+ After some progress with account and storage data, the inspect facility
is used to find dangling links in the database to be filled nose-wise.
+ This is a crude attempt to cobble together functional elements. The
set up needs to be honed.
* fix scheduler to avoid starting dead peers
why:
Some peers drop out while in `sleepAsync()`. So extra `if` clauses
make sure that this event is detected early.
* Bug fixes causing crashes
details:
+ prettify.toPC():
int/intToStr() numeric range over/underflow
+ hexary_inspect.hexaryInspectPath():
take care of half initialised step with branch but missing index into
branch array
* improve handling of dropped peers in alternaive pivot finder
why:
Strange things may happen while querying data from the network.
Additional checks make sure that the state of other peers is updated
immediately.
* Update trace messages
* reorganise snap fetch & store schedule
* Re-implemented `hexaryFollow()` in a more general fashion
details:
+ New name for re-implemented `hexaryFollow()` is `hexaryPath()`
+ Renamed `rTreeFollow()` as `hexaryPath()`
why:
Returning similarly organised structures, the results of the
`hexaryPath()` functions become comparable when running over
the persistent and the in-memory databases.
* Added traversal functionality for persistent ChainDB
* Using `Account` values as re-packed Blob
* Repack samples as compressed data files
* Produce test data
details:
+ Can force pivot state root switch after minimal coverage.
+ For emulating certain network behaviour, downloading accounts stops for
a particular pivot state root if 30% (some static number) coverage is
reached. Following accounts are downloaded for a later pivot state root.
* Bump nim-stew
why:
Need fixed interval set
* Keep track of accumulated account ranges over all state roots
* Added comments and explanations to unit tests
* typo
* Extracted functionality into sub-modules for maintainability
* Setting SST bulk load as default in `accounts_db`
details:
+ currently, the same data are stored via rocksdb if available, and
the same via embedded `storage_type` with (non-standard) prefix 200
for time comparisons
+ fallback to normal `put()` unless rocksdb is accessible
* Provided common scheduler API, applied to `full` sync
* Use hexary trie as storage for proofs_db records
also:
+ Store metadata with account for keeping track of account state
+ add iterator over accounts
* Common scheduler API applied to `snap` sync
* Prepare for accounts bulk import
details:
+ Added some ad-hoc checks for proving accounts data received from the
snap/1 (will be replaced by proper database version when ready)
+ Added code that dumps some of the received snap/1 data into a file
(turned of by default, see `worker_desc.nim`)
* Added sepolia specs
* temporarily avoid latest `master` branch from `nim-eth1`
why:
Currently does not cleanly compile after the `bearssl` split api update.
* Relocated `IntervalSets` to nim-stew repo
* Accumulate accounts on temporary kv-DB
why:
Explore the data as returned from snap/1. Will be converted to a
`eth/db` next.
details:
Verify and accumulate per/state-root accounts downloaded via snap.
also:
Some unit tests
* Replace `Table` by `TrieDatabaseRef` for accounts accumulator
* update ticker statistics
details:
mean/variance based counter update
* allow persistent db for proved accounts
* rebase, and globally activate unit test
* fix statistics
* Using `IntervalSet` type data for `LeafRange`
* Updated log ticker
* Update to `eth67`
details:
Disabled by default, use `ENABLE_LEGACY_ETH66=0` to enable
No support for `Get/NodeData` dialogue via eth, anymore
* Dissolved fetch/common.nim
details;
the log/ticker part becomes ticker.nim
the interval range management is merged into fetch.nim
* Updated account scheduler
why:
The previous scheduler fetched each account once (for different state
roots.) The updated scheduler re-calibrates after a change of the state
root and potentially (until told otherwise) fetches all possible
accounts.
* Fix `high(P)` fringe cases in `IntervalSet` handling
why:
The `high(P)` value for a point type `P` cannot be represented with
half open intervals `[a,b)` for a,b points of `P`. So this single value
needs extra treatment which was slightly wrong.
* Updated docu/comments
also:
rebased
* Update scheduler
details:
Change the `pivot` management when creating new accounts lists. It is
strictly increasing (and wrapping around) depending on last updated
accounts list.
* Use type name eth and snap (rather than snap1)
* Prettified snap/eth handler trace messages
* Regrouped sync sources
details:
Snap storage related sources are moved to common directory.
Option --new-sync renamed to --snap-sync
also:
Normalised logging for secondary/non-protocol handlers.
* Merge protocol wrapper files => protocol.nim
details:
Merge wrapper sync/protocol_ethxx.nim and sync/protocol_snapxx.nim
into single file snap/protocol.nim
* Comments cosmetics
* Similar start logic for blockchain_sync.nim and sync/snap.nim
* Renamed p2p/blockchain_sync.nim -> sync/fast.nim
* Update exception tracinig
* Use lazy JSON parser
why:
Uint246 type is not directly supported by the JSON serializer
* Json parser update for stringified UInt256 integers
why:
Now available
* update sub-module branch reference
* Prepare unit tests for running without tx-pool job queue
why:
Most of the job queue logic can be emulated. This adapts to a few
pathological test cases.
* Replace tx-pool job queue logic with in-place actions
why:
This additional execution layer is not needed, anymore which has been
learned from working with the integration/hive tests.
details:
Execution of add or deletion jobs are executed in-place. Some actions
-- as in smartHead() -- have been combined for facilitating the correct
order actions
* Update production functions, remove txpool legacy stuff
* Enable JWT authentication for websockets
details:
Currently, this is optional and only enabled when the jwtsecret option
is set.
There is a default mechanism to generate a JWT secret if it is not
explicitly stated. This mechanism is currently unused.
* Make JWT authentication compulsory for websockets
* Fix unit test entry point + cosmetics
* Update JSON-RPC link
* Improvements as suggested by Mamy
why:
Causes havoc in most bit fringe cases.
details:
When setting the head forward, the delta was wrongly registered from
the static "left" end (which limits the loop) rather than the moving
"right" end.
* Fix database sort order for local txs
why:
For convenience, packed txs were stored in the block sorted by
rank->nonce. Using local accounts, the greedy grabber uses the sort
order (local,non-local)->rank->nonce which leads to a wrong calculation
of the txRoot.
* Housekeeping
details:
Replaced a couple of local eip1559TxNormalization() functions by a
single public
* Support for local accounts
why:
Accounts tagged local will be packed with priority over untagged
accounts
* Added functions for queuing txs and simultaneously setting account locality
why:
Might be a popular task, in particular for unconditionally adding txs to
a local (aka prioritised account) via "xp.addLocal(tx,true)"
caveat:
Untested yet
* fix typo
* backup
* No baseFee for pre-London tx in verifier
why:
The packer would wrongly discard valid legacy txs.
* De-noisify some Clique logging
why:
Too annoying when syncing against Goerly
* Replay devnet# and kiln sessions
why:
Compiled as local program, the unit test was used for TDD.
* dist: precompiled binaries and Docker images
The builds are reproducible, the binaries are portable and statically link librocksdb.
This took some patching. Upstream PR: https://github.com/facebook/rocksdb/pull/9752
32-bit ARM is missing as a target because two different GCC versions
fail with an ICE when trying to cross-compile RocksDB. Using Clang
instead is too much trouble for a platform that nobody should be using
anyway.
(Clang doesn't come with its own target headers and libraries, can't be
easily convinced to use the ones from GCC, so it needs an fs image from
a 32-bit ARM distro - at which point I stopped caring).
* CI: disable reproducibility test
* Activate wire protocol eth/66
and:
Disentangle protocol_eth66.nim from import sections
why:
Importing the protocol_eth66 module is not necessary. There is
no need to know too many details of the underlying wire protocol. All
that is needed will be exported by blockchain_sync.nim.
* fixes, and rebase
* Update nimbus/p2p/blockchain_sync.nim
Co-authored-by: Kim De Mey <kim.demey@gmail.com>
* Fixes and rebase
Co-authored-by: Kim De Mey <kim.demey@gmail.com>
why:
TDD data and test script that are not needed for CI are externally held.
This saves space.
also:
Added support for test-custom_networks.nim to run import Devnet5 dump.
* Rearrange/rename test_kintsugu => test_custom_network
why:
Debug, fix and test more general problems related to running
nimbus on a custom network.
* Update UInt265/Json parser for --custom-network command line option
why:
As found out with the Kintsugi configuration, block number and balance
have the same Nim type which led to misunderstandings. This patch makes
sure that UInt265 encoded string values "0x11" decodes to 17, and "b"
and "11" to 11.
* Refactored genesis.toBlock() => genesis.toBlockHeader()
why:
The function toBlock(g,db) may return different results depending on
whether the db descriptor argument is nil, or initialised. This is due
to the db.config data sub-descriptor which may give various outcomes
for the baseFee field of the genesis header.
Also, the version where db is non-nil initialised is used internally
only. So the public rewrite toBlockHeader() that replaces the toBlock()
function expects a full set of NetworkParams.
* update comments
* Rename toBlockHeader() => toGenesisHeader()
why:
Polymorphic prototype used for BaseChainDB or NetworkParams argument.
With a BaseChainDB descriptor argument, the name shall imply that the
header is generated from the config fields rather than fetched from
the database.
* Added command line option --static-peers-file
why:
Handy feature to keep peer nodes in a file, similar to the
--bootstrap-file option.
* Kludge needed for setting up custom network
why:
Some non-features in the persistent hexary trie DB produce an assert
error when initiating the Kinsugi network.
details:
This fix should be temporary, only.
* Fix OS detection
why:
directive detectOs() bails out on Windows if checking for Ubuntu
* test environment for studying crash of hexary trie
why:
the persistent test case will crash unless in genesis.toBlock():
+ pruneTrie is set false, or
+ the directive "tdb.put(emptyRlpHash.data,emptyRlp)" is added right
before the "for k, v in account.storage:" loop
* different tests for OS variants
Includes a simple test harness for the merge interop M1 milestone
This aims to enable connecting nimbus-eth2 to nimbus-eth1 within
the testing protocol described here:
https://github.com/status-im/nimbus-eth2/blob/amphora-merge-interop/docs/interop_merge.md
To execute the work-in-progress test, please run:
In terminal 1:
tests/amphora/launch-nimbus.sh
In terminal 2:
tests/amphora/check-merge-test-vectors.sh
details:
For documentation, see comments in the file tx_pool.nim.
For prettified manual pages run 'make docs' in the nimbus directory and
point your web browser to the newly created 'docs' directory.
why:
Previous version was based on lru_cache which is ugly. This module is
based on the stew/keyed_queue library module.
other:
There are still some other modules rely on lru_cache which should be
removed.
* Redesign of BaseVMState descriptor
why:
BaseVMState provides an environment for executing transactions. The
current descriptor also provides data that cannot generally be known
within the execution environment, e.g. the total gasUsed which is
available not before after all transactions have finished.
Also, the BaseVMState constructor has been replaced by a constructor
that does not need pre-initialised input of the account database.
also:
Previous constructor and some fields are provided with a deprecated
annotation (producing a lot of noise.)
* Replace legacy directives in production sources
* Replace legacy directives in unit test sources
* fix CI (missing premix update)
* Remove legacy directives
* chase CI problem
* rebased
* Re-introduce 'AccountsCache' constructor optimisation for 'BaseVmState' re-initialisation
why:
Constructing a new 'AccountsCache' descriptor can be avoided sometimes
when the current state root is properly positioned already. Such a
feature existed already as the update function 'initStateDB()' for the
'BaseChanDB' where the accounts cache was linked into this desctiptor.
The function 'initStateDB()' was removed and re-implemented into the
'BaseVmState' constructor without optimisation. The old version was of
restricted use as a wrong accounts cache state would unconditionally
throw an exception rather than conceptually ask for a remedy.
The optimised 'BaseVmState' re-initialisation has been implemented for
the 'persistBlocks()' function.
also:
moved some test helpers to 'test/replay' folder
* Remove unused & undocumented fields from Chain descriptor
why:
Reduces attack surface in general & improves reading the code.
details:
1. The check for cumulativeGasUsed + tx.gasLimit <= header.gasLimit
makes neither sense nor is it part of the Eip1559 specs. Nevertheless
a check tx.gasLimit <= header.gasLimit is added to satisfy some
unit test (see comments in validateTransaction() body.)
2. As a replacement check for the one removed in 1, a check for
cumulativeGasUsed + gasBurned <= header.gasLimit has been added
(see comments in processTransactionImpl() body.)
3. Prototypes for processTransaction() variants have been cleaned up and
commented.
why:
Detail 1. in particular produces an error for tightly packed blocks when
the last tx in the list has a generous gasLimit.
* crash test scenario
details:
Example code for inspecting nested block chain and accounts cache
database transaction framework. There seems to be a pathological
case where the system crashes after a rollback (as appeared in the
tx-pool packer code.)
* simplified crash scenario
* Workable solution (as suggested by Andri)
details:
Avoiding db.rollback() (db.commit() is OK) while vmState.stateDB is
alive.
* Rename text_txcrash => test_accounts_cache
why:
Unit tests covers part of accounts_cache handling
* comment update
Add the new [Arrow Glacier fork](https://eips.ethereum.org/EIPS/eip-4345).
Only the difficulty calculation is changed, but as a new fork it still affects
a number of places in the code.
To the best of my knowledge the change is only scheduled on Mainnet.
In addition:
- The fork date comments in `chain_config.nim` have been checked against the
real networks, set consistently in UTC instead of random timezones, and made
neater. Maybe we'll keep these when transferring config to a file someday.
- It's added to forkid hash tests (EIP-2124/EIP-2364), of course.
Signed-off-by: Jamie Lokier <jamie@shareable.org>
* PoW wrapper for verification & mining
why:
It eases data management of per-Epoch lookup tables. Also some unit
tests show limits of usefulness on non-specialised machines for
mining besides developing tests.
details:
For PoW verification, this patch provides a pretty wrapper hiding the
details of the ethash/Hashimoto lookup cache management.
For mining on my development system without special hardware, the
underlying ethash functions are prohibitively slow. It takes
* ~20 minutes to prepare the full ethash/Hashimoto lookup dataset
* a second to run ~25k nonce tests (in the mining loop)
The mining part might be of some use for generating test data for
the tx-pool, though.
* Using PowRef as replacement for EpochHashCache + hashimotoLight()
* Fix typo (CI failed)
why:
was below log level when testing locally
* fix canonical naming
previously, every time the VMState was created, it will also create
new stateDB, and this action will nullify the advantages of cached accounts.
the new changes will conserve the accounts cache if the executed blocks
are contiguous. if not the stateDB need to be reinited.
this changes also allow rpcCallEvm and rpcEstimateGas executed properly
using current stateDB instead of creating new one each time they are called.
EIP-2718:
- chainID: Long! of Query
- chainID: Long of Transaction
EIP-1559:
- baseFeePerGas: BigInt of Block
- effectiveGasPrice: BigInt of Transaction
- maxFeePerGas: BigInt of Transaction
- maxPriorityFeePerGas: BigInt of Transaction
this is a preparation for migration to confutils based config
although there is still some getConfiguration usage in tests code
it will be removed after new config arrived
although they are technically different, but in reality,
many networks are using the same id for ChainId dan NetworkId.
in this commit, we set networkid from config file's chainId.
- allow clique period and epoch to be configured via config file
- this also activate poaEngine mode
- remove clique period configuration from cli to reduce confusion
- fix#786
Add these to the list of specially disabled tests during `make test`,
as they are exceptionally slow.
Signed-off-by: Jamie Lokier <jamie@shareable.org>
* Provide PoA voting header generator
why:
Handy for hive/smoke test
details:
Header generator is a re-implementation of the generator previously
used for the canonical reference tests.
* try fixing ci out-of-mem condition
why:
for some reason, the ci began behaving like a real win7/i386 machine
where gcc is limited to 64k optimiser space
* fix comments, typos ..
* Provide API
details:
API is bundled via clique.nim.
* Set extraValidation as default for PoA chains
why:
This triggers consensus verification and an update of the list
of authorised signers. These signers are integral part of the
PoA block chain.
todo:
Option argument to control validation for the nimbus binary.
* Fix snapshot state block number
why:
Using sub-sequence here, so the len() function was wrong.
* Optional start where block verification begins
why:
Can speed up time building loading initial parts of block chain. For
PoA, this allows to prove & test that authorised signers can be
(correctly) calculated starting at any point on the block chain.
todo:
On Goerli around blocks #193537..#197568, processing time increases
disproportionally -- needs to be understand
* For Clique test, get old grouping back (7 transactions per log entry)
why:
Forgot to change back after troubleshooting
* Fix field/function/module-name misunderstanding
why:
Make compilation work
* Use eth_types.blockHash() rather than utils.hash() in Clique modules
why:
Prefer lib module
* Dissolve snapshot_misc.nim
details:
.. into clique_verify.nim (the other source file clique_unused.nim
is inactive)
* Hide unused AsyncLock in Clique descriptor
details:
Unused here but was part of the Go reference implementation
* Remove fakeDiff flag from Clique descriptor
details:
This flag was a kludge in the Go reference implementation used for the
canonical tests. The tests have been adapted so there is no need for
the fakeDiff flag and its implementation.
* Not observing minimum distance from epoch sync point
why:
For compiling PoA state, the go implementation will walk back to the
epoch header with at least 90000 blocks apart from the current header
in the absence of other synchronisation points.
Here just the nearest epoch header is used. The assumption is that all
the checkpoints before have been vetted already regardless of the
current branch.
details:
The behaviour of using the nearest vs the minimum distance epoch is
controlled by a flag and can be changed at run time.
* Analysing processing time (patch adds some debugging/visualisation support)
why:
At the first half million blocks of the Goerli replay, blocks on the
interval #194854..#196224 take exceptionally long to process, but not
due to PoA processing.
details:
It turns out that much time is spent in p2p/excecutor.processBlock()
where the elapsed transaction execution time is significantly greater
for many of these blocks.
Between the 1371 blocks #194854..#196224 there are 223 blocks with more
than 1/2 seconds execution time whereas there are only 4 such blocks
before and 13 such after this range up to #504192.
* fix debugging symbol in clique_desc (causes CI failing)
* Fixing canonical reference tests
why:
Two errors were introduced earlier but ovelooked:
1. "Remove fakeDiff flag .." patch was incomplete
2. "Not observing minimum distance .." introduced problem w/tests 23/24
details:
Fixing 2. needed to revert the behaviour by setting the
applySnapsMinBacklog flag for the Clique descriptor. Also a new
test was added to lock the new behaviour.
* Remove cruft
why:
Clique/PoA processing was intended to take place somewhere in
executor/process_block.processBlock() but was decided later to run
from chain/persist_block.persistBlock() instead.
* Update API comment
* ditto
Use a hard-coded number instead of the same expression as the client, so that
bugs introduced via that expression are detected. Using the same expression as
the client can hide issues when the value is wrong in both places.
When the expected value genuinely changes, it'll be obvious.
Just change this number.
Signed-off-by: Jamie Lokier <jamie@shareable.org>
Use a hard-coded number instead of the same expression as the client, so that
bugs introduced via that expression are detected. Using the same expression as
the client can hide issues when the value is wrong in both places.
When the expected value genuinely changes, it'll be obvious.
Just change this number.
Signed-off-by: Jamie Lokier <jamie@shareable.org>
* Renamed source file clique_utils => clique_helpers
why:
New name is more in line with other modules where local libraries
are named similarly.
* re-implemented PoA verification module as clique_verify.nim
details:
The verification code was ported from the go sources and provisionally
stored in the clique_misc.nim source file.
todo:
Bring it to life.
* re-design Snapshot descriptor as: ref object
why:
Avoids some copying descriptor objects
details:
The snapshot management in clique_snapshot.nim has been cleaned up.
todo:
There is a lot of unnecessary copying & sub-list manipulation of
seq[BlockHeader] lists which needs to be simplified by managing
index intervals.
* optimised sequence handling for Clique/PoA
why:
To much ado about nothing
details:
* Working with shallow sequences inside PoA processing avoids
unnecessary copying.
* Using degenerate lists in the cliqueVerify() batch where only the
parent (and no other ancestor) is needed.
todo:
Expose only functions that are needed, shallow sequences should be
handles with care.
* fix var-parameter function argument
* Activate PoA engine -- currently proof of concept
details:
PoA engine is activated with newChain(extraValidation = true) applied
to a PoA network.
status and todo:
The extraValidation flag on the Chain object can be set at a later
state which allows to pre-load parts of the block chain without
verification. Setting it later will only go back the block chain to
the latest epoch checkpoint. This is inherent to the Clique protocol,
needs testing though.
PoA engine works in fine weather mode on Goerli replay. With the
canonical eip-225 tests, there are quite a few fringe conditions
that fail. These can easily fudged over to make things work but need
some more work to understand and correct properly.
* Make the last offending verification header available
why:
Makes some fringe case tests work.
details:
Within a failed transaction comprising several blocks, this
feature help to identify the offending block if there was a
PoA verification error.
* Make PoA header verifier store the final snapshot
why:
The last snapshot needed by the verifier is the one of the parent but
the list of authorised signer is derived from the current snapshot. So
updating to the latest snapshot provides the latest signers list.
details:
Also, PoA processing has been implemented as transaction in
persistBlocks() with Clique state rollback.
Clique tests succeed now.
* Avoiding double yields in iterator => replaced by template
why:
Tanks to Andri who observed it (see #762)
* Calibrate logging interval and fix logging event detection
why:
Logging interval as copied from Go implementation was too large and
needed re-calibration. Elapsed time calculation was bonkers, negative
the wrong way round.
* re-shuffled Clique functions
why:
Due to the port from the go-sources, the interface logic is not optimal
for nimbus. The main visible function is currently snapshot() and most
of the _procurement_ of this function result has been moved to a
sub-directory.
* run eip-225 Clique test against p2p/chain.persistBlocks()
why:
Previously, loading the test block chains was fugdged with the purpose
only to fill the database. As it is now clear how nimbus works on
Goerli, the same can be achieved with a more realistic scenario.
details:
Eventually these tests will be pre-cursor to the reply tests for the
Goerli chain supporting TDD approach with more simple cases.
* fix exception annotations for executor module
why:
needed for exception tracking
details:
main annoyance are vmState methods (in state.nim) which potentially
throw a base level Exception (a proc would only throws CatchableError)
* split p2p/chain into sub-modules and fix exception annotations
why:
make space for implementing PoA stuff
* provide over-loadable Clique PRNG
why:
There is a PRNG provided for generating reproducible number sequences.
The functions which employ the PRNG to generate time slots were ported
ported from the go-implementation. They are currently unused.
* implement trusted signer assembly in p2p/chain.persistBlocks()
details:
* PoA processing moved there at the end of a transaction. Currently,
there is no action (eg. transaction rollback) if this fails.
* The unit tests with staged blocks work ok. In particular, there should
be tests with to-be-rejected blocks.
* TODO: 1.Optimise throughput/cache handling; 2.Verify headers
* fix statement cast in pool.nim
* added table features to LRU cache
why:
Clique uses the LRU cache using a mixture of volatile online items
from the LRU cache and database checkpoints for hard synchronisation.
For performance, Clique needs more table like features.
details:
First, last, and query key added, as well as efficient random delete
added. Also key-item pair iterator added for debugging.
* re-factored LRU snapshot caching
why:
Caching was sub-optimal (aka. bonkers) in that it skipped over memory
caches in many cases and so mostly rebuild the snapshot from the
last on-disk checkpoint.
details;
The LRU snapshot toValue() handler has been moved into the module
clique_snapshot. This is for the fact that toValue() is not supposed
to see the whole LRU cache database. So there must be a higher layer
working with the the whole LRU cache and the on-disk checkpoint
database.
also:
some clean up
todo:
The code still assumes that the block headers are valid in itself. This
is particular important when an epoch header (aka re-sync header) is
processed as it must contain the PoA result of all previous headers.
So blocks need to be verified when they come in before used for PoA
processing.
* fix some snapshot cache fringe cases
why:
Must not index empty sequences in clique_snapshot module
* extract unused clique/mining support into separate file
why:
mining is currently unsupported by nimbus
* Replay first 51840 transactions from Goerli block chain
why:
Currently Goerli is loaded but the block headers are not verified.
Replaying allows real data PoA development.
details:
Simple stupid gzipped dump/undump layer for debugging based on
the zlib module (no nim-faststream support.)
This is a replay running against p2p/chain.persistBlocks() where
the data were captured from.
* prepare stubs for PoA engine
* split executor source into sup-modules
why:
make room for updates, clique integration should go into
executor/update_poastate.nim
* Simplify p2p/executor.processBlock() function prototype
why:
vmState argument always wraps basicChainDB
* split processBlock() into sub-functions
why:
isolate the part where it will support clique/poa
* provided additional processTransaction() function prototype without _fork_ argument
why:
with the exception of some tests, the _fork_ argument is always derived
from the other prototype argument _vmState_
details:
similar situation with makeReceipt()
* provide new processBlock() version explicitly supporting PoA
details:
The new processBlock() version supporting PoA is the general one also
supporting non-PoA networks, it needs an additional _Clique_ descriptor
function argument for PoA state (if any.)
The old processBlock() function without the _Clique_ descriptor argument
retorns an error on PoA networgs (e.g. Goerli.)
* re-implemented Clique descriptor as _ref object_
why:
gives more flexibility when moving around the descriptor object
details:
also cleaned up a bit the clique sources
* comments for clarifying handling of Clique/PoA state descriptor
* continue importing rlp blocks
why:
a chain of blocks to be imported might have legit blocks
after rejected blocks
details:
import loop only stops if the import list is exhausted or if there
was a decoding error. this adds another four to the count of successful
no-hive tests.
* verify DAO marked extra data field in block header
why:
was ignored, scores another two no-hive tests
* verify minimum required difficulty in header validator
why:
two more nohive tests to succeed
details:
* subsumed extended header tests under validateKinship() and renamed it
more appropriately validateHeaderAndKinship()
* enhanced readability of p2p/chain.nim
* cleaned up test_blockchain_json.nim
* verify positive gasUsed unless no transactions
why:
solves another to nohive tests
details:
straightened test_blockchain_json chech so there is no unconditional
rejection anymore (based on the input test scenario)
why:
some handy features were intended to support the unit test from
the clique/clique_test.go source (the other one is from
clique/snapshot_test.go.)
as this test cannot realistically be implemented without the full
api (includes mining support), it is left as that