Commit Graph

69 Commits

Author SHA1 Message Date
Eugene Kabanov 18409a69e1
Light forward sync mechanism (#6515)
* Initial commit.

* Add hybrid syncing.

* Compilation fixes.

* Cast custom event for our purposes.

* Instantiate AsyncEventQueue properly.

* Fix mistype.

* Further research on optimistic updates.

* Fixing circular deps.

* Add backfilling.

* Add block download feature.

* Add block store.

* Update backfill information before storing block.

* Use custom block verifier for backfilling sync.

* Skip signature verification in backfilling.

* Add one more generic reload to storeBackfillBlock().

* Add block verification debugging statements.

* Add more debugging

* Do not use database for backfilling, part 1.

* Fix for stash.

* Stash fixes part 2.

* Prepare for testing.

* Fix assertion.

* Fix post-restart syncing process.

* Update backfill loading log statement.
Use proper backfill slot callback for sync manager.

* Add handling of Duplicates.

* Fix store duration and block backfilled log statements.

* Add proper syncing state log statement.

* Add snappy compression to beaconchain_file.
Format syncing speed properly.

* Add blobs verification.

* Add `slot` number to file structure for easy navigation over stream of compressed objects.

* Change database filename.

* Fix structure size.

* Add more consistency properties.

* Fix checkRepair() issues.

* Preparation to state rebuild process.

* Add plain & compressed size.

* Debugging snappy encode process.

* Add one more debugging line.

* Dump blocks.

* One more filedump.

* Fix chunk corruption code.

* Fix detection issue.

* Some fixes in state rebuilding process.

* Add more clearance steps.

* Move updateHead() back to block_processor.

* Fix compilation issues.

* Make code more async friendly.

* Fix async issues.
Add more information when proposer verification failed.

* Fix 8192 slots issue.

* Fix Future double completion issue.

* Pass updateFlags to some of the core procedures.

* Fix tests.

* Improve initial sync handling mechanism.

* Fix checkStateTransition() performance improvements.

* Add some performance tuning and meters.

* Light client performance tuning.

* Remove debugging statement.

* Use single file descriptor for blockchain file.

* Attempt to fix LC.

* Fix timeleft calculation when untrusted sync backfilling started right after LC block received.

* Workaround for `chronicles` + `results` `error` issue.
Remove some compilation warnings.
Fix `CatchableError` leaks on Windows.

* Address review comments.

* Address review comments part 2.

* Address review comments part 1.

* Rebase and fix the issues.

* Address review comments part 3.

* Add tests and fix some issues in auto-repair mechanism.

* Add tests to all_tests.

* Rename binary test file to pass restrictions.

* Add `bin` extension to excluded list.
Recover binary test data.

* Rename fixture file to .bin again.

* Update AllTests.

* Address review comments part 4.

* Address review comments part 5 and fix tests.

* Address review comments part 6.

* Eliminate foldl and combine from blobs processing.
Add some tests to ensure that checkResponse() also checks for correct order.

* Fix forgotten place.

* Post rebase fixes.

* Add unique slots tests.

* Optimize updateHead() code.

* Add forgotten changes.

* Address review comments on state as argument.
2024-10-30 05:38:53 +00:00
tersec b56a671122
fix most ConvFromXtoItselfNotNeeded hints and unhide remaining ones (#6307) 2024-05-22 13:56:37 +02:00
Etan Kissling 4fc1550d0f
add `{.push raises: [].}` to recently modified files (#5908)
Status Nim style mandates `{.push raises: []}.` at start of modules.
Ensure that's the case so that exceptions are properly tracked.

- https://status-im.github.io/nim-style-guide/errors.exceptions.html
- https://github.com/status-im/nim-eth/pull/614#discussion_r1220906149
2024-02-18 01:16:49 +00:00
Jacek Sieka afdfe302f3
state loading optimizations (#5881)
* compute post-merge randao mix without loading state
* avoid copying state on shuffling computation and compute epochref
* speed up state copy for block production
2024-02-12 15:58:55 +01:00
tersec a4680cb7fa
refactor addHeadBlock() to research/ and tests/ helper (#5874)
* refactor addHeadBlock() to research/ and tests/ helper

* rm now-dead code
2024-02-09 23:46:51 +00:00
Etan Kissling 7c53841cd8
Revert "Revert "fix checkpoint block potentially not getting backfilled into DB (#5863)" (#5871)" (#5875)
This reverts commit 1575478b72.
2024-02-09 20:44:54 +01:00
tersec 1575478b72
Revert "fix checkpoint block potentially not getting backfilled into DB (#5863)" (#5871)
This reverts commit 65e6f892de.
2024-02-09 12:49:07 +00:00
Etan Kissling 65e6f892de
fix checkpoint block potentially not getting backfilled into DB (#5863)
When using checkpoint sync, only checkpoint state is available, block is
not downloaded and backfilled later.

`dag.backfill` tracks latest filled `slot`, and latest `parent_root` for
which no block has been synced yet.

In checkpoint sync, this assumption is broken, because there, the start
`dag.backfill.slot` is set based on checkpoint state slot, and the block
is also not available.

However, sync manager in backward mode also requests `dag.backfill.slot`
and `block_clearance` then backfills the checkpoint block once it is
synced. But, there is no guarantee that a peer ever sends us that block.
They could send us all parent blocks and solely omit the checkpoint
block itself. In that situation, we would accept the parent blocks and
advance `dag.backfill`, and subsequently never request the checkpoint
block again, resulting in gap inside blocks DB that is never filled.

To mitigate that, the assumption is restored that `dag.backfill.slot`
is the latest filled `slot`, and `dag.backfill.parent_root` is the next
block that needs to be synced. By setting `slot` to `tail.slot + 1` and
`parent_root` to `tail.root`, we put a fake summary into `dag.backfill`
so that `block_clearance` only proceeds once checkpoint block exists.
2024-02-09 11:20:36 +01:00
Etan Kissling 4266e16835
allow `getBlockIdAtSlot` to answer queries from available states (#5869)
After checkpoint sync, historical block IDs cannot yet be queried.
However, they are needed to compute dependent roots of `ShufflingRef`.
To allow lookup, enable `getBlockIdAtSlot` to answer from compatible
states in memory; as long as they descend from the finalized checkpoint
and the requested slot is sufficiently recent, `block_roots` contains
everything to recover `BlockSlotId` up to `SLOTS_PER_HISTORICAL_ROOT`.
This is similar to how `attester_dependent_root` etc. are computed.

This accelerates the first couple minutes of checkpoint sync on Mainnet,
especially the time until finality advances past the synced checkpoint.
2024-02-09 11:13:00 +01:00
tersec 7fd8beb418
rm unused code in {ncli,research,tests}/ (#5809) 2024-01-21 07:55:03 +01:00
Jacek Sieka 62cbdeefc5
verify `genesis_time` more strictly (fixes #1667) (#5694)
Bogus values lead to crashes down the line when timers overflow
2024-01-06 15:26:56 +01:00
Etan Kissling 8cea8af620
fix startup after BN exited between head and finalized blocks updates (#5617)
When the BN exits after writing new `head` to database, but before
completing the `updateFinalizedBlocks` call, the database is slightly
inconsistent due to the partial write. We currently fail to start up
after that. Fix that by catching up on partial `updateFinalizedBlocks`
tasks on start up, and add a test for this edge case.
2023-11-23 00:44:20 +01:00
tersec 54bdda13b4
rm unused code (#5596) 2023-11-11 11:49:34 +03:00
Etan Kissling eb35039704
allow higher `MIN_EPOCHS_FOR_BLOCK_REQUESTS` than safe minimum (#5590)
Gnosis uses `MIN_EPOCHS_FOR_BLOCK_REQUESTS` = 33024, but the computed
safe minimum (that Nimbus was using) is 2304. Relax the compatibility
check to allow `MIN_EPOCHS_FOR_BLOCK_REQUESTS` above the safe minimum
and honor `config.yaml` preferences for `MIN_EPOCHS_FOR_BLOCK_REQUESTS`.
2023-11-10 15:04:55 +00:00
tersec 447786518f
ShufflingRef approach to next-epoch validator duty calculation/prediction (#5414)
* ShufflingRef approach to next-epoch validator duty calculation/prediction

* refactor action_tracker.updateActions to take ShufflingRef + beacon_proposers; refactor maybeUpdateActionTrackerNextEpoch to be separate and reused function; add actual fallback logic

* document one possible set of conditions

* check epoch participation flags and inactivity scores to ensure no penalties and MAX_EFFECTIVE_BALANCE to ensure rewards don't matter

* correctly (un)shuffle each proposer index

* remove debugging assertion
2023-10-10 00:02:07 +00:00
Etan Kissling e7bc41e005
`blck` --> `forkyBlck` when using `withBlck` / `withStateAndBlck` (#5451)
For symmetry with `forkyState` when using `withState`, and to avoid
problems with shadowing of `blck` when using `withBlck` in `template`,
also rename the injected `blck` to `forkyBlck`.

- https://github.com/nim-lang/Nim/issues/22698
2023-09-21 12:49:14 +02:00
Etan Kissling 176ea09c2b
cleanup `OnBlockAdded` usage (#5426)
Reduce repetitiveness when using forked `OnBlockAdded` callbacks by
introducing a template to obtain appropriate cb from `ConsensusFork`.
2023-09-13 17:57:54 +00:00
tersec ff87ee9181
rm i386 test_blockchain_dag workaround (#5356) 2023-08-25 15:24:56 +00:00
Jacek Sieka b8a32419b8
async batch verification (+40% sig verification throughput) (#5176)
* async batch verification

When batch verification is done, the main thread is blocked reducing
concurrency.

With this PR, the new thread signalling primitive in chronos is used to
offload the full batch verification process to a separate thread
allowing the main threads to continue async operations while the other
threads verify signatures.

Similar to previous behavior, the number of ongoing batch verifications
is capped to prevent runaway resource usage.

In addition to the asynchronous processing, 3 addition changes help
drive throughput:

* A loop is used for batch accumulation: this prevents a stampede of
small batches in eager mode where both the eager and the scheduled batch
runner would pick batches off the queue, prematurely picking "fresh"
batches off the queue
* An additional small wait is introduced for small batches - this helps
create slightly larger batches which make better used of the increased
concurrency
* Up to 2 batches are scheduled to the threadpool during high pressure,
reducing startup latency for the threads

Together, these changes increase attestation verification throughput
under load up to 30%.

* fixup

* Update submodules

* fix blst build issues (and a PIC warning)

* bump

---------

Co-authored-by: Zahary Karadjov <zahary@gmail.com>
2023-08-03 11:36:45 +03:00
tersec 846e7c585b
Revert "Revert "generalize `ShufflingRef` acceleration logic (#5197)" (#5223)" (#5225)
This reverts commit 2ab4592a31.
2023-07-31 13:11:45 +00:00
tersec 2ab4592a31
Revert "generalize `ShufflingRef` acceleration logic (#5197)" (#5223)
This reverts commit eb3a30655b.
2023-07-31 08:05:32 +02:00
Etan Kissling eb3a30655b
generalize `ShufflingRef` acceleration logic (#5197)
Split up the `ShufflingRef` acceleration logic into generically usable
parts and attester shuffling specific parts. The generic parts could be
used to accelerate other purposes, e.g., REST `/states/xxx/randao` API.
2023-07-20 10:25:39 +02:00
Etan Kissling f98c33ad03
generalize `commonAncestor` function to `BlockId` (#5192)
To enable additional use cases, e.g., `/states/###/randao` beacon API,
`ShufflingRef` acceleration logic needs to be able to operate on parts
of the DAG that do not have `BlockRef`. Changing `commonAncestor` to
act on `BlockId` instead of `BlockRef` is a step toward that and also
simplifies the logic some more.
2023-07-18 17:37:53 +02:00
Etan Kissling 2efc44a8ab
accelerate RANDAO computation for post-merge blocks (#5190)
Post-merge blocks contain all information to directly obtain RANDAO
without having to load any additional info. Take advantage of that to
further accelerate `ShufflingRef` computation. Note that it is still
necessary to verify that `blck` / `state` share a sufficiently recent
ancestor for the purpose of computing attester shufflings.

- new: 243.71s, 239.67s, 237.32s, 238.36s, 239.57s
- old: 251.33s, 234.29s, 249.28s, 237.03s, 236.78s
2023-07-15 22:16:56 +02:00
Etan Kissling 74bb4b1411
simplify RANDAO recovery in `ShufflingRef` acceleration (#5183)
Current RANDAO recovery logic is quite complex as it optimizes for the
minimum amount of database reads. Loading blocks isn't the bottleneck
though, so rather make the implementation more concise by avoiding the
complex strategy planning step. Note that this also prepares for an even
faster implementation for post-merge blocks in the future that extracts
RANDAO from `ExecutionPayload` directly if available, so even in cases
where efficiency is slightly lower, only historical data is affected.

`time nim c -r tests/test_blockchain_dag` (cached binary):

- new: 145.45s, 133.59s, 144.65s, 127.69s, 136.14s
- old: 149.15s, 150.84s, 135.77s, 137.49s, 133.89s
2023-07-12 17:27:05 +02:00
Etan Kissling 5115aaedb7
early exit `commonAncestor` when comparing with `finalizedHead` (#5174)
* early exit `commonAncestor` when comparing with `finalizedHead`

As all `BlockRef` lead to `finalizedHead` (`parent == nil`),
can shortcut in that situation and immediately return `finalizedHead`
if passed as one of the arguments.

* typo in comment

* add test from #5152

Co-authored-by: tersec <tersec@users.noreply.github.com>

* add note about test complexity

* regenerate test summary

---------

Co-authored-by: tersec <tersec@users.noreply.github.com>
2023-07-10 20:36:25 +00:00
Etan Kissling 2722778ce5
reduce `nim-eth` dependencies just for RNG (#5099)
We have several modules that import `nim-eth` for the sole purpose of
its `keys.newRng` function. This function is meanwhile a simple wrapper
around `nim-bearssl`'s `HmacDrbgContext.new()`, so the import doesn't
really serve a use anymore. Replace `keys.newRng` with the direct call
to reduce `nim-eth` imports.
2023-06-19 22:43:50 +00:00
Etan Kissling dbba003a38
Revert "Revert "accelerate `getShufflingRef` (#4911)" (#4958)"
This reverts commit 748be8b67b.
2023-05-15 17:41:40 +02:00
Etan Kissling 748be8b67b
Revert "accelerate `getShufflingRef` (#4911)" (#4958)
This reverts commit ea97e93e74.
2023-05-15 15:25:51 +00:00
henridf 573228ffa0
Rename eth1/ -> el/ and eth1_monitor.nim -> el_monitor.nim (#4944) 2023-05-15 05:05:12 +00:00
Etan Kissling ea97e93e74
accelerate `getShufflingRef` (#4911)
When an uncached `ShufflingRef` is requested, we currently replay state
which can take several seconds. Acceleration is possible by:

1. Start from any state with locked-in `get_active_validator_indices`.
   Any blocks / slots applied to such a state can only affect that
   result for future epochs, so are viable for querying target epoch.
   `compute_activation_exit_epoch(state.slot.epoch) > target.epoch`

2. Determine highest common ancestor among `state` and `target.blck`.
   At the ancestor slot, same rules re `get_active_validator_indices`.
   `compute_activation_exit_epoch(ancestorSlot.epoch) > target.epoch`

3. We now have a `state` that shares history with `target.blck` up
   through a common ancestor slot. Any blocks / slots that the `state`
   contains, which are not part of the `target.blck` history, affect
   `get_active_validator_indices` at epochs _after_ `target.epoch`.

4. Select `state.randao_mixes[N]` that is closest to common ancestor.
   Either direction is fine (above / below ancestor).

5. From that RANDAO mix, mix in / out all RANDAO reveals from blocks
   in-between. This is just an XOR operation, so fully reversible.
   `mix = mix xor SHA256(blck.message.body.randao_reveal)`

6. Compute the attester dependent slot from `target.epoch`.
   `if epoch >= 2: (target.epoch - 1).start_slot - 1 else: GENESIS_SLOT`

7. Trace back from `target.blck` to the attester dependent slot.
   We now have the destination for which we want to obtain RANDAO.

8. Mix in all RANDAO reveals from blocks up through the `dependentBlck`.
   Same method, no special handling necessary for epoch transitions.

9. Combine `get_active_validator_indices` from `state` at `target.epoch`
   with the recovered RANDAO value at `dependentBlck` to obtain the
   requested shuffling, and construct the `ShufflingRef` without replay.

* more tests and simplify logic

* test with different number of deposits per branch

* Update beacon_chain/consensus_object_pools/blockchain_dag.nim

Co-authored-by: Jacek Sieka <jacek@status.im>

* `commonAncestor` tests

* lint

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2023-05-12 19:36:59 +02:00
tersec d058aa09c8
more withdrowls (#4674) 2023-03-02 17:13:35 +01:00
tersec 0fb726c420
`BeaconStateFork/BeaconBlockFork` -> `ConsensusFork` (#4560)
* `BeaconStateFork/BeaconBlockFork` -> `ConsensusFork`

* revert unrelated change

* revert unrelated changes

* update test summaries
2023-01-28 19:53:41 +00:00
Jacek Sieka 0ba9fc4ede
History pruning (fixes #4419) (#4445)
Introduce (optional) pruning of historical data - a pruned node will
continue to answer queries for historical data up to
`MIN_EPOCHS_FOR_BLOCK_REQUESTS` epochs, or roughly 5 months, capping
typical database usage at around 60-70gb.

To enable pruning, add `--history=prune` to the command line - on the
first start, old data will be cleared (which may take a while) - after
that, data is pruned continuously.

When pruning an existing database, the database will not shrink -
instead, the freed space is recycled as the node continues to run - to
free up space, perform a trusted node sync with a fresh database.

When switching on archive mode in a pruned node, history is retained
from that point onwards.

History pruning is scheduled to be enabled by default in a future
release.

In this PR, `minimal` mode from #4419 is not implemented meaning
retention periods for states and blocks are always the same - depending
on user demand, a future PR may implement `minimal` as well.
2023-01-07 10:02:15 +00:00
Jacek Sieka bd8f08204e
Implement skip_randao_verification for blinded blocks (#4435)
* Implement skip_randao_verification for blinded blocks

* fix redundant randao verification on block replay (5% faster)
* check randao in REST instead of internally
* avoid redundant copies when making blocks
* cleanup leftover randao skipping code

* fix test summary
2022-12-19 15:11:12 +02:00
tersec 4e71e77da7
structure for supporting capella block production (#4383) 2022-12-02 08:39:01 +01:00
Jacek Sieka cd160b5650
more strict read-only database mode (#4362)
* avoid creating pre-altair backwards compatibility tables
* allow running ncli_db era export without above tables present
* drop unused pre-altair backwards compatibility tables
* run benchmark on read-ronly database
* fix running benchmark from genesis
2022-11-28 23:21:58 +00:00
Etan Kissling 48994f67d3
rename `BlockError` -> `VerifierError` (#4310)
We currently use `BlockError` for both beacon blocks and LC objects.
In light of EIP4844, we will likely also use it for blob sidecars.
To avoid confusion, renaming it to a more generic `VerifierError`,
and update its documentation to be more generic.

To avoid long lines as a followup, also renaming the `block_processor`'s
`BlockProcessingCompleted.completed`->`ProcessingStatus.completed` and
`BlockProcessingCompleted.notCompleted`->`ProcessingStatus.notCompleted`
2022-11-10 17:40:27 +00:00
Jacek Sieka d839b9d07e
State-only checkpoint state startup (#4251)
Currently, we require genesis and a checkpoint block and state to start
from an arbitrary slot - this PR relaxes this requirement so that we can
start with a state alone.

The current trusted-node-sync algorithm works by first downloading
blocks until we find an epoch aligned non-empty slot, then downloads the
state via slot.

However, current
[proposals](https://github.com/ethereum/beacon-APIs/pull/226) for
checkpointing prefer finalized state as
the main reference - this allows more simple access control and caching
on the server side - in particular, this should help checkpoint-syncing
from sources that have a fast `finalized` state download (like infura
and teku) but are slow when accessing state via slot.

Earlier versions of Nimbus will not be able to read databases created
without a checkpoint block and genesis. In most cases, backfilling makes
the database compatible except where genesis is also missing (custom
networks).

* backfill checkpoint block from libp2p instead of checkpoint source,
when doing trusted node sync
* allow starting the client without genesis / checkpoint block
* perform epoch start slot lookahead when loading tail state, so as to
deal with the case where the epoch start slot does not have a block
* replace `--blockId` with `--state-id` in TNS command line
* when replaying, also look at the parent of the last-known-block (even
if we don't have the parent block data, we can still replay from a
"parent" state) - in particular, this clears the way for implementing
state pruning
* deprecate `--finalized-checkpoint-block` option (no longer needed)
2022-11-02 10:02:38 +00:00
Jacek Sieka 819442acc3
Allow chain dag without genesis / block (#4230)
* Allow chain dag without genesis / block

This PR enables the initialization of the dag without access to blocks
or genesis state - it is a prerequisite for implementing a number of
interesting features:

* checkpoint sync without any block download
* pruning of blocks and states

* backfill checkpoint block
2022-10-14 22:40:10 +03:00
Jacek Sieka 40bed02f60
Build block in parallel with attestation packing (#4185)
* fix block proposal in first slot after checkpoint
2022-10-04 11:24:16 +00:00
Etan Kissling 5968ed586b
use LRU strategy for shuffling/epoch caches (#4196)
When EL `newPayload` is slow (e.g., Raspberry Pi with Besu), the epoch
and shuffling caches tend to fill up with multiple copies per epoch when
processing gossip and performing validator duties close to wall slot.
The old strategy of evicting oldest epoch led to the same item being
evicted over and over, leading to blocking of over 5 minutes in extreme
cases where alternate epochs/shuffling got loaded repeatedly.
Changing the cache eviction strategy to least-recently-used seems to
improve the situation drastically. A simple implementation was selected
based on single linked-list without a hashtable.
2022-09-29 14:55:58 +00:00
tersec 1819d79e07
avoid potential database inconsistency after fcU `INVALID`+crash (#4192)
* avoid database race-condition inconsistency after fcU `INVALID` then crash

* ensure head doesn't fall behind finalized; add more tests for head movement/reloading DAG
2022-09-28 21:07:31 +00:00
Jacek Sieka b1bc830a92
Harden EpochRef loading against bogus block root at tail (#4178)
* add more error information when things go wrong with database
* lower log level when reloading attestations from no-block epoch start
slot
2022-09-27 18:56:08 +02:00
tersec 0f6d19b4b3
implement v1.2.0 optimistic sync tests (#4174)
* implement v1.2.0 optimistic sync tests

* Update beacon_chain/consensus_object_pools/blockchain_dag.nim

Co-authored-by: Etan Kissling <etan@status.im>

* `lvh` -> `latestValidHash` and only invalidate one specific block"

* `getEarliestInvalidRoot` -> `getEarliestInvalidBlockRoot`; `defaultEarliestInvalidRoot` -> `defaultEarliestInvalidBlockRoot`

Co-authored-by: Etan Kissling <etan@status.im>
2022-09-27 15:11:47 +03:00
Jacek Sieka 7f9af78ddb
test randao skippping (complements #3837) (#4179) 2022-09-27 09:22:24 +02:00
Jacek Sieka 0d9fd54857
cache shuffling separately from other EpochRef data (fixes #2677) (#3990)
In order to avoid full replays when validating attestations hailing from
untaken forks, it's better to keep shufflings separate from `EpochRef`
and perform a lookahead on the shuffling when processing the block that
determines them.

This also helps performance in the case where REST clients are trying to
perform lookahead on attestation duties and decreases memory usage by
sharing shufflings between EpochRef instances of the same dependent
root.
2022-08-18 21:07:01 +03:00
tersec 8eb5d5de09
use ZERO_HASH for default(Eth2Digest)/Eth2Digest() in func calls (#3770) 2022-06-18 04:57:37 +00:00
tersec a07d14cd99
remove unused imports in tests/ (#3713) 2022-06-07 17:05:06 +00:00
Jacek Sieka e009728858
work around Nim assignment bug that breaks state pruning (#3545)
See https://github.com/nim-lang/Nim/issues/19613
2022-03-24 14:37:37 +00:00