nimbus-eth1/nimbus/sync/beacon/worker/db.nim

214 lines
7.2 KiB
Nim
Raw Normal View History

# Nimbus
# Copyright (c) 2023-2024 Status Research & Development GmbH
# Licensed and distributed under either of
# * MIT license (license terms in the root directory or at
# https://opensource.org/licenses/MIT).
# * Apache v2 license (license terms in the root directory or at
# https://www.apache.org/licenses/LICENSE-2.0).
# at your option. This file may not be copied, modified, or distributed
# except according to those terms.
{.push raises:[].}
import
Flare sync (#2627) * Cosmetics, small fixes, add stashed headers verifier * Remove direct `Era1` support why: Era1 is indirectly supported by using the import tool before syncing. * Clarify database persistent save function. why: Function relied on the last saved state block number which was wrong. It now relies on the tx-level. If it is 0, then data are saved directly. Otherwise the task that owns the tx will do it. * Extracted configuration constants into separate file * Enable single peer mode for debugging * Fix peer losing issue in multi-mode details: Running concurrent download peers was previously programmed as running a batch downloading and storing ~8k headers and then leaving the `async` function to be restarted by a scheduler. This was unfortunate because of occasionally occurring long waiting times for restart. While the time gap until restarting were typically observed a few millisecs, there were always a few outliers which well exceed several seconds. This seemed to let remote peers run into timeouts. * Prefix function names `unprocXxx()` and `stagedYyy()` by `headers` why: There will be other `unproc` and `staged` modules. * Remove cruft, update logging * Fix accounting issue details: When staging after fetching headers from the network, there was an off by 1 error occurring when the result was by one smaller than requested. Also, a whole range was mis-accounted when a peer was terminating connection immediately after responding. * Fix slow/error header accounting when fetching why: Originally set for detecting slow headers in a row, the counter was wrongly extended to general errors. * Ban peers for a while that respond with too few headers continuously why: Some peers only returned one header at a time. If these peers sit on a farm, they might collectively slow down the download process. * Update RPC beacon header updater why: Old function hook has slightly changed its meaning since it was used for snap sync. Also, the old hook is used by other functions already. * Limit number of peers or set to single peer mode details: Merge several concepts, single peer mode being one of it. * Some code clean up, fixings for removing of compiler warnings * De-noise header fetch related sources why: Header download looks relatively stable, so general debugging is not needed, anymore. This is the equivalent of removing the scaffold from the part of the building where work has completed. * More clean up and code prettification for headers stuff * Implement body fetch and block import details: Available headers are used stage blocks by combining existing headers with newly fetched blocks. Then these blocks are imported/executed via `persistBlocks()`. * Logger cosmetics and cleanup * Remove staged block queue debugging details: Feature still available, just not executed anymore * Docu, logging update * Update/simplify `runDaemon()` * Re-calibrate block body requests and soft config for import blocks batch why: * For fetching, larger fetch requests are mostly truncated anyway on MainNet. * For executing, smaller batch sizes reduce the memory needed for the price of longer execution times. * Update metrics counters * Docu update * Some fixes, formatting updates, etc. * Update `borrowed` type: uint -. uint64 also: Always convert to `uint64` rather than `uint` where appropriate
2024-09-27 15:07:42 +00:00
pkg/[chronicles, chronos],
pkg/eth/[common, rlp],
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
pkg/stew/[interval_set, sorted_set],
pkg/results,
"../../.."/[common, core/chain, db/storage_types],
../worker_desc,
Beacon sync update multi exe heads aware (#2861) * Log/trace cancellation events in scheduler * Provide `clear()` functions for explicitly flushing data objects * Renaming header cache functions why: More systematic, all functions start with prefix `dbHeader` * Remove `danglingParent` from layout why: Already provided by header cache * Remove `couplerHash` and `headHash` from layout why: No need to cache, `headHash` is unused and `couplerHash` used typically once, only. * Remove `lastLayout` from sync descriptor why: No need to compare changes, saving is always triggered after actively changing the sync layout state * Early reject unsuitable head + finalised header from CL why: The finalised header is only passed by its hash so the header must be fetched somewhere, e.g. from a peer via eth/xx. Also, finalised headers earlier than the `base` from `FC` cannot be handled due to the `Aristo` single state database architecture. Luckily, on a full node, the complete block history is available so unsuitable finalised headers are stored there already which is exploited here to avoid unnecessary network traffic. * Code cosmetics, remove cruft, prettify logging, remove `final` metrics detail: The `final` layout parameter will be deprecated and later removed * Update/re-calibrate syncer logic documentation why: The current implementation sucks if the `FC` module changes the canonical branch in the middle of completing a header chain (due to concurrent updates by the `newPayload()` logic.) * Implement according to re-calibrated syncer docu details: The implementation employs the notion of named layout states (see `SyncLayoutState` in `worker_desc.nim`) which are derived from the state parameter triple `(C,D,H)` as described in `README.md`.
2024-11-21 16:32:47 +00:00
./headers_unproc
const
LhcStateKey = 1.beaconStateKey
Flare sync (#2627) * Cosmetics, small fixes, add stashed headers verifier * Remove direct `Era1` support why: Era1 is indirectly supported by using the import tool before syncing. * Clarify database persistent save function. why: Function relied on the last saved state block number which was wrong. It now relies on the tx-level. If it is 0, then data are saved directly. Otherwise the task that owns the tx will do it. * Extracted configuration constants into separate file * Enable single peer mode for debugging * Fix peer losing issue in multi-mode details: Running concurrent download peers was previously programmed as running a batch downloading and storing ~8k headers and then leaving the `async` function to be restarted by a scheduler. This was unfortunate because of occasionally occurring long waiting times for restart. While the time gap until restarting were typically observed a few millisecs, there were always a few outliers which well exceed several seconds. This seemed to let remote peers run into timeouts. * Prefix function names `unprocXxx()` and `stagedYyy()` by `headers` why: There will be other `unproc` and `staged` modules. * Remove cruft, update logging * Fix accounting issue details: When staging after fetching headers from the network, there was an off by 1 error occurring when the result was by one smaller than requested. Also, a whole range was mis-accounted when a peer was terminating connection immediately after responding. * Fix slow/error header accounting when fetching why: Originally set for detecting slow headers in a row, the counter was wrongly extended to general errors. * Ban peers for a while that respond with too few headers continuously why: Some peers only returned one header at a time. If these peers sit on a farm, they might collectively slow down the download process. * Update RPC beacon header updater why: Old function hook has slightly changed its meaning since it was used for snap sync. Also, the old hook is used by other functions already. * Limit number of peers or set to single peer mode details: Merge several concepts, single peer mode being one of it. * Some code clean up, fixings for removing of compiler warnings * De-noise header fetch related sources why: Header download looks relatively stable, so general debugging is not needed, anymore. This is the equivalent of removing the scaffold from the part of the building where work has completed. * More clean up and code prettification for headers stuff * Implement body fetch and block import details: Available headers are used stage blocks by combining existing headers with newly fetched blocks. Then these blocks are imported/executed via `persistBlocks()`. * Logger cosmetics and cleanup * Remove staged block queue debugging details: Feature still available, just not executed anymore * Docu, logging update * Update/simplify `runDaemon()` * Re-calibrate block body requests and soft config for import blocks batch why: * For fetching, larger fetch requests are mostly truncated anyway on MainNet. * For executing, smaller batch sizes reduce the memory needed for the price of longer execution times. * Update metrics counters * Docu update * Some fixes, formatting updates, etc. * Update `borrowed` type: uint -. uint64 also: Always convert to `uint64` rather than `uint` where appropriate
2024-09-27 15:07:42 +00:00
# ------------------------------------------------------------------------------
# Private helpers
# ------------------------------------------------------------------------------
proc fetchSyncStateLayout(ctx: BeaconCtxRef): Opt[SyncStateLayout] =
let data = ctx.db.ctx.getKvt().get(LhcStateKey.toOpenArray).valueOr:
return err()
try:
return ok(rlp.decode(data, SyncStateLayout))
except RlpError:
discard
Flare sync (#2627) * Cosmetics, small fixes, add stashed headers verifier * Remove direct `Era1` support why: Era1 is indirectly supported by using the import tool before syncing. * Clarify database persistent save function. why: Function relied on the last saved state block number which was wrong. It now relies on the tx-level. If it is 0, then data are saved directly. Otherwise the task that owns the tx will do it. * Extracted configuration constants into separate file * Enable single peer mode for debugging * Fix peer losing issue in multi-mode details: Running concurrent download peers was previously programmed as running a batch downloading and storing ~8k headers and then leaving the `async` function to be restarted by a scheduler. This was unfortunate because of occasionally occurring long waiting times for restart. While the time gap until restarting were typically observed a few millisecs, there were always a few outliers which well exceed several seconds. This seemed to let remote peers run into timeouts. * Prefix function names `unprocXxx()` and `stagedYyy()` by `headers` why: There will be other `unproc` and `staged` modules. * Remove cruft, update logging * Fix accounting issue details: When staging after fetching headers from the network, there was an off by 1 error occurring when the result was by one smaller than requested. Also, a whole range was mis-accounted when a peer was terminating connection immediately after responding. * Fix slow/error header accounting when fetching why: Originally set for detecting slow headers in a row, the counter was wrongly extended to general errors. * Ban peers for a while that respond with too few headers continuously why: Some peers only returned one header at a time. If these peers sit on a farm, they might collectively slow down the download process. * Update RPC beacon header updater why: Old function hook has slightly changed its meaning since it was used for snap sync. Also, the old hook is used by other functions already. * Limit number of peers or set to single peer mode details: Merge several concepts, single peer mode being one of it. * Some code clean up, fixings for removing of compiler warnings * De-noise header fetch related sources why: Header download looks relatively stable, so general debugging is not needed, anymore. This is the equivalent of removing the scaffold from the part of the building where work has completed. * More clean up and code prettification for headers stuff * Implement body fetch and block import details: Available headers are used stage blocks by combining existing headers with newly fetched blocks. Then these blocks are imported/executed via `persistBlocks()`. * Logger cosmetics and cleanup * Remove staged block queue debugging details: Feature still available, just not executed anymore * Docu, logging update * Update/simplify `runDaemon()` * Re-calibrate block body requests and soft config for import blocks batch why: * For fetching, larger fetch requests are mostly truncated anyway on MainNet. * For executing, smaller batch sizes reduce the memory needed for the price of longer execution times. * Update metrics counters * Docu update * Some fixes, formatting updates, etc. * Update `borrowed` type: uint -. uint64 also: Always convert to `uint64` rather than `uint` where appropriate
2024-09-27 15:07:42 +00:00
err()
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
proc deleteStaleHeadersAndState(
ctx: BeaconCtxRef;
upTo: BlockNumber;
info: static[string];
) =
## Delete stale headers and state
let
kvt = ctx.db.ctx.getKvt()
stateNum = ctx.db.getSavedStateBlockNumber() # for persisting
var bn = upTo
while 0 < bn and kvt.hasKey(beaconHeaderKey(bn).toOpenArray):
discard kvt.del(beaconHeaderKey(bn).toOpenArray)
bn.dec
# Occasionallly persist the deleted headers. This will succeed if
# this function is called early enough after restart when there is
# no database transaction pending.
if (upTo - bn) mod 8192 == 0:
ctx.db.persistent(stateNum).isOkOr:
debug info & ": cannot persist deleted sync headers", error=($$error)
# So be it, stop here.
return
# Delete persistent state, there will be no use of it anymore
discard kvt.del(LhcStateKey.toOpenArray)
ctx.db.persistent(stateNum).isOkOr:
debug info & ": cannot persist deleted sync headers", error=($$error)
return
if bn < upTo:
debug info & ": deleted stale sync headers", iv=BnRange.new(bn+1,upTo)
# ------------------------------------------------------------------------------
# Public functions
# ------------------------------------------------------------------------------
Beacon sync activation control update (#2782) * Clarifying/commenting FCU setup condition & small fixes, comments etc. * Update some logging * Reorg metrics updater and activation * Better `async` responsiveness why: Block import does not allow `async` task activation while executing. So allow potential switch after each imported block (rather than a group of 32 blocks.) * Handle resuming after previous sync followed by import why: In this case the ledger state is more recent than the saved sync state. So this is considered a pristine sync where any previous sync state is forgotten. This fixes some assert thrown because of inconsistent internal state at some point. * Provide option for clearing saved beacon sync state before starting syncer why: It would resume with the last state otherwise which might be undesired sometimes. Without RPC available, the syncer typically stops and terminates with the canonical head larger than the base/finalised head. The latter one will be saved as database/ledger state and the canonical head as syncer target. Resuming syncing here will repeat itself. So clearing the syncer state can prevent from starting the syncer unnecessarily avoiding useless actions. * Allow workers to request syncer shutdown from within why: In one-trick-pony mode (after resuming without RPC support) the syncer can be stopped from within soavoiding unnecessary polling. In that case, the syncer can (theoretically) be restarted externally with `startSync()`. * Terminate beacon sync after a single run target is reached why: Stops doing useless polling (typically when there is no RPC available) * Remove crufty comments * Tighten state reload condition when resuming why: Some pathological case might apply if the syncer is stopped while the distance between finalised block and head is very large and the FCU base becomes larger than the locked finalised state. * Verify that finalised number from CL is at least FCU base number why: The FCU base number is determined by the database, non zero if manually imported. The finalised number is passed via RPC by the CL node and will increase over time. Unless fully synced, this number will be pretty low. On the other hand, the FCU call `forkChoice()` will eventually fail if the `finalizedHash` argument refers to something outside the internal chain starting at the FCU base block. * Remove support for completing interrupted sync without RPC support why: Simplifies start/stop logic * Rmove unused import
2024-10-28 16:22:04 +00:00
proc dbStoreSyncStateLayout*(ctx: BeaconCtxRef; info: static[string]) =
## Save chain layout to persistent db
let data = rlp.encode(ctx.layout)
ctx.db.ctx.getKvt().put(LhcStateKey.toOpenArray, data).isOkOr:
raiseAssert info & " put() failed: " & $$error
# While executing blocks there are frequent save cycles. Otherwise, an
# extra save request might help to pick up an interrupted sync session.
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
if ctx.db.level() == 0 and ctx.stash.len == 0:
Flare sync (#2627) * Cosmetics, small fixes, add stashed headers verifier * Remove direct `Era1` support why: Era1 is indirectly supported by using the import tool before syncing. * Clarify database persistent save function. why: Function relied on the last saved state block number which was wrong. It now relies on the tx-level. If it is 0, then data are saved directly. Otherwise the task that owns the tx will do it. * Extracted configuration constants into separate file * Enable single peer mode for debugging * Fix peer losing issue in multi-mode details: Running concurrent download peers was previously programmed as running a batch downloading and storing ~8k headers and then leaving the `async` function to be restarted by a scheduler. This was unfortunate because of occasionally occurring long waiting times for restart. While the time gap until restarting were typically observed a few millisecs, there were always a few outliers which well exceed several seconds. This seemed to let remote peers run into timeouts. * Prefix function names `unprocXxx()` and `stagedYyy()` by `headers` why: There will be other `unproc` and `staged` modules. * Remove cruft, update logging * Fix accounting issue details: When staging after fetching headers from the network, there was an off by 1 error occurring when the result was by one smaller than requested. Also, a whole range was mis-accounted when a peer was terminating connection immediately after responding. * Fix slow/error header accounting when fetching why: Originally set for detecting slow headers in a row, the counter was wrongly extended to general errors. * Ban peers for a while that respond with too few headers continuously why: Some peers only returned one header at a time. If these peers sit on a farm, they might collectively slow down the download process. * Update RPC beacon header updater why: Old function hook has slightly changed its meaning since it was used for snap sync. Also, the old hook is used by other functions already. * Limit number of peers or set to single peer mode details: Merge several concepts, single peer mode being one of it. * Some code clean up, fixings for removing of compiler warnings * De-noise header fetch related sources why: Header download looks relatively stable, so general debugging is not needed, anymore. This is the equivalent of removing the scaffold from the part of the building where work has completed. * More clean up and code prettification for headers stuff * Implement body fetch and block import details: Available headers are used stage blocks by combining existing headers with newly fetched blocks. Then these blocks are imported/executed via `persistBlocks()`. * Logger cosmetics and cleanup * Remove staged block queue debugging details: Feature still available, just not executed anymore * Docu, logging update * Update/simplify `runDaemon()` * Re-calibrate block body requests and soft config for import blocks batch why: * For fetching, larger fetch requests are mostly truncated anyway on MainNet. * For executing, smaller batch sizes reduce the memory needed for the price of longer execution times. * Update metrics counters * Docu update * Some fixes, formatting updates, etc. * Update `borrowed` type: uint -. uint64 also: Always convert to `uint64` rather than `uint` where appropriate
2024-09-27 15:07:42 +00:00
let number = ctx.db.getSavedStateBlockNumber()
ctx.db.persistent(number).isOkOr:
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
raiseAssert info & " persistent() failed: " & $$error
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
proc dbLoadSyncStateLayout*(ctx: BeaconCtxRef; info: static[string]): bool =
## Restore chain layout from persistent db. It returns `true` if a previous
## state could be loaded, and `false` if a new state was created.
let
rc = ctx.fetchSyncStateLayout()
latest = ctx.chain.latestNumber()
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
# If there was a manual import after a previous sync, then saved state
# might be outdated.
Beacon sync activation control update (#2782) * Clarifying/commenting FCU setup condition & small fixes, comments etc. * Update some logging * Reorg metrics updater and activation * Better `async` responsiveness why: Block import does not allow `async` task activation while executing. So allow potential switch after each imported block (rather than a group of 32 blocks.) * Handle resuming after previous sync followed by import why: In this case the ledger state is more recent than the saved sync state. So this is considered a pristine sync where any previous sync state is forgotten. This fixes some assert thrown because of inconsistent internal state at some point. * Provide option for clearing saved beacon sync state before starting syncer why: It would resume with the last state otherwise which might be undesired sometimes. Without RPC available, the syncer typically stops and terminates with the canonical head larger than the base/finalised head. The latter one will be saved as database/ledger state and the canonical head as syncer target. Resuming syncing here will repeat itself. So clearing the syncer state can prevent from starting the syncer unnecessarily avoiding useless actions. * Allow workers to request syncer shutdown from within why: In one-trick-pony mode (after resuming without RPC support) the syncer can be stopped from within soavoiding unnecessary polling. In that case, the syncer can (theoretically) be restarted externally with `startSync()`. * Terminate beacon sync after a single run target is reached why: Stops doing useless polling (typically when there is no RPC available) * Remove crufty comments * Tighten state reload condition when resuming why: Some pathological case might apply if the syncer is stopped while the distance between finalised block and head is very large and the FCU base becomes larger than the locked finalised state. * Verify that finalised number from CL is at least FCU base number why: The FCU base number is determined by the database, non zero if manually imported. The finalised number is passed via RPC by the CL node and will increase over time. Unless fully synced, this number will be pretty low. On the other hand, the FCU call `forkChoice()` will eventually fail if the `finalizedHash` argument refers to something outside the internal chain starting at the FCU base block. * Remove support for completing interrupted sync without RPC support why: Simplifies start/stop logic * Rmove unused import
2024-10-28 16:22:04 +00:00
if rc.isOk and
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
# The base number is the least record of the FCU chains/tree. So the
# finalised entry must not be smaller.
Beacon sync activation control update (#2782) * Clarifying/commenting FCU setup condition & small fixes, comments etc. * Update some logging * Reorg metrics updater and activation * Better `async` responsiveness why: Block import does not allow `async` task activation while executing. So allow potential switch after each imported block (rather than a group of 32 blocks.) * Handle resuming after previous sync followed by import why: In this case the ledger state is more recent than the saved sync state. So this is considered a pristine sync where any previous sync state is forgotten. This fixes some assert thrown because of inconsistent internal state at some point. * Provide option for clearing saved beacon sync state before starting syncer why: It would resume with the last state otherwise which might be undesired sometimes. Without RPC available, the syncer typically stops and terminates with the canonical head larger than the base/finalised head. The latter one will be saved as database/ledger state and the canonical head as syncer target. Resuming syncing here will repeat itself. So clearing the syncer state can prevent from starting the syncer unnecessarily avoiding useless actions. * Allow workers to request syncer shutdown from within why: In one-trick-pony mode (after resuming without RPC support) the syncer can be stopped from within soavoiding unnecessary polling. In that case, the syncer can (theoretically) be restarted externally with `startSync()`. * Terminate beacon sync after a single run target is reached why: Stops doing useless polling (typically when there is no RPC available) * Remove crufty comments * Tighten state reload condition when resuming why: Some pathological case might apply if the syncer is stopped while the distance between finalised block and head is very large and the FCU base becomes larger than the locked finalised state. * Verify that finalised number from CL is at least FCU base number why: The FCU base number is determined by the database, non zero if manually imported. The finalised number is passed via RPC by the CL node and will increase over time. Unless fully synced, this number will be pretty low. On the other hand, the FCU call `forkChoice()` will eventually fail if the `finalizedHash` argument refers to something outside the internal chain starting at the FCU base block. * Remove support for completing interrupted sync without RPC support why: Simplifies start/stop logic * Rmove unused import
2024-10-28 16:22:04 +00:00
ctx.chain.baseNumber() <= rc.value.final and
Beacon sync update multi exe heads aware (#2861) * Log/trace cancellation events in scheduler * Provide `clear()` functions for explicitly flushing data objects * Renaming header cache functions why: More systematic, all functions start with prefix `dbHeader` * Remove `danglingParent` from layout why: Already provided by header cache * Remove `couplerHash` and `headHash` from layout why: No need to cache, `headHash` is unused and `couplerHash` used typically once, only. * Remove `lastLayout` from sync descriptor why: No need to compare changes, saving is always triggered after actively changing the sync layout state * Early reject unsuitable head + finalised header from CL why: The finalised header is only passed by its hash so the header must be fetched somewhere, e.g. from a peer via eth/xx. Also, finalised headers earlier than the `base` from `FC` cannot be handled due to the `Aristo` single state database architecture. Luckily, on a full node, the complete block history is available so unsuitable finalised headers are stored there already which is exploited here to avoid unnecessary network traffic. * Code cosmetics, remove cruft, prettify logging, remove `final` metrics detail: The `final` layout parameter will be deprecated and later removed * Update/re-calibrate syncer logic documentation why: The current implementation sucks if the `FC` module changes the canonical branch in the middle of completing a header chain (due to concurrent updates by the `newPayload()` logic.) * Implement according to re-calibrated syncer docu details: The implementation employs the notion of named layout states (see `SyncLayoutState` in `worker_desc.nim`) which are derived from the state parameter triple `(C,D,H)` as described in `README.md`.
2024-11-21 16:32:47 +00:00
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
# If the latest FCU number is not larger than the head, there is nothing
# to do (might also happen after a manual import.)
Beacon sync update multi exe heads aware (#2861) * Log/trace cancellation events in scheduler * Provide `clear()` functions for explicitly flushing data objects * Renaming header cache functions why: More systematic, all functions start with prefix `dbHeader` * Remove `danglingParent` from layout why: Already provided by header cache * Remove `couplerHash` and `headHash` from layout why: No need to cache, `headHash` is unused and `couplerHash` used typically once, only. * Remove `lastLayout` from sync descriptor why: No need to compare changes, saving is always triggered after actively changing the sync layout state * Early reject unsuitable head + finalised header from CL why: The finalised header is only passed by its hash so the header must be fetched somewhere, e.g. from a peer via eth/xx. Also, finalised headers earlier than the `base` from `FC` cannot be handled due to the `Aristo` single state database architecture. Luckily, on a full node, the complete block history is available so unsuitable finalised headers are stored there already which is exploited here to avoid unnecessary network traffic. * Code cosmetics, remove cruft, prettify logging, remove `final` metrics detail: The `final` layout parameter will be deprecated and later removed * Update/re-calibrate syncer logic documentation why: The current implementation sucks if the `FC` module changes the canonical branch in the middle of completing a header chain (due to concurrent updates by the `newPayload()` logic.) * Implement according to re-calibrated syncer docu details: The implementation employs the notion of named layout states (see `SyncLayoutState` in `worker_desc.nim`) which are derived from the state parameter triple `(C,D,H)` as described in `README.md`.
2024-11-21 16:32:47 +00:00
latest < rc.value.head and
# Can only resume a header download. Blocks need to be set up from scratch.
rc.value.lastState == collectingHeaders:
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
# Assign saved sync state
ctx.sst.layout = rc.value
# Add interval of unprocessed header range `(C,D)` from `README.md`
ctx.headersUnprocSet(ctx.layout.coupler+1, ctx.layout.dangling-1)
trace info & ": restored syncer state", L=latest.bnStr,
C=ctx.layout.coupler.bnStr, D=ctx.layout.dangling.bnStr,
H=ctx.layout.head.bnStr
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
true
else:
Beacon sync update multi exe heads aware (#2861) * Log/trace cancellation events in scheduler * Provide `clear()` functions for explicitly flushing data objects * Renaming header cache functions why: More systematic, all functions start with prefix `dbHeader` * Remove `danglingParent` from layout why: Already provided by header cache * Remove `couplerHash` and `headHash` from layout why: No need to cache, `headHash` is unused and `couplerHash` used typically once, only. * Remove `lastLayout` from sync descriptor why: No need to compare changes, saving is always triggered after actively changing the sync layout state * Early reject unsuitable head + finalised header from CL why: The finalised header is only passed by its hash so the header must be fetched somewhere, e.g. from a peer via eth/xx. Also, finalised headers earlier than the `base` from `FC` cannot be handled due to the `Aristo` single state database architecture. Luckily, on a full node, the complete block history is available so unsuitable finalised headers are stored there already which is exploited here to avoid unnecessary network traffic. * Code cosmetics, remove cruft, prettify logging, remove `final` metrics detail: The `final` layout parameter will be deprecated and later removed * Update/re-calibrate syncer logic documentation why: The current implementation sucks if the `FC` module changes the canonical branch in the middle of completing a header chain (due to concurrent updates by the `newPayload()` logic.) * Implement according to re-calibrated syncer docu details: The implementation employs the notion of named layout states (see `SyncLayoutState` in `worker_desc.nim`) which are derived from the state parameter triple `(C,D,H)` as described in `README.md`.
2024-11-21 16:32:47 +00:00
ctx.sst.layout = SyncStateLayout() # empty layout
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
if rc.isOk:
# Some stored headers might have become stale, so delete them. Even
# though it is not critical, stale headers just stay on the database
# forever occupying space without purpose. Also, delete the state record.
# After deleting headers, the state record becomes stale as well.
if rc.value.head <= latest:
# After manual import, the `latest` state might be ahead of the old
# `head` which leaves a gap `(rc.value.head,latest)` of missing headers.
# So the `deleteStaleHeadersAndState()` clean up routine needs to start
# at the `head` and work backwards.
ctx.deleteStaleHeadersAndState(rc.value.head, info)
else:
Beacon sync update multi exe heads aware (#2861) * Log/trace cancellation events in scheduler * Provide `clear()` functions for explicitly flushing data objects * Renaming header cache functions why: More systematic, all functions start with prefix `dbHeader` * Remove `danglingParent` from layout why: Already provided by header cache * Remove `couplerHash` and `headHash` from layout why: No need to cache, `headHash` is unused and `couplerHash` used typically once, only. * Remove `lastLayout` from sync descriptor why: No need to compare changes, saving is always triggered after actively changing the sync layout state * Early reject unsuitable head + finalised header from CL why: The finalised header is only passed by its hash so the header must be fetched somewhere, e.g. from a peer via eth/xx. Also, finalised headers earlier than the `base` from `FC` cannot be handled due to the `Aristo` single state database architecture. Luckily, on a full node, the complete block history is available so unsuitable finalised headers are stored there already which is exploited here to avoid unnecessary network traffic. * Code cosmetics, remove cruft, prettify logging, remove `final` metrics detail: The `final` layout parameter will be deprecated and later removed * Update/re-calibrate syncer logic documentation why: The current implementation sucks if the `FC` module changes the canonical branch in the middle of completing a header chain (due to concurrent updates by the `newPayload()` logic.) * Implement according to re-calibrated syncer docu details: The implementation employs the notion of named layout states (see `SyncLayoutState` in `worker_desc.nim`) which are derived from the state parameter triple `(C,D,H)` as described in `README.md`.
2024-11-21 16:32:47 +00:00
# Delete stale headers with block numbers starting at to `latest` while
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
# working backwards.
ctx.deleteStaleHeadersAndState(latest, info)
false
# ------------------
Beacon sync update multi exe heads aware (#2861) * Log/trace cancellation events in scheduler * Provide `clear()` functions for explicitly flushing data objects * Renaming header cache functions why: More systematic, all functions start with prefix `dbHeader` * Remove `danglingParent` from layout why: Already provided by header cache * Remove `couplerHash` and `headHash` from layout why: No need to cache, `headHash` is unused and `couplerHash` used typically once, only. * Remove `lastLayout` from sync descriptor why: No need to compare changes, saving is always triggered after actively changing the sync layout state * Early reject unsuitable head + finalised header from CL why: The finalised header is only passed by its hash so the header must be fetched somewhere, e.g. from a peer via eth/xx. Also, finalised headers earlier than the `base` from `FC` cannot be handled due to the `Aristo` single state database architecture. Luckily, on a full node, the complete block history is available so unsuitable finalised headers are stored there already which is exploited here to avoid unnecessary network traffic. * Code cosmetics, remove cruft, prettify logging, remove `final` metrics detail: The `final` layout parameter will be deprecated and later removed * Update/re-calibrate syncer logic documentation why: The current implementation sucks if the `FC` module changes the canonical branch in the middle of completing a header chain (due to concurrent updates by the `newPayload()` logic.) * Implement according to re-calibrated syncer docu details: The implementation employs the notion of named layout states (see `SyncLayoutState` in `worker_desc.nim`) which are derived from the state parameter triple `(C,D,H)` as described in `README.md`.
2024-11-21 16:32:47 +00:00
proc dbHeadersClear*(ctx: BeaconCtxRef) =
## Clear stashed in-memory headers
ctx.stash.clear
proc dbHeadersStash*(
ctx: BeaconCtxRef;
first: BlockNumber;
revBlobs: openArray[seq[byte]];
Beacon sync activation control update (#2782) * Clarifying/commenting FCU setup condition & small fixes, comments etc. * Update some logging * Reorg metrics updater and activation * Better `async` responsiveness why: Block import does not allow `async` task activation while executing. So allow potential switch after each imported block (rather than a group of 32 blocks.) * Handle resuming after previous sync followed by import why: In this case the ledger state is more recent than the saved sync state. So this is considered a pristine sync where any previous sync state is forgotten. This fixes some assert thrown because of inconsistent internal state at some point. * Provide option for clearing saved beacon sync state before starting syncer why: It would resume with the last state otherwise which might be undesired sometimes. Without RPC available, the syncer typically stops and terminates with the canonical head larger than the base/finalised head. The latter one will be saved as database/ledger state and the canonical head as syncer target. Resuming syncing here will repeat itself. So clearing the syncer state can prevent from starting the syncer unnecessarily avoiding useless actions. * Allow workers to request syncer shutdown from within why: In one-trick-pony mode (after resuming without RPC support) the syncer can be stopped from within soavoiding unnecessary polling. In that case, the syncer can (theoretically) be restarted externally with `startSync()`. * Terminate beacon sync after a single run target is reached why: Stops doing useless polling (typically when there is no RPC available) * Remove crufty comments * Tighten state reload condition when resuming why: Some pathological case might apply if the syncer is stopped while the distance between finalised block and head is very large and the FCU base becomes larger than the locked finalised state. * Verify that finalised number from CL is at least FCU base number why: The FCU base number is determined by the database, non zero if manually imported. The finalised number is passed via RPC by the CL node and will increase over time. Unless fully synced, this number will be pretty low. On the other hand, the FCU call `forkChoice()` will eventually fail if the `finalizedHash` argument refers to something outside the internal chain starting at the FCU base block. * Remove support for completing interrupted sync without RPC support why: Simplifies start/stop logic * Rmove unused import
2024-10-28 16:22:04 +00:00
info: static[string];
) =
## Temporarily store header chain to persistent db (oblivious of the chain
Flare sync (#2627) * Cosmetics, small fixes, add stashed headers verifier * Remove direct `Era1` support why: Era1 is indirectly supported by using the import tool before syncing. * Clarify database persistent save function. why: Function relied on the last saved state block number which was wrong. It now relies on the tx-level. If it is 0, then data are saved directly. Otherwise the task that owns the tx will do it. * Extracted configuration constants into separate file * Enable single peer mode for debugging * Fix peer losing issue in multi-mode details: Running concurrent download peers was previously programmed as running a batch downloading and storing ~8k headers and then leaving the `async` function to be restarted by a scheduler. This was unfortunate because of occasionally occurring long waiting times for restart. While the time gap until restarting were typically observed a few millisecs, there were always a few outliers which well exceed several seconds. This seemed to let remote peers run into timeouts. * Prefix function names `unprocXxx()` and `stagedYyy()` by `headers` why: There will be other `unproc` and `staged` modules. * Remove cruft, update logging * Fix accounting issue details: When staging after fetching headers from the network, there was an off by 1 error occurring when the result was by one smaller than requested. Also, a whole range was mis-accounted when a peer was terminating connection immediately after responding. * Fix slow/error header accounting when fetching why: Originally set for detecting slow headers in a row, the counter was wrongly extended to general errors. * Ban peers for a while that respond with too few headers continuously why: Some peers only returned one header at a time. If these peers sit on a farm, they might collectively slow down the download process. * Update RPC beacon header updater why: Old function hook has slightly changed its meaning since it was used for snap sync. Also, the old hook is used by other functions already. * Limit number of peers or set to single peer mode details: Merge several concepts, single peer mode being one of it. * Some code clean up, fixings for removing of compiler warnings * De-noise header fetch related sources why: Header download looks relatively stable, so general debugging is not needed, anymore. This is the equivalent of removing the scaffold from the part of the building where work has completed. * More clean up and code prettification for headers stuff * Implement body fetch and block import details: Available headers are used stage blocks by combining existing headers with newly fetched blocks. Then these blocks are imported/executed via `persistBlocks()`. * Logger cosmetics and cleanup * Remove staged block queue debugging details: Feature still available, just not executed anymore * Docu, logging update * Update/simplify `runDaemon()` * Re-calibrate block body requests and soft config for import blocks batch why: * For fetching, larger fetch requests are mostly truncated anyway on MainNet. * For executing, smaller batch sizes reduce the memory needed for the price of longer execution times. * Update metrics counters * Docu update * Some fixes, formatting updates, etc. * Update `borrowed` type: uint -. uint64 also: Always convert to `uint64` rather than `uint` where appropriate
2024-09-27 15:07:42 +00:00
## layout.) The headers should not be stashed if they are imepreted and
## executed on the database, already.
##
## The `revBlobs[]` arguments are passed in reverse order so that block
## numbers apply as
## ::
## #first -- revBlobs[^1]
## #(first+1) -- revBlobs[^2]
## ..
##
let
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
txLevel = ctx.db.level()
Flare sync (#2627) * Cosmetics, small fixes, add stashed headers verifier * Remove direct `Era1` support why: Era1 is indirectly supported by using the import tool before syncing. * Clarify database persistent save function. why: Function relied on the last saved state block number which was wrong. It now relies on the tx-level. If it is 0, then data are saved directly. Otherwise the task that owns the tx will do it. * Extracted configuration constants into separate file * Enable single peer mode for debugging * Fix peer losing issue in multi-mode details: Running concurrent download peers was previously programmed as running a batch downloading and storing ~8k headers and then leaving the `async` function to be restarted by a scheduler. This was unfortunate because of occasionally occurring long waiting times for restart. While the time gap until restarting were typically observed a few millisecs, there were always a few outliers which well exceed several seconds. This seemed to let remote peers run into timeouts. * Prefix function names `unprocXxx()` and `stagedYyy()` by `headers` why: There will be other `unproc` and `staged` modules. * Remove cruft, update logging * Fix accounting issue details: When staging after fetching headers from the network, there was an off by 1 error occurring when the result was by one smaller than requested. Also, a whole range was mis-accounted when a peer was terminating connection immediately after responding. * Fix slow/error header accounting when fetching why: Originally set for detecting slow headers in a row, the counter was wrongly extended to general errors. * Ban peers for a while that respond with too few headers continuously why: Some peers only returned one header at a time. If these peers sit on a farm, they might collectively slow down the download process. * Update RPC beacon header updater why: Old function hook has slightly changed its meaning since it was used for snap sync. Also, the old hook is used by other functions already. * Limit number of peers or set to single peer mode details: Merge several concepts, single peer mode being one of it. * Some code clean up, fixings for removing of compiler warnings * De-noise header fetch related sources why: Header download looks relatively stable, so general debugging is not needed, anymore. This is the equivalent of removing the scaffold from the part of the building where work has completed. * More clean up and code prettification for headers stuff * Implement body fetch and block import details: Available headers are used stage blocks by combining existing headers with newly fetched blocks. Then these blocks are imported/executed via `persistBlocks()`. * Logger cosmetics and cleanup * Remove staged block queue debugging details: Feature still available, just not executed anymore * Docu, logging update * Update/simplify `runDaemon()` * Re-calibrate block body requests and soft config for import blocks batch why: * For fetching, larger fetch requests are mostly truncated anyway on MainNet. * For executing, smaller batch sizes reduce the memory needed for the price of longer execution times. * Update metrics counters * Docu update * Some fixes, formatting updates, etc. * Update `borrowed` type: uint -. uint64 also: Always convert to `uint64` rather than `uint` where appropriate
2024-09-27 15:07:42 +00:00
last = first + revBlobs.len.uint64 - 1
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
if 0 < txLevel:
# Need to cache it because FCU has blocked writing through to disk.
for n,data in revBlobs:
ctx.stash[last - n.uint64] = data
else:
let kvt = ctx.db.ctx.getKvt()
for n,data in revBlobs:
let key = beaconHeaderKey(last - n.uint64)
kvt.put(key.toOpenArray, data).isOkOr:
raiseAssert info & ": put() failed: " & $$error
Beacon sync update multi exe heads aware (#2861) * Log/trace cancellation events in scheduler * Provide `clear()` functions for explicitly flushing data objects * Renaming header cache functions why: More systematic, all functions start with prefix `dbHeader` * Remove `danglingParent` from layout why: Already provided by header cache * Remove `couplerHash` and `headHash` from layout why: No need to cache, `headHash` is unused and `couplerHash` used typically once, only. * Remove `lastLayout` from sync descriptor why: No need to compare changes, saving is always triggered after actively changing the sync layout state * Early reject unsuitable head + finalised header from CL why: The finalised header is only passed by its hash so the header must be fetched somewhere, e.g. from a peer via eth/xx. Also, finalised headers earlier than the `base` from `FC` cannot be handled due to the `Aristo` single state database architecture. Luckily, on a full node, the complete block history is available so unsuitable finalised headers are stored there already which is exploited here to avoid unnecessary network traffic. * Code cosmetics, remove cruft, prettify logging, remove `final` metrics detail: The `final` layout parameter will be deprecated and later removed * Update/re-calibrate syncer logic documentation why: The current implementation sucks if the `FC` module changes the canonical branch in the middle of completing a header chain (due to concurrent updates by the `newPayload()` logic.) * Implement according to re-calibrated syncer docu details: The implementation employs the notion of named layout states (see `SyncLayoutState` in `worker_desc.nim`) which are derived from the state parameter triple `(C,D,H)` as described in `README.md`.
2024-11-21 16:32:47 +00:00
proc dbHeaderPeek*(ctx: BeaconCtxRef; num: BlockNumber): Opt[Header] =
## Retrieve some stashed header.
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
# Try cache first
ctx.stash.withValue(num, val):
try:
return ok(rlp.decode(val[], Header))
except RlpError:
discard
# Use persistent storage next
let
key = beaconHeaderKey(num)
rc = ctx.db.ctx.getKvt().get(key.toOpenArray)
if rc.isOk:
try:
return ok(rlp.decode(rc.value, Header))
except RlpError:
discard
err()
Beacon sync update multi exe heads aware (#2861) * Log/trace cancellation events in scheduler * Provide `clear()` functions for explicitly flushing data objects * Renaming header cache functions why: More systematic, all functions start with prefix `dbHeader` * Remove `danglingParent` from layout why: Already provided by header cache * Remove `couplerHash` and `headHash` from layout why: No need to cache, `headHash` is unused and `couplerHash` used typically once, only. * Remove `lastLayout` from sync descriptor why: No need to compare changes, saving is always triggered after actively changing the sync layout state * Early reject unsuitable head + finalised header from CL why: The finalised header is only passed by its hash so the header must be fetched somewhere, e.g. from a peer via eth/xx. Also, finalised headers earlier than the `base` from `FC` cannot be handled due to the `Aristo` single state database architecture. Luckily, on a full node, the complete block history is available so unsuitable finalised headers are stored there already which is exploited here to avoid unnecessary network traffic. * Code cosmetics, remove cruft, prettify logging, remove `final` metrics detail: The `final` layout parameter will be deprecated and later removed * Update/re-calibrate syncer logic documentation why: The current implementation sucks if the `FC` module changes the canonical branch in the middle of completing a header chain (due to concurrent updates by the `newPayload()` logic.) * Implement according to re-calibrated syncer docu details: The implementation employs the notion of named layout states (see `SyncLayoutState` in `worker_desc.nim`) which are derived from the state parameter triple `(C,D,H)` as described in `README.md`.
2024-11-21 16:32:47 +00:00
proc dbHeaderParentHash*(ctx: BeaconCtxRef; num: BlockNumber): Opt[Hash32] =
## Retrieve some stashed parent hash.
Beacon sync update multi exe heads aware (#2861) * Log/trace cancellation events in scheduler * Provide `clear()` functions for explicitly flushing data objects * Renaming header cache functions why: More systematic, all functions start with prefix `dbHeader` * Remove `danglingParent` from layout why: Already provided by header cache * Remove `couplerHash` and `headHash` from layout why: No need to cache, `headHash` is unused and `couplerHash` used typically once, only. * Remove `lastLayout` from sync descriptor why: No need to compare changes, saving is always triggered after actively changing the sync layout state * Early reject unsuitable head + finalised header from CL why: The finalised header is only passed by its hash so the header must be fetched somewhere, e.g. from a peer via eth/xx. Also, finalised headers earlier than the `base` from `FC` cannot be handled due to the `Aristo` single state database architecture. Luckily, on a full node, the complete block history is available so unsuitable finalised headers are stored there already which is exploited here to avoid unnecessary network traffic. * Code cosmetics, remove cruft, prettify logging, remove `final` metrics detail: The `final` layout parameter will be deprecated and later removed * Update/re-calibrate syncer logic documentation why: The current implementation sucks if the `FC` module changes the canonical branch in the middle of completing a header chain (due to concurrent updates by the `newPayload()` logic.) * Implement according to re-calibrated syncer docu details: The implementation employs the notion of named layout states (see `SyncLayoutState` in `worker_desc.nim`) which are derived from the state parameter triple `(C,D,H)` as described in `README.md`.
2024-11-21 16:32:47 +00:00
ok (? ctx.dbHeaderPeek num).parentHash
Beacon sync update multi exe heads aware (#2861) * Log/trace cancellation events in scheduler * Provide `clear()` functions for explicitly flushing data objects * Renaming header cache functions why: More systematic, all functions start with prefix `dbHeader` * Remove `danglingParent` from layout why: Already provided by header cache * Remove `couplerHash` and `headHash` from layout why: No need to cache, `headHash` is unused and `couplerHash` used typically once, only. * Remove `lastLayout` from sync descriptor why: No need to compare changes, saving is always triggered after actively changing the sync layout state * Early reject unsuitable head + finalised header from CL why: The finalised header is only passed by its hash so the header must be fetched somewhere, e.g. from a peer via eth/xx. Also, finalised headers earlier than the `base` from `FC` cannot be handled due to the `Aristo` single state database architecture. Luckily, on a full node, the complete block history is available so unsuitable finalised headers are stored there already which is exploited here to avoid unnecessary network traffic. * Code cosmetics, remove cruft, prettify logging, remove `final` metrics detail: The `final` layout parameter will be deprecated and later removed * Update/re-calibrate syncer logic documentation why: The current implementation sucks if the `FC` module changes the canonical branch in the middle of completing a header chain (due to concurrent updates by the `newPayload()` logic.) * Implement according to re-calibrated syncer docu details: The implementation employs the notion of named layout states (see `SyncLayoutState` in `worker_desc.nim`) which are derived from the state parameter triple `(C,D,H)` as described in `README.md`.
2024-11-21 16:32:47 +00:00
proc dbHeaderUnstash*(ctx: BeaconCtxRef; bn: BlockNumber) =
Flare sync (#2627) * Cosmetics, small fixes, add stashed headers verifier * Remove direct `Era1` support why: Era1 is indirectly supported by using the import tool before syncing. * Clarify database persistent save function. why: Function relied on the last saved state block number which was wrong. It now relies on the tx-level. If it is 0, then data are saved directly. Otherwise the task that owns the tx will do it. * Extracted configuration constants into separate file * Enable single peer mode for debugging * Fix peer losing issue in multi-mode details: Running concurrent download peers was previously programmed as running a batch downloading and storing ~8k headers and then leaving the `async` function to be restarted by a scheduler. This was unfortunate because of occasionally occurring long waiting times for restart. While the time gap until restarting were typically observed a few millisecs, there were always a few outliers which well exceed several seconds. This seemed to let remote peers run into timeouts. * Prefix function names `unprocXxx()` and `stagedYyy()` by `headers` why: There will be other `unproc` and `staged` modules. * Remove cruft, update logging * Fix accounting issue details: When staging after fetching headers from the network, there was an off by 1 error occurring when the result was by one smaller than requested. Also, a whole range was mis-accounted when a peer was terminating connection immediately after responding. * Fix slow/error header accounting when fetching why: Originally set for detecting slow headers in a row, the counter was wrongly extended to general errors. * Ban peers for a while that respond with too few headers continuously why: Some peers only returned one header at a time. If these peers sit on a farm, they might collectively slow down the download process. * Update RPC beacon header updater why: Old function hook has slightly changed its meaning since it was used for snap sync. Also, the old hook is used by other functions already. * Limit number of peers or set to single peer mode details: Merge several concepts, single peer mode being one of it. * Some code clean up, fixings for removing of compiler warnings * De-noise header fetch related sources why: Header download looks relatively stable, so general debugging is not needed, anymore. This is the equivalent of removing the scaffold from the part of the building where work has completed. * More clean up and code prettification for headers stuff * Implement body fetch and block import details: Available headers are used stage blocks by combining existing headers with newly fetched blocks. Then these blocks are imported/executed via `persistBlocks()`. * Logger cosmetics and cleanup * Remove staged block queue debugging details: Feature still available, just not executed anymore * Docu, logging update * Update/simplify `runDaemon()` * Re-calibrate block body requests and soft config for import blocks batch why: * For fetching, larger fetch requests are mostly truncated anyway on MainNet. * For executing, smaller batch sizes reduce the memory needed for the price of longer execution times. * Update metrics counters * Docu update * Some fixes, formatting updates, etc. * Update `borrowed` type: uint -. uint64 also: Always convert to `uint64` rather than `uint` where appropriate
2024-09-27 15:07:42 +00:00
## Remove header from temporary DB list
ctx.stash.withValue(bn, _):
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
ctx.stash.del bn
return
discard ctx.db.ctx.getKvt().del(beaconHeaderKey(bn).toOpenArray)
Flare sync (#2627) * Cosmetics, small fixes, add stashed headers verifier * Remove direct `Era1` support why: Era1 is indirectly supported by using the import tool before syncing. * Clarify database persistent save function. why: Function relied on the last saved state block number which was wrong. It now relies on the tx-level. If it is 0, then data are saved directly. Otherwise the task that owns the tx will do it. * Extracted configuration constants into separate file * Enable single peer mode for debugging * Fix peer losing issue in multi-mode details: Running concurrent download peers was previously programmed as running a batch downloading and storing ~8k headers and then leaving the `async` function to be restarted by a scheduler. This was unfortunate because of occasionally occurring long waiting times for restart. While the time gap until restarting were typically observed a few millisecs, there were always a few outliers which well exceed several seconds. This seemed to let remote peers run into timeouts. * Prefix function names `unprocXxx()` and `stagedYyy()` by `headers` why: There will be other `unproc` and `staged` modules. * Remove cruft, update logging * Fix accounting issue details: When staging after fetching headers from the network, there was an off by 1 error occurring when the result was by one smaller than requested. Also, a whole range was mis-accounted when a peer was terminating connection immediately after responding. * Fix slow/error header accounting when fetching why: Originally set for detecting slow headers in a row, the counter was wrongly extended to general errors. * Ban peers for a while that respond with too few headers continuously why: Some peers only returned one header at a time. If these peers sit on a farm, they might collectively slow down the download process. * Update RPC beacon header updater why: Old function hook has slightly changed its meaning since it was used for snap sync. Also, the old hook is used by other functions already. * Limit number of peers or set to single peer mode details: Merge several concepts, single peer mode being one of it. * Some code clean up, fixings for removing of compiler warnings * De-noise header fetch related sources why: Header download looks relatively stable, so general debugging is not needed, anymore. This is the equivalent of removing the scaffold from the part of the building where work has completed. * More clean up and code prettification for headers stuff * Implement body fetch and block import details: Available headers are used stage blocks by combining existing headers with newly fetched blocks. Then these blocks are imported/executed via `persistBlocks()`. * Logger cosmetics and cleanup * Remove staged block queue debugging details: Feature still available, just not executed anymore * Docu, logging update * Update/simplify `runDaemon()` * Re-calibrate block body requests and soft config for import blocks batch why: * For fetching, larger fetch requests are mostly truncated anyway on MainNet. * For executing, smaller batch sizes reduce the memory needed for the price of longer execution times. * Update metrics counters * Docu update * Some fixes, formatting updates, etc. * Update `borrowed` type: uint -. uint64 also: Always convert to `uint64` rather than `uint` where appropriate
2024-09-27 15:07:42 +00:00
# ------------------------------------------------------------------------------
# End
# ------------------------------------------------------------------------------