10 Commits

Author SHA1 Message Date
Jordan Hrycaj
8bc8094413
Beacon sync fix stalling while block processing (#3037)
* Correct docu

why:
  `T` is mentioned on the metrics table but not explained

* Update state sync ticker

why:
  Print last named state for debugging unexpected states.

* Rename `nec_consensus_head`=> `nec_sync_consensus_head`

why:
  This variable is syncer local, derived from what would be vaguely be
  the consensus head. In fact, at some point it is the consensus head
  but often will keep that value while the consensus head advances.

* Handle legit system state when block processing is cancelled

why:
  This state context was previously missing. It happens with problematic
  blocks (i.e. corrupt or missing.) Rather than trying to remedy the batch
  queue, all will be cancelled and the batch queue rebuilt from scratch.

* Update block queue with unexpectedly missing blocks

why:
  Concurrently serving `RPC` requests might cause a reset of the `FC`
  module data area. This in turn might produce a gap between expected `FC`
  module top and the beginning of the already downloaded blocks list.

  Currently this led to a deadlock situation because the missing blocks
  were never downloaded by the syncer, neither installed into `FC` module
  via `RFC`.

* Fix copyright year
2025-01-29 11:30:40 +00:00
Jordan Hrycaj
184af027dc
Beacon sync metrics managemnt update (#3016)
* Sync scheduler provides an independent `ticker` loop process

why:
  Can be used to update `metrics` and for debug logging. While an event
  driven solution would stall if there are no events at the moment (e.g.
  when the syncer hibernates, the `ticker` will run regardless.

* Use `runTicker()` loop interface alike for updating ticker

why:
  Not event driven anymore so it will not stall when the syncer
  hibernates.

* Re-implement logging ticker by running it within the `runTicker()` driver

why:
  Simplifies implementation

* Re-name metrics variable to better fit into the current naming schemes

* Fix copyright header
2025-01-22 10:12:50 +00:00
Jordan Hrycaj
3aae33d6cc
Beacon sync maintenance update (#3012)
* Force metrics update when peers vanish

why:
  After that there might be reduced activity so that the next metrics
  update is delayed.

* Update comments (code cosmetics)

* Tidy up nano-sleep wait directives to an `update.nim`-function

* Fix copyright year
2025-01-20 10:30:04 +00:00
Jordan Hrycaj
0ce5234231
Beacon sync mitigate deadlock with bogus sticky peers (#2943)
* Metrics cosmetics

* Better naming for error threshold constants

* Treating header/body process error different from response errors

why:
  Error handling becomes active not until some consecutive failures
  appear. As both types of errors may interleave (i.g. no response
  errors) the counter reset for one type might affect the other.

  By doing it wrong, a peer might send repeatedly a bogus block so
  locking in the syncer in an endless loop.
2024-12-16 16:26:38 +00:00
Jordan Hrycaj
a241050c94
Beacon sync update multi exe heads aware (#2861)
* Log/trace cancellation events in scheduler

* Provide `clear()` functions for explicitly flushing data objects

* Renaming header cache functions

why:
  More systematic, all functions start with prefix `dbHeader`

* Remove `danglingParent` from layout

why:
  Already provided by header cache

* Remove `couplerHash` and `headHash` from layout

why:
  No need to cache, `headHash` is unused and `couplerHash` used typically
  once, only.

* Remove `lastLayout` from sync descriptor

why:
  No need to compare changes, saving is always triggered after actively
  changing the sync layout state

* Early reject unsuitable head + finalised header from CL

why:
  The finalised header is only passed by its hash so the header must be
  fetched somewhere, e.g. from a peer via eth/xx.

  Also, finalised headers earlier than the `base` from `FC` cannot be
  handled due to the `Aristo` single state database architecture.

  Luckily, on a full node, the complete block history is available so
  unsuitable finalised headers are stored there already which is exploited
  here to avoid unnecessary network traffic.

* Code cosmetics, remove cruft, prettify logging, remove `final` metrics

detail:
  The `final` layout parameter will be deprecated and later removed

* Update/re-calibrate syncer logic documentation

why:
  The current implementation sucks if the `FC` module changes the
  canonical branch in the middle of completing a header chain (due
  to concurrent updates by the `newPayload()` logic.)

* Implement according to re-calibrated syncer docu

details:
  The implementation employs the notion of named layout states (see
  `SyncLayoutState` in `worker_desc.nim`) which are derived from the
  state parameter triple `(C,D,H)` as described in `README.md`.
2024-11-21 16:32:47 +00:00
Jordan Hrycaj
ea268e81ff
Beacon sync activation control update (#2782)
* Clarifying/commenting FCU setup condition & small fixes, comments etc.

* Update some logging

* Reorg metrics updater and activation

* Better `async` responsiveness

why:
  Block import does not allow `async` task activation while
  executing. So allow potential switch after each imported
  block (rather than a group of 32 blocks.)

* Handle resuming after previous sync followed by import

why:
  In this case the ledger state is more recent than the saved
  sync state. So this is considered a pristine sync where any
  previous sync state is forgotten.

  This fixes some assert thrown because of inconsistent internal
  state at some point.

* Provide option for clearing saved beacon sync state before starting syncer

why:
  It would resume with the last state otherwise which might be undesired
  sometimes.

  Without RPC available, the syncer typically stops and terminates with
  the canonical head larger than the base/finalised head. The latter one
  will be saved as database/ledger state and the canonical head as syncer
  target. Resuming syncing here will repeat itself.

  So clearing the syncer state can prevent from starting the syncer
  unnecessarily avoiding useless actions.

* Allow workers to request syncer shutdown from within

why:
  In one-trick-pony mode (after resuming without RPC support) the
  syncer can be stopped from within soavoiding unnecessary polling.
  In that case, the syncer can (theoretically) be restarted externally
  with `startSync()`.

* Terminate beacon sync after a single run target is reached

why:
  Stops doing useless polling (typically when there is no RPC available)

* Remove crufty comments

* Tighten state reload condition when resuming

why:
  Some pathological case might apply if the syncer is stopped while the
  distance between finalised block and head is very large and the FCU
  base becomes larger than the locked finalised state.

* Verify that finalised number from CL is at least FCU base number

why:
  The FCU base number is determined by the database, non zero if
  manually imported. The finalised number is passed via RPC by the CL
  node and will increase over time. Unless fully synced, this number
  will be pretty low.

  On the other hand, the FCU call `forkChoice()` will eventually fail
  if the `finalizedHash` argument refers to something outside the
  internal chain starting at the FCU base block.

* Remove support for completing interrupted sync without RPC support

why:
  Simplifies start/stop logic

* Rmove unused import
2024-10-28 16:22:04 +00:00
Jordan Hrycaj
0b93236d1b
Beacon sync block import via forked chain (#2747)
* Accept finalised hash from RPC with the canon header as well

* Reorg internal sync descriptor(s)

details:
  Update target from RPC to provide the `consensus header` as well as
  the `finalised` block number

why:
  Prepare for using `importBlock()` instead of `persistBlocks()`

* Cosmetic updates

details:
+ Collect all pretty printers in `helpers.nim`
+ Remove unused return codes from function prototype

* Use `importBlock()` + `forkChoice()` rather than `persistBlocks()`

* Update logging and metrics

* Update docu
2024-10-17 17:59:50 +00:00
Jordan Hrycaj
f937f57838
Beacon sync targets cons head rather than finalised block (#2721)
* Fix fringe condition clarifying how to handle an empty range

why:
  The `interval_set` module would treat an undefined interval construct
  `[2,1]` as`[2,2]` (the right bound being `max(2,1)`.)

* Use the `consensus head` rather than the `finalised` block as sync target

why:
  The former is ahead of the `finalised` block.

* In ctx descriptor rename `final` field to `target`

* Update docu, rename `F` -> `T`
2024-10-09 18:00:00 +00:00
Jordan Hrycaj
d6eb8c36f5
Beacon sync align internal names and docu update (#2690)
* Rename `base` -> `coupler`, `B` -> `C`

why:
  Glossary: The jargon `base` is used for the `base state` block number
  which can be smaller than what is now the `coupler`.

* Rename `global state` -> `base`, `T` -> `B`

why:
  See glossary

* Rename `final` -> `end`, `F` -> `E`

why:
  See glossary. Previously, `final` denoted some finalised block but not
  `the finalised` block from the glossary (which is maximal.)

* Properly name finalised block as such, rename `Z` -> `F`

why:
  See glossary

* Rename `least` -> `dangling`, `L` -> `D`

* Metrics update (variables not covered yet)

* Docu update and corrections

* Logger updates

* Remove obsolete `skeleton*Key` kvt columns from `storage_types` module
2024-10-03 20:19:11 +00:00
Jordan Hrycaj
05483d89bd
Rename flare as beacon (#2680)
* Remove `--sync-mode` option from nimbus config

why:
  Currently there is only one sync mode available.

* Rename `flare` -> `beacon`, but not base module folder and nim source

why:
  The name `flare` was used do designate an alternative `beacon` mode that.

  Leaving the base folder and source as-is for a moment, makes it easier
  to read change diffs.

* Rename `flare` base module folder and nim source: `flare` -> `beacon`
2024-10-02 11:31:33 +00:00