nimbus-eth1/nimbus/sync/beacon/TODO.md

36 lines
2.2 KiB
Markdown
Raw Normal View History

Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
## General TODO items
* Update/resolve code fragments which are tagged FIXME
Beacon sync updates tbc (#2818) * Clear rejected sync target so that it would not be processed again * Use in-memory table to stash headers after FCU import has started why: After block imported has started, there is no way to save/stash block headers persistently. The FCU handlers always maintain a positive transaction level and in some instances the current transaction is flushed and re-opened. This patch fixes an exception thrown when a block header has gone missing. * When resuming sync, delete stale headers and state why: Deleting headers saves some persistent space that would get lost otherwise. Deleting the state after resuming prevents from race conditions. * On clean start hibernate sync `deamon` entity before first update from CL details: Only reduces services are running * accept FCU from CL * fetch finalised header after accepting FCY (provides hash only) * Improve text/meaning of some log messages * Revisit error handling for useless peers why: A peer is abandoned from if the error score is too high. This was not properly handled for some fringe case when the error was detected at staging time but fetching via eth/xx was ok. * Clarify `break` meaning by using labelled `break` statements * Fix action how to commit when sync target has been reached why: The sync target block number might precede than latest FCU block number. This happens when the engine API squeezes in some request to execute and import subsequent blocks. This patch fixes and assert thrown when after reaching target the latest FCU block number is higher than the expected target block number. * Update TODO list
2024-11-01 19:18:41 +00:00
## Open issues
### 1. Weird behaviour of the RPC/engine API
See issue [#2816](https://github.com/status-im/nimbus-eth1/issues/2816)
### 2. Some assert
Error: unhandled exception: key not found: 0x441a0f..027bc96a [AssertionDefect]
which happened on several `holesky` tests immediately after loging somehing like
NTC 2024-10-31 21:37:34.728 Finalized blocks persisted file=forked_chain.nim:231 numberOfBlocks=129 last=044d22843cbe baseNumber=2646764 baseHash=21ec11c1deac
or from another machine with literally the same exception text (but the stack-trace differs)
NTC 2024-10-31 21:58:07.616 Finalized blocks persisted file=forked_chain.nim:231 numberOfBlocks=129 last=9cbcc52953a8 baseNumber=2646857 baseHash=9db5c2ac537b
Beacon sync update multi exe heads aware (#2861) * Log/trace cancellation events in scheduler * Provide `clear()` functions for explicitly flushing data objects * Renaming header cache functions why: More systematic, all functions start with prefix `dbHeader` * Remove `danglingParent` from layout why: Already provided by header cache * Remove `couplerHash` and `headHash` from layout why: No need to cache, `headHash` is unused and `couplerHash` used typically once, only. * Remove `lastLayout` from sync descriptor why: No need to compare changes, saving is always triggered after actively changing the sync layout state * Early reject unsuitable head + finalised header from CL why: The finalised header is only passed by its hash so the header must be fetched somewhere, e.g. from a peer via eth/xx. Also, finalised headers earlier than the `base` from `FC` cannot be handled due to the `Aristo` single state database architecture. Luckily, on a full node, the complete block history is available so unsuitable finalised headers are stored there already which is exploited here to avoid unnecessary network traffic. * Code cosmetics, remove cruft, prettify logging, remove `final` metrics detail: The `final` layout parameter will be deprecated and later removed * Update/re-calibrate syncer logic documentation why: The current implementation sucks if the `FC` module changes the canonical branch in the middle of completing a header chain (due to concurrent updates by the `newPayload()` logic.) * Implement according to re-calibrated syncer docu details: The implementation employs the notion of named layout states (see `SyncLayoutState` in `worker_desc.nim`) which are derived from the state parameter triple `(C,D,H)` as described in `README.md`.
2024-11-21 16:32:47 +00:00
### 3. Mem overflow possible on small breasted systems
Running the exe client, a 1.5G response message was opbserved (on my 8G test system this kills the program as it has already 80% mem load. It happens while syncing holesky at around block #184160 and is reproducible on the 8G system but not yet on the an 80G system.)
[..]
DBG 2024-11-20 16:16:18.871+00:00 Processing JSON-RPC request file=router.nim:135 id=178 name=eth_getLogs
DBG 2024-11-20 16:16:18.915+00:00 Returning JSON-RPC response file=router.nim:137 id=178 name=eth_getLogs len=201631
TRC 2024-11-20 16:16:18.951+00:00 <<< find_node from topics="eth p2p discovery" file=discovery.nim:248 node=Node[94.16.123.192:30303]
TRC 2024-11-20 16:16:18.951+00:00 Neighbours to topics="eth p2p discovery" file=discovery.nim:161 node=Node[94.16.123.192:30303] nodes=[..]
TRC 2024-11-20 16:16:18.951+00:00 Neighbours to topics="eth p2p discovery" file=discovery.nim:161 node=Node[94.16.123.192:30303] nodes=[..]
DBG 2024-11-20 16:16:19.027+00:00 Received JSON-RPC request topics="JSONRPC-HTTP-SERVER" file=httpserver.nim:52 address=127.0.0.1:49746 len=239
DBG 2024-11-20 16:16:19.027+00:00 Processing JSON-RPC request file=router.nim:135 id=179 name=eth_getLogs
DBG 2024-11-20 16:20:23.664+00:00 Returning JSON-RPC response file=router.nim:137 id=179 name=eth_getLogs len=1630240149