nimbus-eth2

Commit Graph

Author	SHA1	Message	Date
Etan Kissling	df0ff5f0fb	fix initialization of sync committee cache after loading non-epoch state (#6160 ) When initializing from a state that's not aligned to an epoch boundary, an earlier state is loaded that's epoch aligned, and subsequently topped up with the missing blocks. `dag.headSyncCommittee` is initialized prior to topping up the missing blocks, though. If the sync committee changes while applying the blocks (e.g., a sync committee period boundary hits), the cached information becomes unlinked from `dag.head`, leading to valid blocks based on that chain being rejected. To fix this, move cache initialization after the top up with blocks. This has been observed on Goerli by initializing from 7919502 and attempting to top up 7920111. The block gets rejected with an invalid state root on nodes that have restarted after setting 7920111 as head, while it gets accepted by all other nodes. Error message is `block: state root verification failed`. The incorrect initialization behaviour was introduced in #4592, before which the sync committee cache was initialized after applying blocks.	2024-04-03 23:03:06 +02:00
tersec	7fa32b7f02	add Electra to ConsensusFork enum (#6169 ) * add Electra to ConsensusFork enum * fix gnosis check	2024-04-03 16:43:43 +02:00
Etan Kissling	6a318d0f1a	avoid rejecting empty era file in verification (#6163 ) `batchVerify`'s precondition is a non-empty signature list: ```nim if input.len == 0: # Spec precondition return false ``` This means that in eras without any blocks (as has happened on Goerli), calling it leads to era files being reported as invalid.	2024-04-03 10:06:21 +02:00
Etan Kissling	0000f81df0	remove unused and redundant `PayloadID` type definition (#6165 ) `PayloadID` is defined in `nim-web3` and our own Bellatrix definition can be removed.	2024-04-03 07:27:00 +02:00
Etan Kissling	2dbe24c740	move split view catchup to research branch (#6133 ) Using a dedicated branch for researching the effectiveness of split view scenario handling simplifies testing and avoids having partial work on `unstable`. If we want, we can reintroduce it under a `--debug` flag at a later time. But for now, Goerli is a rare opoprtunity to test this, maybe just for another week or so. - https://github.com/status-im/infra-nimbus/pull/179	2024-03-25 19:09:31 +01:00
Etan Kissling	fc9bc1da3a	add branch discovery module for supporting chain stall situation (#6125 ) In split view situation, the canonical chain may only be served by a tiny amount of peers, and branches may span long durations. Minority branches may still have a large weight from attestations and should be discovered. To assist with that, add a branch discovery module that assists in such a situation by specifically targeting peers with unknown histories and downloading from them, in addition to sync manager work which handles popular branches.	2024-03-24 08:41:47 +00:00
Etan Kissling	66a9304fea	use separate state when catching up to perform validator duties (#6131 ) There are situations where all states in the `blockchain_dag` are occupied and cannot be borrowed. - headState: Many assumptions in the code that it cannot be advanced - clearanceState: Resets every time a new block gets imported, including blocks from non-canonical branches - epochRefState: Used even more frequently than clearanceState This means that during the catch-up mechanic where the head state is slowly advanced to wall clock to catch up on validator duties in the situation where the canonical head is way behind non-canonical heads, we cannot use any of the three existing states. In that situation, Nimbus already consumes an increased amount of memory due to all the `BlockRef`, fork choice states and so on, so experience is degraded. It seems reasonable to allocate a fourth state temporarily during that mechanic, until a new proposal could be made on the canonical chain. Note that currently, on `unstable`, proposals _do_ happen every couple hours because sync manager doesn't manage to discover additional heads in a split-view scenario on Goerli. However, with the branch discovery module, new blocks are discovered all the time, and the clearanceState may no longer be borrowed as it is reset to different branch too often. The extra state could also find other uses in the future, e.g., for incremental computations as in reindexing the database, or online collection of historical light client data.	2024-03-24 07:18:33 +01:00
Etan Kissling	c4a5bca629	update block quarantine eviction order to FIFO (#6129 ) Use the same eviction policy for blocks as already the case for blobs. FIFO makes more sense, because it favors keeping ancestors of blocks which need to be applied to the DAG before their children get eligible.	2024-03-24 06:03:51 +01:00
Etan Kissling	bedc601903	increase blob quarantine capacity to match block quarantine capacity (#6128 ) Blobs are cached from gossip and other sources for all orphans, not just those specifically tagged as `blobless`. `blobless` only means that they are actively fetched from the network. The `MaxBlobs` should be aligned to match `MaxOrphans`. Note that blobs are tiny compared to blocks, so this isn't a huge memory hog.	2024-03-24 04:29:44 +01:00
Etan Kissling	33e34ee8bd	handle case of unreachable block in `is_optimstic` helper (#6124 ) * handle case of unreachable block in `is_optimstic` helper When a non-canonical block is still in the DB, it can be accessed via `BlockId`, but `BlockRef` may be unavailable if the block was not properly cleaned when it got orphaned. Report it as optimistic. * `template` -> `func`	2024-03-22 22:50:21 +00:00
Etan Kissling	12a2f8c026	when adding duplicates to quarantine, schedule deepest missing parent (#6112 ) During sync, sometimes the same block gets encountered and added to quarantine multiple times. If its parent is already known, quarantine incorrectly registers it as missing, leading to re-download. This can be fixed by registering the parent's deepest missing parent recursively. Also increase the stickiness of `missing`. We only perform 4 attempts within ~16 seconds before giving up. Very frequently, this is not enough and there is no progress until sync manager kicks in even on holesky.	2024-03-21 18:41:05 +00:00
Etan Kissling	6f466894ab	answer `RequestManager` queries from disk if possible (#6109 ) When restarting beacon node, orphaned blocks remain in the database but on startup, only the canonical chain as selected by fork choice loads. When a new block is discovered that builds on top of an orphaned block, the orphaned block is re-downloaded using sync/request manager, despite it already being present on disk. Such queries can be answered locally to improve discovery speed of alternate forks.	2024-03-21 18:37:31 +01:00
Etan Kissling	eb5acdb7dd	make sure `clearanceState` builds on top of `headState` in chain stall (#6108 ) The `clearanceState` points to the latest resolved block, regardless of whether that block is canonical according to fork choice. If chain is stalled and we want to prepare for resuming validator duties, we need a recent state according to fork choice to avoid lag spikes and missing slot timings.	2024-03-20 16:41:56 +01:00
Etan Kissling	035ca015e6	continue validator duties if chain does not progress for a long time (#6101 ) Nimbus currently stops performing validator duties if the blockchain does not progress for `node.config.syncHorizon` slots. This means that the chain won't recover because no new blocks are proposed. To fix that, continue performing validator duties if no progress is registered for a long time, and none of our peers is indicating any progress.	2024-03-20 03:23:53 +01:00
Etan Kissling	595d110b37	avoid blocking deep reorgs > 64 epochs (#6099 ) On Goerli there are some instances of long streaks of empty epochs due to different branches being built in parallel. They sometimes lead to `Request for pruned historical state` logs requiring a BN restart to resolve. Avoid that by trying to restore states from the entire non- finalized history, to avoid losing sync in such situtions.	2024-03-19 14:21:25 +01:00
tersec	0a6d189161	automated consensus spec URL updating to v1.4.0 (#6074 )	2024-03-14 07:26:36 +01:00
Eugene Kabanov	72c844534f	Add Keymanager API graffiti endpoints. (#6054 ) * Initial commit. * Add more tests. * Fix API mistypes. * Fix mistypes in tests. * Fix one more mistype. * Fix affected tests because of error code 401. * Add GetGraffitiResponse object. * Add more tests. * Fix compilation errors. * Recover old behavior. * Recover old behavior. * Fix mistype. * Test could not know default graffiti value. * Make VC use adopted graffiti settings. * Make BN use adopted graffiti settings. * Update Alltests. * Fix test. * Revert "Fix test." This reverts commit c735f855d3cb9c4a1c8e8af29d3f4438d068e31f. * Workaround {.push raises.} requirement. * Fix comment. * Update Alltests.	2024-03-14 03:44:00 +00:00
Etan Kissling	4e2ffca44a	use fixed max depth for `BlockRef` (#6070 ) In `block_dag` there is a max depth of 100 years configured to detect internal inconsistencies, e.g., circular references. As `BlockRef` was changed long ago to only reflect the non-finalized chain segment, the theoretically supported max depth can be reduced and simplified.	2024-03-13 13:01:51 +01:00
Etan Kissling	8bd8ffe2bb	align default `syncHorizon` computation logic across networks (#6066 ) The `syncHorizon` describes the number of empty slots before the beacon node considers itself to be out of sync. There are two places where we currently set this to 50 slots, but it makes more sense to base it on wall time, e.g., the 10 minutes that the default 50 are derived from.	2024-03-12 21:51:18 +01:00
tersec	2a13c09615	add proposer reward accounting to block transitions (#6022 ) * add proposer reward accounting to block transitions * Update beacon_chain/spec/state_transition_block.nim Co-authored-by: Etan Kissling <etan@status.im> --------- Co-authored-by: Etan Kissling <etan@status.im>	2024-03-04 17:00:46 +00:00
Etan Kissling	d8b8aee7b7	avoid style check issue with `syncAggregate` (#6013 ) Style check confuses `func syncAggregate` because it accesses some other object's `sync_aggregate` member in the body. Rename the func to avoid.	2024-03-02 02:54:37 +01:00
tersec	fef831d92a	rm unused ForkedTrustedBeaconBlock; add some Electra overloads to consensus_object_pools; Electra BeaconBlock gossip support (#5965 )	2024-02-26 06:49:12 +00:00
tersec	a4f4a35845	Revert "initial Electra support skeleton" (#5955 ) * Revert "initial Electra support skeleton (#5946)" This reverts commit `d09bf3b587`. * Update test_signing_node.nim	2024-02-25 19:42:44 +00:00
Etan Kissling	f54fa083b4	fix EIP-7044 implementation when using batch verification (#5953 ) In #5120, EIP-7044 support got added to the state transition function to force `CAPELLA_FORK_VERSION` to be used when validiting `VoluntaryExit` messages, irrespective of their `epoch`. In #5637, similar logic was added when batch verifying BLS signatures, which is used during gossip validation (libp2p gossipsub, and req/resp). However, that logic did not match the one introduced in #5120, and only uses `CAPELLA_FORK_VERSION` when a `VoluntaryExit`'s `epoch` was set to a value `>= CAPELLA_FORK_EPOCH`. Otherwise, `BELLATRIX_FORK_VERSION` would still be used when validating `VoluntaryExit`, e.g., with `epoch` set to `0`, as is the case in this Holesky block: - https://holesky.beaconcha.in/slot/1076985#voluntary-exits Extracting the correct logic from #5120 into a function, and reusing it when verifying BLS signatures fixes this issue, and also leverages the exhaustive EF test suite that covers the (correct) #5120 logic. This fix only affects networks that have EIP-7044 applied (post-Deneb). Without the fix, Deneb blocks with a `VoluntaryExit` with `epoch` set to `< CAPELLA_FORK_EPOCH` incorrectly fail to validate despite being valid. Incorrect blocks that contain a malicious `VoluntaryExit` with `epoch` set to `< CAPELLA_FORK_EPOCH` and signed using `BELLATRIX_FORK_VERSION` _would_ pass the BLS verification stage, but subsequently fail the state transition logic. Such blocks would still correctly be labeled invalid.	2024-02-25 15:25:26 +01:00
tersec	d09bf3b587	initial Electra support skeleton (#5946 )	2024-02-24 13:44:15 +00:00
tersec	0f155ebf95	some consensus spec v1.4.0-beta.7 spec URL updates (#5945 )	2024-02-22 02:42:57 +00:00
tersec	c73d7c6f6f	automated consensus spec URL updating to v1.4.0-beta.7 (#5942 )	2024-02-21 19:44:48 +00:00
Etan Kissling	88045a91cd	rename new timing metrics, as `_total` suffix is implicit (#5917 ) * track latest duration instead of total in new timing metrics Change `db_checkpoint_seconds` and `state_replay_seconds` metrics to record the latest duration instead of the total. `nim-metrics` already synthesizes a `_total` metric from these implicitly. * still have to use inc, metrics only synthesizes the name not the sum * prefix with `beacon_dag`	2024-02-20 20:34:41 +01:00
Jacek Sieka	8d465a7d8c	vmon: Missed block metric (#5913 ) Validator monitoring gained 2 new metrics for tracking when blocks are included or not on the head chain. Similar to attestations, if the block is produced in epoch N, reporting will use the state when switching to epoch N+2 to do the reporting (so as to reasonably stabilise the block inclusion in the face of reorgs).	2024-02-20 06:40:18 +02:00
Etan Kissling	92197ce690	add metric for database checkpoint duration (#5897 ) Database checkpointing can take seconds, e.g., while Geth is syncing. Add a debug log + metric for it, and also info log if it takes longer than 250ms, same as for the existing `State replayed` log. If the log shows up for a user while the system is not overloaded, it may point to slow disk speed or thermal issue.	2024-02-19 11:00:11 +01:00
Jacek Sieka	afdfe302f3	state loading optimizations (#5881 ) * compute post-merge randao mix without loading state * avoid copying state on shuffling computation and compute epochref * speed up state copy for block production	2024-02-12 15:58:55 +01:00
tersec	a4680cb7fa	refactor addHeadBlock() to research/ and tests/ helper (#5874 ) * refactor addHeadBlock() to research/ and tests/ helper * rm now-dead code	2024-02-09 23:46:51 +00:00
Etan Kissling	9593ef74b8	do not cache zero block hash if block unavailable (#5865 ) With checkpoint sync, the checkpoint block is typically unavailable at the start, and only backfilled later. To avoid treating it as having zero hash, execution disabled in some contexts, wrap the result of `loadExecutionBlockHash` in `Opt` and handle block hash being unknown. --------- Co-authored-by: Jacek Sieka <jacek@status.im>	2024-02-09 22:10:38 +00:00
Etan Kissling	7c53841cd8	Revert "Revert "fix checkpoint block potentially not getting backfilled into DB (#5863 )" (#5871 )" (#5875 ) This reverts commit `1575478b72`.	2024-02-09 20:44:54 +01:00
Etan Kissling	f2d92729a2	reduce verbosity of `Got request for pre-backfill slot` (#5876 ) When syncing, we log a notice each time someone asks us for a block that we haven't backfilled yet. This is quite verbose and not unexpected, because the status message does not allow indicating backfill progress.	2024-02-09 20:32:31 +01:00
tersec	1575478b72	Revert "fix checkpoint block potentially not getting backfilled into DB (#5863 )" (#5871 ) This reverts commit `65e6f892de`.	2024-02-09 12:49:07 +00:00
Etan Kissling	65e6f892de	fix checkpoint block potentially not getting backfilled into DB (#5863 ) When using checkpoint sync, only checkpoint state is available, block is not downloaded and backfilled later. `dag.backfill` tracks latest filled `slot`, and latest `parent_root` for which no block has been synced yet. In checkpoint sync, this assumption is broken, because there, the start `dag.backfill.slot` is set based on checkpoint state slot, and the block is also not available. However, sync manager in backward mode also requests `dag.backfill.slot` and `block_clearance` then backfills the checkpoint block once it is synced. But, there is no guarantee that a peer ever sends us that block. They could send us all parent blocks and solely omit the checkpoint block itself. In that situation, we would accept the parent blocks and advance `dag.backfill`, and subsequently never request the checkpoint block again, resulting in gap inside blocks DB that is never filled. To mitigate that, the assumption is restored that `dag.backfill.slot` is the latest filled `slot`, and `dag.backfill.parent_root` is the next block that needs to be synced. By setting `slot` to `tail.slot + 1` and `parent_root` to `tail.root`, we put a fake summary into `dag.backfill` so that `block_clearance` only proceeds once checkpoint block exists.	2024-02-09 11:20:36 +01:00
Etan Kissling	4266e16835	allow `getBlockIdAtSlot` to answer queries from available states (#5869 ) After checkpoint sync, historical block IDs cannot yet be queried. However, they are needed to compute dependent roots of `ShufflingRef`. To allow lookup, enable `getBlockIdAtSlot` to answer from compatible states in memory; as long as they descend from the finalized checkpoint and the requested slot is sufficiently recent, `block_roots` contains everything to recover `BlockSlotId` up to `SLOTS_PER_HISTORICAL_ROOT`. This is similar to how `attester_dependent_root` etc. are computed. This accelerates the first couple minutes of checkpoint sync on Mainnet, especially the time until finality advances past the synced checkpoint.	2024-02-09 11:13:00 +01:00
Etan Kissling	e398078abc	`...ExecutionPayloadHash` --> `...ExecutionBlockHash` (#5864 ) Finish the rename started in #4809 to have a consistent naming. `ExecutionPayloadHash` suggests hash over payload instead of block. `BlockHash` is also the canonical name in engine API.	2024-02-08 01:24:55 +01:00
Etan Kissling	94ba0a9bd1	consider block availability when initializing LC data collector (#5860 ) When using checkpoint sync, the initial block is missing in the DB. Update the LC data collector initialization to account for that, avoiding a spurious error message when it is incorrectly accessed: ``` ERR 2024-02-07 11:21:55.416+01:00 Block failed to load unexpectedly topics="chaindag_lc" bid=d30517a7:8257504 tail=8257504 ``` Also fixes a regression from #5691 that resulted in similar messages while importing the first few blocks after checkpoint sync. Thanks to @arnetheduck for reporting this.	2024-02-07 18:03:19 +00:00
Etan Kissling	b7026a683a	avoid marking blocks as unviable if `blobless` quarantine is full (#5858 ) Full caches should not be used to mark blocks as unviable. The unviable status is quite persistent and a block marked as such won't be processed again once the cache empties. Problem originally introduced in #4808.	2024-02-07 13:38:20 +00:00
Jacek Sieka	6328c77778	raises for gossip (#5808 ) * raises for gossip * fix light client	2024-01-22 17:34:54 +01:00
tersec	6c53dc1e11	automated consensus spec URL updating to v1.4.0-beta.6 (#5804 )	2024-01-20 11:19:47 +00:00
Eugene Kabanov	3648df7d4c	Fix VC not always be able to obtain feeRecipient value. (#5781 ) Use state's validator value to obtain feeRecipient value. Make feeRecipient and gasLimit calculation equal for BN and VC.	2024-01-19 14:36:04 +00:00
Etan Kissling	be73ce2e9a	import finalized head LC bootstrap on launch (#5775 ) If the initial state replays cover the finalized head, import matching `LightClientBootstrap` into database. This also addresses this error when light client requests bootstrap from the genesis slot on networks that launch with Altair enabled. ``` {"lvl":"DBG","ts":"2023-10-04 11:17:49.665+00:00","msg":"LC bootstrap unavailable: Sync committee branch not cached","topics":"chaindag_lc","slot":0} ```	2024-01-18 22:51:26 +00:00
tersec	cf1bec7670	update some deprecated stew/results to results imports (#5743 )	2024-01-16 22:37:14 +00:00
tersec	69af8f943e	implement blob_sidecar Beacon API streaming (#5728 )	2024-01-13 11:52:13 +02:00
Jacek Sieka	62cbdeefc5	verify `genesis_time` more strictly (fixes #1667 ) (#5694 ) Bogus values lead to crashes down the line when timers overflow	2024-01-06 15:26:56 +01:00
Etan Kissling	7db95f047b	track latest `LightClientUpdate` only once fork choice selects it (#5691 ) Instead of tracking the latest `LightClientUpdate` across all branches, track the latest one on the current branch as selected by fork choice.	2024-01-03 23:36:05 +01:00
Etan Kissling	030226148d	rename `exit_pool` > `validator_change_pool` (#5679 ) The `ExitPool` was renamed to `ValidatorChangePool` with Capella, but the files were still using the previous name. Rename for consistency.	2023-12-23 06:55:47 +01:00

1 2 3 4 5 ...

591 Commits