6798 Commits

Author SHA1 Message Date
Etan Kissling
66a9304fea
use separate state when catching up to perform validator duties (#6131)
There are situations where all states in the `blockchain_dag` are
occupied and cannot be borrowed.

- headState: Many assumptions in the code that it cannot be advanced
- clearanceState: Resets every time a new block gets imported, including
  blocks from non-canonical branches
- epochRefState: Used even more frequently than clearanceState

This means that during the catch-up mechanic where the head state is
slowly advanced to wall clock to catch up on validator duties in the
situation where the canonical head is way behind non-canonical heads,
we cannot use any of the three existing states. In that situation,
Nimbus already consumes an increased amount of memory due to all the
`BlockRef`, fork choice states and so on, so experience is degraded.
It seems reasonable to allocate a fourth state temporarily during that
mechanic, until a new proposal could be made on the canonical chain.

Note that currently, on `unstable`, proposals _do_ happen every couple
hours because sync manager doesn't manage to discover additional heads
in a split-view scenario on Goerli. However, with the branch discovery
module, new blocks are discovered all the time, and the clearanceState
may no longer be borrowed as it is reset to different branch too often.

The extra state could also find other uses in the future, e.g., for
incremental computations as in reindexing the database, or online
collection of historical light client data.
2024-03-24 07:18:33 +01:00
Etan Kissling
c4a5bca629
update block quarantine eviction order to FIFO (#6129)
Use the same eviction policy for blocks as already the case for blobs.
FIFO makes more sense, because it favors keeping ancestors of blocks
which need to be applied to the DAG before their children get eligible.
2024-03-24 06:03:51 +01:00
Etan Kissling
991e7cafbc
descore when opening connection fails, same as when reading fails (#6130)
`eth2_network` forgets to descore peers when opening connection times
out. It only descores when opening the connection succeeds and then
there is a subsequent error. The caller cannot distinguish the cases,
so ensure that the descore is also applied if the request fails during
its initial portion.
2024-03-24 05:37:47 +01:00
Etan Kissling
3765e8ac06
ensure blobs are quarantined when block is quarantined (#6127)
When quarantining a block from block processor, we should also keep a
copy of its blobs. Otherwise, this involves more network roundtrips
to obtain information we already have. This is in line with how blobs
arrive from gossip and request manager sources. The existing flow does
not work when applying blocks from quarantine, which is addressed here.
2024-03-24 04:56:30 +01:00
Etan Kissling
bedc601903
increase blob quarantine capacity to match block quarantine capacity (#6128)
Blobs are cached from gossip and other sources for all orphans, not just
those specifically tagged as `blobless`. `blobless` only means that they
are actively fetched from the network. The `MaxBlobs` should be aligned
to match `MaxOrphans`. Note that blobs are tiny compared to blocks, so
this isn't a huge memory hog.
2024-03-24 04:29:44 +01:00
tersec
c5f0d1def3
Revert "Revert "Set default localBlockValueBoost to 10 (#6103)" (#6118)" (#6126)
This reverts commit 213076e4cd1ef72a85ebb32ec4b007c8aaee8eda.
2024-03-23 10:17:29 +01:00
Etan Kissling
33e34ee8bd
handle case of unreachable block in is_optimstic helper (#6124)
* handle case of unreachable block in `is_optimstic` helper

When a non-canonical block is still in the DB, it can be accessed via
`BlockId`, but `BlockRef` may be unavailable if the block was not
properly cleaned when it got orphaned. Report it as optimistic.

* `template` -> `func`
2024-03-22 22:50:21 +00:00
Etan Kissling
2d9586a5a8
enqueue missing parent block if stored in local DB (#6122)
When checking for `MissingParent`, it may be that the parent block was
already discovered as part of a prior run. In that case, it can be
loaded from storage and processed without having to rediscover the
entire branch from the network. This is similar to #6112 but for blocks
that are discovered via gossip / sync mgr instead of via request mgr.
2024-03-22 14:35:46 +01:00
Eugene Kabanov
a6e9e0774c
VC: Refactor some timing code around sync committee processing (#6073)
* Add some duration metering.
Refactor some log statements.
Rework sync contribution deadline waiting.
Add some cancellation reporting handlers.

* Make all validator's shortLog to become validatorLog.
Optimize some logs with logScope.

* Add `raises`.

* More log statements polishing.
2024-03-22 02:37:44 +00:00
Etan Kissling
9d5643240b
only request blobs if a sync response actually provided blocks (#6121)
During sync, we can skip the `blobSidecarsByRange` request when there
are no blocks with `kzg_commitments` in the blocks data. Avoids running
into throttling from peers during long periods of non-finality.
2024-03-22 03:27:02 +01:00
Etan Kissling
17ee40b39b
make blobs use less quota when other nodes sync from us (#6120)
Each individual blob currently uses as much quota from the network limit
as an entire block does, 128 items per second shared across all peers.
Blobs are 128 KB each instead of up to several MB and are simpler to
encode. There can be multiple per block (6 currently), so allow 2000
blobs per second across all peers. That decreases the cost per block
from `3125 + 3125 * blobs.len` quota (= `[3125, 21875]`) to a lower
`3125 + 200 * blobs.len` quota (= `[3125, 4325]`), accounting for the
slight increase in data transfer and encoding time.
2024-03-22 02:36:08 +01:00
Etan Kissling
2a45bb3c7c
add error information when sync requests fail (#6119)
During sync it may be interesting to know why requests are failing.
Extend debug logging accordingly.
2024-03-22 00:26:50 +00:00
Etan Kissling
12a2f8c026
when adding duplicates to quarantine, schedule deepest missing parent (#6112)
During sync, sometimes the same block gets encountered and added to
quarantine multiple times. If its parent is already known, quarantine
incorrectly registers it as missing, leading to re-download. This can
be fixed by registering the parent's deepest missing parent recursively.

Also increase the stickiness of `missing`. We only perform 4 attempts
within ~16 seconds before giving up. Very frequently, this is not enough
and there is no progress until sync manager kicks in even on holesky.
2024-03-21 18:41:05 +00:00
Etan Kissling
6f466894ab
answer RequestManager queries from disk if possible (#6109)
When restarting beacon node, orphaned blocks remain in the database but
on startup, only the canonical chain as selected by fork choice loads.
When a new block is discovered that builds on top of an orphaned block,
the orphaned block is re-downloaded using sync/request manager, despite
it already being present on disk. Such queries can be answered locally
to improve discovery speed of alternate forks.
2024-03-21 18:37:31 +01:00
Etan Kissling
9256db2265
use LPProtocol.new instead of LPProtocol() (#6117)
Avoid potenial issue with https://github.com/vacp2p/nim-libp2p/pull/1064#discussion_r1534021691
in a future dependency bump.
2024-03-21 17:53:59 +01:00
tersec
213076e4cd
Revert "Set default localBlockValueBoost to 10 (#6103)" (#6118)
This reverts commit d66a769135d019d939b52c842febdfaaa1073fea.
2024-03-21 15:13:58 +00:00
Etan Kissling
3d45c0575a
avoid resetting chain stall detection on lag spike (#6115)
During lag spike, e.g., from state replays, peer count can temporarily
drop significantly. Should not have to wait another 60 minutes in that
situation just to be back where one started.
2024-03-21 04:55:29 +01:00
Fredrik Svantes
d66a769135
Set default localBlockValueBoost to 10 (#6103)
* Set default localBlockValueBoost to 10

* Updated local-block-value-boost in documentation to say 10 as default
2024-03-21 02:06:03 +00:00
dependabot[bot]
02ccfd488b
Bump black from 21.12b0 to 24.3.0 in /ncli (#6111)
Bumps [black](https://github.com/psf/black) from 21.12b0 to 24.3.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/commits/24.3.0)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-20 21:54:09 +00:00
Etan Kissling
de2d205f61
use PR-3431 style fork choice on all networks (#6110)
To start phasing out Capella fork choice logic, set default to PR 3431.
A subsequent release can remove the fallback option.
2024-03-20 19:12:33 +01:00
Etan Kissling
eb5acdb7dd
make sure clearanceState builds on top of headState in chain stall (#6108)
The `clearanceState` points to the latest resolved block, regardless of
whether that block is canonical according to fork choice. If chain is
stalled and we want to prepare for resuming validator duties, we need
a recent state according to fork choice to avoid lag spikes and missing
slot timings.
2024-03-20 16:41:56 +01:00
Etan Kissling
e75b209076
Revert nim-libp2p to 28609597d104a9be880ed5e1648e1ce18ca9dc38 (#6107)
* Revert "bump `nim-libp2p` to `49a92e564167c0ffdcc86838c5e45cc985665d96` (#6084)"

This reverts commit 78f3e03d538980c6702dd36fcce812cbd9f1fd31.

* Revert "bump `nim-libp2p` to `ae13a0d58301159e6b3bfc43fe23986c254c741a` (#6065)"

This reverts commit 4a6ed0323e2453ec473819a7d6ee6bda390f1dc5.
2024-03-20 13:46:12 +01:00
andri lim
1fe6efcf53
Bump nim-web3 to 285d97c2b05bbe2a13dab4b52ea878157fb1a1a1 (#6106)
Unify EthCall/EthSend into TransactionArgs (#138)
2024-03-20 14:39:12 +07:00
Jacek Sieka
032b91c631
extend seen ttl to cover 2 epochs (#6098)
this allows us to drop these useless messages earlier in the pipeline

https://github.com/ethereum/consensus-specs/pull/3627
2024-03-20 08:07:16 +01:00
Etan Kissling
4c0b9efb30
fix chain stall detection (#6105)
Used the incorrect count for `numPeers` and did not account for heads on
alternate branches in in chain stall detection.
2024-03-20 04:51:55 +01:00
Etan Kissling
035ca015e6
continue validator duties if chain does not progress for a long time (#6101)
Nimbus currently stops performing validator duties if the blockchain
does not progress for `node.config.syncHorizon` slots. This means that
the chain won't recover because no new blocks are proposed. To fix that,
continue performing validator duties if no progress is registered for a
long time, and none of our peers is indicating any progress.
2024-03-20 03:23:53 +01:00
Etan Kissling
8b604b59a7
fix BlockNumber serialization (#6102)
Correct formatting of `BlockNumber` in EL manager, regression from
its conversion to `distinct` in #6088.
2024-03-19 22:14:08 +01:00
Etan Kissling
8514e4a44c
bump gnosis-chain-configs to 14d8439235fa757dd39b9fb1c10a06a99a720989 (#6100)
- Add more Chiado bootnodes for GnosisDAO
2024-03-19 16:32:48 +00:00
Etan Kissling
5d42859176
make Gwei distinct (#6090)
#6087 introduced a subtle change to `nim-web3` resulting in `Gwei` to be
serialized differently than before. Using a `distinct` type for `Gwei`
improves type safety and avoids such problems in the future.
2024-03-19 14:22:07 +01:00
Etan Kissling
1dd2c939ac
bump nim-web3 to 80c7aa6de2a26c57fa1f06ad47f3ac6058e6545b (#6088)
- Add writeValue for BlockNumber
- make `BlockNumber` `distinct`
2024-03-19 14:21:47 +01:00
Etan Kissling
595d110b37
avoid blocking deep reorgs > 64 epochs (#6099)
On Goerli there are some instances of long streaks of empty epochs due
to different branches being built in parallel. They sometimes lead to
`Request for pruned historical state` logs requiring a BN restart to
resolve. Avoid that by trying to restore states from the entire non-
finalized history, to avoid losing sync in such situtions.
2024-03-19 14:21:25 +01:00
Miran
e1aa9e6de5
fix commit hash when publishing book (#6097)
Before, the commit hash was taken from the `gh-pages` branch,
instead from the branch used to publish the book.
2024-03-19 09:56:47 +01:00
Jacek Sieka
ed1ef19bf4
use assign for forky state assignment (#6055) 2024-03-19 09:50:25 +01:00
Etan Kissling
d4d27164f9
bump nim-sqlite3-abi to 1453b19b1a3cac24002dead15e02bd978cb52355 (#6096)
- bump `sqlite-amalgamation` to `3.45.2`
2024-03-18 00:17:29 +01:00
Etan Kissling
d22dfaed41
bump nim-ssz-serialization to 0fc5e49093fa8d3c07476738e3257d0d8e7999a3 (#6095)
- more fixes for `distinct` integer types
- avoid double testing `--mm:refc`
2024-03-18 00:14:19 +01:00
Etan Kissling
f40083f1e5
annotate validator_db_aggregator with {.raises.} (#6094)
Show proper error message when `validator_db_aggregator` raises errors.
2024-03-17 16:17:07 +01:00
Etan Kissling
ef2411e1a0
use correct INACTIVITY_SCORE_RECOVERY_RATE if overridden from default (#6091)
When a config defines a different `INACTIVITY_SCORE_RECOVERY_RATE` than
the default, `process_inactivity_updates` uses an incorrect rate ever
since #2710 when `INACTIVITY_SCORE_RECOVERY_RATE` became configurable.
2024-03-17 13:32:30 +01:00
Etan Kissling
4aea780320
bump nim-ssz-serialization to 9bb15468c64851e9300ccab662f16a15be6d833e (#6089)
- use `toSszType` for elements of `HashList|HashArray|List|array`
2024-03-17 02:46:49 +01:00
Etan Kissling
30460aad9c
bump nim-chronos to 47cc17719f4293bf80a22ebe28e3bfc54b2a59a1 (#6083)
- print warning when calling failed
2024-03-16 15:38:17 +01:00
Etan Kissling
74a238460b
bump nim-json-rpc to ad8721e0f3c6925597b5a93b6c53e040f26b5fb3 (#6086)
- Export errors for json-rpc clients
2024-03-16 04:05:44 +00:00
Etan Kissling
448e610f8a
bump nim-presto to a9687dda1c3e20d5b066d42b33c2a63f018af93f (#6085)
- Add examples
2024-03-16 03:46:37 +00:00
Etan Kissling
78f3e03d53
bump nim-libp2p to 49a92e564167c0ffdcc86838c5e45cc985665d96 (#6084)
- default `MultiAddress` param for `newStandardSwitch` does not raise
- clean up triple lookup and avoid `KeyError` when adding muxer
- `{.async: (raises).}` for `relay/utils.nim`
- `{.async: (raises).}` annotations for `protocols/secure`
- avoid pointless exception raising in `dcutr/server`
2024-03-16 02:25:40 +00:00
Etan Kissling
82b8c96f72
bump nim-results to e2adf66b8bc2f41606e8469a5f0a850d1e545b55 (#6082)
- Formatted with nph v0.5.1-0-gde5cd48
- Update CI
- ci: Combine c/c++
- extend `optValue` support for `Result[void, E]`
- Document a few fixes
2024-03-16 02:15:03 +00:00
Etan Kissling
7a7c024534
bump nim-libbacktrace to 027570111c161d8378bca9e84b5f75500a8c38a3 (#6081)
- bump `libbacktrace` to `7ead8c1ea2f4aeafe9c5b9ef8a9461a9ba781aa8`
2024-03-16 02:22:20 +01:00
Etan Kissling
2d52016e5c
bump nim-stew to a0c085a51fe4f2d82aa96173ac49b3bfe6043858 (#6079)
- strformat: compile-time format string parser (backport Nim 2.2)
2024-03-16 02:08:54 +01:00
Etan Kissling
b3bce7ce79
bump nim-stint to 3c238df6cd4b9c1f37a9f103383e7d2bbd420c13 (#6078)
- fix noInit to noinit; use evergreen GitHub Actions image versions
2024-03-16 01:53:35 +01:00
Etan Kissling
4a6ed0323e
bump nim-libp2p to ae13a0d58301159e6b3bfc43fe23986c254c741a (#6065)
- Send priority with queue fix
2024-03-15 22:49:01 +01:00
Etan Kissling
1bd5819dad
cache head block eligibility for fork choice (#6076)
When there are long periods of non-finality, `nodeIsViableForHead` has
been observed to consume significant time as it repeatedly walks the
non-finalized check graph as part of determining what heads are eligible
for fork choice. Caching the result resolves that.

Overall, it may still be better to prune fork choice more aggressively
when finality advances, to fully avoid the case specced out using the
linear scan. The current implementation is very close to spec, though,
so such a change should not be introduced without thorough testing.

The simple cache should allow significantly better performance on Goerli
while the network is still supported (Mid April).
2024-03-15 22:48:18 +01:00
tersec
0a6d189161
automated consensus spec URL updating to v1.4.0 (#6074) 2024-03-14 07:26:36 +01:00
Eugene Kabanov
72c844534f
Add Keymanager API graffiti endpoints. (#6054)
* Initial commit.

* Add more tests.

* Fix API mistypes.

* Fix mistypes in tests.

* Fix one more mistype.

* Fix affected tests because of error code 401.

* Add GetGraffitiResponse object.

* Add more tests.

* Fix compilation errors.

* Recover old behavior.

* Recover old behavior.

* Fix mistype.

* Test could not know default graffiti value.

* Make VC use adopted graffiti settings.

* Make BN use adopted graffiti settings.

* Update Alltests.

* Fix test.

* Revert "Fix test."

This reverts commit c735f855d3cb9c4a1c8e8af29d3f4438d068e31f.

* Workaround {.push raises.} requirement.

* Fix comment.

* Update Alltests.
2024-03-14 03:44:00 +00:00