* Working Makefile targets for Capella devnet2
make capella-devnet-2
make clean-capella-devnet-2
You'll need to have https://github.com/tmuxinator/tmuxinator installed.
It's available as a regular package in most Linux distributions or through
Nix or Brew on macOS.
This commit also fixes the initial hang in the Eth1 monitor in the "find
TTD block" procedure through a fix to the network metadata files which
hasn't been upstreamed yet.
Other changes:
* Disabled Geth snap sync in the simulation
When all Geth nodes are configured to run with snap sync enabled, they all
start snap sync after the first forkchoiceUpdated which causes the BNs to
skip validator duties because the EL is syncing. The snap sync never completes
due to poor connectivity between the Geth nodes in the simulation.
libp2p issues related to operation cancellations have been addressed in
https://github.com/status-im/nim-libp2p/pull/816
This means we can once more enable `--sync-light-client` in CI, without
having to deal with spurious CI failures due to the cancellation issues.
Other changes:
* More optimal search for TTD block.
* Add timeouts to all REST requests during trusted node sync.
Fixes#4037
* Removed support for storing a deposit snapshot in the network
metadata.
Since the sync committee duties are no longer updated on every slot
and previously the sync committee aggregators selection proofs were
generated during the duties update, this now resulted in the client
using stale selection proofs (they must be generated at each slot).
The fix consists of moving the selection proof generation logic in
a different function which is properly executed on each slot.
Other changes:
* The logtrace tool has been enhanced with a framework for adding
new simpler log aggregation and analysis algorithms.
The default CI testnet simulation will now ensure that the blocks
in the network have reasonable sync committee participation.
Local testnet simulation currently waits 5 seconds when starting each
individual Geth instance. Waiting a shorter amount saves almost a minute
per minimum + mainnet CI finalization job.
Measured startup times per Geth: Linux ~100ms, macOS Intel ~300ms.
Launching multiple local testnet simulation sequentially can lead to
existing EL processes from prior failed/aborted runs not being stopped
properly, subsequently leading to hard-to-debug CI test failures.
Fixing the cleanup logic addresses this problem.
The `eth1_monitor` check to require engine API from bellatrix onward
has issues in setups where the EL and CL are started simultaneously
because the EL may not be ready to answer requests by the time that the
check is performed. This can be observed, e.g., on Raspberry Pi 4 when
using Besu as the EL client. Now that the merge transition happened, the
check is also not that useful anymore, as users have other ways to know
that their setup is not working correctly (e.g., repeated exchange logs)
When the BN-embedded LC makes sync progress, pass the corresponding
execution block hash to the EL via `engine_forkchoiceUpdatedV1`.
This allows the EL to sync to wall slot while the chain DAG is behind.
Renamed `--light-client` to `--sync-light-client` for clarity, and
`--light-client-trusted-block-root` to `--trusted-block-root` for
consistency with `nimbus_light_client`.
Note that this does not work well in practice at this time:
- Geth sticks to the optimistic sync:
"Ignoring payload while snap syncing" (when passing the LC head)
"Forkchoice requested unknown head" (when updating to LC head)
- Nethermind syncs to LC head but does not report ancestors as VALID,
so the main forward sync is still stuck in optimistic mode:
"Pre-pivot block, ignored and returned Syncing"
To aid EL client teams in fixing those issues, having this available
as a hidden option is still useful.
* Fixes a segfault during block production when the Keymanager API
is disabled. The Keymanager is now disabled on half of the local
testnet nodes to catch such problems in the future.
* Fixes multiple potential stalls from REST requests being done
without a timeout. From practice, we know that such requests
can hang forever if not cancelled with a timeout. At best,
this would be a resource leak, at worst, it may lead to a
full stall of the client and missed validator duties.
* Changes some Options usages to Opt (for easier use of valueOr)
* Keymanager API for the validator client
* Properly treat the 'description' field as optional when loading Keystores
* Spec-compliant serialization of the slashing data in Keymanager's DeleteKeys response ()
Fixes#3940Fixes#3964Closes#3884 by adding test
* packaging updates
* one package per binary (nimbus_beacon_node, nimbus_validator_client)
* use `-` in package name (`_` is separating the version)
* don't include (un)installation scripts in package
* default metrics port 8108 for vc
* fix several upgrade/install errors in scripts
* add JWT option to service files
* don't attempt to remove user on purge
* Use final `v1` version for light client protocols
* Unhide LC data collection options
* Default enable LC data serving
* rm unneeded import
* Connect to EL on startup
* Add docs for LC based EL sync
The ports for the concurrently executing REST and Minimal testnet clash,
leading to some CI failures since #3827 introduced further concurrency.
Adjusting the ports to be distinct across various tests should fix this.
Adds the `--web3-url` launch argument to `nimbus_light_client` to enable
driving the EL with the optimistic head obtained from LC sync protocol.
This will keep issuing `newPayload` / `forkChoiceUpdated` requests for
new blocks, marking them as optimistic. `ZERO_HASH` is reported as the
finalized block for now.
`a48d741022f6b0da1bb679e0ede4e38c019242cf` disabled LC in local testnet
as an undocumented side effect. Re-enabling for more thorough testing,
and added handling of LC with `--eth2-docker-image`.
* remove web3 url prompt in launcher script
The interactive prompt for web3 has outlived its utility as we now load
url:s from command line params and config files, preventing the prompt
from correctly detecting when it's needed.
Also, after the merge, a JWT secret will (likely) be needed.
* log notice when web3 url is missing
* fix docs to not mention default that doesn't exist
* fix scripts to properly quote arguments