Port 9952 sometimes seems blocked, maybe by another Status project
being tested on the same host, or by UPS process, or maybe it needs
SO_REUSEPORT as well?
Switch to 9962 for now to see if that improves things.
```
{"lvl":"DBG","ts":"2023-08-06 13:44:56.834+03:00","msg":"Initializing ELManager","topics":"beacnde","tid":231629305,"file":"el_manager.nim:1823","depositContractBlockNumber":1,"depositContractBlockHash":"00000000"}
{"lvl":"ERR","ts":"2023-08-06 13:44:56.834+03:00","msg":"Could not create HTTP server instance","topics":"beacnde","tid":231629305,"file":"server.nim:50","address":{"family":"IPv4","address_v4":[127,0,0,1],"port":9952},"serverIdent":"nim-presto/0.0.3 (arm64/macosx)","serverFlags":["NotifyDisconnect","QueryCommaSeparatedArray"],"socketFlags":["ReuseAddr"],"serverUri":{"scheme":"","username":"","password":"","hostname":"","port":"","path":"","query":"","anchor":"","opaque":false,"isIpv6":false},"maxConnections":-1,"backlogSize":100,"bufferSize":4096,"httpHeadersTimeout":"15250w1d23h47m16s854ms775us807ns","maxHeadersSize":131072,"maxRequestBodySize":16777216,"err":"(48) Address already in use"}
{"lvl":"NOT","ts":"2023-08-06 13:44:56.835+03:00","msg":"Rest server could not be started","topics":"beacnde","tid":231629305,"file":"nimbus_binary_common.nim:370","address":"127.0.0.1:9952","reason":"Could not create HTTP server instance"}
```
I couldn't install those packages with the versions previously stated.
The new versions install successfully,
and `make publish-book` works without errors.
* Add support for using custom remote signers in local sim
Other changes:
* Enable the Nimbus remote signer in the minimal simulation
* Move all log files into the `logs` folder of the simulation
* Create PID files for all processes and use them during the clean-up
phase instead of the previous more fragile methods for killing the
remaining processes.
* Refactor nimbus_signing_node to support Unix signals.
* Fix SN unable to close REST server properly.
* Fix `keys`, `deposit` and `validator_registration` endpoints issues.
Add getValidatorExitSignature() and getDepositMessageSignature() to validator_pool.
* Add /reload endpoint and implementation.
Fix signData to not cancel `timer`.
Fix validator_pool should clear attachedValidators table.
* Diva protocol enhancement implementation.
* Support for driving multiple EL nodes from a single Nimbus BN
Full list of changes:
* Eth1Monitor has been renamed to ELManager to match its current
responsibilities better.
* The ELManager is no longer optional in the code (it won't have
a nil value under any circumstances).
* The support for subscribing for headers was removed as it only
worked with WebSockets and contributed significant complexity
while bringing only a very minor advantage.
* The `--web3-url` parameter has been deprecated in favor of a
new `--el` parameter. The new parameter has a reasonable default
value and supports specifying a different JWT for each connection.
Each connection can also be configured with a different set of
responsibilities (e.g. download deposits, validate blocks and/or
produce blocks). On the command-line, these properties can be
configured through URL properties stored in the #anchor part of
the URL. In TOML files, they come with a very natural syntax
(althrough the URL scheme is also supported).
* The previously scattered EL-related state and logic is now moved
to `eth1_monitor.nim` (this module will be renamed to `el_manager.nim`
in a follow-up commit). State is assigned properly either to the
`ELManager` or the to individual `ELConnection` objects where
appropriate.
The ELManager executes all Engine API requests against all attached
EL nodes, in parallel. It compares their results and if there is a
disagreement regarding the validity of a certain payload, this is
detected and the beacon node is protected from publishing a block
with a potential execution layer consensus bug in it.
The BN provides metrics per EL node for the number of successful or
failed requests for each type Engine API requests. If an EL node
goes offline and connectivity is resoted later, we report the
problem and the remedy in edge-triggered fashion.
* More progress towards implementing Deneb block production in the VC
and comparing the value of blocks produced by the EL and the builder
API.
* Adds a Makefile target for the zhejiang testnet
* Local sim impovements
* Added support for running Capella and EIP-4844 simulations
by downloading the correct version of Geth.
* Added support for using Nimbus remote signer and Web3Signer.
Use 2 out of 3 threshold signing configuration in the mainnet
configuration and regular remote signing in the minimal one.
* The local testnet simulation can now use a payload builder.
This is currently not activated in CI due to lack of automated
procedures for installing third-party relays or builders.
You are adviced to use mergemock for now, but for most realistic
results, we can create a simple builder based on the nimbus-eth1
codebase that will be able to propose transactions from the regular
network mempool.
* Start the simulation from a merged state. This would allow us
to start removing pre-merge functionality such as the gossip
subsciption logic. The commit also removes the merge-forcing
hack installed after the TTD removal.
* Consolidate all the tools used in the local simulation into a
single `ncli_testnet` binary.
Extends fork choice state to also track slot numbers to improve accuracy
of `/eth/v1/debug/fork_choice` endpoint. Autoenable this API on devnet,
and disable some extra checks on devnet to aid focused testing efforts.
Align fork choice pruning logic with API based on checkpoints vs root.
* Working Makefile targets for Capella devnet2
make capella-devnet-2
make clean-capella-devnet-2
You'll need to have https://github.com/tmuxinator/tmuxinator installed.
It's available as a regular package in most Linux distributions or through
Nix or Brew on macOS.
This commit also fixes the initial hang in the Eth1 monitor in the "find
TTD block" procedure through a fix to the network metadata files which
hasn't been upstreamed yet.
Other changes:
* Disabled Geth snap sync in the simulation
When all Geth nodes are configured to run with snap sync enabled, they all
start snap sync after the first forkchoiceUpdated which causes the BNs to
skip validator duties because the EL is syncing. The snap sync never completes
due to poor connectivity between the Geth nodes in the simulation.
In Jenkins CI we run two instances of unit tests concurrently.
This can trigger CI failure when the same port numbers are re-used
by the different test instances. Fixed one more issue of this by
allowing user configuration of the base port number.
When the BN-embedded LC makes sync progress, pass the corresponding
execution block hash to the EL via `engine_forkchoiceUpdatedV1`.
This allows the EL to sync to wall slot while the chain DAG is behind.
Renamed `--light-client` to `--sync-light-client` for clarity, and
`--light-client-trusted-block-root` to `--trusted-block-root` for
consistency with `nimbus_light_client`.
Note that this does not work well in practice at this time:
- Geth sticks to the optimistic sync:
"Ignoring payload while snap syncing" (when passing the LC head)
"Forkchoice requested unknown head" (when updating to LC head)
- Nethermind syncs to LC head but does not report ancestors as VALID,
so the main forward sync is still stuck in optimistic mode:
"Pre-pivot block, ignored and returned Syncing"
To aid EL client teams in fixing those issues, having this available
as a hidden option is still useful.
* Use final `v1` version for light client protocols
* Unhide LC data collection options
* Default enable LC data serving
* rm unneeded import
* Connect to EL on startup
* Add docs for LC based EL sync
A fix for a bug triggered by recent `Jenkinsfile` refactoring done in:
https://github.com/status-im/nimbus-eth2/pull/3827
Which due to a big in Jenkins Throttling plugin caused jobs to start
running in parallel on the same host despite global configuration that
is supposed to block this:
https://issues.jenkins.io/browse/JENKINS-49173https://github.com/jenkinsci/throttle-concurrent-builds-plugin/pull/68
An attempt to fix this was made in this PR:
https://github.com/status-im/nimbus-eth2/pull/3913
But it was ineffective due to bugs in the Throttle plugin.
As a result semi-random testnet launches would fail with errors like this:
```
./scripts/launch_local_testnet.sh: line 1026: 58977 Killed: 9 ${BEACON_NODE_COMMAND} ...
```
The culprit was the old process cleanup in `scripts/launch_local_testnet.sh`:
```
+ make local-testnet-mainnet
Found old process listening on port 7001, with PID 58977. Killing it.
Found old process listening on port 7002, with PID 59024. Killing it.
Found old process listening on port 7003, withu PID 59027. Killing it.
Found old process listening on port 7004, with PID 59030. Killing it.
```
Which was triggered due to use of immediate assignment for `EXECUTOR_NUMBER`:
```
EXECUTOR_NUMBER := 0
```
Which cause the `EXECUTOR_NUMBER` value set by Jenkins to be ignored.
For more details see:
https://www.gnu.org/software/make/manual/html_node/Flavors.html#Flavors
Signed-off-by: Jakub Sokołowski <jakub@status.im>
The ports for the concurrently executing REST and Minimal testnet clash,
leading to some CI failures since #3827 introduced further concurrency.
Adjusting the ports to be distinct across various tests should fix this.
`mkdocs` works with markdown similar to `mdbook` but is generally more
pleasing to the eye and has several nice UX features.
This PR does the bulk of the transition - likely, a followup would be
needed to fully make use of the extra features and navigation.
Book pages have been kept url-compatible, meaning that for the most
part, old links should continue to work!
Co-authored-by: Etan Kissling <etan@status.im>