Commit Graph

152 Commits

Author SHA1 Message Date
Jakub Sokołowski 6162e735dd
launch_local_testnet: add debugging for port conflicts (#5317)
We have been seeing some port conflicts like:
```
[2023-08-15T00:31:47.625Z] Geth 0 failed to start
```
```
$ tail -n1 local-testnet-mainnet/logs/geth.?.txt
==> local-testnet-mainnet/logs/geth.0.txt <==
Fatal: Error starting protocol stack: listen tcp :6801: bind: address already in use

==> local-testnet-mainnet/logs/geth.1.txt <==
Fatal: Error starting protocol stack: listen tcp :6806: bind: address already in use

==> local-testnet-mainnet/logs/geth.2.txt <==
Fatal: Error starting protocol stack: listen tcp :6811: bind: address already in use
```
In order to debug this we'll need to add printing of some extra info
into `unstable` so feature branches include it.

Related: https://github.com/status-im/nimbus-eth2/issues/4575

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2023-08-18 22:24:40 +00:00
Etan Kissling 6a2bac5cee
need even more log lines for debugging keymanager (#5260)
Keymanager test logs are still cut off, further increase log lines.
2023-08-07 18:02:08 +02:00
Etan Kissling 5b70a686e3
further adjust `test_keymanager_api` logs (#5259)
It is still unclear how `test_keymanager_api` sometimes fails in CI;
further adjust logging parameters.
2023-08-05 13:12:44 +00:00
Etan Kissling ed213538c7
more log lines when jobs fail (#5226)
Increase log lines emitted on test failure to 1000.
2023-07-31 15:03:32 +02:00
Zahary Karadjov e6ed11fc33
Restore the ability to test the Web3Signer (don't enable it in CI yet) 2023-06-03 00:41:57 +03:00
zah 8833acbe23
Add support for using custom remote signers in local sim (#4989)
* Add support for using custom remote signers in local sim

Other changes:

* Enable the Nimbus remote signer in the minimal simulation
* Move all log files into the `logs` folder of the simulation
* Create PID files for all processes and use them during the clean-up
  phase instead of the previous more fragile methods for killing the
  remaining processes.
2023-05-25 15:05:38 +00:00
henridf 5dfd814588
Load trusted setup (#4870)
* Kzg: Load trusted setup

* scripts/launch_local_testnet.sh: set FIELD_ELEMENTS_PER_BLOB

* Use right setup file for mainnet/minimal

* Force rebuild

* Add comment explaining why build with -f
2023-05-11 11:52:44 +03:00
Zahary Karadjov 0d6453aa51
Add an USE_VC env variable affecting the local simulation targets 2023-04-28 00:21:01 +03:00
Etan Kissling 450f06566b
accelerate execution layer sync using light client (#4805)
Turn on `--sync-light-client` option by default, now that it has shown
stability in local testnets.
2023-04-10 14:28:46 +00:00
tersec ed148ab69d
fix false positive getopt failure with multiple getopt matches in searched path (#4797)
* fix false positive getopt failure with multiple getopt matches in searched path

* also get launch_local_testnet

* also make_prometheus_config, called from launch_local_testnet
2023-04-08 00:18:29 +00:00
Zahary Karadjov 7f2a3b7130
[local sim] Download the latest nimbus-eth2 when --dl-nimbus-eth2 is used 2023-04-05 19:12:28 +03:00
Zahary Karadjov b4d731a1cc
Make it possible to run the local simulation on nixOS 2023-03-30 19:37:29 +03:00
zah 6539928775
Refer to binaries with their proper extensions on Windows (#4739) 2023-03-17 19:43:52 +01:00
Etan Kissling 2f006d31e7
rename `GETH_EIP_4844_BINARY` > `GETH_DENEB_BINARY` (#4729) 2023-03-12 01:47:24 +00:00
zah 8771e91d53
Support for driving multiple EL nodes from a single Nimbus BN (#4465)
* Support for driving multiple EL nodes from a single Nimbus BN

Full list of changes:

* Eth1Monitor has been renamed to ELManager to match its current
  responsibilities better.

* The ELManager is no longer optional in the code (it won't have
  a nil value under any circumstances).

* The support for subscribing for headers was removed as it only
  worked with WebSockets and contributed significant complexity
  while bringing only a very minor advantage.

* The `--web3-url` parameter has been deprecated in favor of a
  new `--el` parameter. The new parameter has a reasonable default
  value and supports specifying a different JWT for each connection.
  Each connection can also be configured with a different set of
  responsibilities (e.g. download deposits, validate blocks and/or
  produce blocks). On the command-line, these properties can be
  configured through URL properties stored in the #anchor part of
  the URL. In TOML files, they come with a very natural syntax
  (althrough the URL scheme is also supported).

* The previously scattered EL-related state and logic is now moved
  to `eth1_monitor.nim` (this module will be renamed to `el_manager.nim`
  in a follow-up commit). State is assigned properly either to the
  `ELManager` or the to individual `ELConnection` objects where
  appropriate.

  The ELManager executes all Engine API requests against all attached
  EL nodes, in parallel. It compares their results and if there is a
  disagreement regarding the validity of a certain payload, this is
  detected and the beacon node is protected from publishing a block
  with a potential execution layer consensus bug in it.

  The BN provides metrics per EL node for the number of successful or
  failed requests for each type Engine API requests. If an EL node
  goes offline and connectivity is resoted later, we report the
  problem and the remedy in edge-triggered fashion.

* More progress towards implementing Deneb block production in the VC
  and comparing the value of blocks produced by the EL and the builder
  API.

* Adds a Makefile target for the zhejiang testnet
2023-03-05 01:40:21 +00:00
Jakub Sokołowski 2f36f15b20
launch_local_testnet.sh: fix terminating Geth (#4684)
Also improve logging and list jobs after cleanup.

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2023-03-01 18:27:55 +02:00
Jakub Sokołowski f46ed12f04
launch_local_testnet.sh: re-add targetting parent PID (#4680)
Otherwise we kill of other unrelated processes.

Fix for bug introduced most probably in:
https://github.com/status-im/nimbus-eth2/pull/4551

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2023-02-28 10:39:39 +00:00
Jakub Sokołowski f2a09802b1
launch_local_testnet.sh: print processes first
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2023-02-28 11:24:20 +01:00
Jakub Sokołowski 02437bb365
launch_local_testnet.sh: show which killed processes
First step in debugging issue most probably re-introduced by:
https://github.com/status-im/nimbus-eth2/pull/4551

Which causes the finalization tests script to kill other processes
unrelated to the given CI job.

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2023-02-28 11:24:17 +01:00
Kim De Mey 14f4100e1b
Fix Nimbus EL node index in local testnet script (#4671) 2023-02-27 20:39:50 +01:00
zah 6036f2e7d7
Local sim impovements (#4551)
* Local sim impovements

* Added support for running Capella and EIP-4844 simulations
  by downloading the correct version of Geth.

* Added support for using Nimbus remote signer and Web3Signer.
  Use 2 out of 3 threshold signing configuration in the mainnet
  configuration and regular remote signing in the minimal one.

* The local testnet simulation can now use a payload builder.
  This is currently not activated in CI due to lack of automated
  procedures for installing third-party relays or builders.

  You are adviced to use mergemock for now, but for most realistic
  results, we can create a simple builder based on the nimbus-eth1
  codebase that will be able to propose transactions from the regular
  network mempool.

* Start the simulation from a merged state. This would allow us
  to start removing pre-merge functionality such as the gossip
  subsciption logic. The commit also removes the merge-forcing
  hack installed after the TTD removal.

* Consolidate all the tools used in the local simulation into a
  single `ncli_testnet` binary.
2023-02-23 02:10:07 +00:00
tersec 1c62a5eb24
capella VC support (#4586) 2023-02-03 16:12:11 +01:00
zah 0f758c5f02
Working Makefile targets for Capella devnet2 (#4494)
* Working Makefile targets for Capella devnet2

make capella-devnet-2
make clean-capella-devnet-2

You'll need to have https://github.com/tmuxinator/tmuxinator installed.
It's available as a regular package in most Linux distributions or through
Nix or Brew on macOS.

This commit also fixes the initial hang in the Eth1 monitor in the "find
TTD block" procedure through a fix to the network metadata files which
hasn't been upstreamed yet.

Other changes:

* Disabled Geth snap sync in the simulation

When all Geth nodes are configured to run with snap sync enabled, they all
start snap sync after the first forkchoiceUpdated which causes the BNs to
skip validator duties because the EL is syncing. The snap sync never completes
due to poor connectivity between the Geth nodes in the simulation.
2023-01-13 12:21:58 +02:00
tersec b1acae67c3
revert to Geth stable commits for testnet (#4476) 2023-01-09 14:16:29 +00:00
Etan Kissling d80082c784
re-enable `--sync-light-client` in CI testnets (#4469)
libp2p issues related to operation cancellations have been addressed in
https://github.com/status-im/nim-libp2p/pull/816
This means we can once more enable `--sync-light-client` in CI, without
having to deal with spurious CI failures due to the cancellation issues.
2023-01-06 23:38:38 +01:00
Etan Kissling 44aa1f5152
avoid VC keymanager port conflict in CI (#4459)
`BASE_VC_KEYMANAGER_PORT` was not configured separately between runs,
make it configurable and base on `EXECUTOR_NUMBER` like the others.
2023-01-04 18:59:35 +01:00
tersec 121c66ad3e
fix getForkedBlock to read EIP4844 and Capella blocks (#4450)
* fix getForkedBlock to read EIP4844 and Capella blocks

* update geth develop URLs
2022-12-30 21:59:21 +00:00
tersec 2b7e2499d9
use unstable Geth version in CI for capella and eip4844 support (#4415) 2022-12-13 00:55:33 +00:00
zah d30cb8baf1
Support for obtaining deposit snapshots during trustedNodeSync (#4303)
Other changes:

* More optimal search for TTD block.

* Add timeouts to all REST requests during trusted node sync.
  Fixes #4037

* Removed support for storing a deposit snapshot in the network
  metadata.
2022-12-07 12:24:51 +02:00
Zahary Karadjov fbbab3bfef
Add version metric for the VC; Enable VC metrics in the local simulation 2022-11-30 12:47:11 +02:00
zah d07113767d
Bugfix: The VC was producing invalid sync committee contributions (#4343)
Since the sync committee duties are no longer updated on every slot
and previously the sync committee aggregators selection proofs were
generated during the duties update, this now resulted in the client
using stale selection proofs (they must be generated at each slot).

The fix consists of moving the selection proof generation logic in
a different function which is properly executed on each slot.

Other changes:

* The logtrace tool has been enhanced with a framework for adding
  new simpler log aggregation and analysis algorithms.
  The default CI testnet simulation will now ensure that the blocks
  in the network have reasonable sync committee participation.
2022-11-24 09:46:35 +02:00
Etan Kissling bee3c4d68e
properly stop EL instances from prior testnets (#4328)
Launching multiple local testnet simulation sequentially can lead to
existing EL processes from prior failed/aborted runs not being stopped
properly, subsequently leading to hard-to-debug CI test failures.
Fixing the cleanup logic addresses this problem.
2022-11-16 12:12:20 +02:00
Etan Kissling 23cf8e9b22
temporarily disable LC sync in testnet (#4238)
LC sync seems to trigger libp2p disconnects in launch_local_testnet in
some cases. Disable the testing flag for now until investigated.
2022-10-14 15:12:14 +00:00
tersec 20d6481b39
increase local testnet validators from 128 to 1024 (#4214) 2022-10-04 19:44:20 +03:00
Jakub Sokołowski d697a54846
local_launch_testnet: use -SIGKILL instead of -9 (#4159)
Might help with this error:
```
pkill: illegal option -- 9
usage: pkill [-signal] [-ILfilnovx] [-F pidfile] [-G gid]
             [-P ppid] [-U uid] [-g pgrp] [-t tty] [-u euid]
             pattern ...
```

Signed-off-by: Jakub Sokołowski <jakub@status.im>

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2022-09-22 11:55:50 +00:00
Etan Kissling 317e91bfc1
support jwtsecret in `launch_local_testnet` (#4097)
To allow testing more thoroughly with connected EL, pass `jwtsecret`
as part of running local testnets.
2022-09-08 15:22:31 +02:00
Etan Kissling 613f4a9a50
accelerate EL sync with LC with `--sync-light-client` (#4041)
When the BN-embedded LC makes sync progress, pass the corresponding
execution block hash to the EL via `engine_forkchoiceUpdatedV1`.
This allows the EL to sync to wall slot while the chain DAG is behind.
Renamed `--light-client` to `--sync-light-client` for clarity, and
`--light-client-trusted-block-root` to `--trusted-block-root` for
consistency with `nimbus_light_client`.

Note that this does not work well in practice at this time:
- Geth sticks to the optimistic sync:
  "Ignoring payload while snap syncing" (when passing the LC head)
  "Forkchoice requested unknown head" (when updating to LC head)
- Nethermind syncs to LC head but does not report ancestors as VALID,
  so the main forward sync is still stuck in optimistic mode:
  "Pre-pivot block, ignored and returned Syncing"

To aid EL client teams in fixing those issues, having this available
as a hidden option is still useful.
2022-08-29 12:16:35 +00:00
zah b1ac9c9fe4
Fix a potential segfault and various potential stalls (#4003)
* Fixes a segfault during block production when the Keymanager API
  is disabled. The Keymanager is now disabled on half of the local
  testnet nodes to catch such problems in the future.

* Fixes multiple potential stalls from REST requests being done
  without a timeout. From practice, we know that such requests
  can hang forever if not cancelled with a timeout. At best,
  this would be a resource leak, at worst, it may lead to a
  full stall of the client and missed validator duties.

* Changes some Options usages to Opt (for easier use of valueOr)
2022-08-19 21:51:30 +00:00
zah fca20e08d6
Keymanager API for the validator client (#3976)
* Keymanager API for the validator client
* Properly treat the 'description' field as optional when loading Keystores
* Spec-compliant serialization of the slashing data in Keymanager's DeleteKeys response ()

Fixes #3940
Fixes #3964
Closes #3884 by adding test
2022-08-19 13:30:07 +03:00
Nikolay Mitev 33546f0fa9 Trivial: Fix typo 2022-08-15 16:46:14 +03:00
Nikolay Mitev 9e6d9b955d Trivial: Make NIMBUS_EL_BINARY customizable 2022-08-14 15:51:42 +03:00
Nikolay Mitev 607de676cb
launch_local_testnet script: add options to download eth2 binary (#3958)
Some refactoring and cleanup
2022-08-12 17:41:11 +03:00
zah 8273b3d909
Keep CLI options consistent by removing the '-enable' suffix from the outliers (#3928) 2022-08-05 17:38:26 +02:00
Etan Kissling 3ec7982293
update light client protocol version (#3550)
* Use final `v1` version for light client protocols
* Unhide LC data collection options
* Default enable LC data serving
* rm unneeded import
* Connect to EL on startup
* Add docs for LC based EL sync
2022-07-29 11:45:39 +03:00
Jakub Sokołowski 3f30eb51dd
launch_local_testnet: show processes killed by pkill
I want to verify that processes being killed are not from other parallel jobs.

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2022-07-26 11:54:02 +02:00
Etan Kissling fd4cf35c20
fix concurrent Jenkins stages (#3904)
The ports for the concurrently executing REST and Minimal testnet clash,
leading to some CI failures since #3827 introduced further concurrency.
Adjusting the ports to be distinct across various tests should fix this.
2022-07-23 14:28:10 +00:00
Etan Kissling a6deacd878
allow driving EL with LC (#3865)
Adds the `--web3-url` launch argument to `nimbus_light_client` to enable
driving the EL with the optimistic head obtained from LC sync protocol.
This will keep issuing `newPayload` / `forkChoiceUpdated` requests for
new blocks, marking them as optimistic. `ZERO_HASH` is reported as the
finalized block for now.
2022-07-14 04:07:40 +00:00
Etan Kissling 20491560b6
re-enable LC in local testnet (#3857)
`a48d741022f6b0da1bb679e0ede4e38c019242cf` disabled LC in local testnet
as an undocumented side effect. Re-enabling for more thorough testing,
and added handling of LC with `--eth2-docker-image`.
2022-07-12 11:38:50 +00:00
tersec 31772a2ac2
don't use JWT if one hasn't set up a JWT secret (#3822) 2022-06-29 07:36:22 +00:00
Zahary Karadjov 73227508db
Fix regressions in launch_local_simulation 2022-06-28 19:22:06 +03:00