* Working Makefile targets for Capella devnet2
make capella-devnet-2
make clean-capella-devnet-2
You'll need to have https://github.com/tmuxinator/tmuxinator installed.
It's available as a regular package in most Linux distributions or through
Nix or Brew on macOS.
This commit also fixes the initial hang in the Eth1 monitor in the "find
TTD block" procedure through a fix to the network metadata files which
hasn't been upstreamed yet.
Other changes:
* Disabled Geth snap sync in the simulation
When all Geth nodes are configured to run with snap sync enabled, they all
start snap sync after the first forkchoiceUpdated which causes the BNs to
skip validator duties because the EL is syncing. The snap sync never completes
due to poor connectivity between the Geth nodes in the simulation.
Introduce (optional) pruning of historical data - a pruned node will
continue to answer queries for historical data up to
`MIN_EPOCHS_FOR_BLOCK_REQUESTS` epochs, or roughly 5 months, capping
typical database usage at around 60-70gb.
To enable pruning, add `--history=prune` to the command line - on the
first start, old data will be cleared (which may take a while) - after
that, data is pruned continuously.
When pruning an existing database, the database will not shrink -
instead, the freed space is recycled as the node continues to run - to
free up space, perform a trusted node sync with a fresh database.
When switching on archive mode in a pruned node, history is retained
from that point onwards.
History pruning is scheduled to be enabled by default in a future
release.
In this PR, `minimal` mode from #4419 is not implemented meaning
retention periods for states and blocks are always the same - depending
on user demand, a future PR may implement `minimal` as well.
Since the sync committee duties are no longer updated on every slot
and previously the sync committee aggregators selection proofs were
generated during the duties update, this now resulted in the client
using stale selection proofs (they must be generated at each slot).
The fix consists of moving the selection proof generation logic in
a different function which is properly executed on each slot.
Other changes:
* The logtrace tool has been enhanced with a framework for adding
new simpler log aggregation and analysis algorithms.
The default CI testnet simulation will now ensure that the blocks
in the network have reasonable sync committee participation.
To further tighten Nimbus against spam, this PR introduces a global
quota for block requests (shared between peers) as well as a general
per-peer request limit that applies to all libp2p requests.
* apply request quota before decoding message
* for high-bandwidth requests (blocks), apply a shared global quota
which helps manage bandwidth for high-peer setups
* add metrics
When connecting to hosts on shared IP/Port using TLS, SNI must be sent
to allow the remote server to provide the correct TLS certificate.
Bump the `nim-json-rpc` and `nim-websock` dependencies to send SNI ext.
`news` has a few open issues that are not present in `nim-websock`:
1. There is a 1 second delay between each MB of sent data.
2. Cancelling an ongoing `send` makes the entire WebSocket unusable.
3. Control packets do not have priority over ongoing message frames.
Using `news`, there are quite a few of these messages in Geth:
```
Previously seen beacon client is offline. Please ensure it is
operational to follow the chain!
```
It may take quite some time to reconnect when this happens.
Using `nim-websock`, this message still occurs because `eth1_monitor`
reconnects the EL connection when no new blocks occurred for 5 minutes,
but reconnecting is quick and the message is rarer.
When calling `newPayload` on a >1MB payload (can happen post-merge),
`news` splits up that payload into 1MB chunks. The chunks are each sent
individually, though, with `await` in-between. This means that when we
send concurrent `forkChoiceUpdated` calls, that those may end up getting
in-between the `newPayload` chunks, leading to invalid data being sent.
The EL then returns an error message with a `null` `id` entry (as it
could not read the request `id` due to the mangling) and disconnects.
A PR has been submitted to fix this in `news`, and merged into `status`
branch early as this fix is critical for reliable post-merge operation:
https://github.com/Tormund/news/pull/22
* Re-enabled requireAllFields after a fix in nim-json-serialization
The problem was that `Option[T]` fields were not treated as optional
when requireAllFields is set to true. This is now fixed in NJS.
* Add makefile targets for recreating the Jenkins simulation runs
* Fix a discrepancy with the REST spec
Whether new blocks/attestations/etc are produced internally or received
via REST, their journey through the node is the same - to ensure that
they get the same treatment (logging, metrics, processing), this PR
moves the routing to a dedicated module and fixes several small
differences that existed before.
* `xxxValidator` -> `processMessageName` - the processor also was adding
messages to pools, so we want the name to reflect that action
* add missing "sent" metrics for some messages
* document ignore policy better - already-seen messages are not actaully
rebroadcast by libp2p
* skip redundant signature checks for internal validators consistently