Commit Graph

2787 Commits

Author SHA1 Message Date
Etan Kissling 634408ff2c
use `nim-websock` instead of `news` (#4061)
`news` has a few open issues that are not present in `nim-websock`:
1. There is a 1 second delay between each MB of sent data.
2. Cancelling an ongoing `send` makes the entire WebSocket unusable.
3. Control packets do not have priority over ongoing message frames.

Using `news`, there are quite a few of these messages in Geth:
```
Previously seen beacon client is offline. Please ensure it is
operational to follow the chain!
```
It may take quite some time to reconnect when this happens.

Using `nim-websock`, this message still occurs because `eth1_monitor`
reconnects the EL connection when no new blocks occurred for 5 minutes,
but reconnecting is quick and the message is rarer.
2022-09-06 23:41:33 +02:00
tersec 8fbb3d975b
display invalid status in extra fork choice info (#4074)
* fork choice: support marking roots/nodes invalid

* check for invalid first

* display invalid status in extra fork choice info
2022-09-06 18:05:57 +00:00
tersec 11ebf60ab8
fork choice: support marking roots/nodes invalid (#4071)
* fork choice: support marking roots/nodes invalid

* check for invalid first
2022-09-06 16:58:54 +00:00
tersec 776f09215c
only mark post-finalized blocks invalid (#4072) 2022-09-06 11:43:19 +00:00
tersec e183dccc7f
blockchain DAG and fork choice comment cleanup (#4070) 2022-09-05 23:25:28 +00:00
Tanguy 2da13c0b22
Bump libp2p (#4066) 2022-09-05 20:05:36 +02:00
Jacek Sieka d9ceb61dbd
eth: bump (#4062) 2022-09-04 19:44:43 +02:00
Etan Kissling 8936212f93
descore on empty response for range w known block (#4050)
The sync protocol does not distinguish between:
- All requested slots are empty
- Peer does not have data available about requested range

Therefore, we treat EOF for `beacon_blocks_by_range` and for
`beacon_blocks_by_range` as valid responses, as if the entire epoch
really contained no single block for any slot. Once a followup response
provides new blocks, we detect that some blocks were missing and rewind.

During backfill, we also request the known-to-exist `backfill.slot`,
so we can actually detect whether an epoch really does not have blocks
or whether a response is incomplete (`PeerScoreNoBlocks`).
2022-09-03 23:12:58 +02:00
tersec 301e5a919d
remove some Bellatrix-specific references (#4019)
* remove some Bellatrix-specific references

* remove more bellatrixData-dependencies
2022-09-03 20:56:20 +00:00
Etan Kissling b7e4d1518b
msf11 deprecated, msf13 added, adjust deployment phases (#4056)
Removes deprecated msf11 and adds msf13 to devnets,
and extends devnet check for public testnets.
2022-09-03 00:49:32 +02:00
Zahary Karadjov 3a8abd6010
Version 22.8.2 2022-09-01 13:44:49 +03:00
Zahary Karadjov 3a045690e4
Merge branch 'stable' into unstable 2022-09-01 13:37:21 +03:00
tersec 2309f11e9e
don't access potentially unitialized Opts (#4054) 2022-08-31 16:36:24 +00:00
tersec ad0d30093f
state/forkyState cleanup; spec URL updates; rm unused imports (#4052) 2022-08-31 13:29:34 +02:00
tersec 9ae796daed
Cache and resend, rather than recreate, builder API registrations (#4040) 2022-08-31 03:29:03 +03:00
Zahary Karadjov eaa01dbd64
Version 22.8.1 2022-08-30 12:49:00 +03:00
Jacek Sieka 59092e5b3b
add some log data for fishy trusted attestations (#4049) 2022-08-30 02:59:42 +00:00
Etan Kissling 574b84f96f
add REST endpoint for fork choice context (#4042)
Implements a proposed REST endpoint for analyzing fork choice behaviour.
See https://github.com/ethereum/beacon-APIs/pull/232
2022-08-29 22:02:29 +00:00
Zahary Karadjov 74ac85a75f
Add reassuring log message upon connecting to the EL 2022-08-29 23:11:09 +03:00
Etan Kissling 613f4a9a50
accelerate EL sync with LC with `--sync-light-client` (#4041)
When the BN-embedded LC makes sync progress, pass the corresponding
execution block hash to the EL via `engine_forkchoiceUpdatedV1`.
This allows the EL to sync to wall slot while the chain DAG is behind.
Renamed `--light-client` to `--sync-light-client` for clarity, and
`--light-client-trusted-block-root` to `--trusted-block-root` for
consistency with `nimbus_light_client`.

Note that this does not work well in practice at this time:
- Geth sticks to the optimistic sync:
  "Ignoring payload while snap syncing" (when passing the LC head)
  "Forkchoice requested unknown head" (when updating to LC head)
- Nethermind syncs to LC head but does not report ancestors as VALID,
  so the main forward sync is still stuck in optimistic mode:
  "Pre-pivot block, ignored and returned Syncing"

To aid EL client teams in fixing those issues, having this available
as a hidden option is still useful.
2022-08-29 12:16:35 +00:00
tersec 2545d1d053
remove incorrect block gossip validation condition (#4044)
* remove incorrect block gossip validation condition

* clarify explanation
2022-08-29 13:01:32 +03:00
tersec d7e9c334ac
document external block builder configuration (#4032)
* document external block builder configuration

* Update docs/the_nimbus_book/src/external-block-builder.md

Co-authored-by: Jacek Sieka <jacek@status.im>

* unhide external payload builder options

* clarify builder API incentive misalignment

Co-authored-by: Jacek Sieka <jacek@status.im>
2022-08-29 12:59:12 +03:00
Jacek Sieka e87b7f1572
metrics: add block failure counters (#4036) 2022-08-29 12:55:20 +03:00
Etan Kissling 994339c7ee
adjust checkpoint tracking for devnets (#4039)
Track checkpoints more defensively on devnets with low participation.
2022-08-29 09:26:01 +02:00
tersec b60456fdf3
`withState`: `state` -> `forkyState` (#4038) 2022-08-26 22:47:40 +00:00
Etan Kissling 4e90e9f52c
update network list for msf11 and msf12 (#4034)
Tracks correct deployment phase for the latest mainnet shadow forks.
2022-08-26 16:49:43 +00:00
Jacek Sieka 91a1b4e0c5
better error message on invalid URL (fixes #4023) (#4024) 2022-08-26 15:47:55 +00:00
tersec 66a5e88203
allow accessing withState forky state via `forkyState` (#4026) 2022-08-26 17:14:18 +03:00
tersec 61dc296046
update engine API spec ref URLs from alpha.9 to beta.1 (#4030)
* update engine API spec ref URLs from alpha.9 to beta.1

* require exactly 256-bit JWT keys
2022-08-26 13:44:50 +03:00
Etan Kissling 64972e3c8a
set `safe_block_hash` to fork choice justified (#4010)
Implements the fork choice safe block spec, where `safe_block_hash` in
`forkChoiceUpdated` is set to justified (used to be `ZERO_HASH`).
https://github.com/ethereum/consensus-specs/blob/v1.2.0-rc.3/fork_choice/safe-block.md#get_safe_execution_payload_hash
2022-08-25 23:34:02 +00:00
Etan Kissling d619b539f3
fix engine API crash when EL disconnected (#4027)
When issuing an engine API call while the EL is disconnected, a `nil`
pointer is dereferenced. Fixed by correctly initializing futures.

```
Traceback (most recent call last, using override)
vendor/nim-libp2p/libp2p/protocols/pubsub/pubsub.nim(890) main
beacon_chain/nimbus_beacon_node.nim(2139) main
beacon_chain/nimbus_beacon_node.nim(0) handleStartUpCmd
beacon_chain/nimbus_beacon_node.nim(0) doRunBeaconNode
beacon_chain/nimbus_beacon_node.nim(0) start
beacon_chain/nimbus_beacon_node.nim(1589) run
vendor/nimbus-build-system/vendor/Nim/lib/system/iterators_1.nim(107) poll
vendor/nim-chronos/chronos/asyncfutures2.nim(365) futureContinue
beacon_chain/consensus_object_pools/consensus_manager.nim(297) updateHeadWithExecution
vendor/nim-chronos/chronos/asyncmacro2.nim(213) runProposalForkchoiceUpdated
vendor/nim-chronos/chronos/asyncfutures2.nim(365) futureContinue
beacon_chain/consensus_object_pools/consensus_manager.nim(259) runProposalForkchoiceUpdated
beacon_chain/eth1/eth1_monitor.nim(0) forkchoiceUpdated
vendor/nim-chronos/chronos/asyncfutures2.nim(219) complete
vendor/nim-chronos/chronos/asyncfutures2.nim(149) cancelled
vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(610) signalHandler
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
```
2022-08-25 20:07:29 +02:00
Etan Kissling 9180f09641
reduce LC optsync latency (#4002)
The optimistic sync spec was updated since the LC based optsync module
was introduced. It is no longer necessary to wait for the justified
checkpoint to have execution enabled; instead, any block is okay to be
optimistically imported to the EL client, as long as its parent block
has execution enabled. Complex syncing logic has been removed, and the
LC optsync module will now follow gossip directly, reducing the latency
when using this module. Note that because this is now based on gossip
instead of using sync manager / request manager, that individual blocks
may be missed. However, EL clients should recover from this by fetching
missing blocks themselves.
2022-08-25 03:53:59 +00:00
Etan Kissling eec6c04d32
do not descore peer when EL connection fails (#4020)
When the EL fails to respond to `newPayload`, e.g., because connection
to the EL got interrupted, or due to misconfiguration, optimistic blocks
cannot be imported according to spec. This condition is treated the same
as if the peer returned a block with missing parent which gets the block
out of our processing queue, but can have nasty side effects.

For example, if sync manager asks for validation of a block known to be
in the finalized range, if it receives a `MissingParent` verdict, the
peer is immediately removed from the peer pool.

```
DBG 2022-08-24 11:45:26.874+02:00 newPayload: inserting block into execution engine parentHash=e4ca7424 blockHash=36cdc198 stateRoot=cf3902c1 receiptsRoot=56e81f17 prevRandao=0b49a172 blockNumber=1518089 gasLimit=30000000 gasUsed=0 timestamp=1657980396 extraDataLen=0 baseFeePerGas=7 numTransactions=0
ERR 2022-08-24 11:45:26.875+02:00 newPayload failed                          msg="Transport is not initialised (missing a call to connect?)"
DBG 2022-08-24 11:45:26.875+02:00 Block pool rejected peer's response        topics="syncman" request=187232:32@1475 peer=16U*MsCJdx direction=forward blocks_map=xxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxx blocks_count=31 ok=false unviable=false missing_parent=true sync_ident=main
ERR 2022-08-24 11:45:26.875+02:00 Unexpected missing parent at finalized epoch slot topics="syncman" request=187232:32@1475 peer=16U*MsCJdx direction=forward rewind_to_slot=187232 blocks_count=31 blocks_map=xxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxx sync_ident=main
DBG 2022-08-24 11:45:26.875+02:00 Peer was removed from PeerPool due to low score topics="beacnde" peer=16U*MsCJdx peer_score=-1000 score_low_limit=0 score_high_limit=1000
DBG 2022-08-24 11:45:26.875+02:00 Lost connection to peer                    topics="networking" peer=16U*MsCJdx connections=0
```

By delaying issuing a verdict until the EL connection is restored and
`newPayload` successfully ran, the problem should be fixed. This also
induces back pressure to the sync manager by stopping download of new
blocks (or re-downloading the same block over and over again).
2022-08-24 16:55:41 +00:00
tersec 1d55743ebb
allow execution clients several seconds to construct blocks (#4012) 2022-08-23 19:19:52 +03:00
Jacek Sieka 9e9db216c5
Harden block proposal against expired slashings/exits (#4013)
* Harden block proposal against expired slashings/exits

When a message is signed in a phase0 domain, it can no longer be
validated under bellatrix due to the correct fork no longer being
available in the `BeaconState`.

To ensure that all slashing/exits are still valid, in this PR we re-run
the checks in the state that we're proposing for, thus hardening against
both signatures and other changes in the state that might have
invalidated the message.

* fix same message added multiple times

in case of attestation slashing of multiple validators in one go
2022-08-23 18:30:46 +03:00
tersec e70d5e6194
update spec ref URLs in state_transition_epoch (#4016) 2022-08-23 13:06:12 +00:00
Zahary Karadjov 57f9974fe5
Version 22.8.0 2022-08-23 01:11:29 +03:00
zah 4e41ed1d5a
Require properly configured Engine API connection after the merge (#4006) 2022-08-22 22:44:40 +03:00
Etan Kissling f1ddcfff0f
support connecting to peers without bellatrix (#4011)
* support connecting to peers without bellatrix

Make discovery fork ID aware of scheduled Bellatrix fork to enable
connections to peers that don't have Bellatrix scheduled yet.
Without this, has peering issues with peers on older SW version.

* expand tests with compatibility checks

* more exhaustive compatibility checks
2022-08-21 19:36:46 +02:00
Etan Kissling 74dc388ad9
do not prune LC data by default (#4008)
Aligns the default retention policy for LC data with the one for blocks.
Minimum spec requirement for both blocks and LC data is ~5 months.
Additional use cases are better supported by retaining data for longer.
2022-08-21 11:24:59 +02:00
tersec c65eaca1bf
update spec ref URLs (#4005) 2022-08-20 16:03:32 +00:00
zah 09de83af80
Reviewed the Engine API calls for missing error handling (#4004) 2022-08-20 09:09:25 +03:00
zah b1ac9c9fe4
Fix a potential segfault and various potential stalls (#4003)
* Fixes a segfault during block production when the Keymanager API
  is disabled. The Keymanager is now disabled on half of the local
  testnet nodes to catch such problems in the future.

* Fixes multiple potential stalls from REST requests being done
  without a timeout. From practice, we know that such requests
  can hang forever if not cancelled with a timeout. At best,
  this would be a resource leak, at worst, it may lead to a
  full stall of the client and missed validator duties.

* Changes some Options usages to Opt (for easier use of valueOr)
2022-08-19 21:51:30 +00:00
tersec f537f263df
don't use empty execution payload when newPayload rejects it (#3999)
* don't use empty execution payload when newPayload rejects it

* disallow optimistic import except when accepted/syncing
2022-08-20 00:20:57 +03:00
zah df5ef95111
Doppelganger detection bug fix (#3997)
When the client was started without any validators, the doppelganger
detection structures were never initialized properly. Later, when
validators were added through the Keymanager API, they interacted
with the uninitialized doppelganger detection structures and their
duties were inappropriately skipped.
2022-08-19 13:34:08 +03:00
zah fca20e08d6
Keymanager API for the validator client (#3976)
* Keymanager API for the validator client
* Properly treat the 'description' field as optional when loading Keystores
* Spec-compliant serialization of the slashing data in Keymanager's DeleteKeys response ()

Fixes #3940
Fixes #3964
Closes #3884 by adding test
2022-08-19 13:30:07 +03:00
zah a7192f5d6c
Fix the block header computation when proposing an empty execution payload (#3991)
* Fix the block header computation when proposing an empty execution payload
* Spec compliant base fee calculation when producing empty payloads
2022-08-19 13:28:42 +03:00
tersec b5b93e90c0
use v1.2.0-rc.3 test vectors (#3995) 2022-08-19 04:32:53 +00:00
Jacek Sieka c8fb447020
valmon: log autoregistration once only (#3993) 2022-08-18 23:09:49 +00:00
Jacek Sieka 0d9fd54857
cache shuffling separately from other EpochRef data (fixes #2677) (#3990)
In order to avoid full replays when validating attestations hailing from
untaken forks, it's better to keep shufflings separate from `EpochRef`
and perform a lookahead on the shuffling when processing the block that
determines them.

This also helps performance in the case where REST clients are trying to
perform lookahead on attestation duties and decreases memory usage by
sharing shufflings between EpochRef instances of the same dependent
root.
2022-08-18 21:07:01 +03:00