do not descore peer when EL connection fails (#4020)

When the EL fails to respond to `newPayload`, e.g., because connection
to the EL got interrupted, or due to misconfiguration, optimistic blocks
cannot be imported according to spec. This condition is treated the same
as if the peer returned a block with missing parent which gets the block
out of our processing queue, but can have nasty side effects.

For example, if sync manager asks for validation of a block known to be
in the finalized range, if it receives a `MissingParent` verdict, the
peer is immediately removed from the peer pool.

```
DBG 2022-08-24 11:45:26.874+02:00 newPayload: inserting block into execution engine parentHash=e4ca7424 blockHash=36cdc198 stateRoot=cf3902c1 receiptsRoot=56e81f17 prevRandao=0b49a172 blockNumber=1518089 gasLimit=30000000 gasUsed=0 timestamp=1657980396 extraDataLen=0 baseFeePerGas=7 numTransactions=0
ERR 2022-08-24 11:45:26.875+02:00 newPayload failed                          msg="Transport is not initialised (missing a call to connect?)"
DBG 2022-08-24 11:45:26.875+02:00 Block pool rejected peer's response        topics="syncman" request=187232:32@1475 peer=16U*MsCJdx direction=forward blocks_map=xxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxx blocks_count=31 ok=false unviable=false missing_parent=true sync_ident=main
ERR 2022-08-24 11:45:26.875+02:00 Unexpected missing parent at finalized epoch slot topics="syncman" request=187232:32@1475 peer=16U*MsCJdx direction=forward rewind_to_slot=187232 blocks_count=31 blocks_map=xxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxx sync_ident=main
DBG 2022-08-24 11:45:26.875+02:00 Peer was removed from PeerPool due to low score topics="beacnde" peer=16U*MsCJdx peer_score=-1000 score_low_limit=0 score_high_limit=1000
DBG 2022-08-24 11:45:26.875+02:00 Lost connection to peer                    topics="networking" peer=16U*MsCJdx connections=0
```

By delaying issuing a verdict until the EL connection is restored and
`newPayload` successfully ran, the problem should be fixed. This also
induces back pressure to the sync manager by stopping download of new
blocks (or re-downloading the same block over and over again).
This commit is contained in:
Etan Kissling 2022-08-24 18:55:41 +02:00 committed by GitHub
parent 492a2ccfac
commit eec6c04d32
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 14 additions and 11 deletions

View File

@ -506,6 +506,17 @@ proc runQueueProcessingLoop*(self: ref BlockProcessor) {.async.} =
# TODO detect have-TTD-but-not-is_execution_block case, and where
# execution payload was non-zero when TTD detection more reliable
when true:
# When an execution engine returns an error or fails to respond to a
# payload validity request for some block, a consensus engine:
# - MUST NOT optimistically import the block.
# - MUST NOT apply the block to the fork choice store.
# - MAY queue the block for later processing.
# https://github.com/ethereum/consensus-specs/blob/v1.2.0-rc.3/sync/optimistic.md#execution-engine-errors
template reEnqueueBlock: untyped =
await sleepAsync(chronos.seconds(1))
self[].addBlock(
blck.src, blck.blck, blck.resfut, blck.validationDur)
try:
# Minimize window for Eth1 monitor to shut down connection
await self.consensusManager.eth1Monitor.ensureDataProvider()
@ -520,22 +531,14 @@ proc runQueueProcessingLoop*(self: ref BlockProcessor) {.async.} =
let executionPayloadStatus = await newExecutionPayload(
self.consensusManager.eth1Monitor, executionPayload)
if executionPayloadStatus.isNone:
# https://github.com/ethereum/consensus-specs/blob/v1.2.0-rc.3/sync/optimistic.md#execution-engine-errors
if not blck.resfut.isNil:
blck.resfut.complete(
Result[void, BlockError].err(BlockError.MissingParent))
reEnqueueBlock()
continue
executionPayloadStatus.get
except CatchableError as err:
error "runQueueProcessingLoop: newPayload failed",
err = err.msg
# https://github.com/ethereum/consensus-specs/blob/v1.2.0-rc.3/sync/optimistic.md#execution-engine-errors
if not blck.resfut.isNil:
blck.resfut.complete(
Result[void, BlockError].err(BlockError.MissingParent))
error "runQueueProcessingLoop: newPayload failed", err = err.msg
reEnqueueBlock()
continue
else:
debug "runQueueProcessingLoop: got execution payload before TTD"