* ensure sync duties for next epoch are registered in time
For attestations, VC queries duties for current and next epoch.
For sync messages, VC queries for current and next period (if soon).
This means that for sync messages we don't actually have the duties for
next epoch in all situations, leading to situation where VC may miss
sync duties in the final slot of an epoch when using. As duties remain
same within a sync committee period, simply copy them over to next epoch
to avoid this situation without adding network latency.
* Update beacon_chain/validator_client/duties_service.nim
Co-authored-by: Jacek Sieka <jacek@status.im>
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
VC currently misses sync committee duties for first slot of most epochs
because the 1 slot offset is not taken into account. Duties for the next
slot must be used during any given current slot. We use the correct slot
for processing the duty, but do not use the correct slot for fetching.
* Refactor api.nim to provide more informative failure reasons.
Distinct between unexpected data and unexpected code.
Deprecate Option[T] usage.
* Fix generated reason to not include opt[t].
* Fix 400 for produceBlindedBlock().
Get proper string conversion for strategy.
* Bump copyright years.
* Fix durationToNextSlot() and durationToNextEpoch() to work not only after Genesis, but also before Genesis.
Change VC pre-genesis behavior, add runPreGenesisWaitingLoop() and runGenesisWaitingLoop().
Add checkedWaitForSlot() and checkedWaitForNextSlot() to strictly check current time and print warnings.
Fix VC main loop to use checkedWaitForNextSlot().
Fix attestation_service to run attestations processing only until the end of the duty slot.
Change attestation_service main loop to use checkedWaitForNextSlot().
Change block_service to properly cancel all the pending proposer tasks.
Use checkedWaitForSlot to wait for block proposal.
Fix block_service waitForBlockPublished() to be compatible with BN.
Fix sync_committee_service to avoid asyncSpawn.
Fix sync_committee_service to run only until the end of the duty slot.
Fix sync_committee_service to use checkedWaitForNextSlot().
* Refactor validator logging.
Fix aggregated attestation publishing missing delay.
* Fix doppelganger detection should not start at pre-genesis time.
Fix fallback service sync status spam.
Fix false `sync committee subnets subscription error`.
* Address review comments part 1.
* Address review comments.
* Fix condition issue for near genesis waiting loop.
* Address review comments.
* Address review comments 2.
* Fix issue when VC unable to detect errors properly and act accordingly.
Switch all API functions used by VC to RestPlainResponse, this allows us to print errors returned by BN servers.
* Fix issue when prepareBeaconCommitteeSubnet() do not perform actions when BN is optimistically synced only.
* Fix Defect issue.
* Fix submit/publish returning `false` when operation was successful.
* Address review comments.
* Fix some client calls unable to receive `execution_optimistic` field, mark BN as OptSynced when such request has been made.
* Adjust warning levels.
---------
Co-authored-by: Jacek Sieka <jacek@status.im>
* Initial commit.
* Address review comments and recommendations.
* Fix too often `Execution client not in sync` messages in logs.
* Add failure reason for duties requests.
* Add more reasons to every place of ValidatorApiError.
* Address race condition issue.
* Remove `vc` argument for getFailureReason().
* restore doppelganger check on connectivity loss
https://github.com/status-im/nimbus-eth2/pull/4398 introduced a
regression in functionality where doppelganger detection would not be
rerun during connectivity loss. This PR reintroduces this check and
makes some adjustments to the implementation to simplify the code flow
for both BN and VC.
* track when check was last performed for each validator (to deal with
late-added validators)
* track when we performed a doppel-detectable activity (attesting) so as
to avoid false positives
* remove nodeStart special case (this should be treated the same as
adding a validator dynamically just after startup)
* allow sync committee duties in doppelganger period
* don't trigger doppelganger when registering duties
* fix crash when expected index response is missing
* fix missing slashingSafe propagation
We do a linear scan of all pubkeys for each validator and slot - this
becomes expensive with large validator counts.
* normalise BN/VC validator startup logging
* fix crash when host cannot be resolved while adding remote validator
* silence repeated log spam for unknown validators
* print pubkey/index/activation mapping on startup/validator
identification
* fix REST liveness endpoint responding even when gossip is not enabled
* fix VC exit code on doppelganger hit
* fix activation epoch not being updated correctly on long deposit
queues
* fix activation epoch being set incorrectly when updating validator
* move most implementation logic to `validator_pool`, add tests
* ensure consistent logging between VC and BN
* add docs
Since the sync committee duties are no longer updated on every slot
and previously the sync committee aggregators selection proofs were
generated during the duties update, this now resulted in the client
using stale selection proofs (they must be generated at each slot).
The fix consists of moving the selection proof generation logic in
a different function which is properly executed on each slot.
Other changes:
* The logtrace tool has been enhanced with a framework for adding
new simpler log aggregation and analysis algorithms.
The default CI testnet simulation will now ensure that the blocks
in the network have reasonable sync committee participation.
* Fix doppelganger protection reorders validator indices in response issue.
* Add chronos metrics endpoint to nimbus REST API.
* Doppelganger protection now works on duties not on attestations.
Improve logging for doppelganger and indices.
* Improve doppelganger and indices logging.
* Add number of validators to logs.
* Move logging dumps from `debug` to `trace` level.
* Fixes a segfault during block production when the Keymanager API
is disabled. The Keymanager is now disabled on half of the local
testnet nodes to catch such problems in the future.
* Fixes multiple potential stalls from REST requests being done
without a timeout. From practice, we know that such requests
can hang forever if not cancelled with a timeout. At best,
this would be a resource leak, at worst, it may lead to a
full stall of the client and missed validator duties.
* Changes some Options usages to Opt (for easier use of valueOr)
* Keymanager API for the validator client
* Properly treat the 'description' field as optional when loading Keystores
* Spec-compliant serialization of the slashing data in Keymanager's DeleteKeys response ()
Fixes#3940Fixes#3964Closes#3884 by adding test
It's not quite clear why this condition was triggered in the local
simulation, but it seems a viable scenario after the Keymanager API
is integrated in the validator client.
The user can temporarily remove all validator keys from a running
client before adding another set of keys.
Other changes:
* logtrace can now verify sync committee messages and contributions
* Many unnecessary use of pairs() have been removed for consistency
* Map 40x BN response codes to BeaconNodeStatus.Incompatible in the VC
* Add some indicators to help fixing issue.
* Bump presto to help debugging.
* Fix compilation problems in presto.
* Fix SIGSEGV.
* Bump latest changes in chronos and presto.
Fix rare cases in validator_client.
* Use proper commits for chronos and presto.
* Fix firstSuccess() template missing timeouts.
* Fix validator race condition.
Fix logs to be compatible with beacon_node logs.
Add CatchableError handlers to avoid crashes.
Move some logs from Notice to Debug level.
Fix some [unused] warnings.
* Fix block proposal issue for slots in the past and from the future.
* Change sent to published.
* Address review comments #1.
* Initial commit.
* Exporting getConfig().
* Add beacon node checking procedures.
* Post rebase fixes.
* Use runSlotLoop() from nimbus_beacon_node.
Fallback implementation.
Fixes for ETH2 REST serialization.
* Add beacon_clock.durationToNextSlot().
Move type declarations from beacon_rest_api to json_rest_serialization.
Fix seq[ValidatorIndex] serialization.
Refactor ValidatorPool and add some utility procedures.
Create separate version of validator_client.
* Post-rebase fixes.
Remove CookedPubKey from validator_pool.nim.
* Now we should be able to produce attestations and aggregate and proofs.
But its not working yet.
* Debugging attestation sending.
* Add durationToNextAttestation.
Optimize some debug logs.
Fix aggregation_bits encoding.
Bump chronos/presto.
* Its alive.
* Fixes for launch_local_testnet script.
Bump chronos.
* Switch client API to not use `/api` prefix.
* Post-rebase adjustments.
* Fix endpoint for publishBlock().
* Add CONFIG_NAME.
Add more checks to ensure that beacon_node is compatible.
* Add beacon committee subscription support to validator_client.
* Fix stacktrace should be an array of strings.
Fix committee subscriptions should not be `data` keyed.
* Log duration to next block proposal.
* Fix beacon_node_status import.
* Use jsonMsgResponse() instead of jsonError().
* Fix graffityBytes usage.
Remove unnecessary `await`.
Adjust creation of SignedBlock instance.
Remove legacy files.
* Rework durationToNextSlot() and durationToNextEpoch() to use `fromNow`.
* Fix race condition for block proposal and attestations for same slot.
Fix local_testnet script to properly kill tasks on Windows.
Bump chronos and nim-http-tools, to allow connections to infura.io (basic auth).
* Catch services errors.
Improve performance of local_testnet.sh script on Windows.
Fix race condition when attestation producing.
* Post-rebase fixes.
* Bump chronos and presto.
* Calculate block publishing delay.
Fix pkill in one more place.
* Add error handling and timeouts to firstSuccess() template.
Add onceToAll() template.
Add checkNodes() procedure.
Refactor firstSuccess() template.
Add error checking to api.nim calls.
* Deprecated usage onceToAll() for better stability.
Address comment and send attestations asap.
* Avoid unnecessary loop when calculating minimal duration.