Commit Graph

32 Commits

Author SHA1 Message Date
Eugene Kabanov c8b50765cf
Various fixes for VC and BN REST server. (#4673)
* Fix issue when VC unable to detect errors properly and act accordingly.
Switch all API functions used by VC to RestPlainResponse, this allows us to print errors returned by BN servers.

* Fix issue when prepareBeaconCommitteeSubnet() do not perform actions when BN is optimistically synced only.

* Fix Defect issue.

* Fix submit/publish returning `false` when operation was successful.

* Address review comments.

* Fix some client calls unable to receive `execution_optimistic` field, mark BN as OptSynced when such request has been made.

* Adjust warning levels.

---------

Co-authored-by: Jacek Sieka <jacek@status.im>
2023-03-03 20:20:01 +00:00
Eugene Kabanov 08b6bb7a6b
VC: Use not-synced/opt-synced BNs. (#4635)
* Initial commit.

* Address review comments and recommendations.

* Fix too often `Execution client not in sync` messages in logs.

* Add failure reason for duties requests.

* Add more reasons to every place of ValidatorApiError.

* Address race condition issue.

* Remove `vc` argument for getFailureReason().
2023-02-23 00:11:00 +00:00
Jacek Sieka 83f9745df1
restore doppelganger check on connectivity loss (#4616)
* restore doppelganger check on connectivity loss

https://github.com/status-im/nimbus-eth2/pull/4398 introduced a
regression in functionality where doppelganger detection would not be
rerun during connectivity loss. This PR reintroduces this check and
makes some adjustments to the implementation to simplify the code flow
for both BN and VC.

* track when check was last performed for each validator (to deal with
late-added validators)
* track when we performed a doppel-detectable activity (attesting) so as
to avoid false positives
* remove nodeStart special case (this should be treated the same as
adding a validator dynamically just after startup)

* allow sync committee duties in doppelganger period

* don't trigger doppelganger when registering duties

* fix crash when expected index response is missing

* fix missing slashingSafe propagation
2023-02-20 13:28:56 +02:00
Jacek Sieka d63179ab57
vc: avoid npe for unknown validator (#4632)
follow-up for #4590
2023-02-16 13:27:10 +00:00
Jacek Sieka 856fcea8d7
fix slow checking of unknown validators (#4590)
We do a linear scan of all pubkeys for each validator and slot - this
becomes expensive with large validator counts.

* normalise BN/VC validator startup logging
* fix crash when host cannot be resolved while adding remote validator
* silence repeated log spam for unknown validators
* print pubkey/index/activation mapping on startup/validator
identification
2023-02-07 14:53:36 +00:00
Eugene Kabanov 08ed8ad43e
Adopt BN and VC header sizes and requirements, to avoid users confusion with default configuration options. (#4556)
Add comments.
2023-01-26 16:00:10 +01:00
Jacek Sieka 6e2a02466e
unify bn/vc doppelganger detection (#4398)
* fix REST liveness endpoint responding even when gossip is not enabled
* fix VC exit code on doppelganger hit
* fix activation epoch not being updated correctly on long deposit
queues
* fix activation epoch being set incorrectly when updating validator
* move most implementation logic to `validator_pool`, add tests
* ensure consistent logging between VC and BN
* add docs
2022-12-09 17:05:55 +01:00
zah d07113767d
Bugfix: The VC was producing invalid sync committee contributions (#4343)
Since the sync committee duties are no longer updated on every slot
and previously the sync committee aggregators selection proofs were
generated during the duties update, this now resulted in the client
using stale selection proofs (they must be generated at each slot).

The fix consists of moving the selection proof generation logic in
a different function which is properly executed on each slot.

Other changes:

* The logtrace tool has been enhanced with a framework for adding
  new simpler log aggregation and analysis algorithms.
  The default CI testnet simulation will now ensure that the blocks
  in the network have reasonable sync committee participation.
2022-11-24 09:46:35 +02:00
Eugene Kabanov a2fc515db0
Address #4290. (#4333) 2022-11-17 13:09:20 +02:00
Eugene Kabanov 8417b7e064
Sync committee subscription fixes. (#4281)
* Preparing code.

* Fix prepareXXX procedures to use `onceToAll` strategy only.

* Remove lighthouse like subscription code.

* Address review comments.
2022-11-03 20:23:33 +01:00
Eugene Kabanov 00f083785d
VC: Fix for #4116 (external block builders support) (#4260) 2022-10-29 11:00:51 +02:00
Eugene Kabanov 367e7052f4
VC: some fixes (#4240)
* Skip doppelganger protection for validators which activated just now or in future.

* Fix sync committee duties spam issue.

* Optimize sync committee duties logging statements.

* Fix missing lazyWait.

* Add short path.

* Address #4087.

* Add missing watch for crash.
2022-10-21 16:53:30 +02:00
Eugene Kabanov 805a12e467
VC: Fix doppelganger protection never allow attestations. (#4236)
* Fix doppelganger protection reorders validator indices in response issue.

* Add chronos metrics endpoint to nimbus REST API.

* Doppelganger protection now works on duties not on attestations.
Improve logging for doppelganger and indices.

* Improve doppelganger and indices logging.

* Add number of validators to logs.

* Move logging dumps from `debug` to `trace` level.
2022-10-14 14:19:17 +02:00
Eugene Kabanov eea13ee5ed
VC: roles & strategies. (#4113)
* Initial commit.

* Roles changes.

* Fix all the compilation issues.

* Add beacon node roles.
Add loop for firstSuccessParallel().

* Remove unused variables.
2022-09-29 09:57:14 +02:00
zah b1ac9c9fe4
Fix a potential segfault and various potential stalls (#4003)
* Fixes a segfault during block production when the Keymanager API
  is disabled. The Keymanager is now disabled on half of the local
  testnet nodes to catch such problems in the future.

* Fixes multiple potential stalls from REST requests being done
  without a timeout. From practice, we know that such requests
  can hang forever if not cancelled with a timeout. At best,
  this would be a resource leak, at worst, it may lead to a
  full stall of the client and missed validator duties.

* Changes some Options usages to Opt (for easier use of valueOr)
2022-08-19 21:51:30 +00:00
zah fca20e08d6
Keymanager API for the validator client (#3976)
* Keymanager API for the validator client
* Properly treat the 'description' field as optional when loading Keystores
* Spec-compliant serialization of the slashing data in Keymanager's DeleteKeys response ()

Fixes #3940
Fixes #3964
Closes #3884 by adding test
2022-08-19 13:30:07 +03:00
Eugene Kabanov ce9e50e275
VC: metrics (#3915)
* Initial commit.
Enable MetricsHttpServerRef and configuration.

* Add metrics.

* Add headers.
Add compilation issue fixes.
2022-07-29 11:36:20 +03:00
Eugene Kabanov c3d3397843
VC: doppelganger protection (#3877)
* Improve fallback_service.

* Improve logging in fallback_service.

* Apply signal handling for all stages.

* Fix some logging statements.

* Add doppelganger REST api endpoint.
Add some structures to VC.

* Add client API call implementation.

* Initial fix & refactor onceToAll()
Add doppelganger service.
Add doppelganger helpers.

* Add doppelganger checks.

* Move doppelganger log messages to higher levels.

* Fix firstSuccess().

* Bump chronos.

* Post rebase fixes.

* Proper chronos bump.

* Address review comments.

* Attempt to fix finalization test issue.

* Fix nimbus_signing_node.

* Mark validators which are added at GENESIS_SLOT in GENESIS_EPOCH as passed doppelganger validation.

* Do not send empty requests to server.

* Fix log statement.

* Address review comments and re-raise cancellations.

Co-authored-by: zah <zahary@gmail.com>
2022-07-21 19:54:07 +03:00
Eugene Kabanov d4bafdf5a4
VC: cancellation hot-fixes. (#3875)
* Fix cancellation issues.
* Add exitEvent which will allow gracefully shutdown validator client.
* Fix firstSuccessTimeout() template.
* Fix service names.
* Modify waitOnlineNodes to include timeout parameter.
2022-07-15 00:11:25 +03:00
zah 2bd5d03743
Fix a VC crash observed in the local_testnet simulation (#3872)
It's not quite clear why this condition was triggered in the local
simulation, but it seems a viable scenario after the Keymanager API
is integrated in the validator client.

The user can temporarily remove all validator keys from a running
client before adding another set of keys.
2022-07-14 21:48:04 +03:00
Eugene Kabanov 263a2ffa14
Validator client various fixes. (#3840)
* Improve fallback_service.
* Fix nextAction negative time issue.
* Improve logging in fallback_service.
* Improve logging in sync_committee_service.
* Prepare all services for cancellation.
* Signals handlers for validator client
* Address #3800

Co-authored-by: Zahary Karadjov <zahary@gmail.com>
2022-07-13 17:43:57 +03:00
Jacek Sieka c145916414
cleanups (#3819)
* avoid circular panda imports
* move deposit merkleization helpers to spec/
* normalize validator signature helpers to spec names / params
* remove redundant functions for remote signing
2022-06-29 18:53:59 +02:00
Eugene Kabanov beaf05b8d1
Fix assertion crash in pollForSyncCommitteeDuties() because forks schedule is not available yet. (#3791)
Fix syncCommitteeeDutiesLoop() not being restarted on crash.
2022-06-23 03:59:37 +00:00
zah a2ba34f686
Implement all sync committee duties in the validator client (#3583)
Other changes:

* logtrace can now verify sync committee messages and contributions
* Many unnecessary use of pairs() have been removed for consistency
* Map 40x BN response codes to BeaconNodeStatus.Incompatible in the VC
2022-05-10 10:03:40 +00:00
Eugene Kabanov 3a80b9951c
VC: Fix forks handling. (#3389)
* Trying to debug the finalization issue.

* Add debug logs to understand signature issue.

* Remove all the debugging helpers.

* Initial commit.

* Address review comments.

* Remove unneeded checks for empty fork schedule.

* Fix bellatrix ExecutionAddress serialization/deserialization procedures.
2022-02-16 12:31:23 +01:00
Zahary Karadjov 54d0d588b1 Implementation of the Keymanager API (BETA)
https://github.com/ethereum/keymanager-APIs
2022-01-04 18:51:45 +02:00
tersec 6ef3834f4a
fix type-conversions-to-self, unexport from nimbus_beacon_node, and rm unused vars/procs (#3211) 2021-12-20 12:21:17 +01:00
tersec 8d7df05f2e
Revert "Bump chronos and presto. (#3159)" (#3160)
This reverts commit 4c90b82d9f.
2021-12-04 15:52:28 +00:00
Eugene Kabanov 4c90b82d9f
Bump chronos and presto. (#3159)
* Add some indicators to help fixing issue.

* Bump presto to help debugging.

* Fix compilation problems in presto.

* Fix SIGSEGV.

* Bump latest changes in chronos and presto.
Fix rare cases in validator_client.

* Use proper commits for chronos and presto.
2021-12-04 14:26:16 +00:00
Eugene Kabanov e62c7c7c37
Remote signing client/server. (#3077) 2021-11-30 03:20:21 +02:00
Eugene Kabanov f0c30e31b4
VC: various fixes (#2730)
* Fix firstSuccess() template missing timeouts.

* Fix validator race condition.
Fix logs to be compatible with beacon_node logs.
Add CatchableError handlers to avoid crashes.
Move some logs from Notice to Debug level.
Fix some [unused] warnings.

* Fix block proposal issue for slots in the past and from the future.

* Change sent to published.

* Address review comments #1.
2021-07-19 14:31:02 +00:00
Eugene Kabanov 3b6f4fab4a
New validator client using REST API. (#2651)
* Initial commit.

* Exporting getConfig().

* Add beacon node checking procedures.

* Post rebase fixes.

* Use runSlotLoop() from nimbus_beacon_node.
Fallback implementation.
Fixes for ETH2 REST serialization.

* Add beacon_clock.durationToNextSlot().
Move type declarations from beacon_rest_api to json_rest_serialization.
Fix seq[ValidatorIndex] serialization.
Refactor ValidatorPool and add some utility procedures.
Create separate version of validator_client.

* Post-rebase fixes.
Remove CookedPubKey from validator_pool.nim.

* Now we should be able to produce attestations and aggregate and proofs.
But its not working yet.

* Debugging attestation sending.

* Add durationToNextAttestation.
Optimize some debug logs.
Fix aggregation_bits encoding.
Bump chronos/presto.

* Its alive.

* Fixes for launch_local_testnet script.
Bump chronos.

* Switch client API to not use `/api` prefix.

* Post-rebase adjustments.

* Fix endpoint for publishBlock().

* Add CONFIG_NAME.
Add more checks to ensure that beacon_node is compatible.

* Add beacon committee subscription support to validator_client.

* Fix stacktrace should be an array of strings.
Fix committee subscriptions should not be `data` keyed.

* Log duration to next block proposal.

* Fix beacon_node_status import.

* Use jsonMsgResponse() instead of jsonError().

* Fix graffityBytes usage.
Remove unnecessary `await`.
Adjust creation of SignedBlock instance.
Remove legacy files.

* Rework durationToNextSlot() and durationToNextEpoch() to use `fromNow`.

* Fix race condition for block proposal and attestations for same slot.
Fix local_testnet script to properly kill tasks on Windows.
Bump chronos and nim-http-tools, to allow connections to infura.io (basic auth).

* Catch services errors.
Improve performance of local_testnet.sh script on Windows.
Fix race condition when attestation producing.

* Post-rebase fixes.

* Bump chronos and presto.

* Calculate block publishing delay.
Fix pkill in one more place.

* Add error handling and timeouts to firstSuccess() template.
Add onceToAll() template.
Add checkNodes() procedure.
Refactor firstSuccess() template.
Add error checking to api.nim calls.

* Deprecated usage onceToAll() for better stability.
Address comment and send attestations asap.

* Avoid unnecessary loop when calculating minimal duration.
2021-07-13 13:15:07 +02:00