Validator monitoring based on and mostly compatible with the
implementation in Lighthouse - tracks additional logs and metrics for
specified validators so as to stay on top on performance.
The implementation works more or less the following way:
* Validator pubkeys are singled out for monitoring - these can be
running on the node or not
* For every action that the validator takes, we record steps in the
process such as messages being seen on the network or published in the
API
* When the dust settles at the end of an epoch, we report the
information from one epoch before that, which coincides with the
balances being updated - this is a tradeoff between being correct
(waiting for finalization) and providing relevant information in a
timely manner)
* batch-verify sync messages for a small perf boost
Generally reuses the same structure as attestation and aggregate
verification
* normalize `signatures` and `signature_batch` to use the same pattern
of verification
* normalize parameter names, order etc for signature stuff in general
* avoid calling `blsSign` directly - instead, go through `signatures`
consistently
This PR fixes two issues with block publishing:
* Gossip-valid blocks are published before integrating them into the
chain, giving broadcasting a head start, both for rest block and
* Outright invalid blocks from the API that could lead to the descoring
of the node are no longer broadcast
Bonus:
* remove undocumented and duplicated `post_v1_validator_block` JSON-RPC
call
Validator clients such as Vouch can be configured to work with multiple
beacon nodes simultaneously. In this configuration, the validator client
will try to broadcast the gossip messages through each of the connected
beacon nodes which may lead to a situation where some of the nodes see a
message arriving from the network before it arrives through the REST API.
This should not be considered an error and the beacon node should still
broadcast the message as the intented purpose of the Vouch strategy is
to ensure that the message will reach as many peers as possible.
Renames and cleanups split out from the validator monitoring branch, so
as to reduce conflict area vs other PR:s
* add constants for expected message timing
* name validators after the messages they validate, mostly, to make
grepping easier
* unify field naming of EpochInfo across forks to make cross-fork code
easier
Currently, we don't have a good answer to the question "are we synced
yet" - the sync manager syncs based on the peers it's connected to, but
just because some peer looks like it should be synced from doesn't mean
we're out of sync.
Instead, we use a very silly time-based heuristic - the problem with
that is that the network can go into a rut where nobody produces blocks
- better heuristics would be needed here, but in the meantime, a command
line option can get us out of a tight spot - this PR places such an
option in the client, in the unlikely event it should be needed (most
likely in a testnet).
* ncli_db: add putState, putBlock
These tools allow modifying an existing nimbus database for the purpose
of recovery or reorg, moving the head, tail and genesis to arbitrary
points.
* remove potentially expensive `putState` in `BeaconStateDB`
* introduce `latest_block_root` which computes the root of the latest
applied block from the `latest_block_header` field (instead of passing
it in separately)
* avoid some unnecessary BeaconState copies during init
* discover https://github.com/nim-lang/Nim/issues/19094
* prefer `HashedBeaconState` in a few places to avoid recomputing state
root
* fetch latest block root from state when creating blocks
* harden `get_beacon_proposer_index` against invalid slots and document
* move random spec function tests to `test_spec.nim`
* avoid unnecessary state root computation before block proposal
* `SyncCommitteeIndex` -> `SyncSubcommitteeIndex`
* `syncCommitteePeriod` -> `sync_committee_period` (spec spelling)
* tighten period comparisons
* fix assert when validating committee message with non-altair state in
REST api
* register vc duties with subnet tracker
* fix activation logging during startup
* cache slot signature to avoid duplicate signature work
* schedule aggregation duties one slot at a time to avoid CPU spike at
each epoch
* lower aggregation subnet pre-subscription time to 4 slots (lowers
bandwidth and CPU usage)
* update stability subnets in ENR on startup
* log gossip state
* perform gossip subscriptions just before the next slot starts
* document stuff
* add random include
* don't overwrite subscription state when not subscribed
* log target gossip state
* updating gossip status once is enough
* add test
* remove syncQueueLen - this one is not updated at the end of the sync
and may cause gossip to disconnect itself completely - use a simple head
distance instead
* fix gossip disconnection - if in hysteresis, node.gossipState will be
set to disabled even though we don't disable topic subscriptions
* fix extra duty registration call
So far, `withState` and `withBlck` templates could only be used to have
convenience access to fork-agnostic BeaconState and BeaconBlock fields.
This patch:
- injects an additional `stateFork` constant that allows to use
`when` expressions to also access Altair and Merge-specific fields.
- introduces a `withStateAndBlck` template to support operating on both
a `BeaconState` and `BeaconBlock` at a time.
- makes sync committee related functions Merge aware.
- changes a couple if-else trees for forks into case statements so that
forgotten future forks are promoted to compile-time errors.
There are a number of locations in the code that get attestations on a
forked beacon state. For attestation pools test, a convenience wrapper
was available to reduce clutter. This patch integrates that wrapper into
the core component so that it can also take advantage of the wrapper.
The initialization of a `SyncAggregate` to its default value is not very
intuitive. There is an `init` function in `sync_committee_msg_pool` that
provides a convenience wrapper. This patch exports that initializer so
that the rest of the code base can also take advantage of it.
The phase0 and altair overloads of `makeBeaconBlock` slightly differ in
their signatures which makes using them unnecessarily verbose.
- A placeholder `sync_aggregate` argument similar to `executionPayload`
is added to the phase0 overload to match the altair signature.
- A wrapper operating on `ForkedHashedBeaconState` is introduced.
* Placing callbacks into strategic places.
* Initial events call implementation.
* Post rebase fixes.
* Change addSyncContribution() implementation.
* Add `attestation-sent` event.
Remove gcsafe, raises from callbacks implementations.
Move `attestation-received` fire at the end of attestation processing.
* Address review comments.
Other changes:
* Add server getBlockV2(), and produceBlockV2().
* Add getBlockV2() to REST test suite.
* Add client getBlockV2(), and produceBlockV2().
* Fix URLs in comments.
* Add some primitives and fix some issues in forks.nim.
* Switch `validator_client` to V2 calls usage.
* Bump `chronos` with imports fixes.
* Bump `nim-json-serialization` for `requireAllFields`.
* cleanups
* use ForkedTrustedSignedBeaconBlock.ionit where appropriate
* move `is_aggregator` to `spec/`
* use `errReject` in a few more places
* update enr fork id when time is auspicious
* use network broadcast functions
* Return Ignore for aggregate signature validation timeouts
...consistently between aggregates and attestations.
* clean up some more reject/ignore rules
* shorten texts a bit
* errReject->checkedReject, use err helpers throughout
* get rid of quarantine in exitpool as well
* Fix getForkSchedule call.
Create cache of all configuration endpoints at node startup.
Add prepareJsonResponse() call to create cached responses.
Mark all procedures with `raises`.
* Add getForkSchedule to VC.
Fix getForkSchedule return type for API.
More `raises` annotations.
Fix VC fork_service.nim.
* Use `push raises` instead of inline `raises`.
* Improvements for REST API aggregated attestations and attestations processing.
* Rename eth2_network.sendXXX procedures to eth2_network.broadcastXXX.
Add broadcastBeaconBlock() and broadcastAggregateAndProof().
Fix links to specification in REST API declarations.
Add implementation for v2 getStateV2().
Add validator_duties.sendXXX procedures which not only broadcast data, but also validate it.
Fix JSON-RPC/REST to use new validator_duties.sendXXX procedures instead of own implementations.
* Fix validator_client online nodes count incorrect value.
Fix aggregate and proof attestation could be sent too late.
* Adding timeout for block wait in attestations processing.
Fix compilation errors.
* Attempt to debug aggregate and proofs.
* Fix Beacon AIP to use `sendAttestation`.
Add link comment to produceBlockV2.
* Add debug logs before publish operation for blocks, attestations and aggregated attestations.
Fix attestations publishing issue.
* logging fixes
`indexInCommnittee` already logged in attestation
Co-authored-by: Jacek Sieka <jacek@status.im>
* send attestations and exit messages on fork-appropriate topic
* document why use wall clock over attestation slot
* centralize some fork-topic-picking-logic in eth2_network
* pick up new test in summary
* allow specified GetTimeFn for testing purposes
* add GenesisTime and use it in eth2_network
* replace GetTimeFn and GenesisTime with GetBeaconTimeFn
* reorganize ssz dependencies
This PR continues the work in
https://github.com/status-im/nimbus-eth2/pull/2646,
https://github.com/status-im/nimbus-eth2/pull/2779 as well as past
issues with serialization and type, to disentangle SSZ from eth2 and at
the same time simplify imports and exports with a structured approach.
The principal idea here is that when a library wants to introduce SSZ
support, they do so via 3 files:
* `ssz_codecs` which imports and reexports `codecs` - this covers the
basic byte conversions and ensures no overloads get lost
* `xxx_merkleization` imports and exports `merkleization` to specialize
and get access to `hash_tree_root` and friends
* `xxx_ssz_serialization` imports and exports `ssz_serialization` to
specialize ssz for a specific library
Those that need to interact with SSZ always import the `xxx_` versions
of the modules and never `ssz` itself so as to keep imports simple and
safe.
This is similar to how the REST / JSON-RPC serializers are structured in
that someone wanting to serialize spec types to REST-JSON will import
`eth2_rest_serialization` and nothing else.
* split up ssz into a core library that is independendent of eth2 types
* rename `bytes_reader` to `codec` to highlight that it contains coding
and decoding of bytes and native ssz types
* remove tricky List init overload that causes compile issues
* get rid of top-level ssz import
* reenable merkleization tests
* move some "standard" json serializers to spec
* remove `ValidatorIndex` serialization for now
* remove test_ssz_merkleization
* add tests for over/underlong byte sequences
* fix broken seq[byte] test - seq[byte] is not an SSZ type
There are a few things this PR doesn't solve:
* like #2646 this PR is weak on how to handle root and other
dontSerialize fields that "sometimes" should be computed - the same
problem appears in REST / JSON-RPC etc
* Fix a build problem on macOS
* Another way to fix the macOS builds
Co-authored-by: Zahary Karadjov <zahary@gmail.com>
The spec imports are a mess to work with, so this branch cleans them up
a bit to ensure that we avoid generic sandwitches and that importing
stuff generally becomes easier.
* reexport crypto/digest/presets because these are part of the public
symbol set of the rest of the spec types
* don't export `merge` types from `base` - this causes circular deps
* fix circular deps in `ssz/spec_types` - this is the first step in
disentangling ssz from spec
* be explicit about phase0 vs altair - longer term, `altair` will become
the "natural" type set, then merge and so on, so no point in giving
`phase0` special preferential treatment
Simpler module name for stuff that covers forks
* check that runtime config matches database state
* also include some assorted altair cleanups
* use "standard" genesis fork in local testnet to work around missing
runtime config support