Jacek Sieka c270ec21e4
Validator monitoring (#2925)
Validator monitoring based on and mostly compatible with the
implementation in Lighthouse - tracks additional logs and metrics for
specified validators so as to stay on top on performance.

The implementation works more or less the following way:
* Validator pubkeys are singled out for monitoring - these can be
running on the node or not
* For every action that the validator takes, we record steps in the
process such as messages being seen on the network or published in the
API
* When the dust settles at the end of an epoch, we report the
information from one epoch before that, which coincides with the
balances being updated - this is a tradeoff between being correct
(waiting for finalization) and providing relevant information in a
timely manner)
2021-12-20 20:20:31 +01:00
..

Gossip Processing

This folder holds a collection of modules to:

  • validate raw gossip data before
    • rebroadcasting it (potentially aggregated)
    • sending it to one of the consensus object pools

Validation

Gossip validation is different from consensus verification in particular for blocks.

There are multiple consumers of validated consensus objects:

  • a ValidationResult.Accept output triggers rebroadcasting in libp2p
    • We jump into method validate(PubSub, Message) in libp2p/protocols/pubsub/pubsub.nim
    • which was called by rpcHandler(GossipSub, PubSubPeer, RPCMsg)
  • a blockValidator message enqueues the validated object to the processing queue in block_processor
    • blockQueue: AsyncQueue[BlockEntry] (shared with request_manager and sync_manager)
    • This queue is then regularly processed to be made available to the consensus object pools.
  • a xyzValidator message adds the validated object to a pool in eth2_processor
    • Attestations (unaggregated and aggregated) get collected into batches.
    • Once a threshold is exceeded or after a timeout, they get validated together using BatchCrypto.

Security concerns

As the first line of defense in Nimbus, modules must be able to handle bursts of data that may come:

  • from malicious nodes trying to DOS us
  • from long periods of non-finality, creating lots of forks, attestations