nimbus-eth2/docs/the_nimbus_book/src/validator-monitor.md

# Validator monitoring

The validator monitoring feature allows for tracking the life-cycle and performance of one or more validators in detail.

Monitoring can be carried out for any validator, with slightly more detail for validators that are running in the same beacon node.

Every time the validator performs a duty, the duty is recorded and the monitor keeps track of the reward-related events for having performed it. For example:

* When attesting, the attestation is added to an aggregate, then a block, before a reward is applied to the state
* When performing sync committee duties, likewise

Validator actions can be traced either through logging, or comprehensive metrics that allow for creating alerts in monitoring tools.

The metrics are broadly compatible with [Lighthouse](https://lighthouse-book.sigmaprime.io/validator-monitoring.html), thus dashboards and alerts can be used with either client with minor adjustments.

## Enabling validator monitoring

The monitor can be enabled either for all keys that are used with a particular beacon node, or for a specific list of validators, or both.

```
# Enable automatic monitoring of all validators used with this beacon node
./run-mainnet-beacon-node.sh --validator-monitor-auto

# Enable monitoring of one or more specific validators
./run-mainnet-beacon-node.sh \
  --validator-monitor-pubkey=0xa1d1ad0714035353258038e964ae9675dc0252ee22cea896825c01458e1807bfad2f9969338798548d9858a571f7425c \
  --validator-monitor-pubkey=0xb2ff4716ed345b05dd1dfc6a5a9fa70856d8c75dcc9e881dd2f766d5f891326f0d10e96f3a444ce6c912b69c22c6754d

# Publish metrics as totals for all monitored validators instead of each validator separately - used for limiting the load on metrics when monitoring many validators
./run-mainnet-beacon-node.sh --validator-monitor-totals
```

## Understanding monitoring

When a validator performs a duty, such as signing an attestation or a sync committee message, this is broadcast to the network. Other nodes pick it up and package the message into an aggregate and later a block. The block is included in the canonical chain and a reward is given two epochs (~13 minutes) later.

The monitor tracks these actions and will log each step at the `INF` level. If any step is missed, a `NOT` log is shown instead.

The typical lifecycle of an attestation might look something like the following:

```
INF 2021-11-22 11:32:44.228+01:00 Attestation seen                           topics="val_mon" attestation="(aggregation_bits: 0b0000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, data: (slot: 2656363, index: 11, beacon_block_root: \"bbe7fc25\", source: \"83010:a8a1b125\", target: \"83011:6db281cd\"), signature: \"b88ef2f2\")" src=api epoch=83011 validator=b93c290b
INF 2021-11-22 11:32:51.293+01:00 Attestation included in aggregate          topics="val_mon" aggregate="(aggregation_bits: 0b1111111101011111001101111111101100111111110100111011111110110101110111111010111111011101111011101111111111101111100001111111100111, data: (slot: 2656363, index: 11, beacon_block_root: \"bbe7fc25\", source: \"83010:a8a1b125\", target: \"83011:6db281cd\"), signature: \"8576b3fc\")" src=gossip epoch=83011 validator=b93c290b
INF 2021-11-22 11:33:07.193+01:00 Attestation included in block              attestation_data="(slot: 2656364, index: 9, beacon_block_root: \"c7761767\", source: \"83010:a8a1b125\", target: \"83011:6db281cd\")" block_slot=2656365 inclusion_lag_slots=0 epoch=83011 validator=b65b6e1b
```

The lifecycle of a particular message can be traced by following the `epoch=.... validator=...` fields in the message.

Failures at any point are recorded at a higher logging level, such as `NOT`(ice):

```
NOT 2021-11-17 20:53:42.108+01:00 Attestation failed to match head           topics="chaindag" epoch=81972 validator=...
```

Failures are reported with a lag of two epochs (~13 minutes) - to examine the log for potential root causes, the logs from the epoch in the failure message should be looked at.

!!! warning
    It should be noted that metrics are tracked for the current history - in the case of a reorg on the chain - in particular a deep reorg - no attempt is made to revisit previously reported values. In the case that finality is delayed, the risk of stale metrics increases.

Likewise, many metrics, such as aggregation inclusion, reflect conditions on the network - it may happen that the same message is counted more than once under certain conditions.

## Monitoring metrics

The full list of metrics supported by the validator monitoring feature can be seen in the [source code](https://github.com/status-im/nimbus-eth2/blob/unstable/beacon_chain/validators/validator_monitor.nim) or by examining the metrics output:

```
curl -s localhost:8008/metrics | grep HELP.*validator_
```
Validator monitoring (#2925) Validator monitoring based on and mostly compatible with the implementation in Lighthouse - tracks additional logs and metrics for specified validators so as to stay on top on performance. The implementation works more or less the following way: * Validator pubkeys are singled out for monitoring - these can be running on the node or not * For every action that the validator takes, we record steps in the process such as messages being seen on the network or published in the API * When the dust settles at the end of an epoch, we report the information from one epoch before that, which coincides with the balances being updated - this is a tradeoff between being correct (waiting for finalization) and providing relevant information in a timely manner) 2021-12-20 19:20:31 +00:00			`# Validator monitoring`

book updates (min viable changes prior to release) (#3283) * min viable changes * make clear keymanager api is not ready for mainnet * Update docs/the_nimbus_book/src/keymanager-api.md Co-authored-by: Jacek Sieka <jacek@status.im> * Update docs/the_nimbus_book/src/keymanager-api.md Co-authored-by: zah <zahary@gmail.com> Co-authored-by: Jacek Sieka <jacek@status.im> Co-authored-by: zah <zahary@gmail.com> 2022-01-14 12:37:21 +00:00			`The validator monitoring feature allows for tracking the life-cycle and performance of one or more validators in detail.`

Validator monitor polish (#3569) * lower "Previous epoch attestation missing" to `NOTICE` for easier filtering * add delay logging to validator monitor logs * simplify delay logging code post-`BeaconTime` 2022-04-06 09:23:01 +00:00			`Monitoring can be carried out for any validator, with slightly more detail for validators that are running in the same beacon node.`
Validator monitoring (#2925) Validator monitoring based on and mostly compatible with the implementation in Lighthouse - tracks additional logs and metrics for specified validators so as to stay on top on performance. The implementation works more or less the following way: * Validator pubkeys are singled out for monitoring - these can be running on the node or not * For every action that the validator takes, we record steps in the process such as messages being seen on the network or published in the API * When the dust settles at the end of an epoch, we report the information from one epoch before that, which coincides with the balances being updated - this is a tradeoff between being correct (waiting for finalization) and providing relevant information in a timely manner) 2021-12-20 19:20:31 +00:00
			`Every time the validator performs a duty, the duty is recorded and the monitor keeps track of the reward-related events for having performed it. For example:`

			`* When attesting, the attestation is added to an aggregate, then a block, before a reward is applied to the state`
			`* When performing sync committee duties, likewise`

val-mon: remove redundant `_total` suffix from counters It turns out nim-metrics adds this suffix on its own - it also turns out some of the names are non-conventional and need follow-up. 2022-01-31 13:02:38 +00:00			`Validator actions can be traced either through logging, or comprehensive metrics that allow for creating alerts in monitoring tools.`

Validator monitor polish (#3569) * lower "Previous epoch attestation missing" to `NOTICE` for easier filtering * add delay logging to validator monitor logs * simplify delay logging code post-`BeaconTime` 2022-04-06 09:23:01 +00:00			`The metrics are broadly compatible with [Lighthouse](https://lighthouse-book.sigmaprime.io/validator-monitoring.html), thus dashboards and alerts can be used with either client with minor adjustments.`
Validator monitoring (#2925) Validator monitoring based on and mostly compatible with the implementation in Lighthouse - tracks additional logs and metrics for specified validators so as to stay on top on performance. The implementation works more or less the following way: * Validator pubkeys are singled out for monitoring - these can be running on the node or not * For every action that the validator takes, we record steps in the process such as messages being seen on the network or published in the API * When the dust settles at the end of an epoch, we report the information from one epoch before that, which coincides with the balances being updated - this is a tradeoff between being correct (waiting for finalization) and providing relevant information in a timely manner) 2021-12-20 19:20:31 +00:00
			`## Enabling validator monitoring`

			`The monitor can be enabled either for all keys that are used with a particular beacon node, or for a specific list of validators, or both.`

			```
			`# Enable automatic monitoring of all validators used with this beacon node`
			`./run-mainnet-beacon-node.sh --validator-monitor-auto`

			`# Enable monitoring of one or more specific validators`
			`./run-mainnet-beacon-node.sh \`
			`--validator-monitor-pubkey=0xa1d1ad0714035353258038e964ae9675dc0252ee22cea896825c01458e1807bfad2f9969338798548d9858a571f7425c \`
			`--validator-monitor-pubkey=0xb2ff4716ed345b05dd1dfc6a5a9fa70856d8c75dcc9e881dd2f766d5f891326f0d10e96f3a444ce6c912b69c22c6754d`

			`# Publish metrics as totals for all monitored validators instead of each validator separately - used for limiting the load on metrics when monitoring many validators`
			`./run-mainnet-beacon-node.sh --validator-monitor-totals`
			```

			`## Understanding monitoring`

			`When a validator performs a duty, such as signing an attestation or a sync committee message, this is broadcast to the network. Other nodes pick it up and package the message into an aggregate and later a block. The block is included in the canonical chain and a reward is given two epochs (~13 minutes) later.`

			The monitor tracks these actions and will log each step at the `INF` level. If any step is missed, a `NOT` log is shown instead.

			`The typical lifecycle of an attestation might look something like the following:`

			```
			`INF 2021-11-22 11:32:44.228+01:00 Attestation seen topics="val_mon" attestation="(aggregation_bits: 0b0000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, data: (slot: 2656363, index: 11, beacon_block_root: \"bbe7fc25\", source: \"83010:a8a1b125\", target: \"83011:6db281cd\"), signature: \"b88ef2f2\")" src=api epoch=83011 validator=b93c290b`
			`INF 2021-11-22 11:32:51.293+01:00 Attestation included in aggregate topics="val_mon" aggregate="(aggregation_bits: 0b1111111101011111001101111111101100111111110100111011111110110101110111111010111111011101111011101111111111101111100001111111100111, data: (slot: 2656363, index: 11, beacon_block_root: \"bbe7fc25\", source: \"83010:a8a1b125\", target: \"83011:6db281cd\"), signature: \"8576b3fc\")" src=gossip epoch=83011 validator=b93c290b`
			`INF 2021-11-22 11:33:07.193+01:00 Attestation included in block attestation_data="(slot: 2656364, index: 9, beacon_block_root: \"c7761767\", source: \"83010:a8a1b125\", target: \"83011:6db281cd\")" block_slot=2656365 inclusion_lag_slots=0 epoch=83011 validator=b65b6e1b`
			```

			The lifecycle of a particular message can be traced by following the `epoch=.... validator=...` fields in the message.

			Failures at any point are recorded at a higher logging level, such as `NOT`(ice):

			```
			`NOT 2021-11-17 20:53:42.108+01:00 Attestation failed to match head topics="chaindag" epoch=81972 validator=...`
			```

Validator monitor polish (#3569) * lower "Previous epoch attestation missing" to `NOTICE` for easier filtering * add delay logging to validator monitor logs * simplify delay logging code post-`BeaconTime` 2022-04-06 09:23:01 +00:00			`Failures are reported with a lag of two epochs (~13 minutes) - to examine the log for potential root causes, the logs from the epoch in the failure message should be looked at.`

Migrate docs to mkdocs (#3900) `mkdocs` works with markdown similar to `mdbook` but is generally more pleasing to the eye and has several nice UX features. This PR does the bulk of the transition - likely, a followup would be needed to fully make use of the extra features and navigation. Book pages have been kept url-compatible, meaning that for the most part, old links should continue to work! Co-authored-by: Etan Kissling <etan@status.im> 2022-07-22 19:47:24 +00:00			`!!! warning`
			`It should be noted that metrics are tracked for the current history - in the case of a reorg on the chain - in particular a deep reorg - no attempt is made to revisit previously reported values. In the case that finality is delayed, the risk of stale metrics increases.`
Validator monitoring (#2925) Validator monitoring based on and mostly compatible with the implementation in Lighthouse - tracks additional logs and metrics for specified validators so as to stay on top on performance. The implementation works more or less the following way: * Validator pubkeys are singled out for monitoring - these can be running on the node or not * For every action that the validator takes, we record steps in the process such as messages being seen on the network or published in the API * When the dust settles at the end of an epoch, we report the information from one epoch before that, which coincides with the balances being updated - this is a tradeoff between being correct (waiting for finalization) and providing relevant information in a timely manner) 2021-12-20 19:20:31 +00:00
			`Likewise, many metrics, such as aggregation inclusion, reflect conditions on the network - it may happen that the same message is counted more than once under certain conditions.`

			`## Monitoring metrics`

			`The full list of metrics supported by the validator monitoring feature can be seen in the [source code](https://github.com/status-im/nimbus-eth2/blob/unstable/beacon_chain/validators/validator_monitor.nim) or by examining the metrics output:`

			```
			`curl -s localhost:8008/metrics \| grep HELP.*validator_`
			```