status-go/metrics
Andrea Maria Piana b4e5bf417b Handle peer event async in metrics
There might be an issue on how we handle metrics, which causes the p2p
server to hang.

updateNodeMetrics calls  a method on the p2p server, which
blocks until the server is available:

e60f425b45/vendor/github.com/ethereum/go-ethereum/p2p/server.go (L301)
e60f425b45/vendor/github.com/ethereum/go-ethereum/p2p/server.go (L746)
If there's back-pressure on the peer event feed
e60f425b45/vendor/github.com/ethereum/go-ethereum/p2p/server.go (L783)

The event channel above might become while updateNodeMetrics
is called, which means is never consumed, the server blocks on publishing on
it, and the two will deadlock (server waits for the channel above to be consumed,
this code waits for the server to respond to peerCount, which is in the same
event loop).

Calling it in a different go-routine will allow this code to keep
processing peer added events, therefore the server will not lock and keep processing requests.
2021-02-02 07:58:17 +01:00
..
node Handle peer event async in metrics 2021-02-02 07:58:17 +01:00
README.md add README for metrics (#1906) 2020-03-17 22:09:21 +01:00
metrics.go add a simple healtcheck for metrics endpoint 2019-11-04 16:29:14 +01:00

README.md

Description

This package configures Prometheus metrics for the node.

Technical Details

We use a trick to combine our metrics with Geth ones.

The NewMetricsServer() function in metrics.go calls our own Handler() function which in turn calls two handlers:

  • promhttp.HandlerFor() - Our own custom metrics from this package.
  • gethprom.Handler(reg) - Geth metrics defined in metrics

By calling both we can extend existing metrics.

Metrics

We add a few extra metrics on top of the normal Geth ones in node/metrics.go:

  • p2p_peers_count - Current numbers of peers split by name.
  • p2p_peers_absolute - Absolute number of connected peers.
  • p2p_peers_max - Maximum number of peers that can connect.

The p2p_peers_count metrics includes 3 labels:

  • type - Set to StatusIM for mobile and Statusd for daemon.
  • version - Version of status-go, always with the v prefix.
  • platform - Host platform, like android-arm64 or darwin-arm64

The way this data is acquired is using node names, which look like this:

StatusIM/vrelease-0.30.1-beta.2/android-arm/go1.11.5
Statusd/v0.34.0-beta.3/linux-amd64/go1.13.1
Geth/v1.9.9-stable-5aa131ca/linux-amd64/go1.13.3

This 4 segment format is standard for Ethereum as you can see on https://ethstats.net/.

We parse the names using labelsFromNodeName() from node/metrics.go.

Links