5.0 KiB
Monitor nwaku using Prometheus and Grafana
Prerequisites
- A running nwaku instance with HTTP metrics server enabled (i.e. with
--metrics-server:true
) - Prometheus and Grafana installed
Installing Prometheus
Prometheus can be installed by downloading and extracting the latest release for your system distribution from the Prometheus download page.
For example, on a DebianOS distribution you could run
wget https://github.com/prometheus/prometheus/releases/download/v2.38.0/prometheus-2.38.0.linux-amd64.tar.gz
tar xvfz prometheus-2.38.0.linux-amd64.tar.gz
For more advanced installations,
Prometheus has a handy Getting Started page to guide you through the process.
There are also many third party guides on installing Prometheus for specific distributions,
such as this old but still relevant one from DigitalOcean.
We also suggest running Prometheus as a service,
as explained by this guide.
Bear in mind that we'll be creating our own prometheus.yml
configuration file later on when you encounter this in any of the guides.
Installing Grafana
Follow the installation instructions appropriate to your distribution to install Grafana. The stable version of the Grafana Enterprise Edition is the free, recommended edition to install.
Configure Prometheus
- Create a file called
prometheus.yml
with the following content:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'nwaku'
scrape_interval: 1s
static_configs:
- targets: ['localhost:<nwaku_port>']
Replace <nwaku_port>
with the metrics HTTP server port of your running nwaku instance.
For default configurations metrics are reported on port 8008
of the localhost
.
If you've used --ports-shift
, or explicitly set the metrics port using --metrics-server-port
, this port will be different from the default.
It's possible to extract the metrics server port from the startup logs of the nwaku node.
Look for a log with the format below and substitute nwaku_port
with the value reported after serverPort=
:
INF 2022-09-16 12:14:12.739+01:00 Metrics HTTP server started topics="wakunode.setup.metrics" tid=6243 file=wakunode2_setup_metrics.nim:29 serverIp=127.0.0.1 serverPort=8009
- Start Prometheus using the config file you created above:
./path/to/prometheus --config.file=/path/to/prometheus.yml &
- Verify that Prometheus is running correctly.
Once Prometheus is running, it exposes by default a management console on port 9090
.
If you are running Prometheus locally, for example,
you can visit http://localhost:9090/ in a browser to view basic info about the running instance.
http://localhost:9090/targets shows the state of the different metrics server endpoints that we configured in prometheus.yml
.
In our case we'd expect Prometheus to successfully scrape metrics off two endpoints,
the running nwaku instance and Prometheus itself.
Configure Grafana
- Start the Grafana server, if it's not running already after installation.
sudo systemctl start grafana-server
- Open Grafana in your browser.
Grafana exposes its interface by default on port 3000
.
For example, if you are running Grafana locally,
you can find it by navigating to http://localhost:3000/.
If you are prompted for a username and password,
the default is admin
in both cases.
- Set Prometheus as your data source.
These instructions describe how to add a new data source. The default values for setting up a Prometheus data source should be sufficient.
- Create a new dashboard or import an existing one.
You can now visualize metrics off your running nwaku instance by creating a new dashboard and adding panels for the metric(s) of your choice. To get you started, we have published a basic monitoring dashboard for a single nwaku node which you can import to your Grafana instance.
- Happy monitoring!
Some of the most important metrics to keep an eye on include:
libp2p_peers
as an indication of how many peers your node is connected to,waku_node_messages_total
to view the total amount of network traffic relayed by your node andwaku_node_errors
as a rough indication of basic operating errors logged by the node.