To avoid naming confiusion. Signed-off-by: Jakub Sokołowski <jakub@status.im>
6.7 KiB
Waku v2 fleet: management & monitoring
Background
Status currently maintains two fleets for nim-waku
v2 nodes,
the wakuv2.test
fleet and the wakuv2.prod
(production) fleet.
They'll be referred to as test
and prod
in this document.
Status fleet nodes and addresses can be viewed here.
Fleet overview
At the time of writing this, each fleet consists of three waku v2 nodes,
with a websockify WebSocket-to-TCP bridge for each node.
Waku v2 peers can choose to connect either directly to a node's TCP endpoint
or the bridged WebSocket depending on their own supported transports.
The prod
fleet also has a deployed chat2bridge
,
which serves as a bridge between the Waku v2 toy-chat and Matterbridge.
The chat2bridge
is currently deployed to the node-01.do-ams3
datacentre
and configured to bridge toy-chat messages to the #waku channel
on the Vac Discord Server.
Fleet deployment rationale
The test
fleet is automatically updated after every commit to the nim-waku
master
branch
and is therefore the most up to date representation of Waku v2 development.
It is suitable for testing new features before they're rolled out to the (more) stable prod
fleet.
In general only the latest release of nim-waku
is deployed to the prod
fleet.
It requires manual updating and should therefore be more stable than test
.
See the section on Jenkins below for more on the deployment process.
Related repos
The infra-docs
repo contains the most comprehensive overview of Status infrastructure.
This is a private repository.
Feel free to contact someone in the team to request access.
The infra-nim-waku
repo contains the infrastructure definitions for Waku nodes implemented in Nim.
Monitoring and management
The rest of this document highlights some infra services of specific interest to Waku v2 fleet monitoring and management:
- Consul to view the health status of Waku nodes.
- Kibana to view and filter logs.
- Grafana to view and filter metrics.
- Jenkins to configure and deploy new builds to the fleets.
1. Consul for health checks
Consul provides a useful high-level view of the health of the nim-waku
fleets.
It aggregates the result of various monitoring checks
and shows the health status for the node itself, the RPC API, exposed WebSocket and metrics.
The datacentre can be changed in the upper left-hand corner.
2. Kibana for logs
Kibana is a powerful visualisation tool for Elasticsearch data.
For Waku v2 fleets it can be used to retrieve, filter and view the logs for all deployed services.
For example, to view the latest logs for prod
,
Kibana can be opened in "Discover" mode with an active filter for fleet: wakuv2.prod
.
3. Grafana for metrics
The Nim-Waku V2
Grafana dashboard displays live and historical metrics for Waku v2 nodes.
The default view includes metrics from both fleets,
though it's possible to filter by Hostname
, Fleet name
or Data Center
.
The time range can also be configured -
by default the latest metrics will be shown.
The dashboard itself includes an "At a glance" summary with an overview of the latest connected peers, total messages, CPU usage, reported errors, etc. The "General" collection contains a more in-depth look at node, libp2p and performance-related metrics. This is followed by separate panel collections showing per-protocol metrics.
A copy of the Nim-Waku V2
fleets dashboard is maintained in the nim-waku
repo.
From time to time certain Prometheus queries may fail,
often when the underlying metrics are renamed.
Please report any broken panels via our Discord channels or by creating an issue in nim-waku
.
4. Jenkins for deployment
The nim-waku
jobs on Jenkins are configured to deploy nim-waku
builds to the fleets.
deploy-wakuv2-test
is triggered automatically after every commit to thenim-waku
master
branch.deploy-wakuv2-prod
must be triggered manually. Usually this job is only built after a tagged release innim-waku
.
Each job can be manually triggered using the "Build with Parameters" option. Options under "Configure" include the build triggers, build target and branches to build. These should only be changed with care.
See Continuous Integration docs for more.
Quick links
chat2bridge
- Consul for do-ams3
- Consul for ac-cn-hongkong-c
- Consul for gc-us-central1-a
- Grafana Nim-Waku V2 dashboard
infra-docs
repoinfra-nim-waku
repo- Jenkins jobs for
nim-waku
- Jenkins deploy-wakuv2-prod manual trigger
- Jenkins deploy-wakuv2-test manual trigger
- Kibana logs for
prod
- Kibana logs for
test
- Status fleets
- Websockify