nwaku/docs/contributors/waku-fleets.md
Jakub Sokołowski 3b7d6f4a1b
ci: use docker tag names closer to fleet names
To avoid naming confiusion.

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2022-09-28 17:12:33 +02:00

6.7 KiB

Waku v2 fleet: management & monitoring

Background

Status currently maintains two fleets for nim-waku v2 nodes, the wakuv2.test fleet and the wakuv2.prod (production) fleet. They'll be referred to as test and prod in this document. Status fleet nodes and addresses can be viewed here.

Fleet overview

At the time of writing this, each fleet consists of three waku v2 nodes, with a websockify WebSocket-to-TCP bridge for each node. Waku v2 peers can choose to connect either directly to a node's TCP endpoint or the bridged WebSocket depending on their own supported transports. The prod fleet also has a deployed chat2bridge, which serves as a bridge between the Waku v2 toy-chat and Matterbridge. The chat2bridge is currently deployed to the node-01.do-ams3 datacentre and configured to bridge toy-chat messages to the #waku channel on the Vac Discord Server.

Fleet deployment rationale

The test fleet is automatically updated after every commit to the nim-waku master branch and is therefore the most up to date representation of Waku v2 development. It is suitable for testing new features before they're rolled out to the (more) stable prod fleet.

In general only the latest release of nim-waku is deployed to the prod fleet. It requires manual updating and should therefore be more stable than test. See the section on Jenkins below for more on the deployment process.

The infra-docs repo contains the most comprehensive overview of Status infrastructure. This is a private repository. Feel free to contact someone in the team to request access.

The infra-nim-waku repo contains the infrastructure definitions for Waku nodes implemented in Nim.

Monitoring and management

The rest of this document highlights some infra services of specific interest to Waku v2 fleet monitoring and management:

  1. Consul to view the health status of Waku nodes.
  2. Kibana to view and filter logs.
  3. Grafana to view and filter metrics.
  4. Jenkins to configure and deploy new builds to the fleets.

1. Consul for health checks

Consul provides a useful high-level view of the health of the nim-waku fleets. It aggregates the result of various monitoring checks and shows the health status for the node itself, the RPC API, exposed WebSocket and metrics. The datacentre can be changed in the upper left-hand corner.

2. Kibana for logs

Kibana is a powerful visualisation tool for Elasticsearch data. For Waku v2 fleets it can be used to retrieve, filter and view the logs for all deployed services. For example, to view the latest logs for prod, Kibana can be opened in "Discover" mode with an active filter for fleet: wakuv2.prod.

3. Grafana for metrics

The Nim-Waku V2 Grafana dashboard displays live and historical metrics for Waku v2 nodes. The default view includes metrics from both fleets, though it's possible to filter by Hostname, Fleet name or Data Center. The time range can also be configured - by default the latest metrics will be shown.

The dashboard itself includes an "At a glance" summary with an overview of the latest connected peers, total messages, CPU usage, reported errors, etc. The "General" collection contains a more in-depth look at node, libp2p and performance-related metrics. This is followed by separate panel collections showing per-protocol metrics.

A copy of the Nim-Waku V2 fleets dashboard is maintained in the nim-waku repo. From time to time certain Prometheus queries may fail, often when the underlying metrics are renamed. Please report any broken panels via our Discord channels or by creating an issue in nim-waku.

4. Jenkins for deployment

The nim-waku jobs on Jenkins are configured to deploy nim-waku builds to the fleets.

  1. deploy-wakuv2-test is triggered automatically after every commit to the nim-waku master branch.
  2. deploy-wakuv2-prod must be triggered manually. Usually this job is only built after a tagged release in nim-waku.

Each job can be manually triggered using the "Build with Parameters" option. Options under "Configure" include the build triggers, build target and branches to build. These should only be changed with care.

See Continuous Integration docs for more.

  1. chat2bridge
  2. Consul for do-ams3
  3. Consul for ac-cn-hongkong-c
  4. Consul for gc-us-central1-a
  5. Grafana Nim-Waku V2 dashboard
  6. infra-docs repo
  7. infra-nim-waku repo
  8. Jenkins jobs for nim-waku
  9. Jenkins deploy-wakuv2-prod manual trigger
  10. Jenkins deploy-wakuv2-test manual trigger
  11. Kibana logs for prod
  12. Kibana logs for test
  13. Status fleets
  14. Websockify