nwaku/docs/contributors/waku-fleets.md

6.7 KiB

Waku fleet: management & monitoring

Background

Status currently maintains two fleets for nwaku nodes, the waku.test fleet and the waku.sandbox (sandbox) fleet. They'll be referred to as test and sandbox in this document. Status fleet nodes and addresses can be viewed here.

Fleet overview

At the time of writing this, each fleet consists of three waku nodes, with a websockify WebSocket-to-TCP bridge for each node. Waku peers can choose to connect either directly to a node's TCP endpoint or the bridged WebSocket depending on their own supported transports. The sandbox fleet also has a deployed chat2bridge, which serves as a bridge between the Waku toy-chat and Matterbridge. The chat2bridge is currently deployed to the node-01.do-ams3 datacentre and configured to bridge toy-chat messages to the #waku channel on the Vac Discord Server.

Fleet deployment rationale

The test fleet is automatically updated after every commit to the nwaku repository master branch and is therefore the most up to date representation of Waku development. It is suitable for testing new features before they're rolled out to the (more) stable sandbox fleet.

In general only the latest release of nwaku is deployed to the sandbox fleet. It requires manual updating and should therefore be more stable than test. See the section on Jenkins below for more on the deployment process.

The infra-docs repo contains the most comprehensive overview of Status infrastructure. This is a private repository. Feel free to contact someone in the team to request access.

The infra-nim-waku repo contains the infrastructure definitions for Waku nodes implemented in Nim.

Monitoring and management

The rest of this document highlights some infra services of specific interest to Waku fleet monitoring and management:

  1. Consul to view the health status of Waku nodes.
  2. Kibana to view and filter logs.
  3. Grafana to view and filter metrics.
  4. Jenkins to configure and deploy new builds to the fleets.

1. Consul for health checks

Consul provides a useful high-level view of the health of the nwaku fleets. It aggregates the result of various monitoring checks and shows the health status for the node itself, the RPC API, exposed WebSocket and metrics. The datacentre can be changed in the upper left-hand corner.

2. Kibana for logs

Kibana is a powerful visualisation tool for Elasticsearch data. For Waku fleets it can be used to retrieve, filter and view the logs for all deployed services. For example, to view the latest logs for sandbox, Kibana can be opened in "Discover" mode with an active filter for fleet: waku.sandbox.

3. Grafana for metrics

The Nim-Waku Grafana dashboard displays live and historical metrics for Waku nodes. The default view includes metrics from both fleets, though it's possible to filter by Hostname, Fleet name or Data Center. The time range can also be configured - by default the latest metrics will be shown.

The dashboard itself includes an "At a glance" summary with an overview of the latest connected peers, total messages, CPU usage, reported errors, etc. The "General" collection contains a more in-depth look at node, libp2p and performance-related metrics. This is followed by separate panel collections showing per-protocol metrics.

A copy of the Nim-Waku fleets dashboard is maintained in the nwaku repo. From time to time certain Prometheus queries may fail, often when the underlying metrics are renamed. Please report any broken panels via our Discord channels or by creating an issue in nwaku.

4. Jenkins for deployment

The nim-waku jobs on Jenkins are configured to deploy nwaku builds to the fleets.

  1. deploy-waku-test is triggered automatically after every commit to the nwaku master branch.
  2. deploy-waku-sandbox must be triggered manually. Usually this job is only built after a tagged release in nwaku.

Each job can be manually triggered using the "Build with Parameters" option. Options under "Configure" include the build triggers, build target and branches to build. These should only be changed with care.

See Continuous Integration docs for more.

  1. chat2bridge
  2. Consul for do-ams3
  3. Consul for ac-cn-hongkong-c
  4. Consul for gc-us-central1-a
  5. Grafana Nim-Waku dashboard
  6. infra-docs repo
  7. infra-waku repo
  8. Jenkins jobs for nim-waku
  9. Jenkins deploy-waku-sandbox manual trigger
  10. Jenkins deploy-waku-test manual trigger
  11. Kibana logs for sandbox
  12. Kibana logs for test
  13. Status fleets
  14. Status fleets - Table
  15. Websockify