mirror of https://github.com/status-im/consul.git
469 lines
20 KiB
Plaintext
469 lines
20 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Configure Datadog metrics collection for Consul on Kubernetes
|
|
description: >-
|
|
Consul can integrate with external platforms such as Datadog to stream metrics about its operations. Learn how to enable Consul monitoring with Datadog by configuring the `metrics.datadog` Helm value override options.
|
|
---
|
|
|
|
# Configure Datadog metrics collection for Consul on Kubernetes
|
|
|
|
This page describes the processes for integrating Datadog metrics collection in your Consul on Kubernetes deployment. The Helm chart includes automated configuration options to simplify the integration process.
|
|
|
|
## Datadog Metrics Integration Methods
|
|
|
|
- [DogstatsD](#dogstatsd)
|
|
- [Datadog Checks: Official Consul Integration](#datadog-checks-official-consul-integration)
|
|
- [Openmetrics Prometheus](#openmetrics-prometheus)
|
|
|
|
Users should choose **_one_** integration method from the three described below that best suites the intent for metrics collection. **[DogStatsD](https://docs.datadoghq.com/developers/dogstatsd/?tab=kubernetes)**, **[Consul Integration](https://docs.datadoghq.com/integrations/consul/?tab=containerized)**, and **[Openmetrics Prometheus](https://docs.datadoghq.com/containers/kubernetes/prometheus/?tab=kubernetesadv2)** methods of integration are **_mutually exclusive_**.
|
|
|
|
**Reasoning:** _The consul-k8s helm chart automated configuration implements Datadog's [Consul Integration](https://docs.datadoghq.com/integrations/consul/?tab=containerized) method using the [`use_prometheus_endpoint`](https://github.com/DataDog/integrations-core/blob/07c04c5e9465ba1f3e0198830896d05923e81283/consul/datadog_checks/consul/data/conf.yaml.example#L59) configuration parameter. **DogstatsD**, **Consul Integration**, and **Openmetrics Prometheus** Metrics **by design** share the same [metric name](https://docs.datadoghq.com/integrations/consul/?tab=host#data-collected) syntax for collection, and would therefore cause a conflict. The [consul.py](https://github.com/DataDog/integrations-core/blob/07c04c5e9465ba1f3e0198830896d05923e81283/consul/datadog_checks/consul/consul.py#L55-L61) integration source code, as well as the [consul-k8s helm chart](https://github.com/hashicorp/consul-k8s/blob/4cac70496788f50354f96e9331003fcf338f419c/charts/consul/templates/_helpers.tpl#L595-L598) prohibit the enablement of more that one integration at a time._
|
|
|
|
|
|
## DogstatsD
|
|
|
|
This method of implementation leverages the [hashicorp/go-metrics DogstatsD client library](https://github.com/hashicorp/go-metrics/tree/master/datadog) to manage metrics collection.
|
|
Metrics are aggregated and sent via UDP or UDS transports to a Datadog Agent that runs on the same Kube Node as the Consul servers.
|
|
|
|
Enabling this method of metrics collection allows Consul to control the delivery of metrics traffic directly to a Datadog agent rather
|
|
than a Datadog agent attempting to reach Consul and scrape the `/v1/agent/metrics` API endpoint.
|
|
|
|
This is accomplished by updating each server agent's configuration telemetry stanza.
|
|
|
|
### Helm Chart Configuration
|
|
|
|
<Tabs>
|
|
<CodeBlockConfig heading={"DogstatsD (UDS)"}>
|
|
|
|
Consul Helm Chart Overrides
|
|
|
|
```yaml
|
|
metrics:
|
|
enabled: true
|
|
enableAgentMetrics: true
|
|
datadog:
|
|
enabled: true
|
|
namespace: "datadog"
|
|
dogstatsd:
|
|
enabled: true
|
|
socketTransportType: "UDS"
|
|
dogstatsdAddr: "/var/run/datadog/dsd.socket"
|
|
```
|
|
|
|
Resulting server agent telemetry configuration
|
|
|
|
```json
|
|
{
|
|
"telemetry": {
|
|
"dogstatsd_addr": "unix:///var/run/datadog/dsd.socket"
|
|
}
|
|
}
|
|
```
|
|
|
|
</CodeBlockConfig>
|
|
|
|
<CodeBlockConfig heading={"DogstatsD (UDP:KubeService)"}>
|
|
|
|
Consul Helm Chart Overrides
|
|
|
|
```yaml
|
|
metrics:
|
|
enabled: true
|
|
enableAgentMetrics: true
|
|
datadog:
|
|
enabled: true
|
|
namespace: "datadog"
|
|
dogstatsd:
|
|
enabled: true
|
|
socketTransportType: "UDP"
|
|
# Set `dogstatsdPort` to `0` (default) to omit port number append to address.
|
|
dogstatsdPort: 0
|
|
dogstatsdAddr: "datadog-agent.datadog.svc.cluster.local"
|
|
```
|
|
|
|
Resulting server agent telemetry configuration
|
|
|
|
```json
|
|
{
|
|
"telemetry": {
|
|
"dogstatsd_addr": "datadog-agent.datadog.svc.cluster.local"
|
|
}
|
|
}
|
|
```
|
|
|
|
</CodeBlockConfig>
|
|
|
|
<CodeBlockConfig heading={"DogstatsD (UDP: hostPort)"}>
|
|
|
|
Consul Helm Chart Overrides
|
|
|
|
```yaml
|
|
metrics:
|
|
enabled: true
|
|
enableAgentMetrics: true
|
|
datadog:
|
|
enabled: true
|
|
namespace: "datadog"
|
|
dogstatsd:
|
|
enabled: true
|
|
socketTransportType: "UDP"
|
|
dogstatsdPort: 8125
|
|
dogstatsdAddr: "172.20.180.10"
|
|
```
|
|
|
|
Resulting server agent telemetry configuration
|
|
|
|
```json
|
|
{
|
|
"telemetry": {
|
|
"dogstatsd_addr": "172.20.180.10:8125",
|
|
}
|
|
}
|
|
```
|
|
|
|
</CodeBlockConfig>
|
|
</Tabs>
|
|
|
|
### UDS/UDP Advantages and Disadvantages
|
|
|
|
This integration method accomplishes metrics collection by leveraging either [Unix Domain Sockets](https://docs.datadoghq.com/developers/dogstatsd/unix_socket/?tab=kubernetes) (**UDS**) or [User Datagram Protocol](https://docs.datadoghq.com/developers/dogstatsd/?tab=kubernetes#agent) (**UDP**) transport.
|
|
Practitioners who manage their Kubernetes infrastructure and/or service-mesh should take into account the implications outlined in the tables below.
|
|
|
|
#### UDS
|
|
|
|
**Packet Transport**: Unix Domain Socket File
|
|
|
|
| Advantages | Disadvantages |
|
|
|-------------------------------------------------------|------------------------------------------------------------------------------------------------------|
|
|
| No IP or DNS resolution requirement for Datadog Agent | Requires [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) Volume attachment |
|
|
| Improved network performance | Datadog Agent must run on every host you send metrics from |
|
|
| Higher throughput capacity | |
|
|
| Packet error handling | |
|
|
| Automatic container ID tagging | |
|
|
|
|
#### UDP
|
|
|
|
**Packet Transport**:
|
|
* Kubernetes Service `IP:Port`
|
|
* Container Host Port
|
|
|
|
| Advantages | Disadvantages |
|
|
|------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
|
|
| Does **not** require [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) Volume attachment | **No** packet error handling |
|
|
| (**_KubeDNS_**) Does **not** require Hostport exposure if accessible from cluster | (**_Hostport_**) Requires a networking provider that adheres to the CNI specification, such as Calico, Canal, or Flannel. |
|
|
| Similar `IP:Port` configuration as Virtual Machine hosts | (**_Hostport_**) Requires port to be exposed on host using `hostNetwork` |
|
|
| | (**_Hostport_**) Requires firewall access controls to permit access |
|
|
| | (**_Hostport_**) Network Namespace sharing is required |
|
|
|
|
|
|
#### Verifying DogstatsD Metric Collection
|
|
|
|
To confirm you're Datadog agent is receiving traffic, the `status` subcommand can be ran from the Datadog Agent expecting to receive DogstatsD traffic from Consul.
|
|
|
|
There should be an increase in either UDP or UDS traffic packet counts from the resultant output after the configuration has been properly established.
|
|
|
|
|
|
| Transport | Command | Pod | Container |
|
|
|:---------------|----------------------------------------------------------------------------|---------------|-----------|
|
|
| `UDP`\|\|`UDS` | `agent status` | datadog-agent | agent |
|
|
|
|
|
|
<Tabs>
|
|
<CodeBlockConfig heading={"UDP"}>
|
|
|
|
```shell
|
|
# Example: UDP Packet and Metric Packet Traffic Increase
|
|
=========
|
|
DogStatsD
|
|
=========
|
|
Event Packets: 0
|
|
Event Parse Errors: 0
|
|
Metric Packets: 5,908
|
|
Metric Parse Errors: 0
|
|
Service Check Packets: 0
|
|
Service Check Parse Errors: 0
|
|
Udp Bytes: 636,872
|
|
Udp Packet Reading Errors: 0
|
|
Udp Packets: 3,300
|
|
Uds Bytes: 0
|
|
Uds Origin Detection Errors: 0
|
|
Uds Packet Reading Errors: 0
|
|
Uds Packets: 0
|
|
Unterminated Metric Errors: 0
|
|
```
|
|
</CodeBlockConfig>
|
|
<CodeBlockConfig heading={"UDS"}>
|
|
|
|
```shell
|
|
# Example: UDS Packet and Metric Packet Traffic Increase
|
|
=========
|
|
DogStatsD
|
|
=========
|
|
Event Packets: 0
|
|
Event Parse Errors: 0
|
|
Metric Packets: 30,523
|
|
Metric Parse Errors: 0
|
|
Service Check Packets: 0
|
|
Service Check Parse Errors: 0
|
|
Udp Bytes: 124,635
|
|
Udp Packet Reading Errors: 0
|
|
Udp Packets: 731
|
|
Uds Bytes: 2,957,433
|
|
Uds Origin Detection Errors: 0
|
|
Uds Packet Reading Errors: 0
|
|
Uds Packets: 11,563
|
|
Unterminated Metric Errors: 0
|
|
```
|
|
|
|
</CodeBlockConfig>
|
|
</Tabs>
|
|
|
|
Traffic verification can also be accomplished using the `netstat` command line utility from a consul-server expected to be submitting
|
|
metrics data to Datadog.
|
|
|
|
<Note>
|
|
Using <code>netstat</code> requires privileged container permissions to install <code>open-bsd</code> networking tools on the consul-server for testing.
|
|
</Note>
|
|
|
|
| Transport | Command | Pod | Container |
|
|
|:-----------------|-----------|---------------|-----------|
|
|
| `UDP` \|\| `UDS` | `netstat` | consul-server | consul |
|
|
|
|
<Tabs>
|
|
<CodeBlockConfig heading={"UDP"}>
|
|
|
|
```shell
|
|
$ netstat -nup | grep "172.28.13.12:8125.*ESTABLISHED
|
|
udp 0 0 127.0.0.1:53874 127.0.0.1:8125 ESTABLISHED 23176/consul
|
|
```
|
|
|
|
</CodeBlockConfig>
|
|
|
|
<CodeBlockConfig heading={"UDS"}>
|
|
|
|
```shell
|
|
$ netstat -x
|
|
Active UNIX domain sockets (w/o servers)
|
|
Proto RefCnt Flags Type State I-Node Path
|
|
unix 2 [ ] DGRAM CONNECTED 15952473
|
|
unix 2 [ ] DGRAM 15652537 @9d10c
|
|
```
|
|
|
|
</CodeBlockConfig>
|
|
|
|
</Tabs>
|
|
|
|
UDS provides the additional capability for verification by sending a test metrics packet to the Unix Socket configured.
|
|
|
|
<Note>
|
|
Using <code>netcat</code> (nc) requires privileged container permissions to install <code>open-bsd</code> networking tools on the consul-server for testing.
|
|
</Note>
|
|
|
|
| Transport | Command | Pod | Container |
|
|
|:----------|---------|---------------|-----------|
|
|
| `UDS` | `nc` | consul-server | consul |
|
|
|
|
<CodeBlockConfig heading={"UDS"}>
|
|
|
|
```shell
|
|
$ echo -n "custom.metric.name:1|c" | nc -U -u -w1 /var/run/datadog/dsd.socket
|
|
Bound on /tmp/nc-IjJkoG/recv.sock
|
|
```
|
|
|
|
</CodeBlockConfig>
|
|
|
|
#### Use Case
|
|
|
|
DogstatsD integration provides full-scope metrics collection from Consul, and minimizes access control configuration requirements as traffic
|
|
flow is outbound (toward the Datadog Agent) as opposed to inbound (toward the `/v1/agent/metrics/` API endpoint).
|
|
|
|
|
|
#### Metrics Data Collected
|
|
|
|
- Full list of metrics sent via DogstatsD consists of those listed in the [Agent Telemetry](https://developer.hashicorp.com/consul/docs/agent/telemetry) documentation.
|
|
|
|
|
|
## Datadog Checks: Official Consul Integration
|
|
|
|
The Datadog Agent package includes official third-party integrations for built-in availability upon agent deployment.
|
|
|
|
The Consul Integration Datadog checks provided some additional metric verification checks that leverage Consul's built-in feature-set, and help monitor Consul
|
|
during normal operation beyond that of Consul's available metrics.
|
|
|
|
See the below [table](#additional-integration-checks-performed) for an outline of the features added by the official integration.
|
|
|
|
<Note>
|
|
Currently, the annotations configured by the Helm overrides with Consul RPC TLS enabled
|
|
assume server and ca certificate secrets are shared with the Datadog agent release namespace and mount the valid <code>tls.crt</code>, <code>tls.key</code>,
|
|
and <code>ca.crt</code> secret volumes at the <code>/etc/datadog-agent/conf.d/consul.d/certs</code> path on the Datadog Agent, agent container.
|
|
</Note>
|
|
|
|
### Helm Chart Configuration
|
|
|
|
<Tabs>
|
|
<CodeBlockConfig heading={"Datadog Consul Checks"}>
|
|
|
|
Consul Helm Chart Overrides
|
|
|
|
```yaml
|
|
global:
|
|
tls:
|
|
enabled: true
|
|
enableAutoEncrypt: true
|
|
acls:
|
|
manageSystemACLs: true
|
|
metrics:
|
|
enabled: true
|
|
enableAgentMetrics: true
|
|
datadog:
|
|
enabled: true
|
|
namespace: "datadog"
|
|
```
|
|
|
|
|
|
Consul `server-statefulset.yaml` annotations
|
|
|
|
```yaml
|
|
"ad.datadoghq.com/consul.checks": |
|
|
{
|
|
"consul": {
|
|
"init_config": {},
|
|
"instances": [
|
|
{
|
|
"url": "https://consul-server.consul.svc:8501",
|
|
"tls_cert": "/etc/datadog-agent/conf.d/consul.d/certs/tls.crt",
|
|
"tls_private_key": "/etc/datadog-agent/conf.d/consul.d/certs/tls.key",
|
|
"tls_ca_cert": "/etc/datadog-agent/conf.d/consul.d/ca/tls.crt",
|
|
"use_prometheus_endpoint": true,
|
|
"acl_token": "ENC[k8s_secret@consul/consul-datadog-agent-metrics-acl-token/token]",
|
|
"new_leader_checks": true,
|
|
"network_latency_checks": true,
|
|
"catalog_checks": true,
|
|
"auth_type": "basic"
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
</CodeBlockConfig>
|
|
</Tabs>
|
|
|
|
### Additional Integration Checks Performed
|
|
|
|
| Consul Component | Description | API Endpoint(s) |
|
|
|------------------|-----------------------------------------------------|----------------------------------------------------------------------|
|
|
| Agent | Agent Metadata (i.e., version) | `/v1/agent/self` |
|
|
| Metrics | Prometheus formatted metrics | `/v1/agent/metrics` |
|
|
| Serf | Events and Membership Flaps | `/v1/health/service/consul` `/v1/agent/self` |
|
|
| Raft | Monitors Raft peer information and leader elections | `/v1/status/leader` `/v1/status/peers` |
|
|
| Catalog Services | Service Health Status and Node Count | `/v1/catalog/services` `/v1/health/state/any` |
|
|
| Catalog Nodes | Node Service Count and Health Status | `/v1/health/state/any` `/v1/health/service/<service>` |
|
|
| Consul Latency | Consul LAN + WAN Coordinate Latency Calculations | `/v1/agent/self` `/v1/coordinate/nodes` `/v1/coordinate/datacenters` |
|
|
|
|
#### Use Case
|
|
|
|
This integration is primarily for basic Consul monitoring with focus on the service discovery.
|
|
|
|
#### Metrics Data Collected
|
|
|
|
The list of Consul's Prometheus metrics scraped and mapped by this method are listed in the latest [metrics.py](https://github.com/DataDog/integrations-core/blob/master/consul/datadog_checks/consul/metrics.py) of the integration source code.
|
|
|
|
To understand how Consul Latency metrics are calculated, review the [Consul Network Coordinates](https://developer.hashicorp.com/consul/docs/architecture/coordinates) documentation.
|
|
|
|
Review the [Datadog Documentation](https://docs.datadoghq.com/integrations/consul/?tab=containerized#data-collected) for the full description of Metrics data collected.
|
|
|
|
## Openmetrics Prometheus
|
|
|
|
For Datadog agents at or above v6.5.0, OpenMetrics and Prometheus checks are available to scrape Kubernetes application Prometheus endpoints.
|
|
|
|
This method implements the collection via Openmetrics as that is fully supported for Prometheus text format and is accomplished using pod annotations as demonstrated below.
|
|
|
|
<Note>
|
|
Enabling OpenMetrics collection via Datadog <strong><em>by design</em></strong> removes the <code>prometheus.io/path</code> and <code>prometheus.io/port</code> annotations from the consul-server statefulset deployment to allow Datadog
|
|
to scrape the agent's metrics API endpoint using either RPC TLS and Consul ACLs as necessary.
|
|
</Note>
|
|
|
|
<Note>
|
|
Currently, the annotations configured by the Helm overrides with Consul RPC TLS enabled
|
|
assume server and ca certificate secrets are shared with the Datadog agent release namespace and mount the valid <code>tls.crt</code>, <code>tls.key</code>,
|
|
and <code>ca.crt</code> secret volumes at the <code>/etc/datadog-agent/conf.d/consul.d/certs</code> path on the Datadog Agent, agent container.
|
|
</Note>
|
|
|
|
### Helm Chart Configuration
|
|
|
|
<Tabs>
|
|
<CodeBlockConfig heading={"OpenMetrics Prometheus"}>
|
|
|
|
Consul Helm Chart Overrides
|
|
|
|
```yaml
|
|
global:
|
|
tls:
|
|
enabled: true
|
|
enableAutoEncrypt: true
|
|
acls:
|
|
manageSystemACLs: true
|
|
metrics:
|
|
enabled: true
|
|
enableAgentMetrics: true
|
|
datadog:
|
|
enabled: true
|
|
namespace: "datadog"
|
|
openMetricsPrometheus:
|
|
enabled: true
|
|
```
|
|
|
|
Consul `server-statefulset.yaml` annotations
|
|
|
|
```yaml
|
|
ad.datadoghq.com/consul.checks: |
|
|
{
|
|
"openmetrics": {
|
|
"init_config": {},
|
|
"instances": [
|
|
{
|
|
"openmetrics_endpoint": "https://consul-server.consul.svc:8501/v1/agent/metrics?format=prometheus",
|
|
"tls_cert": "/etc/datadog-agent/conf.d/consul.d/certs/tls.crt",
|
|
"tls_private_key": "/etc/datadog-agent/conf.d/consul.d/certs/tls.key",
|
|
"tls_ca_cert": "/etc/datadog-agent/conf.d/consul.d/ca/tls.crt",
|
|
"headers": {
|
|
"X-Consul-Token": "ENC[k8s_secret@consul/consul-datadog-agent-metrics-acl-token/token]"
|
|
},
|
|
"namespace": "consul",
|
|
"metrics": [ ".*" ]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
</CodeBlockConfig>
|
|
</Tabs>
|
|
|
|
#### Use Case
|
|
|
|
This method of integration is useful for Prometheus-enabled scrapes with further customization of the collected data.
|
|
|
|
By default, all metrics pulled using this method scrape Consul metrics using the `/v1/agent/metrics?format=prometheus` API query, and are considered to be custom metrics.
|
|
|
|
Use of this method maps to Datadog as described in [Mapping Prometheus Metrics to Datadog Metrics](https://docs.datadoghq.com/integrations/guide/prometheus-metrics/?tab=latestversion). The following table summarizing how these metrics map to each other.
|
|
|
|
| OpenMetrics metric type | Datadog metric type |
|
|
|:------------------------|:-----------------------------------|
|
|
| `Gauge` | `gauge` |
|
|
| `Counter` | `count` |
|
|
| Histogram: `_count ` | `count.count` |
|
|
| Histogram: `_sum` | `count.sum` |
|
|
| Histogram: `_bucket` | `count.bucket` \|\| `distribution` |
|
|
| Summary: `_count` | `count.count` |
|
|
| Summary: `_sum` | `count.sum` |
|
|
| Summary: `sample` | `gauge.quantile` |
|
|
|
|
|
|
#### Metrics Data Collected
|
|
|
|
The integration, by default, uses a wildcard (`".*"`) to collect **_all_** metrics emitted from the `/v1/agent/metrics` endpoint.
|
|
|
|
Please refer to the [Agent Telemetry](https://developer.hashicorp.com/consul/docs/agent/telemetry) documentation for a full list and desription of the metrics data collected.
|