Dhia Ayachi 3717ab0991 generate a single debug file for a long duration capture (#10279)
* debug: remove the CLI check for debug_enabled

The API allows collecting profiles even debug_enabled=false as long as
ACLs are enabled. Remove this check from the CLI so that users do not
need to set debug_enabled=true for no reason.

Also:
- fix the API client to return errors on non-200 status codes for debug
  endpoints
- improve the failure messages when pprof data can not be collected

Co-Authored-By: Dhia Ayachi <dhia@hashicorp.com>

* remove parallel test runs

parallel runs create a race condition that fail the debug tests

* snapshot the timestamp at the beginning of the capture

- timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures
- capture append to the file if it already exist

* Revert "snapshot the timestamp at the beginning of the capture"

This reverts commit c2d03346

* Refactor captureDynamic to extract capture logic for each item in a different func

* snapshot the timestamp at the beginning of the capture

- timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures
- capture append to the file if it already exist

* Revert "snapshot the timestamp at the beginning of the capture"

This reverts commit c2d03346

* Refactor captureDynamic to extract capture logic for each item in a different func

* extract wait group outside the go routine to avoid a race condition

* capture pprof in a separate go routine

* perform a single capture for pprof data for the whole duration

* add missing vendor dependency

* add a change log and fix documentation to reflect the change

* create function for timestamp dir creation and simplify error handling

* use error groups and ticker to simplify interval capture loop

* Logs, profile and traces are captured for the full duration. Metrics, Heap and Go routines are captured every interval

* refactor Logs capture routine and add log capture specific test

* improve error reporting when log test fail

* change test duration to 1s

* make time parsing in log line more robust

* refactor log time format in a const

* test on log line empty the earliest possible and return

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* rename function to captureShortLived

* more specific changelog

Co-authored-by: Paul Banks <banks@banksco.de>

* update documentation to reflect current implementation

* add test for behavior when invalid param is passed to the command

* fix argument line in test

* a more detailed description of the new behaviour

Co-authored-by: Paul Banks <banks@banksco.de>

* print success right after the capture is done

* remove an unnecessary error check

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* upgraded github.com/google/pprof v0.0.0-20181206194817-3ea8567a2e57 => v0.0.0-20210601050228-01bbb1931b22

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Co-authored-by: Paul Banks <banks@banksco.de>
2021-06-07 17:12:49 +00:00

122 lines
6.9 KiB
Plaintext

---
layout: commands
page_title: 'Commands: Debug'
---
# Consul Debug
Command: `consul debug`
The `consul debug` command monitors a Consul agent for the specified period of
time, recording information about the agent, cluster, and environment to an archive
written to the current directory.
Providing support for complex issues encountered by Consul operators often
requires a large amount of debugging information to be retrieved. This command
aims to shortcut that coordination and provide a simple workflow for accessing
data about Consul agent, cluster, and environment to enable faster
isolation and debugging of issues.
This command requires an `operator:read` ACL token in order to retrieve the
data from the target agent, if ACLs are enabled.
If the command is interrupted, as it could be given a long duration but
require less time than expected, it will attempt to archive the current
captured data.
## Security and Privacy
By default, ACL tokens, private keys, and other sensitive material related
to Consul is sanitized and not available in this archive. However, other
information about the environment the target agent is running in is available
in plain text within the archive.
It is recommended to validate the contents of the archive and redact any
material classified as sensitive to the target environment, or use the `-capture`
flag to not retrieve it initially.
Additionally, we recommend securely transmitting this archive via encryption
or otherwise.
## Usage
`Usage: consul debug [options]`
By default, the debug command will capture an archive at the current path for
all targets for 2 minutes.
#### API Options
@include 'http_api_options_client.mdx'
#### Command Options
- `-duration` - Optional, the total time to capture data for from the target agent. Must
be greater than the interval and longer than 10 seconds. Defaults to 2 minutes.
- `-interval` - Optional, the interval at which to capture dynamic data, such as heap
and metrics. Must be longer than 5 seconds. Defaults to 30 seconds.
- `-capture` - Optional, can be specified multiple times for each [capture target](#capture-targets)
and will only record that information in the archive.
- `-output` - Optional, the full path of where to write the directory of data and
resulting archive. Defaults to the current directory.
- `-archive` - Optional, if the tool show archive the directory of data into a
compressed tar file. Defaults to true.
## Capture Targets
The `-capture` flag can be specified multiple times to capture specific
information when `debug` is running. By default, it captures all information.
| Target | Description |
| --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `agent` | Version and configuration information about the agent. |
| `host` | Information about resources on the host running the target agent such as CPU, memory, and disk. |
| `cluster` | A list of all the WAN and LAN members in the cluster. |
| `metrics` | Metrics from the in-memory metrics endpoint in the target, captured at the interval. |
| `logs` | `DEBUG` level logs for the target agent, captured for the duration. |
| `pprof` | Golang heap, CPU, goroutine, and trace profiling. CPU and traces are captured for `duration` in a single file while heap and goroutine are separate snapshots for each `interval`. This information is not retrieved unless [`enable_debug`](/docs/agent/options#enable_debug) is set to `true` on the target agent or ACLs are enable and an ACL token with `operator:read` is provided. |
## Examples
This command can be run from any host with the Consul binary, but requires
network access to the target agent in order to retrieve data. Once retrieved,
the data is written to the the specified path (defaulting to the current
directory) on the host where the command runs.
By default the command will capture all available data from the default
agent address on loopback for 2 minutes at 30 second intervals.
```shell-session
$ consul debug
...
```
In this example, the archive is collected from a different agent on the
network using the standard Consul CLI flag to change the API address.
```shell-session
$ consul debug -http-addr=10.0.1.10:8500
...
```
The capture flag can be specified to only record a subset of data
about the agent and environment.
```shell-session
$ consul debug -capture agent -capture host -capture logs
...
```
The duration of the command and interval of capturing dynamic
information (such as metrics) can be specified with the `-interval`
and `-duration` flags.
```shell-session
$ consul debug -interval=15s -duration=1m
...
```