2014-02-18 18:05:18 -08:00
---
layout: "docs"
page_title: "Check Definition"
sidebar_current: "docs-agent-checks"
2014-10-19 19:40:10 -04:00
description: |-
2015-01-29 16:45:19 -05:00
One of the primary roles of the agent is management of system- and application-level health checks. A health check is considered to be application-level if it is associated with a service. A check is defined in a configuration file or added at runtime over the HTTP interface.
2014-02-18 18:05:18 -08:00
---
# Checks
2015-01-29 16:54:36 -05:00
One of the primary roles of the agent is management of system-level and application-level health
2015-01-29 16:45:19 -05:00
checks. A health check is considered to be application-level if it is associated with a
2015-01-29 16:54:36 -05:00
service. If not associated with a service, the check monitors the health of the entire node.
A check is defined in a configuration file or added at runtime over the HTTP interface. Checks
2015-01-29 17:10:15 -05:00
created via the HTTP interface persist with that node.
2014-02-18 18:05:18 -08:00
2015-01-09 16:43:24 -06:00
There are three different kinds of checks:
2014-02-18 18:05:18 -08:00
* Script + Interval - These checks depend on invoking an external application
2015-01-29 16:45:19 -05:00
that performs the health check, exits with an appropriate exit code, and potentially
generates some output. A script is paired with an invocation interval (e.g.
2014-02-18 18:05:18 -08:00
every 30 seconds). This is similar to the Nagios plugin system.
2015-01-29 16:45:19 -05:00
* HTTP + Interval - These checks make an HTTP `GET` request every Interval (e.g.
every 30 seconds) to the specified URL. The status of the service depends on the HTTP response code:
any `2xx` code is considered passing, a `429 Too Many Requests` is a warning, and anything else is a failure.
This type of check should be preferred over a script that uses `curl` or another external process
to check a simple HTTP operation.
2015-01-09 16:43:24 -06:00
2014-11-26 13:42:53 +00:00
* Time to Live (TTL) - These checks retain their last known state for a given TTL.
The state of the check must be updated periodically over the HTTP interface. If an
2014-02-18 18:05:18 -08:00
external system fails to update the status within a given TTL, the check is
2015-01-29 17:12:20 -05:00
set to the failed state. This mechanism, conceptually similar to a dead man's switch,
2015-01-29 17:14:19 -05:00
relies on the application to directly report its health. For example, a healthy app
2015-01-29 17:12:20 -05:00
can periodically `PUT` a status update to the HTTP endpoint; if the app fails, the TTL will
expire and the health check enters a critical state.
2014-02-18 18:05:18 -08:00
## Check Definition
2015-01-29 16:45:19 -05:00
A script check:
2014-02-18 18:05:18 -08:00
2014-10-19 19:40:10 -04:00
```javascript
{
"check": {
"id": "mem-util",
"name": "Memory utilization",
"script": "/usr/local/bin/check_mem.py",
"interval": "10s"
}
}
```
2014-02-18 18:05:18 -08:00
2015-01-29 16:45:19 -05:00
A HTTP check:
2015-01-09 16:43:24 -06:00
```javascript
{
"check": {
"id": "api",
"name": "HTTP API on port 5000",
"http": "http://localhost:5000/health",
"interval": "10s"
}
}
```
2015-01-29 16:45:19 -05:00
A TTL check:
2014-02-18 18:05:18 -08:00
2014-10-19 19:40:10 -04:00
```javascript
{
"check": {
"id": "web-app",
"name": "Web App Status",
"notes": "Web app does a curl internally every 10 seconds",
"ttl": "30s"
}
}
```
2014-02-18 18:05:18 -08:00
2015-01-29 16:45:19 -05:00
Each type of definition must include a `name` and may optionally
2014-02-18 18:05:18 -08:00
provide an `id` and `notes` field. The `id` is set to the `name` if not
2015-01-29 16:45:19 -05:00
provided. It is required that all checks have a unique ID per node: if names
might conflict, unique IDs should be provided.
2014-02-18 18:05:18 -08:00
2015-01-29 16:45:19 -05:00
The `notes` field is opaque to Consul but can be used to provide a human-readable
2015-01-29 17:17:02 -05:00
description of the current state of the check. With a script check, the field is
set to any output generated by the script. Similarly, an external process updating
a TTL check via the HTTP interface can set the `notes` value.
2014-02-18 18:05:18 -08:00
2014-02-22 18:53:31 -08:00
To configure a check, either provide it as a `-config-file` option to the
2015-01-29 16:45:19 -05:00
agent or place it inside the `-config-dir` of the agent. The file must
2014-02-22 18:53:31 -08:00
end in the ".json" extension to be loaded by Consul. Check definitions can
also be updated by sending a `SIGHUP` to the agent. Alternatively, the
check can be registered dynamically using the [HTTP API ](/docs/agent/http.html ).
2014-02-19 12:05:18 -08:00
## Check Scripts
A check script is generally free to do anything to determine the status
2015-01-29 16:45:19 -05:00
of the check. The only limitations placed are that the exit codes must obey
this convention:
2014-02-19 12:05:18 -08:00
* Exit code 0 - Check is passing
* Exit code 1 - Check is warning
* Any other code - Check is failing
This is the only convention that Consul depends on. Any output of the script
will be captured and stored in the `notes` field so that it can be viewed
by human operators.
2014-10-26 13:24:23 -07:00
2015-01-13 17:52:17 -08:00
## Service-bound checks
2015-01-29 16:45:19 -05:00
Health checks may optionally be bound to a specific service. This ensures
2015-01-13 17:52:17 -08:00
that the status of the health check will only affect the health status of the
given service instead of the entire node. Service-bound health checks may be
provided by adding a `service_id` field to a check configuration:
```javascript
{
"check": {
"id": "web-app",
"name": "Web App Status",
"service_id": "web-app",
"ttl": "30s"
}
}
```
In the above configuration, if the web-app health check begins failing, it will
2015-01-29 16:45:19 -05:00
only affect the availability of the web-app service. All other services
provided by the node will remain unchanged.
2015-01-13 17:52:17 -08:00
2014-10-26 13:24:23 -07:00
## Multiple Check Definitions
2015-01-29 16:45:19 -05:00
Multiple check definitions can be defined using the `checks` (plural)
2014-10-26 13:24:23 -07:00
key in your configuration file.
```javascript
{
"checks": [
{
"id": "chk1",
"name": "mem",
"script": "/bin/check_mem",
2014-10-27 11:58:01 -07:00
"interval": "5s"
2014-10-26 13:24:23 -07:00
},
{
"id": "chk2",
2015-01-09 16:43:24 -06:00
"name": "/health",
"http": "http://localhost:5000/health",
"interval": "15s"
},
{
"id": "chk3",
2014-10-26 13:24:23 -07:00
"name": "cpu",
"script": "/bin/check_cpu",
2014-10-27 11:58:01 -07:00
"interval": "10s"
2014-10-26 13:24:23 -07:00
},
...
]
}
```