consul/website/source/docs/agent/checks.html.markdown

---
layout: "docs"
page_title: "Check Definition"
sidebar_current: "docs-agent-checks"
description: |-
  One of the primary roles of the agent is management of system- and application-level health checks. A health check is considered to be application-level if it is associated with a service. A check is defined in a configuration file or added at runtime over the HTTP interface.
---

# Checks

One of the primary roles of the agent is management of system-level and application-level health
checks. A health check is considered to be application-level if it is associated with a
service. If not associated with a service, the check monitors the health of the entire node.

A check is defined in a configuration file or added at runtime over the HTTP interface.  Checks
created via the HTTP interface persist with that node.

There are three different kinds of checks:

* Script + Interval - These checks depend on invoking an external application
  that performs the health check, exits with an appropriate exit code, and potentially
  generates some output. A script is paired with an invocation interval (e.g.
  every 30 seconds). This is similar to the Nagios plugin system.

* HTTP + Interval - These checks make an HTTP `GET` request every Interval (e.g.
  every 30 seconds) to the specified URL. The status of the service depends on the HTTP response code:
  any `2xx` code is considered passing, a `429 Too Many Requests` is a warning, and anything else is a failure.
  This type of check should be preferred over a script that uses `curl` or another external process
  to check a simple HTTP operation. By default, HTTP checks will be configured
  with a request timeout equal to the check interval, with a max of 10 seconds.
  It is possible to configure a custom HTTP check timeout value by specifying
  the `timeout` field in the check definition.

* <a name="TTL"></a>Time to Live (TTL) - These checks retain their last known state for a given TTL.
  The state of the check must be updated periodically over the HTTP interface. If an
  external system fails to update the status within a given TTL, the check is
  set to the failed state. This mechanism, conceptually similar to a dead man's switch,
  relies on the application to directly report its health. For example, a healthy app
  can periodically `PUT` a status update to the HTTP endpoint; if the app fails, the TTL will
  expire and the health check enters a critical state. TTL checks also persist
  their last known status to disk. This allows the Consul agent to restore the
  last known status of the check across restarts. Persisted check status is
  valid through the end of the TTL from the time of the last check.

## Check Definition

A script check:

```javascript
{
  "check": {
    "id": "mem-util",
    "name": "Memory utilization",
    "script": "/usr/local/bin/check_mem.py",
    "interval": "10s"
  }
}
```

A HTTP check:

```javascript
{
  "check": {
    "id": "api",
    "name": "HTTP API on port 5000",
    "http": "http://localhost:5000/health",
    "interval": "10s",
    "timeout": "1s"
  }
}
```

A TTL check:

```javascript
{
  "check": {
    "id": "web-app",
    "name": "Web App Status",
    "notes": "Web app does a curl internally every 10 seconds",
    "ttl": "30s"
  }
}
```

Each type of definition must include a `name` and may optionally
provide an `id` and `notes` field. The `id` is set to the `name` if not
provided. It is required that all checks have a unique ID per node: if names
might conflict, unique IDs should be provided.

The `notes` field is opaque to Consul but can be used to provide a human-readable
description of the current state of the check. With a script check, the field is
set to any output generated by the script. Similarly, an external process updating
a TTL check via the HTTP interface can set the `notes` value.

Checks may also contain a `token` field to provide an ACL token. This token is
used for any interaction with the catalog for the check, including
[anti-entropy syncs](/docs/internals/anti-entropy.html) and deregistration.

Both script and HTTP checks must include an `interval` field. This field is
parsed by Go's `time` package, and has the following
[formatting specification](http://golang.org/pkg/time/#ParseDuration):
> A duration string is a possibly signed sequence of decimal numbers, each with
> optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m".
> Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".

To configure a check, either provide it as a `-config-file` option to the
agent or place it inside the `-config-dir` of the agent. The file must
end in the ".json" extension to be loaded by Consul. Check definitions can
also be updated by sending a `SIGHUP` to the agent. Alternatively, the
check can be registered dynamically using the [HTTP API](/docs/agent/http.html).

## Check Scripts

A check script is generally free to do anything to determine the status
of the check. The only limitations placed are that the exit codes must obey
this convention:

 * Exit code 0 - Check is passing
 * Exit code 1 - Check is warning
 * Any other code - Check is failing

This is the only convention that Consul depends on. Any output of the script
will be captured and stored in the `notes` field so that it can be viewed
by human operators.

## Initial Health Check Status

By default, when checks are registered against a Consul agent, the state is set
immediately to "critical". This is useful to prevent services from being
registered as "passing" and entering the service pool before they are confirmed
to be healthy. In certain cases, it may be desirable to specify the initial
state of a health check. This can be done by specifying the `status` field in a
health check definition, like so:

```javascript
{
  "check": {
    "id": "mem",
    "script": "/bin/check_mem",
    "interval": "10s",
    "status": "passing"
  }
}
```

The above service definition would cause the new "mem" check to be
registered with its initial state set to "passing".

## Service-bound checks

Health checks may optionally be bound to a specific service. This ensures
that the status of the health check will only affect the health status of the
given service instead of the entire node. Service-bound health checks may be
provided by adding a `service_id` field to a check configuration:

```javascript
{
  "check": {
    "id": "web-app",
    "name": "Web App Status",
    "service_id": "web-app",
    "ttl": "30s"
  }
}
```

In the above configuration, if the web-app health check begins failing, it will
only affect the availability of the web-app service. All other services
provided by the node will remain unchanged.

## Multiple Check Definitions

Multiple check definitions can be defined using the `checks` (plural)
key in your configuration file.

```javascript
{
  "checks": [
    {
      "id": "chk1",
      "name": "mem",
      "script": "/bin/check_mem",
      "interval": "5s"
    },
    {
      "id": "chk2",
      "name": "/health",
      "http": "http://localhost:5000/health",
      "interval": "15s"
    },
    {
      "id": "chk3",
      "name": "cpu",
      "script": "/bin/check_cpu",
      "interval": "10s"
    },
    ...
  ]
}
```
website: document checks and services 2014-02-18 18:05:18 -08:00			`---`
			`layout: "docs"`
			`page_title: "Check Definition"`
			`sidebar_current: "docs-agent-checks"`
Use new Markdown syntaxes and add SEO descriptions 2014-10-19 19:40:10 -04:00			`description: \|-`
Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			`One of the primary roles of the agent is management of system- and application-level health checks. A health check is considered to be application-level if it is associated with a service. A check is defined in a configuration file or added at runtime over the HTTP interface.`
website: document checks and services 2014-02-18 18:05:18 -08:00			`---`

			`# Checks`

Add a bit more detail around checks and clarify some language. 2015-01-29 16:54:36 -05:00			`One of the primary roles of the agent is management of system-level and application-level health`
Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			`checks. A health check is considered to be application-level if it is associated with a`
Add a bit more detail around checks and clarify some language. 2015-01-29 16:54:36 -05:00			`service. If not associated with a service, the check monitors the health of the entire node.`

			`A check is defined in a configuration file or added at runtime over the HTTP interface. Checks`
Make it clear that checks persist with the node, period, not just across runs of the agent but across reboots as well. 2015-01-29 17:10:15 -05:00			`created via the HTTP interface persist with that node.`
website: document checks and services 2014-02-18 18:05:18 -08:00
command/agent: Add simple HTTP check type These checks make an `HTTP GET` request every Interval to the specified URL. The status of the service depends on the HTTP Response Code. `200` is passing, `503` is warning and anything else is failing. 2015-01-09 16:43:24 -06:00			`There are three different kinds of checks:`
website: document checks and services 2014-02-18 18:05:18 -08:00
website: document http check timeout configuration 2015-02-05 23:30:08 -08:00			`* Script + Interval - These checks depend on invoking an external application`
			`that performs the health check, exits with an appropriate exit code, and potentially`
			`generates some output. A script is paired with an invocation interval (e.g.`
			`every 30 seconds). This is similar to the Nagios plugin system.`

			* HTTP + Interval - These checks make an HTTP `GET` request every Interval (e.g.
			`every 30 seconds) to the specified URL. The status of the service depends on the HTTP response code:`
			any `2xx` code is considered passing, a `429 Too Many Requests` is a warning, and anything else is a failure.
			This type of check should be preferred over a script that uses `curl` or another external process
			`to check a simple HTTP operation. By default, HTTP checks will be configured`
			`with a request timeout equal to the check interval, with a max of 10 seconds.`
			`It is possible to configure a custom HTTP check timeout value by specifying`
			the `timeout` field in the check definition.

Website: cleanup for intro/getting-started/checks.html. 2015-03-17 17:50:28 -04:00			`* <a name="TTL"></a>Time to Live (TTL) - These checks retain their last known state for a given TTL.`
website: document http check timeout configuration 2015-02-05 23:30:08 -08:00			`The state of the check must be updated periodically over the HTTP interface. If an`
			`external system fails to update the status within a given TTL, the check is`
			`set to the failed state. This mechanism, conceptually similar to a dead man's switch,`
			`relies on the application to directly report its health. For example, a healthy app`
			can periodically `PUT` a status update to the HTTP endpoint; if the app fails, the TTL will
website: document TTL check persistence 2015-06-05 17:15:57 -07:00			`expire and the health check enters a critical state. TTL checks also persist`
			`their last known status to disk. This allows the Consul agent to restore the`
			`last known status of the check across restarts. Persisted check status is`
			`valid through the end of the TTL from the time of the last check.`
website: document checks and services 2014-02-18 18:05:18 -08:00
			`## Check Definition`

Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			`A script check:`
website: document checks and services 2014-02-18 18:05:18 -08:00
Use new Markdown syntaxes and add SEO descriptions 2014-10-19 19:40:10 -04:00			```javascript
			`{`
			`"check": {`
			`"id": "mem-util",`
			`"name": "Memory utilization",`
			`"script": "/usr/local/bin/check_mem.py",`
			`"interval": "10s"`
			`}`
			`}`
			```
website: document checks and services 2014-02-18 18:05:18 -08:00
Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			`A HTTP check:`
command/agent: Add simple HTTP check type These checks make an `HTTP GET` request every Interval to the specified URL. The status of the service depends on the HTTP Response Code. `200` is passing, `503` is warning and anything else is failing. 2015-01-09 16:43:24 -06:00
			```javascript
			`{`
			`"check": {`
			`"id": "api",`
			`"name": "HTTP API on port 5000",`
			`"http": "http://localhost:5000/health",`
website: document http check timeout configuration 2015-02-05 23:30:08 -08:00			`"interval": "10s",`
			`"timeout": "1s"`
command/agent: Add simple HTTP check type These checks make an `HTTP GET` request every Interval to the specified URL. The status of the service depends on the HTTP Response Code. `200` is passing, `503` is warning and anything else is failing. 2015-01-09 16:43:24 -06:00			`}`
			`}`
			```

Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			`A TTL check:`
website: document checks and services 2014-02-18 18:05:18 -08:00
Use new Markdown syntaxes and add SEO descriptions 2014-10-19 19:40:10 -04:00			```javascript
			`{`
			`"check": {`
			`"id": "web-app",`
			`"name": "Web App Status",`
			`"notes": "Web app does a curl internally every 10 seconds",`
			`"ttl": "30s"`
			`}`
			`}`
			```
website: document checks and services 2014-02-18 18:05:18 -08:00
Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			Each type of definition must include a `name` and may optionally
website: document checks and services 2014-02-18 18:05:18 -08:00			provide an `id` and `notes` field. The `id` is set to the `name` if not
Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			`provided. It is required that all checks have a unique ID per node: if names`
			`might conflict, unique IDs should be provided.`
website: document checks and services 2014-02-18 18:05:18 -08:00
Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			The `notes` field is opaque to Consul but can be used to provide a human-readable
A bit more language cleanup to checks. 2015-01-29 17:17:02 -05:00			`description of the current state of the check. With a script check, the field is`
			`set to any output generated by the script. Similarly, an external process updating`
			a TTL check via the HTTP interface can set the `notes` value.
website: document checks and services 2014-02-18 18:05:18 -08:00
website: document service and check acl options 2015-04-28 14:26:22 -07:00			Checks may also contain a `token` field to provide an ACL token. This token is
			`used for any interaction with the catalog for the check, including`
			`[anti-entropy syncs](/docs/internals/anti-entropy.html) and deregistration.`

Clarify how intervals are parsed Current docs only show second intervals and do not specify other valid options, this commit specifically outlines how the times are parsed. 2015-06-03 12:53:09 -05:00			Both script and HTTP checks must include an `interval` field. This field is
			parsed by Go's `time` package, and has the following
			`[formatting specification](http://golang.org/pkg/time/#ParseDuration):`
			`> A duration string is a possibly signed sequence of decimal numbers, each with`
			`> optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m".`
			`> Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".`

website: document registering checks and services better. Fixes #6 2014-02-22 18:53:31 -08:00			To configure a check, either provide it as a `-config-file` option to the
Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			agent or place it inside the `-config-dir` of the agent. The file must
website: document registering checks and services better. Fixes #6 2014-02-22 18:53:31 -08:00			`end in the ".json" extension to be loaded by Consul. Check definitions can`
			also be updated by sending a `SIGHUP` to the agent. Alternatively, the
			`check can be registered dynamically using the [HTTP API](/docs/agent/http.html).`

website: working on documenting http api 2014-02-19 12:05:18 -08:00			`## Check Scripts`

			`A check script is generally free to do anything to determine the status`
Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			`of the check. The only limitations placed are that the exit codes must obey`
			`this convention:`
website: working on documenting http api 2014-02-19 12:05:18 -08:00
			`* Exit code 0 - Check is passing`
			`* Exit code 1 - Check is warning`
			`* Any other code - Check is failing`

			`This is the only convention that Consul depends on. Any output of the script`
			will be captured and stored in the `notes` field so that it can be viewed
			`by human operators.`
website: update docs for multiple checks in config 2014-10-26 13:24:23 -07:00
website: document setting initial status of health checks in config 2015-05-28 13:03:01 -07:00			`## Initial Health Check Status`

			`By default, when checks are registered against a Consul agent, the state is set`
			`immediately to "critical". This is useful to prevent services from being`
			`registered as "passing" and entering the service pool before they are confirmed`
			`to be healthy. In certain cases, it may be desirable to specify the initial`
			state of a health check. This can be done by specifying the `status` field in a
			`health check definition, like so:`

			```javascript
			`{`
			`"check": {`
			`"id": "mem",`
			`"script": "/bin/check_mem",`
			`"interval": "10s",`
			`"status": "passing"`
			`}`
			`}`
			```

			`The above service definition would cause the new "mem" check to be`
			`registered with its initial state set to "passing".`

agent: support multiple checks per service 2015-01-13 17:52:17 -08:00			`## Service-bound checks`

Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			`Health checks may optionally be bound to a specific service. This ensures`
agent: support multiple checks per service 2015-01-13 17:52:17 -08:00			`that the status of the health check will only affect the health status of the`
			`given service instead of the entire node. Service-bound health checks may be`
			provided by adding a `service_id` field to a check configuration:

			```javascript
			`{`
			`"check": {`
			`"id": "web-app",`
			`"name": "Web App Status",`
			`"service_id": "web-app",`
			`"ttl": "30s"`
			`}`
			`}`
			```

			`In the above configuration, if the web-app health check begins failing, it will`
Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			`only affect the availability of the web-app service. All other services`
			`provided by the node will remain unchanged.`
agent: support multiple checks per service 2015-01-13 17:52:17 -08:00
website: update docs for multiple checks in config 2014-10-26 13:24:23 -07:00			`## Multiple Check Definitions`

Language touch-ups for the checks docs. 2015-01-29 16:45:19 -05:00			Multiple check definitions can be defined using the `checks` (plural)
website: update docs for multiple checks in config 2014-10-26 13:24:23 -07:00			`key in your configuration file.`

			```javascript
			`{`
			`"checks": [`
			`{`
			`"id": "chk1",`
			`"name": "mem",`
			`"script": "/bin/check_mem",`
website: fix JSON in multiple checks documentation 2014-10-27 11:58:01 -07:00			`"interval": "5s"`
website: update docs for multiple checks in config 2014-10-26 13:24:23 -07:00			`},`
			`{`
			`"id": "chk2",`
command/agent: Add simple HTTP check type These checks make an `HTTP GET` request every Interval to the specified URL. The status of the service depends on the HTTP Response Code. `200` is passing, `503` is warning and anything else is failing. 2015-01-09 16:43:24 -06:00			`"name": "/health",`
			`"http": "http://localhost:5000/health",`
			`"interval": "15s"`
			`},`
			`{`
			`"id": "chk3",`
website: update docs for multiple checks in config 2014-10-26 13:24:23 -07:00			`"name": "cpu",`
			`"script": "/bin/check_cpu",`
website: fix JSON in multiple checks documentation 2014-10-27 11:58:01 -07:00			`"interval": "10s"`
website: update docs for multiple checks in config 2014-10-26 13:24:23 -07:00			`},`
			`...`
			`]`
			`}`
			```