2014-04-10 17:41:49 -07:00
|
|
|
---
|
2020-04-07 14:55:19 -04:00
|
|
|
layout: intro
|
|
|
|
page_title: Registering Health Checks
|
|
|
|
description: >-
|
|
|
|
We've now seen how simple it is to run Consul, add nodes and services, and
|
|
|
|
query those nodes and services. In this step, we will continue our tour by
|
|
|
|
adding health checks to both nodes and services. Health checks are a critical
|
|
|
|
component of service discovery that prevent using services that are unhealthy.
|
2014-04-10 17:41:49 -07:00
|
|
|
---
|
|
|
|
|
2014-04-14 14:00:53 -07:00
|
|
|
# Health Checks
|
2014-04-10 17:41:49 -07:00
|
|
|
|
2014-04-14 14:00:53 -07:00
|
|
|
We've now seen how simple it is to run Consul, add nodes and services, and
|
2015-02-20 18:10:58 -05:00
|
|
|
query those nodes and services. In this section, we will continue our tour
|
|
|
|
by adding health checks to both nodes and services. Health checks are a
|
2015-03-17 17:50:28 -04:00
|
|
|
critical component of service discovery that prevent using services that
|
2015-02-20 18:10:58 -05:00
|
|
|
are unhealthy.
|
2014-04-14 14:00:53 -07:00
|
|
|
|
2020-04-09 19:46:54 -04:00
|
|
|
This step builds upon [the Consul cluster created previously](/intro/getting-started/join).
|
2015-03-17 17:50:28 -04:00
|
|
|
At this point, you should have a two-node cluster running.
|
2014-04-10 19:06:10 -07:00
|
|
|
|
|
|
|
## Defining Checks
|
|
|
|
|
2015-02-20 18:10:58 -05:00
|
|
|
Similar to a service, a check can be registered either by providing a
|
2020-04-09 19:46:54 -04:00
|
|
|
[check definition](/docs/agent/checks) or by making the
|
|
|
|
appropriate calls to the [HTTP API](/api/health).
|
2014-04-14 14:00:53 -07:00
|
|
|
|
2015-03-17 17:50:28 -04:00
|
|
|
We will use the check definition approach because, just like with
|
|
|
|
services, definitions are the most common way to set up checks.
|
2014-04-14 14:00:53 -07:00
|
|
|
|
2018-01-03 18:32:42 -08:00
|
|
|
In Consul 0.9.0 and later the agent must be configured with
|
|
|
|
`enable_script_checks` set to true in order to enable script checks.
|
2017-12-27 17:23:29 +02:00
|
|
|
|
2015-02-20 18:16:31 -05:00
|
|
|
Create two definition files in the Consul configuration directory of
|
2015-02-20 18:10:58 -05:00
|
|
|
the second node:
|
2014-04-10 19:06:10 -07:00
|
|
|
|
2014-10-19 19:40:10 -04:00
|
|
|
```text
|
2015-07-07 11:08:22 +08:00
|
|
|
vagrant@n2:~$ echo '{"check": {"name": "ping",
|
2018-01-03 18:32:42 -08:00
|
|
|
"args": ["ping", "-c1", "google.com"], "interval": "30s"}}' \
|
2015-03-17 17:50:28 -04:00
|
|
|
>/etc/consul.d/ping.json
|
2014-04-10 19:06:10 -07:00
|
|
|
|
2015-07-07 11:08:22 +08:00
|
|
|
vagrant@n2:~$ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80,
|
2018-01-03 18:32:42 -08:00
|
|
|
"check": {"args": ["curl", "localhost"], "interval": "10s"}}}' \
|
2015-03-17 17:50:28 -04:00
|
|
|
>/etc/consul.d/web.json
|
2014-04-10 19:06:10 -07:00
|
|
|
```
|
|
|
|
|
2014-04-14 14:00:53 -07:00
|
|
|
The first definition adds a host-level check named "ping". This check runs
|
2015-06-15 21:55:05 -05:00
|
|
|
on a 30 second interval, invoking `ping -c1 google.com`. On a `script`-based
|
|
|
|
health check, the check runs as the same user that started the Consul process.
|
2018-02-11 15:30:40 +01:00
|
|
|
If the command exits with an exit code >= 2, then the check will be flagged as
|
2018-10-24 16:09:41 +02:00
|
|
|
failing and the service will be considered unhealthy. An exit code of 1 will
|
|
|
|
be considered as warning state. This is the contract for any
|
2020-04-09 19:46:54 -04:00
|
|
|
[`script`-based health check](/docs/agent/checks#check-scripts).
|
2014-04-14 14:00:53 -07:00
|
|
|
|
2015-02-20 18:10:58 -05:00
|
|
|
The second command modifies the service named `web`, adding a check that sends a
|
|
|
|
request every 10 seconds via curl to verify that the web server is accessible.
|
2018-02-11 15:30:40 +01:00
|
|
|
As with the host-level health check, if the script exits with an exit code >= 2,
|
|
|
|
the check will be flagged as failing and the service will be considered unhealthy.
|
2014-04-10 19:06:10 -07:00
|
|
|
|
2017-02-14 16:09:38 -08:00
|
|
|
Now, restart the second agent, reload it with `consul reload`, or send it a `SIGHUP` signal. You should see the
|
2014-04-14 14:00:53 -07:00
|
|
|
following log lines:
|
2014-04-10 19:06:10 -07:00
|
|
|
|
2014-10-19 19:40:10 -04:00
|
|
|
```text
|
2014-04-10 19:06:10 -07:00
|
|
|
==> Starting Consul agent...
|
|
|
|
...
|
|
|
|
[INFO] agent: Synced service 'web'
|
|
|
|
[INFO] agent: Synced check 'service:web'
|
|
|
|
[INFO] agent: Synced check 'ping'
|
|
|
|
[WARN] Check 'service:web' is now critical
|
|
|
|
```
|
|
|
|
|
2015-03-17 17:50:28 -04:00
|
|
|
The first few lines indicate that the agent has synced the new
|
2014-04-14 14:00:53 -07:00
|
|
|
definitions. The last line indicates that the check we added for
|
|
|
|
the `web` service is critical. This is because we're not actually running
|
2015-02-20 18:10:58 -05:00
|
|
|
a web server, so the curl test is failing!
|
2014-04-10 19:06:10 -07:00
|
|
|
|
|
|
|
## Checking Health Status
|
|
|
|
|
2015-02-20 18:10:58 -05:00
|
|
|
Now that we've added some simple checks, we can use the HTTP API to inspect
|
|
|
|
them. First, we can look for any failing checks using this command (note, this
|
|
|
|
can be run on either node):
|
2014-04-10 19:06:10 -07:00
|
|
|
|
2014-10-19 19:40:10 -04:00
|
|
|
```text
|
2015-03-17 17:50:28 -04:00
|
|
|
vagrant@n1:~$ curl http://localhost:8500/v1/health/state/critical
|
2017-11-23 22:16:42 +01:00
|
|
|
[{"Node":"agent-two","CheckID":"service:web","Name":"Service 'web' check","Status":"critical","Notes":"","ServiceID":"web","ServiceName":"web","ServiceTags":["rails"]}]
|
2014-04-10 19:06:10 -07:00
|
|
|
```
|
|
|
|
|
2015-02-20 18:10:58 -05:00
|
|
|
We can see that there is only a single check, our `web` service check, in the
|
|
|
|
`critical` state.
|
2014-04-14 14:00:53 -07:00
|
|
|
|
|
|
|
Additionally, we can attempt to query the web service using DNS. Consul
|
2015-02-20 18:10:58 -05:00
|
|
|
will not return any results since the service is unhealthy:
|
2014-04-10 19:06:10 -07:00
|
|
|
|
2014-10-19 19:40:10 -04:00
|
|
|
```text
|
|
|
|
dig @127.0.0.1 -p 8600 web.service.consul
|
2014-04-14 14:00:53 -07:00
|
|
|
...
|
2014-04-10 19:06:10 -07:00
|
|
|
|
|
|
|
;; QUESTION SECTION:
|
|
|
|
;web.service.consul. IN A
|
|
|
|
```
|
|
|
|
|
2015-03-17 17:50:28 -04:00
|
|
|
## Next Steps
|
|
|
|
|
2015-02-20 18:10:58 -05:00
|
|
|
In this section, you learned how easy it is to add health checks. Check definitions
|
2014-04-10 19:06:10 -07:00
|
|
|
can be updated by changing configuration files and sending a `SIGHUP` to the agent.
|
2015-02-20 18:16:31 -05:00
|
|
|
Alternatively, the HTTP API can be used to add, remove, and modify checks dynamically.
|
2015-03-17 17:50:28 -04:00
|
|
|
The API also allows for a "dead man's switch", a
|
2020-04-09 19:46:54 -04:00
|
|
|
[TTL-based check](/docs/agent/checks#TTL). TTL checks can be used to integrate an
|
2015-03-17 17:50:28 -04:00
|
|
|
application more tightly with Consul, enabling business logic to be evaluated as part
|
|
|
|
of assessing the state of the check.
|
|
|
|
|
2020-04-09 19:46:54 -04:00
|
|
|
Next, we will explore [Consul's K/V store](/intro/getting-started/kv).
|