website: health checks page

This commit is contained in:
Mitchell Hashimoto 2014-04-14 14:00:53 -07:00
parent 11d09948cb
commit af786b0ab5
2 changed files with 42 additions and 36 deletions

View File

@ -4,33 +4,47 @@ page_title: "Registering Health Checks"
sidebar_current: "gettingstarted-checks" sidebar_current: "gettingstarted-checks"
--- ---
# Registering Health Checks # Health Checks
We've already seen how simple registering a service is. In this section we will We've now seen how simple it is to run Consul, add nodes and services, and
continue by adding both a service level health check, as well as a host level query those nodes and services. In this section we will continue by adding
health check. health checks to both nodes and services, a critical component of service
discovery that prevents using services that are unhealthy.
This page will build upon the previous page and assumes you have a
two node cluster running.
## Defining Checks ## Defining Checks
Similarly to a service, a check can be registered either by providing a Similarly to a service, a check can be registered either by providing a
[check definition](/docs/agent/checks.html), or by making the appropriate calls to the [check definition](/docs/agent/checks.html)
[HTTP API](/docs/agent/http.html). We will use a simple check definition to get started. , or by making the appropriate calls to the
On the second node, we start by adding some additional configuration: [HTTP API](/docs/agent/http.html).
We will use the check definition, because just like services, definitions
are the most common way to setup checks.
Create two definition files in the Consul configuration directory of
the second node.
The first file will add a host-level check, and the second will modify the web
service definition to add a service-level check.
``` ```
$ echo '{"check": {"name": "ping", "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' | sudo tee /etc/consul/ping.json $ echo '{"check": {"name": "ping", "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' >/etc/consul.d/ping.json
$ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80, $ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80,
"check": {"script": "curl localhost:80 >/dev/null 2>&1", "interval": "10s"}}}' | sudo tee /etc/consul/web.json "check": {"script": "curl localhost:80 >/dev/null 2>&1", "interval": "10s"}}}' >/etc/consul.d/web.json
``` ```
The first command adds a "ping" check. This check runs on a 30 second interval, invoking The first definition adds a host-level check named "ping". This check runs
the "ping -c1 google.com" command. The second command is modifying our previous definition of on a 30 second interval, invoking `ping -c1 google.com`. If the command
the `web` service to include a check. This check uses curl every 10 seconds to verify that exits with a non-zero exit code, then the node will be flagged unhealthy.
our web server is running.
We now restart the second agent, with the same parameters as before. We should now see the following The second command modifies the web service and adds a check that uses
log lines: curl every 10 seconds to verify that the web server is running.
Restart the second agent, or send a `SIGHUP` to it. We should now see the
following log lines:
``` ```
==> Starting Consul agent... ==> Starting Consul agent...
@ -41,43 +55,36 @@ log lines:
[WARN] Check 'service:web' is now critical [WARN] Check 'service:web' is now critical
``` ```
The first few log lines indicate that the agent has synced the new checks and service updates The first few log lines indicate that the agent has synced the new
with the Consul servers. The last line indicates that the check we added for the `web` service definitions. The last line indicates that the check we added for
is critical. This is because we are not actually running a web server and the curl test the `web` service is critical. This is because we're not actually running
we've added is failing! a web server and the curl test is failing!
## Checking Health Status ## Checking Health Status
Now that we've added some simple checks, we can use the HTTP API to check them. First, Now that we've added some simple checks, we can use the HTTP API to check
we can look for any failing checks: them. First, we can look for any failing checks. You can run this curl
on either node:
``` ```
$ curl http://localhost:8500/v1/health/state/critical $ curl http://localhost:8500/v1/health/state/critical
[{"Node":"agent-two","CheckID":"service:web","Name":"Service 'web' check","Status":"critical","Notes":"","ServiceID":"web","ServiceName":"web"}] [{"Node":"agent-two","CheckID":"service:web","Name":"Service 'web' check","Status":"critical","Notes":"","ServiceID":"web","ServiceName":"web"}]
``` ```
We can see that there is only a single check in the `critical` state, which is our We can see that there is only a single check in the `critical` state, which is
`web` service check. If we try to perform a DNS lookup for the service, we will see that our `web` service check.
we don't get any results:
Additionally, we can attempt to query the web service using DNS. Consul
will not return any results, since the service is unhealthy:
``` ```
dig @127.0.0.1 -p 8600 web.service.consul dig @127.0.0.1 -p 8600 web.service.consul
...
; <<>> DiG 9.8.1-P1 <<>> @127.0.0.1 -p 8600 web.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35753
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION: ;; QUESTION SECTION:
;web.service.consul. IN A ;web.service.consul. IN A
``` ```
The DNS interface uses the health information and avoids routing to nodes that
are failing their health checks. This is all managed for us automatically.
This section should have shown that checks can be easily added. Check definitions This section should have shown that checks can be easily added. Check definitions
can be updated by changing configuration files and sending a `SIGHUP` to the agent. can be updated by changing configuration files and sending a `SIGHUP` to the agent.
Alternatively the HTTP API can be used to add, remove and modify checks dynamically. Alternatively the HTTP API can be used to add, remove and modify checks dynamically.

View File

@ -119,4 +119,3 @@ To leave the cluster, you can either gracefully quit an agent (using
the node to transition into the _left_ state, otherwise other nodes the node to transition into the _left_ state, otherwise other nodes
will detect it as having _failed_. The difference is covered will detect it as having _failed_. The difference is covered
in more detail [here](/intro/getting-started/agent.html#toc_3). in more detail [here](/intro/getting-started/agent.html#toc_3).