mirror of https://github.com/status-im/consul.git
website: health checks page
This commit is contained in:
parent
11d09948cb
commit
af786b0ab5
|
@ -4,33 +4,47 @@ page_title: "Registering Health Checks"
|
|||
sidebar_current: "gettingstarted-checks"
|
||||
---
|
||||
|
||||
# Registering Health Checks
|
||||
# Health Checks
|
||||
|
||||
We've already seen how simple registering a service is. In this section we will
|
||||
continue by adding both a service level health check, as well as a host level
|
||||
health check.
|
||||
We've now seen how simple it is to run Consul, add nodes and services, and
|
||||
query those nodes and services. In this section we will continue by adding
|
||||
health checks to both nodes and services, a critical component of service
|
||||
discovery that prevents using services that are unhealthy.
|
||||
|
||||
This page will build upon the previous page and assumes you have a
|
||||
two node cluster running.
|
||||
|
||||
## Defining Checks
|
||||
|
||||
Similarly to a service, a check can be registered either by providing a
|
||||
[check definition](/docs/agent/checks.html), or by making the appropriate calls to the
|
||||
[HTTP API](/docs/agent/http.html). We will use a simple check definition to get started.
|
||||
On the second node, we start by adding some additional configuration:
|
||||
[check definition](/docs/agent/checks.html)
|
||||
, or by making the appropriate calls to the
|
||||
[HTTP API](/docs/agent/http.html).
|
||||
|
||||
We will use the check definition, because just like services, definitions
|
||||
are the most common way to setup checks.
|
||||
|
||||
Create two definition files in the Consul configuration directory of
|
||||
the second node.
|
||||
The first file will add a host-level check, and the second will modify the web
|
||||
service definition to add a service-level check.
|
||||
|
||||
```
|
||||
$ echo '{"check": {"name": "ping", "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' | sudo tee /etc/consul/ping.json
|
||||
$ echo '{"check": {"name": "ping", "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' >/etc/consul.d/ping.json
|
||||
|
||||
$ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80,
|
||||
"check": {"script": "curl localhost:80 >/dev/null 2>&1", "interval": "10s"}}}' | sudo tee /etc/consul/web.json
|
||||
"check": {"script": "curl localhost:80 >/dev/null 2>&1", "interval": "10s"}}}' >/etc/consul.d/web.json
|
||||
```
|
||||
|
||||
The first command adds a "ping" check. This check runs on a 30 second interval, invoking
|
||||
the "ping -c1 google.com" command. The second command is modifying our previous definition of
|
||||
the `web` service to include a check. This check uses curl every 10 seconds to verify that
|
||||
our web server is running.
|
||||
The first definition adds a host-level check named "ping". This check runs
|
||||
on a 30 second interval, invoking `ping -c1 google.com`. If the command
|
||||
exits with a non-zero exit code, then the node will be flagged unhealthy.
|
||||
|
||||
We now restart the second agent, with the same parameters as before. We should now see the following
|
||||
log lines:
|
||||
The second command modifies the web service and adds a check that uses
|
||||
curl every 10 seconds to verify that the web server is running.
|
||||
|
||||
Restart the second agent, or send a `SIGHUP` to it. We should now see the
|
||||
following log lines:
|
||||
|
||||
```
|
||||
==> Starting Consul agent...
|
||||
|
@ -41,43 +55,36 @@ log lines:
|
|||
[WARN] Check 'service:web' is now critical
|
||||
```
|
||||
|
||||
The first few log lines indicate that the agent has synced the new checks and service updates
|
||||
with the Consul servers. The last line indicates that the check we added for the `web` service
|
||||
is critical. This is because we are not actually running a web server and the curl test
|
||||
we've added is failing!
|
||||
The first few log lines indicate that the agent has synced the new
|
||||
definitions. The last line indicates that the check we added for
|
||||
the `web` service is critical. This is because we're not actually running
|
||||
a web server and the curl test is failing!
|
||||
|
||||
## Checking Health Status
|
||||
|
||||
Now that we've added some simple checks, we can use the HTTP API to check them. First,
|
||||
we can look for any failing checks:
|
||||
Now that we've added some simple checks, we can use the HTTP API to check
|
||||
them. First, we can look for any failing checks. You can run this curl
|
||||
on either node:
|
||||
|
||||
```
|
||||
$ curl http://localhost:8500/v1/health/state/critical
|
||||
[{"Node":"agent-two","CheckID":"service:web","Name":"Service 'web' check","Status":"critical","Notes":"","ServiceID":"web","ServiceName":"web"}]
|
||||
```
|
||||
|
||||
We can see that there is only a single check in the `critical` state, which is our
|
||||
`web` service check. If we try to perform a DNS lookup for the service, we will see that
|
||||
we don't get any results:
|
||||
We can see that there is only a single check in the `critical` state, which is
|
||||
our `web` service check.
|
||||
|
||||
Additionally, we can attempt to query the web service using DNS. Consul
|
||||
will not return any results, since the service is unhealthy:
|
||||
|
||||
```
|
||||
dig @127.0.0.1 -p 8600 web.service.consul
|
||||
|
||||
; <<>> DiG 9.8.1-P1 <<>> @127.0.0.1 -p 8600 web.service.consul
|
||||
; (1 server found)
|
||||
;; global options: +cmd
|
||||
;; Got answer:
|
||||
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35753
|
||||
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
|
||||
;; WARNING: recursion requested but not available
|
||||
...
|
||||
|
||||
;; QUESTION SECTION:
|
||||
;web.service.consul. IN A
|
||||
```
|
||||
|
||||
The DNS interface uses the health information and avoids routing to nodes that
|
||||
are failing their health checks. This is all managed for us automatically.
|
||||
|
||||
This section should have shown that checks can be easily added. Check definitions
|
||||
can be updated by changing configuration files and sending a `SIGHUP` to the agent.
|
||||
Alternatively the HTTP API can be used to add, remove and modify checks dynamically.
|
||||
|
|
|
@ -119,4 +119,3 @@ To leave the cluster, you can either gracefully quit an agent (using
|
|||
the node to transition into the _left_ state, otherwise other nodes
|
||||
will detect it as having _failed_. The difference is covered
|
||||
in more detail [here](/intro/getting-started/agent.html#toc_3).
|
||||
|
||||
|
|
Loading…
Reference in New Issue