mirror of https://github.com/status-im/consul.git
website: health checks page
This commit is contained in:
parent
11d09948cb
commit
af786b0ab5
|
@ -4,33 +4,47 @@ page_title: "Registering Health Checks"
|
||||||
sidebar_current: "gettingstarted-checks"
|
sidebar_current: "gettingstarted-checks"
|
||||||
---
|
---
|
||||||
|
|
||||||
# Registering Health Checks
|
# Health Checks
|
||||||
|
|
||||||
We've already seen how simple registering a service is. In this section we will
|
We've now seen how simple it is to run Consul, add nodes and services, and
|
||||||
continue by adding both a service level health check, as well as a host level
|
query those nodes and services. In this section we will continue by adding
|
||||||
health check.
|
health checks to both nodes and services, a critical component of service
|
||||||
|
discovery that prevents using services that are unhealthy.
|
||||||
|
|
||||||
|
This page will build upon the previous page and assumes you have a
|
||||||
|
two node cluster running.
|
||||||
|
|
||||||
## Defining Checks
|
## Defining Checks
|
||||||
|
|
||||||
Similarly to a service, a check can be registered either by providing a
|
Similarly to a service, a check can be registered either by providing a
|
||||||
[check definition](/docs/agent/checks.html), or by making the appropriate calls to the
|
[check definition](/docs/agent/checks.html)
|
||||||
[HTTP API](/docs/agent/http.html). We will use a simple check definition to get started.
|
, or by making the appropriate calls to the
|
||||||
On the second node, we start by adding some additional configuration:
|
[HTTP API](/docs/agent/http.html).
|
||||||
|
|
||||||
|
We will use the check definition, because just like services, definitions
|
||||||
|
are the most common way to setup checks.
|
||||||
|
|
||||||
|
Create two definition files in the Consul configuration directory of
|
||||||
|
the second node.
|
||||||
|
The first file will add a host-level check, and the second will modify the web
|
||||||
|
service definition to add a service-level check.
|
||||||
|
|
||||||
```
|
```
|
||||||
$ echo '{"check": {"name": "ping", "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' | sudo tee /etc/consul/ping.json
|
$ echo '{"check": {"name": "ping", "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' >/etc/consul.d/ping.json
|
||||||
|
|
||||||
$ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80,
|
$ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80,
|
||||||
"check": {"script": "curl localhost:80 >/dev/null 2>&1", "interval": "10s"}}}' | sudo tee /etc/consul/web.json
|
"check": {"script": "curl localhost:80 >/dev/null 2>&1", "interval": "10s"}}}' >/etc/consul.d/web.json
|
||||||
```
|
```
|
||||||
|
|
||||||
The first command adds a "ping" check. This check runs on a 30 second interval, invoking
|
The first definition adds a host-level check named "ping". This check runs
|
||||||
the "ping -c1 google.com" command. The second command is modifying our previous definition of
|
on a 30 second interval, invoking `ping -c1 google.com`. If the command
|
||||||
the `web` service to include a check. This check uses curl every 10 seconds to verify that
|
exits with a non-zero exit code, then the node will be flagged unhealthy.
|
||||||
our web server is running.
|
|
||||||
|
|
||||||
We now restart the second agent, with the same parameters as before. We should now see the following
|
The second command modifies the web service and adds a check that uses
|
||||||
log lines:
|
curl every 10 seconds to verify that the web server is running.
|
||||||
|
|
||||||
|
Restart the second agent, or send a `SIGHUP` to it. We should now see the
|
||||||
|
following log lines:
|
||||||
|
|
||||||
```
|
```
|
||||||
==> Starting Consul agent...
|
==> Starting Consul agent...
|
||||||
|
@ -41,43 +55,36 @@ log lines:
|
||||||
[WARN] Check 'service:web' is now critical
|
[WARN] Check 'service:web' is now critical
|
||||||
```
|
```
|
||||||
|
|
||||||
The first few log lines indicate that the agent has synced the new checks and service updates
|
The first few log lines indicate that the agent has synced the new
|
||||||
with the Consul servers. The last line indicates that the check we added for the `web` service
|
definitions. The last line indicates that the check we added for
|
||||||
is critical. This is because we are not actually running a web server and the curl test
|
the `web` service is critical. This is because we're not actually running
|
||||||
we've added is failing!
|
a web server and the curl test is failing!
|
||||||
|
|
||||||
## Checking Health Status
|
## Checking Health Status
|
||||||
|
|
||||||
Now that we've added some simple checks, we can use the HTTP API to check them. First,
|
Now that we've added some simple checks, we can use the HTTP API to check
|
||||||
we can look for any failing checks:
|
them. First, we can look for any failing checks. You can run this curl
|
||||||
|
on either node:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ curl http://localhost:8500/v1/health/state/critical
|
$ curl http://localhost:8500/v1/health/state/critical
|
||||||
[{"Node":"agent-two","CheckID":"service:web","Name":"Service 'web' check","Status":"critical","Notes":"","ServiceID":"web","ServiceName":"web"}]
|
[{"Node":"agent-two","CheckID":"service:web","Name":"Service 'web' check","Status":"critical","Notes":"","ServiceID":"web","ServiceName":"web"}]
|
||||||
```
|
```
|
||||||
|
|
||||||
We can see that there is only a single check in the `critical` state, which is our
|
We can see that there is only a single check in the `critical` state, which is
|
||||||
`web` service check. If we try to perform a DNS lookup for the service, we will see that
|
our `web` service check.
|
||||||
we don't get any results:
|
|
||||||
|
Additionally, we can attempt to query the web service using DNS. Consul
|
||||||
|
will not return any results, since the service is unhealthy:
|
||||||
|
|
||||||
```
|
```
|
||||||
dig @127.0.0.1 -p 8600 web.service.consul
|
dig @127.0.0.1 -p 8600 web.service.consul
|
||||||
|
...
|
||||||
; <<>> DiG 9.8.1-P1 <<>> @127.0.0.1 -p 8600 web.service.consul
|
|
||||||
; (1 server found)
|
|
||||||
;; global options: +cmd
|
|
||||||
;; Got answer:
|
|
||||||
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35753
|
|
||||||
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
|
|
||||||
;; WARNING: recursion requested but not available
|
|
||||||
|
|
||||||
;; QUESTION SECTION:
|
;; QUESTION SECTION:
|
||||||
;web.service.consul. IN A
|
;web.service.consul. IN A
|
||||||
```
|
```
|
||||||
|
|
||||||
The DNS interface uses the health information and avoids routing to nodes that
|
|
||||||
are failing their health checks. This is all managed for us automatically.
|
|
||||||
|
|
||||||
This section should have shown that checks can be easily added. Check definitions
|
This section should have shown that checks can be easily added. Check definitions
|
||||||
can be updated by changing configuration files and sending a `SIGHUP` to the agent.
|
can be updated by changing configuration files and sending a `SIGHUP` to the agent.
|
||||||
Alternatively the HTTP API can be used to add, remove and modify checks dynamically.
|
Alternatively the HTTP API can be used to add, remove and modify checks dynamically.
|
||||||
|
|
|
@ -119,4 +119,3 @@ To leave the cluster, you can either gracefully quit an agent (using
|
||||||
the node to transition into the _left_ state, otherwise other nodes
|
the node to transition into the _left_ state, otherwise other nodes
|
||||||
will detect it as having _failed_. The difference is covered
|
will detect it as having _failed_. The difference is covered
|
||||||
in more detail [here](/intro/getting-started/agent.html#toc_3).
|
in more detail [here](/intro/getting-started/agent.html#toc_3).
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue