From af786b0ab5bb6ddea504032e09213e2b46e2bc7c Mon Sep 17 00:00:00 2001 From: Mitchell Hashimoto Date: Mon, 14 Apr 2014 14:00:53 -0700 Subject: [PATCH] website: health checks page --- .../getting-started/checks.html.markdown | 77 ++++++++++--------- .../intro/getting-started/join.html.markdown | 1 - 2 files changed, 42 insertions(+), 36 deletions(-) diff --git a/website/source/intro/getting-started/checks.html.markdown b/website/source/intro/getting-started/checks.html.markdown index fcc1eebaed..439becb119 100644 --- a/website/source/intro/getting-started/checks.html.markdown +++ b/website/source/intro/getting-started/checks.html.markdown @@ -4,33 +4,47 @@ page_title: "Registering Health Checks" sidebar_current: "gettingstarted-checks" --- -# Registering Health Checks +# Health Checks -We've already seen how simple registering a service is. In this section we will -continue by adding both a service level health check, as well as a host level -health check. +We've now seen how simple it is to run Consul, add nodes and services, and +query those nodes and services. In this section we will continue by adding +health checks to both nodes and services, a critical component of service +discovery that prevents using services that are unhealthy. + +This page will build upon the previous page and assumes you have a +two node cluster running. ## Defining Checks Similarly to a service, a check can be registered either by providing a -[check definition](/docs/agent/checks.html), or by making the appropriate calls to the -[HTTP API](/docs/agent/http.html). We will use a simple check definition to get started. -On the second node, we start by adding some additional configuration: +[check definition](/docs/agent/checks.html) +, or by making the appropriate calls to the +[HTTP API](/docs/agent/http.html). + +We will use the check definition, because just like services, definitions +are the most common way to setup checks. + +Create two definition files in the Consul configuration directory of +the second node. +The first file will add a host-level check, and the second will modify the web +service definition to add a service-level check. ``` -$ echo '{"check": {"name": "ping", "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' | sudo tee /etc/consul/ping.json +$ echo '{"check": {"name": "ping", "script": "ping -c1 google.com >/dev/null", "interval": "30s"}}' >/etc/consul.d/ping.json $ echo '{"service": {"name": "web", "tags": ["rails"], "port": 80, - "check": {"script": "curl localhost:80 >/dev/null 2>&1", "interval": "10s"}}}' | sudo tee /etc/consul/web.json + "check": {"script": "curl localhost:80 >/dev/null 2>&1", "interval": "10s"}}}' >/etc/consul.d/web.json ``` -The first command adds a "ping" check. This check runs on a 30 second interval, invoking -the "ping -c1 google.com" command. The second command is modifying our previous definition of -the `web` service to include a check. This check uses curl every 10 seconds to verify that -our web server is running. +The first definition adds a host-level check named "ping". This check runs +on a 30 second interval, invoking `ping -c1 google.com`. If the command +exits with a non-zero exit code, then the node will be flagged unhealthy. -We now restart the second agent, with the same parameters as before. We should now see the following -log lines: +The second command modifies the web service and adds a check that uses +curl every 10 seconds to verify that the web server is running. + +Restart the second agent, or send a `SIGHUP` to it. We should now see the +following log lines: ``` ==> Starting Consul agent... @@ -41,43 +55,36 @@ log lines: [WARN] Check 'service:web' is now critical ``` -The first few log lines indicate that the agent has synced the new checks and service updates -with the Consul servers. The last line indicates that the check we added for the `web` service -is critical. This is because we are not actually running a web server and the curl test -we've added is failing! +The first few log lines indicate that the agent has synced the new +definitions. The last line indicates that the check we added for +the `web` service is critical. This is because we're not actually running +a web server and the curl test is failing! ## Checking Health Status -Now that we've added some simple checks, we can use the HTTP API to check them. First, -we can look for any failing checks: +Now that we've added some simple checks, we can use the HTTP API to check +them. First, we can look for any failing checks. You can run this curl +on either node: ``` $ curl http://localhost:8500/v1/health/state/critical [{"Node":"agent-two","CheckID":"service:web","Name":"Service 'web' check","Status":"critical","Notes":"","ServiceID":"web","ServiceName":"web"}] ``` -We can see that there is only a single check in the `critical` state, which is our -`web` service check. If we try to perform a DNS lookup for the service, we will see that -we don't get any results: +We can see that there is only a single check in the `critical` state, which is +our `web` service check. + +Additionally, we can attempt to query the web service using DNS. Consul +will not return any results, since the service is unhealthy: ``` dig @127.0.0.1 -p 8600 web.service.consul - -; <<>> DiG 9.8.1-P1 <<>> @127.0.0.1 -p 8600 web.service.consul -; (1 server found) -;; global options: +cmd -;; Got answer: -;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35753 -;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 -;; WARNING: recursion requested but not available +... ;; QUESTION SECTION: ;web.service.consul. IN A ``` -The DNS interface uses the health information and avoids routing to nodes that -are failing their health checks. This is all managed for us automatically. - This section should have shown that checks can be easily added. Check definitions can be updated by changing configuration files and sending a `SIGHUP` to the agent. Alternatively the HTTP API can be used to add, remove and modify checks dynamically. diff --git a/website/source/intro/getting-started/join.html.markdown b/website/source/intro/getting-started/join.html.markdown index a27f186eda..8df742852f 100644 --- a/website/source/intro/getting-started/join.html.markdown +++ b/website/source/intro/getting-started/join.html.markdown @@ -119,4 +119,3 @@ To leave the cluster, you can either gracefully quit an agent (using the node to transition into the _left_ state, otherwise other nodes will detect it as having _failed_. The difference is covered in more detail [here](/intro/getting-started/agent.html#toc_3). -