Updates to DNS Caching Guide (#5001)

* Updates to DNS Caching Guide

* Spelling and grammar
This commit is contained in:
kaitlincarter-hc 2018-11-29 08:08:44 -08:00 committed by GitHub
parent c1eccfd1db
commit 7a6ebd419f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -16,47 +16,89 @@ By default, Consul serves all DNS results with a 0 TTL value. This prevents
any caching. The advantage is that each DNS lookup is always re-evaluated, any caching. The advantage is that each DNS lookup is always re-evaluated,
so the most timely information is served. However, this adds a latency hit so the most timely information is served. However, this adds a latency hit
for each lookup and can potentially exhaust the query throughput of a cluster. for each lookup and can potentially exhaust the query throughput of a cluster.
For this reason, Consul provides a number of tuning parameters that can For this reason, Consul provides a number of tuning parameters that can
customize how DNS queries are handled. customize how DNS queries are handled.
In this guide, we will review important parameters for tuning
stale reads, negative response caching, and TTL. All of the DNS config
parameters must be set in set in the agent's configuration file.
<a name="stale"></a> <a name="stale"></a>
## Stale Reads ## Stale Reads
Stale reads can be used to reduce latency and increase the throughput Stale reads can be used to reduce latency and increase the throughput
of DNS queries. The [settings](/docs/agent/options.html) used to control stale reads of DNS queries. The [settings](/docs/agent/options.html) used to control stale reads
are [`dns_config.allow_stale`](/docs/agent/options.html#allow_stale), are:
which must be set to enable stale reads, and [`dns_config.max_stale`](/docs/agent/options.html#max_stale)
which limits how stale results are allowed to be.
Since Consul 0.7.1, [`allow_stale`](/docs/agent/options.html#allow_stale) * [`dns_config.allow_stale`](/docs/agent/options.html#allow_stale) must be
is enabled by default, using a [`max_stale`](/docs/agent/options.html#max_stale) set to true to enable stale reads.
value that defaults to a near-indefinite threshold (10 years) to allow DNS queries to continue to be served in the event * [`dns_config.max_stale`](/docs/agent/options.html#max_stale) limits how stale results
are allowed to be when querying DNS.
With these two settings you can allow or prevent stale reads. Below we will discuss
the advanatages and disadvatages of both.
### Allow Stale Reads
Since Consul 0.7.1, `allow_stale` is enabled by default and uses a `max_stale`
value that defaults to a near-indefinite threshold (10 years).
This allows DNS queries to continue to be served in the event
of a long outage with no leader. A new telemetry counter has also been added at of a long outage with no leader. A new telemetry counter has also been added at
`consul.dns.stale_queries` to track when agents serve DNS queries that are stale `consul.dns.stale_queries` to track when agents serve DNS queries that are stale
by more than 5 seconds. by more than 5 seconds.
```javascript
"dns_config" {
"allow_stale" = true
"max_stale" = "87600h"
}
```
~> NOTE: The above example is the default setting. You do not need to set it explicitly.
Doing a stale read allows any Consul server to Doing a stale read allows any Consul server to
service a query, but non-leader nodes may return data that is service a query, but non-leader nodes may return data that is
out-of-date. By allowing data to be slightly stale, we get horizontal out-of-date. By allowing data to be slightly stale, we get horizontal
read scalability. Now any Consul server can service the request, so we read scalability. Now any Consul server can service the request, so we
increase throughput by the number of servers in a cluster. increase throughput by the number of servers in a cluster.
If you want to prevent ### Prevent Stale Reads
stale reads or limit how stale they can be, you can set `allow_stale`
If you want to prevent stale reads or limit how stale they can be, you can set `allow_stale`
to false or use a lower value for `max_stale`. Doing the first will ensure that to false or use a lower value for `max_stale`. Doing the first will ensure that
all reads are serviced by a [single leader node](/docs/internals/consensus.html). all reads are serviced by a [single leader node](/docs/internals/consensus.html).
The reads will then be strongly consistent but will be limited by the throughput The reads will then be strongly consistent but will be limited by the throughput
of a single node. of a single node.
```javascript
"dns_config" {
"allow_stale" = false
}
```
## Negative Response Caching ## Negative Response Caching
Some DNS clients cache negative responses - that is, Consul returning a "not Some DNS clients cache negative responses - that is, Consul returning a "not
found" style response because a service exists but there are no healthy found" style response because a service exists but there are no healthy
endpoints. What this means in practice is that cached negative responses may endpoints. In practice, this could mean that the cached negative responses may
mean that services appear "down" for longer than they are actually unavailable cause that service to appear "down" for longer than they are actually unavailable
when using DNS for service discovery. when using DNS for service discovery.
### Configure SOA
In Consul 1.3.0 and newer, it is now possible to tune SOA
responses and modify the negative TTL cache for some resolvers. It can
be achieved using the [`soa.min_ttl`](/docs/agent/options.html#soa_min_ttl)
configuration within the [`soa`](/docs/agent/options.html#soa) configuration.
```javascript
"dns_config" {
"soa" {
"min_ttl" = "60s"
}
}
```
One common example is that Windows will default to caching negative responses One common example is that Windows will default to caching negative responses
for 15 minutes. DNS forwarders may also cache negative responses, with the same for 15 minutes. DNS forwarders may also cache negative responses, with the same
effect. To avoid this problem, check the negative response cache defaults for effect. To avoid this problem, check the negative response cache defaults for
@ -65,11 +107,6 @@ client and Consul and set the cache values appropriately. In many cases
"appropriately" simply is turning negative response caching off to get the best "appropriately" simply is turning negative response caching off to get the best
recovery time when a service becomes available again. recovery time when a service becomes available again.
With versions of Consul greater than 1.3.0, it is now possible to tune SOA
responses and modify the negative TTL cache for some resolvers. It can
be achieved using the [`soa.min_ttl`](/docs/agent/options.html#soa_min_ttl)
configuration within the [`soa`](/docs/agent/options.html#soa) configuration.
<a name="ttl"></a> <a name="ttl"></a>
## TTL Values ## TTL Values
@ -78,6 +115,17 @@ TTL values reduce the number of lookups on the Consul servers and speed lookups
clients, at the cost of increasingly stale results. By default, all TTLs are zero, clients, at the cost of increasingly stale results. By default, all TTLs are zero,
preventing any caching. preventing any caching.
```javascript
{
"dns_config": {
"service_ttl" = "0s"
"node_ttl" = "0s"
}
}
```
### Enable Caching
To enable caching of node lookups (e.g. "foo.node.consul"), we can set the To enable caching of node lookups (e.g. "foo.node.consul"), we can set the
[`dns_config.node_ttl`](/docs/agent/options.html#node_ttl) value. This can be set to [`dns_config.node_ttl`](/docs/agent/options.html#node_ttl) value. This can be set to
"10s" for example, and all node lookups will serve results with a 10 second TTL. "10s" for example, and all node lookups will serve results with a 10 second TTL.
@ -108,15 +156,23 @@ a wildcard TTL and a specific TTL for a service might look like this:
``` ```
This sets all lookups to "web.service.consul" to use a 30 second TTL This sets all lookups to "web.service.consul" to use a 30 second TTL
while lookups to "db.service.consul" or "api.service.consul" will use the while lookups to "api.service.consul" will use the 5 second TTL from the wildcard.
5 second TTL from the wildcard.
All lookups matching "db*" would get a 10 seconds TTL except "db-master" All lookups matching "db*" would get a 10 seconds TTL except "db-master"
that would have a 3 seconds TTL. that would have a 3 seconds TTL.
### Prepared Queries
[Prepared Queries](/api/query.html) provide an additional [Prepared Queries](/api/query.html) provide an additional
level of control over TTL. They allow for the TTL to be defined along with level of control over TTL. They allow for the TTL to be defined along with
the query, and they can be changed on the fly by updating the query definition. the query, and they can be changed on the fly by updating the query definition.
If a TTL is not configured for a prepared query, then it will fall back to the If a TTL is not configured for a prepared query, then it will fall back to the
service-specific configuration defined in the Consul agent as described above, service-specific configuration defined in the Consul agent as described above,
and ultimately to 0 if no TTL is configured for the service in the Consul agent. and ultimately to 0 if no TTL is configured for the service in the Consul agent.
## Summary
In this guide we covered several of the parameters for tuning DNS queries. We reviewed
how to enable or disable stale reads and how to configure the amount of time when stale
reads are allowed. We also looked at the minimum TTL configuration options
for negative responses from services. Finally, we reviewed how to setup TTLs
for service lookups.