mirror of https://github.com/status-im/consul.git
Merge pull request #2796 from hashicorp/f-autopilot-guide
Add autopilot guide to the docs
This commit is contained in:
commit
ff51b34943
|
@ -48,6 +48,7 @@ func (s *Server) autopilotLoop() {
|
|||
_, autopilotConf, err := state.AutopilotConfig()
|
||||
if err != nil {
|
||||
s.logger.Printf("[ERR] consul: error retrieving autopilot config: %s", err)
|
||||
break
|
||||
}
|
||||
|
||||
if err := s.autopilotPolicy.PromoteNonVoters(autopilotConf); err != nil {
|
||||
|
|
|
@ -558,6 +558,7 @@ Consul will not enable TLS for the HTTP API unless the `https` port has been ass
|
|||
|
||||
* <a name="autopilot"></a><a href="#autopilot">`autopilot`</a> Added in Consul 0.8, this object
|
||||
allows a number of sub-keys to be set which can configure operator-friendly settings for Consul servers.
|
||||
For more information about Autopilot, see the [Autopilot Guide](/docs/guides/autopilot.html).
|
||||
<br><br>
|
||||
The following sub-keys are available:
|
||||
|
||||
|
|
|
@ -0,0 +1,116 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Autopilot"
|
||||
sidebar_current: "docs-guides-autopilot"
|
||||
description: |-
|
||||
This guide covers how to configure and use Autopilot features.
|
||||
---
|
||||
|
||||
# Autopilot
|
||||
|
||||
Autopilot is a set of new features added in Consul 0.8 to allow for automatic
|
||||
operator-friendly management of Consul servers. It includes cleanup of dead
|
||||
servers, monitoring the of the Raft cluster, and stable server introduction.
|
||||
|
||||
To enable Autopilot features (with the exception of dead server cleanup),
|
||||
the [`raft_protocol`](/docs/agent/options.html#_raft_protocol) setting in
|
||||
the Agent configuration must be set to 3 or higher on all servers. In Consul
|
||||
0.8 this setting defaults to 2; in Consul 0.9 it will default to 3. For more
|
||||
information, see the [Version Upgrade section](/docs/upgrade-specific.html#raft_protocol)
|
||||
on Raft Protocol versions.
|
||||
|
||||
## Configuration
|
||||
|
||||
The configuration of Autopilot is loaded by the leader from the agent's
|
||||
[`autopilot`](/docs/agent/options.html#autopilot) settings when initially
|
||||
bootstrapping the cluster. After bootstrapping, the configuration can
|
||||
be viewed or modified either via the [`operator autopilot`]
|
||||
(/docs/commands/operator/autopilot.html) subcommand or the
|
||||
[`/v1/operator/autopilot/configuration`](/docs/agent/http/operator.html#autopilot-configuration)
|
||||
HTTP endpoint:
|
||||
|
||||
```
|
||||
$ consul operator autopilot get-config
|
||||
CleanupDeadServers = true
|
||||
LastContactThreshold = 200ms
|
||||
MaxTrailingLogs = 250
|
||||
ServerStabilizationTime = 10s
|
||||
|
||||
$ consul operator autopilot set-config -cleanup-dead-servers=false
|
||||
Configuration updated!
|
||||
|
||||
$ consul operator autopilot get-config
|
||||
CleanupDeadServers = false
|
||||
LastContactThreshold = 200ms
|
||||
MaxTrailingLogs = 250
|
||||
ServerStabilizationTime = 10s
|
||||
```
|
||||
|
||||
## Dead Server Cleanup
|
||||
|
||||
Dead servers will periodically be cleaned up and removed from the Raft peer
|
||||
set, to prevent them from interfering with the quorum size and leader elections.
|
||||
This cleanup will also happen whenever a new server is successfully added to the
|
||||
cluster.
|
||||
|
||||
Prior to Autopilot, it would take 72 hours for dead servers to be automatically reaped,
|
||||
or operators had to script a `consul force-leave`. If another server failure occurred,
|
||||
it could jeopardize the quorum, even if the failed Consul server had been automatically
|
||||
replaced. Autopilot helps prevent these kinds of outages by quickly removing failed
|
||||
servers as soon as a replacement Consul server comes online.
|
||||
|
||||
This option can be disabled by running `consul operator autopilot set-config`
|
||||
with the `-cleanup-dead-servers=false` option.
|
||||
|
||||
## Server Health Checking
|
||||
|
||||
An internal health check runs on the leader to track the stability of servers.
|
||||
</br>A server is considered healthy if:
|
||||
|
||||
- It has a SerfHealth status of 'Alive'
|
||||
- The time since its last contact with the current leader is below
|
||||
`LastContactThreshold`
|
||||
- Its latest Raft term matches the leader's term
|
||||
- The number of Raft log entries it trails the leader by does not exceed
|
||||
`MaxTrailingLogs`
|
||||
|
||||
The status of these health checks can be viewed through the [`/v1/operator/autopilot/health`]
|
||||
(/docs/agent/http/operator.html#autopilot-health) HTTP endpoint, with a top level
|
||||
`Healthy` field indicating the overall status of the cluster:
|
||||
|
||||
```
|
||||
$ curl localhost:8500/v1/operator/autopilot/health
|
||||
{
|
||||
"Healthy": true,
|
||||
"FailureTolerance": 0,
|
||||
"Servers": [
|
||||
{
|
||||
"ID": "e349749b-3303-3ddf-959c-b5885a0e1f6e",
|
||||
"Name": "node1",
|
||||
"SerfStatus": "alive",
|
||||
"LastContact": "0s",
|
||||
"LastTerm": 3,
|
||||
"LastIndex": 23,
|
||||
"Healthy": true,
|
||||
"StableSince": "2017-03-10T22:01:14Z"
|
||||
},
|
||||
{
|
||||
"ID": "099061c7-ea74-42d5-be04-a0ad74caaaf5",
|
||||
"Name": "node2",
|
||||
"SerfStatus": "alive",
|
||||
"LastContact": "53.279635ms",
|
||||
"LastTerm": 3,
|
||||
"LastIndex": 23,
|
||||
"Healthy": true,
|
||||
"StableSince": "2017-03-10T22:03:26Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Stable Server Introduction
|
||||
|
||||
When a new server is added to the cluster, there is a waiting period where it
|
||||
must be healthy and stable for a certain amount of time before being promoted
|
||||
to a full, voting member. This can be configured via the `ServerStabilizationTime`
|
||||
setting.
|
|
@ -33,6 +33,40 @@ and update any scripts that passed a custom `-rpc-addr` to the following command
|
|||
* `monitor`
|
||||
* `reload`
|
||||
|
||||
#### <a name="raft_protocol"></a><a href="#raft_protocol">Raft Protocol Version Compatibility</a>
|
||||
|
||||
When upgrading to Consul 0.8.0 from a version lower than 0.7.0, users will need to
|
||||
set the [`-raft-protocol`](/docs/agent/options.html#_raft_protocol) option to 1 in
|
||||
order to maintain backwards compatibility with the old servers during the upgrade.
|
||||
After the servers have been migrated to version 0.8.0, `-raft-protocol` can be moved
|
||||
up to 2 and the servers restarted to match the default.
|
||||
|
||||
The Raft protocol must be stepped up in this way; only adjacent version numbers are
|
||||
compatible (for example, version 1 cannot talk to version 3). Here is a table of the
|
||||
Raft Protocol versions supported by each Consul version:
|
||||
|
||||
<table class="table table-bordered table-striped">
|
||||
<tr>
|
||||
<th>Version</th>
|
||||
<th>Supported Raft Protocols</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>0.6 and earlier</td>
|
||||
<td>0</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>0.7</td>
|
||||
<td>1</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>0.8</td>
|
||||
<td>1, 2, 3</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
In order to enable all [Autopilot](/docs/guides/autopilot.html) features, all servers
|
||||
in a Consul cluster must be running with Raft protocol version 3 or later.
|
||||
|
||||
## Consul 0.7.1
|
||||
|
||||
#### Child Process Reaping
|
||||
|
|
|
@ -296,6 +296,10 @@
|
|||
<a href="/docs/guides/servers.html">Adding/Removing Servers</a>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-guides-autopilot") %>>
|
||||
<a href="/docs/guides/autopilot.html">Autopilot</a>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-guides-bootstrapping") %>>
|
||||
<a href="/docs/guides/bootstrapping.html">Bootstrapping</a>
|
||||
</li>
|
||||
|
|
Loading…
Reference in New Issue