Website: GH-730 and cleanup for docs/guides/outage.html

2015-02-28 23:36:25 -05:00 · 2015-02-28 23:36:25 -05:00 · c1e4eb2f2c
parent 53ee3ffba2
commit c1e4eb2f2c
1 changed files with 35 additions and 25 deletions
--- a/website/source/docs/guides/outage.html.markdown
+++ b/website/source/docs/guides/outage.html.markdown
@ -3,38 +3,47 @@ layout: "docs"
 page_title: "Outage Recovery"
 sidebar_current: "docs-guides-outage"
 description: |-
-  Do not panic! This is a critical first step. Depending on your deployment configuration, it may take only a single server failure for cluster unavailability. Recovery requires an operator to intervene, but is straightforward.
+  Don't panic! This is a critical first step. Depending on your deployment configuration, it may take only a single server failure for cluster unavailability. Recovery requires an operator to intervene, but recovery is straightforward.
 ---
 # Outage Recovery
-Do not panic! This is a critical first step. Depending on your
+Don't panic! This is a critical first step. Depending on your
 [deployment configuration](/docs/internals/consensus.html#toc_4), it may
 take only a single server failure for cluster unavailability. Recovery
-requires an operator to intervene, but is straightforward.
+requires an operator to intervene, but the process is straightforward.
 ~>  This page covers recovery from Consul becoming unavailable due to a majority
 of server nodes in a datacenter being lost. If you are just looking to
-add or remove a server [see this page](/docs/guides/servers.html).
+add or remove a server, [see this guide](/docs/guides/servers.html).
 ## Failure of a Single Server Cluster
 If you had only a single server and it has failed, simply restart it.
-Note that a single server configuration requires the `-bootstrap` or
+Note that a single server configuration requires the
-`-bootstrap-expect 1` flag. If that server cannot be recovered, you need to
+[`-bootstrap`](/docs/agent/options.html#_bootstrap) or
-bring up a new server.
+[`-bootstrap-expect 1`](/docs/agent/options.html#_bootstrap_expect) flag. If
-See the [bootstrapping guide](/docs/guides/bootstrapping.html). Data loss
+the server cannot be recovered, you need to bring up a new server.
-is inevitable, since data was not replicated to any other servers. This
+See the [bootstrapping guide](/docs/guides/bootstrapping.html) for more detail.
 is why a single server deploy is never recommended. Any services registered
 with agents will be re-populated when the new server comes online, as
 agents perform anti-entropy.
-In a multi-server deploy, there are at least N remaining servers. The first step
+In the case of an unrecoverable server failure in a single server cluster, data
-is to simply stop all the servers. You can attempt a graceful leave, but
+loss is inevitable since data was not replicated to any other servers. This is
-it will not work in most cases. Do not worry if the leave exits with an
+why a single server deploy is never recommended.
 error, since the cluster is in an unhealthy state.
-The next step is to go to the `-data-dir` of each Consul server. Inside
+Any services registered with agents will be re-populated when the new server
-that directory, there will be a `raft/` sub-directory. We need to edit
+comes online as agents perform anti-entropy.
-the `raft/peers.json` file. It should be something like:
+
 ## Failure of a Server in a Multi-Server Cluster
 In a multi-server deploy, there are at least N remaining servers. The first
 step is to simply stop all the servers. You can attempt a graceful leave,
 but it will not work in most cases. Do not worry if the leave exits with an
 error. The cluster is in an unhealthy state, so this is expected.
 The next step is to go to the [`-data-dir`](/docs/agent/options.html#_data_dir)
 of each Consul server. Inside that directory, there will be a `raft/`
 sub-directory. We need to edit the `raft/peers.json` file. It should look
 something like:
 ```javascript
 [
@ -45,29 +54,30 @@ the `raft/peers.json` file. It should be something like:
 ```
 Simply delete the entries for all the failed servers. You must confirm
-those servers have indeed failed, and will not later rejoin the cluster.
+those servers have indeed failed and will not later rejoin the cluster.
 Ensure that this file is the same across all remaining server nodes.
 At this point, you can restart all the remaining servers. If any servers
 managed to perform a graceful leave, you may need to have then rejoin
-the cluster using the `join` command:
+the cluster using the [`join`](/docs/commands/join.html) command:
 ```text
 $ consul join <Node Address>
 Successfully joined cluster by contacting 1 nodes.
 ```
-It should be noted that any existing member can be used to rejoin the cluster,
+It should be noted that any existing member can be used to rejoin the cluster
 as the gossip protocol will take care of discovering the server nodes.
-At this point the cluster should be in an operable state again. One of the
+At this point, the cluster should be in an operable state again. One of the
 nodes should claim leadership and emit a log like:
 ```text
 [INFO] consul: cluster leadership acquired
 ```
-Additional, the `info` command can be a useful debugging tool:
+Additional, the [`info`](/docs/commands/info.html) command can be a useful
 debugging tool:
 ```text
 $ consul info
@ -86,7 +96,7 @@ raft:
 ...
 ```
-You should verify that one server claims to be the `Leader`, and all the
+You should verify that one server claims to be the `Leader` and all the
 others should be in the `Follower` state. All the nodes should agree on the
 peer count as well. This count is (N-1), since a server does not count itself
 as a peer.