Updating Stopping Agent Section (#8016)

Fixes #6935 to clarify agent behavior.
2020-06-03 17:08:49 -04:00 · 2020-06-03 17:08:49 -04:00 · 66ee9c3bb2
parent b5dc84757d
commit 66ee9c3bb2
1 changed files with 73 additions and 45 deletions
--- a/website/pages/docs/agent/index.mdx
+++ b/website/pages/docs/agent/index.mdx
@ -15,23 +15,28 @@ information, registers services, runs checks, responds to queries,
 and more. The agent must run on every node that is part of a Consul cluster.
 Any agent may run in one of two modes: client or server. A server
-node takes on the additional responsibility of being part of the [consensus quorum](/docs/internals/consensus).
+node takes on the additional responsibility of being part of the 
 [consensus quorum](/docs/internals/consensus).
 These nodes take part in Raft and provide strong consistency and availability in
-the case of failure. The higher burden on the server nodes means that usually they
+the case of failure. The higher burden on the server nodes means that usually
-should be run on dedicated instances -- they are more resource intensive than a client
+they should be run on dedicated instances -- they are more resource intensive
-node. Client nodes make up the majority of the cluster, and they are very lightweight
+than a client node. Client nodes make up the majority of the cluster, and they
-as they interface with the server nodes for most operations and maintain very little state
+are very lightweight as they interface with the server nodes for most
-of their own.
+operations and maintain very little state of their own.
 ## Running an Agent
-The agent is started with the [`consul agent`](/docs/commands/agent) command. This
+The agent is started with the [`consul agent`](/docs/commands/agent) command.
-command blocks, running forever or until told to quit. You can test a local agent by following the [Getting Started guides](https://learn.hashicorp.com/consul/getting-started/install?utm_source=consul.io&utm_medium=docs).
+This command blocks, running forever or until told to quit. You can test a 
 local agent by following the 
 [Getting Started guides](https://learn.hashicorp.com/consul/getting-started/install?utm_source=consul.io&utm_medium=docs).
-The agent command takes a variety
+The agent command takes a variety of 
-of [`configuration options`](/docs/agent/options#command-line-options), but most have sane defaults.
+[`configuration options`](/docs/agent/options#command-line-options), but most
 have sane defaults.
-When running [`consul agent`](/docs/commands/agent), you should see output similar to this:
+When running [`consul agent`](/docs/commands/agent), you should see output
 similar to this:
 ```shell-session
 $ consul agent -data-dir=/tmp/consul
@ -49,33 +54,38 @@ $ consul agent -data-dir=/tmp/consul
 ...
 ```
-There are several important messages that [`consul agent`](/docs/commands/agent) outputs:
+There are several important messages that 
 [`consul agent`](/docs/commands/agent) outputs:
 - **Node name**: This is a unique name for the agent. By default, this
  is the hostname of the machine, but you may customize it using the
  [`-node`](/docs/agent/options#_node) flag.
- **Datacenter**: This is the datacenter in which the agent is configured to run.
+- **Datacenter**: This is the datacenter in which the agent is configured to 
-  Consul has first-class support for multiple datacenters; however, to work efficiently,
+run.
-  each node must be configured to report its datacenter. The [`-datacenter`](/docs/agent/options#_datacenter)
+  Consul has first-class support for multiple datacenters; however, to work
-  flag can be used to set the datacenter. For single-DC configurations, the agent
+  efficiently, each node must be configured to report its datacenter. The 
-  will default to "dc1".
+  [`-datacenter`](/docs/agent/options#_datacenter) flag can be used to set the 
  datacenter. For single-DC configurations, the agent will default to "dc1".
- **Server**: This indicates whether the agent is running in server or client mode.
+- **Server**: This indicates whether the agent is running in server or client
 mode.
  Server nodes have the extra burden of participating in the consensus quorum,
  storing cluster state, and handling queries. Additionally, a server may be
  in ["bootstrap"](/docs/agent/options#_bootstrap_expect) mode. Multiple servers
-  cannot be in bootstrap mode as that would put the cluster in an inconsistent state.
+  cannot be in bootstrap mode as that would put the cluster in an inconsistent
  state.
 - **Client Addr**: This is the address used for client interfaces to the agent.
-  This includes the ports for the HTTP and DNS interfaces. By default, this binds only
+  This includes the ports for the HTTP and DNS interfaces. By default, this
-  to localhost. If you change this address or port, you'll have to specify a `-http-addr`
+  binds only to localhost. If you change this address or port, you'll have to
-  whenever you run commands such as [`consul members`](/docs/commands/members) to
+  specify a `-http-addr` whenever you run commands such as
-  indicate how to reach the agent. Other applications can also use the HTTP address and port
+  [`consul members`](/docs/commands/members) to indicate how to reach the
  agent. Other applications can also use the HTTP address and port
  [to control Consul](/api).
- **Cluster Addr**: This is the address and set of ports used for communication between
+- **Cluster Addr**: This is the address and set of ports used for communication
-  Consul agents in a cluster. Not all Consul agents in a cluster have to
+  between Consul agents in a cluster. Not all Consul agents in a cluster have to
  use the same port, but this address **MUST** be reachable by all other nodes.
 When running under `systemd` on Linux, Consul notifies systemd by sending
@ -85,44 +95,62 @@ service definition file has to have `Type=notify` set.
 ## Stopping an Agent
-An agent can be stopped in two ways: gracefully or forcefully. To gracefully
+An agent can be stopped in two ways: gracefully or forcefully. Servers and
-halt an agent, send the process an interrupt signal (usually
+Clients both behave differently depending on the leave that is performed. There
-`Ctrl-C` from a terminal or running `kill -INT consul_pid` ). When gracefully exiting, the agent first notifies
+are two potential states a process can be in after a system signal is sent: 
-the cluster it intends to leave the cluster. This way, other cluster members
+_left_ and _failed_.
 notify the cluster that the node has _left_.
-Alternatively, you can force kill the agent by sending it a kill signal.
+To gracefully halt an agent, send the process an _interrupt signal_ (usually
-When force killed, the agent ends immediately. The rest of the cluster will
+`Ctrl-C` from a terminal, or running `kill -INT consul_pid` ). For more
-eventually (usually within seconds) detect that the node has died and
+information on different signals sent by the `kill` command, see
-notify the cluster that the node has _failed_.
+[here](https://www.linux.org/threads/kill-signals-and-commands-revised.11625/)
-It is especially important that a server node be allowed to leave gracefully
+When a Client is gracefully exited, the agent first notifies the cluster it
-so that there will be a minimal impact on availability as the server leaves
+intends to leave the cluster. This way, other cluster members notify the
-the consensus quorum.
+cluster that the node has _left_.
 When a Server is gracefully exited, the server will not be marked as _left_.
 This is to minimally impact the consensus quorum. Instead, the Server will be
 marked as _failed_. To remove a server from the cluster, the 
 [`force-leave`](/docs/commands/force-leave) command is used. Using
 `force-leave` will put the server instance in a _left_ state so long as the
 Server agent is not alive.
 Alternatively, you can forcibly stop an agent by sending it a
 `kill -KILL consul_pid` signal. This will stop any agent immediately. The rest
 of the cluster will eventually (usually within seconds) detect that the node has
 died and notify the cluster that the node has _failed_.
 For client agents, the difference between a node _failing_ and a node _leaving_
 may not be important for your use case. For example, for a web server and load
 balancer setup, both result in the same outcome: the web node is removed
 from the load balancer pool.
 The [`skip_leave_on_interrupt`](/docs/agent/options#skip_leave_on_interrupt) and
 [`leave_on_terminate`](/docs/agent/options#leave_on_terminate) configuration
 options allow you to adjust this behavior.
 ## Lifecycle
 Every agent in the Consul cluster goes through a lifecycle. Understanding
 this lifecycle is useful for building a mental model of an agent's interactions
 with a cluster and how the cluster treats a node.
-When an agent is first started, it does not know about any other node in the cluster.
+When an agent is first started, it does not know about any other node in the
 cluster.
 To discover its peers, it must _join_ the cluster. This is done with the
 [`join`](/docs/commands/join)
-command or by providing the proper configuration to auto-join on start. Once a node
+command or by providing the proper configuration to auto-join on start. Once a
-joins, this information is gossiped to the entire cluster, meaning all nodes will
+node joins, this information is gossiped to the entire cluster, meaning all
-eventually be aware of each other. If the agent is a server, existing servers will
+nodes will eventually be aware of each other. If the agent is a server,
-begin replicating to the new node.
+existing servers will begin replicating to the new node.
 In the case of a network failure, some nodes may be unreachable by other nodes.
-In this case, unreachable nodes are marked as _failed_. It is impossible to distinguish
+In this case, unreachable nodes are marked as _failed_. It is impossible to
-between a network failure and an agent crash, so both cases are handled the same.
+distinguish between a network failure and an agent crash, so both cases are
-Once a node is marked as failed, this information is updated in the service catalog.
+handled the same.
 Once a node is marked as failed, this information is updated in the service
 catalog.
 -> **Note:** There is some nuance here since this update is only possible if the servers can still [form a quorum](/docs/internals/consensus). Once the network recovers or a crashed agent restarts the cluster will repair itself and unmark a node as failed. The health check in the catalog will also be updated to reflect this.