mirror of https://github.com/status-im/consul.git
Updating Stopping Agent Section (#8016)
Fixes #6935 to clarify agent behavior.
This commit is contained in:
parent
b5dc84757d
commit
66ee9c3bb2
|
@ -15,23 +15,28 @@ information, registers services, runs checks, responds to queries,
|
||||||
and more. The agent must run on every node that is part of a Consul cluster.
|
and more. The agent must run on every node that is part of a Consul cluster.
|
||||||
|
|
||||||
Any agent may run in one of two modes: client or server. A server
|
Any agent may run in one of two modes: client or server. A server
|
||||||
node takes on the additional responsibility of being part of the [consensus quorum](/docs/internals/consensus).
|
node takes on the additional responsibility of being part of the
|
||||||
|
[consensus quorum](/docs/internals/consensus).
|
||||||
These nodes take part in Raft and provide strong consistency and availability in
|
These nodes take part in Raft and provide strong consistency and availability in
|
||||||
the case of failure. The higher burden on the server nodes means that usually they
|
the case of failure. The higher burden on the server nodes means that usually
|
||||||
should be run on dedicated instances -- they are more resource intensive than a client
|
they should be run on dedicated instances -- they are more resource intensive
|
||||||
node. Client nodes make up the majority of the cluster, and they are very lightweight
|
than a client node. Client nodes make up the majority of the cluster, and they
|
||||||
as they interface with the server nodes for most operations and maintain very little state
|
are very lightweight as they interface with the server nodes for most
|
||||||
of their own.
|
operations and maintain very little state of their own.
|
||||||
|
|
||||||
## Running an Agent
|
## Running an Agent
|
||||||
|
|
||||||
The agent is started with the [`consul agent`](/docs/commands/agent) command. This
|
The agent is started with the [`consul agent`](/docs/commands/agent) command.
|
||||||
command blocks, running forever or until told to quit. You can test a local agent by following the [Getting Started guides](https://learn.hashicorp.com/consul/getting-started/install?utm_source=consul.io&utm_medium=docs).
|
This command blocks, running forever or until told to quit. You can test a
|
||||||
|
local agent by following the
|
||||||
|
[Getting Started guides](https://learn.hashicorp.com/consul/getting-started/install?utm_source=consul.io&utm_medium=docs).
|
||||||
|
|
||||||
The agent command takes a variety
|
The agent command takes a variety of
|
||||||
of [`configuration options`](/docs/agent/options#command-line-options), but most have sane defaults.
|
[`configuration options`](/docs/agent/options#command-line-options), but most
|
||||||
|
have sane defaults.
|
||||||
|
|
||||||
When running [`consul agent`](/docs/commands/agent), you should see output similar to this:
|
When running [`consul agent`](/docs/commands/agent), you should see output
|
||||||
|
similar to this:
|
||||||
|
|
||||||
```shell-session
|
```shell-session
|
||||||
$ consul agent -data-dir=/tmp/consul
|
$ consul agent -data-dir=/tmp/consul
|
||||||
|
@ -49,33 +54,38 @@ $ consul agent -data-dir=/tmp/consul
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
There are several important messages that [`consul agent`](/docs/commands/agent) outputs:
|
There are several important messages that
|
||||||
|
[`consul agent`](/docs/commands/agent) outputs:
|
||||||
|
|
||||||
- **Node name**: This is a unique name for the agent. By default, this
|
- **Node name**: This is a unique name for the agent. By default, this
|
||||||
is the hostname of the machine, but you may customize it using the
|
is the hostname of the machine, but you may customize it using the
|
||||||
[`-node`](/docs/agent/options#_node) flag.
|
[`-node`](/docs/agent/options#_node) flag.
|
||||||
|
|
||||||
- **Datacenter**: This is the datacenter in which the agent is configured to run.
|
- **Datacenter**: This is the datacenter in which the agent is configured to
|
||||||
Consul has first-class support for multiple datacenters; however, to work efficiently,
|
run.
|
||||||
each node must be configured to report its datacenter. The [`-datacenter`](/docs/agent/options#_datacenter)
|
Consul has first-class support for multiple datacenters; however, to work
|
||||||
flag can be used to set the datacenter. For single-DC configurations, the agent
|
efficiently, each node must be configured to report its datacenter. The
|
||||||
will default to "dc1".
|
[`-datacenter`](/docs/agent/options#_datacenter) flag can be used to set the
|
||||||
|
datacenter. For single-DC configurations, the agent will default to "dc1".
|
||||||
|
|
||||||
- **Server**: This indicates whether the agent is running in server or client mode.
|
- **Server**: This indicates whether the agent is running in server or client
|
||||||
|
mode.
|
||||||
Server nodes have the extra burden of participating in the consensus quorum,
|
Server nodes have the extra burden of participating in the consensus quorum,
|
||||||
storing cluster state, and handling queries. Additionally, a server may be
|
storing cluster state, and handling queries. Additionally, a server may be
|
||||||
in ["bootstrap"](/docs/agent/options#_bootstrap_expect) mode. Multiple servers
|
in ["bootstrap"](/docs/agent/options#_bootstrap_expect) mode. Multiple servers
|
||||||
cannot be in bootstrap mode as that would put the cluster in an inconsistent state.
|
cannot be in bootstrap mode as that would put the cluster in an inconsistent
|
||||||
|
state.
|
||||||
|
|
||||||
- **Client Addr**: This is the address used for client interfaces to the agent.
|
- **Client Addr**: This is the address used for client interfaces to the agent.
|
||||||
This includes the ports for the HTTP and DNS interfaces. By default, this binds only
|
This includes the ports for the HTTP and DNS interfaces. By default, this
|
||||||
to localhost. If you change this address or port, you'll have to specify a `-http-addr`
|
binds only to localhost. If you change this address or port, you'll have to
|
||||||
whenever you run commands such as [`consul members`](/docs/commands/members) to
|
specify a `-http-addr` whenever you run commands such as
|
||||||
indicate how to reach the agent. Other applications can also use the HTTP address and port
|
[`consul members`](/docs/commands/members) to indicate how to reach the
|
||||||
|
agent. Other applications can also use the HTTP address and port
|
||||||
[to control Consul](/api).
|
[to control Consul](/api).
|
||||||
|
|
||||||
- **Cluster Addr**: This is the address and set of ports used for communication between
|
- **Cluster Addr**: This is the address and set of ports used for communication
|
||||||
Consul agents in a cluster. Not all Consul agents in a cluster have to
|
between Consul agents in a cluster. Not all Consul agents in a cluster have to
|
||||||
use the same port, but this address **MUST** be reachable by all other nodes.
|
use the same port, but this address **MUST** be reachable by all other nodes.
|
||||||
|
|
||||||
When running under `systemd` on Linux, Consul notifies systemd by sending
|
When running under `systemd` on Linux, Consul notifies systemd by sending
|
||||||
|
@ -85,44 +95,62 @@ service definition file has to have `Type=notify` set.
|
||||||
|
|
||||||
## Stopping an Agent
|
## Stopping an Agent
|
||||||
|
|
||||||
An agent can be stopped in two ways: gracefully or forcefully. To gracefully
|
An agent can be stopped in two ways: gracefully or forcefully. Servers and
|
||||||
halt an agent, send the process an interrupt signal (usually
|
Clients both behave differently depending on the leave that is performed. There
|
||||||
`Ctrl-C` from a terminal or running `kill -INT consul_pid` ). When gracefully exiting, the agent first notifies
|
are two potential states a process can be in after a system signal is sent:
|
||||||
the cluster it intends to leave the cluster. This way, other cluster members
|
_left_ and _failed_.
|
||||||
notify the cluster that the node has _left_.
|
|
||||||
|
|
||||||
Alternatively, you can force kill the agent by sending it a kill signal.
|
To gracefully halt an agent, send the process an _interrupt signal_ (usually
|
||||||
When force killed, the agent ends immediately. The rest of the cluster will
|
`Ctrl-C` from a terminal, or running `kill -INT consul_pid` ). For more
|
||||||
eventually (usually within seconds) detect that the node has died and
|
information on different signals sent by the `kill` command, see
|
||||||
notify the cluster that the node has _failed_.
|
[here](https://www.linux.org/threads/kill-signals-and-commands-revised.11625/)
|
||||||
|
|
||||||
It is especially important that a server node be allowed to leave gracefully
|
When a Client is gracefully exited, the agent first notifies the cluster it
|
||||||
so that there will be a minimal impact on availability as the server leaves
|
intends to leave the cluster. This way, other cluster members notify the
|
||||||
the consensus quorum.
|
cluster that the node has _left_.
|
||||||
|
|
||||||
|
When a Server is gracefully exited, the server will not be marked as _left_.
|
||||||
|
This is to minimally impact the consensus quorum. Instead, the Server will be
|
||||||
|
marked as _failed_. To remove a server from the cluster, the
|
||||||
|
[`force-leave`](/docs/commands/force-leave) command is used. Using
|
||||||
|
`force-leave` will put the server instance in a _left_ state so long as the
|
||||||
|
Server agent is not alive.
|
||||||
|
|
||||||
|
Alternatively, you can forcibly stop an agent by sending it a
|
||||||
|
`kill -KILL consul_pid` signal. This will stop any agent immediately. The rest
|
||||||
|
of the cluster will eventually (usually within seconds) detect that the node has
|
||||||
|
died and notify the cluster that the node has _failed_.
|
||||||
|
|
||||||
For client agents, the difference between a node _failing_ and a node _leaving_
|
For client agents, the difference between a node _failing_ and a node _leaving_
|
||||||
may not be important for your use case. For example, for a web server and load
|
may not be important for your use case. For example, for a web server and load
|
||||||
balancer setup, both result in the same outcome: the web node is removed
|
balancer setup, both result in the same outcome: the web node is removed
|
||||||
from the load balancer pool.
|
from the load balancer pool.
|
||||||
|
|
||||||
|
The [`skip_leave_on_interrupt`](/docs/agent/options#skip_leave_on_interrupt) and
|
||||||
|
[`leave_on_terminate`](/docs/agent/options#leave_on_terminate) configuration
|
||||||
|
options allow you to adjust this behavior.
|
||||||
|
|
||||||
## Lifecycle
|
## Lifecycle
|
||||||
|
|
||||||
Every agent in the Consul cluster goes through a lifecycle. Understanding
|
Every agent in the Consul cluster goes through a lifecycle. Understanding
|
||||||
this lifecycle is useful for building a mental model of an agent's interactions
|
this lifecycle is useful for building a mental model of an agent's interactions
|
||||||
with a cluster and how the cluster treats a node.
|
with a cluster and how the cluster treats a node.
|
||||||
|
|
||||||
When an agent is first started, it does not know about any other node in the cluster.
|
When an agent is first started, it does not know about any other node in the
|
||||||
|
cluster.
|
||||||
To discover its peers, it must _join_ the cluster. This is done with the
|
To discover its peers, it must _join_ the cluster. This is done with the
|
||||||
[`join`](/docs/commands/join)
|
[`join`](/docs/commands/join)
|
||||||
command or by providing the proper configuration to auto-join on start. Once a node
|
command or by providing the proper configuration to auto-join on start. Once a
|
||||||
joins, this information is gossiped to the entire cluster, meaning all nodes will
|
node joins, this information is gossiped to the entire cluster, meaning all
|
||||||
eventually be aware of each other. If the agent is a server, existing servers will
|
nodes will eventually be aware of each other. If the agent is a server,
|
||||||
begin replicating to the new node.
|
existing servers will begin replicating to the new node.
|
||||||
|
|
||||||
In the case of a network failure, some nodes may be unreachable by other nodes.
|
In the case of a network failure, some nodes may be unreachable by other nodes.
|
||||||
In this case, unreachable nodes are marked as _failed_. It is impossible to distinguish
|
In this case, unreachable nodes are marked as _failed_. It is impossible to
|
||||||
between a network failure and an agent crash, so both cases are handled the same.
|
distinguish between a network failure and an agent crash, so both cases are
|
||||||
Once a node is marked as failed, this information is updated in the service catalog.
|
handled the same.
|
||||||
|
Once a node is marked as failed, this information is updated in the service
|
||||||
|
catalog.
|
||||||
|
|
||||||
-> **Note:** There is some nuance here since this update is only possible if the servers can still [form a quorum](/docs/internals/consensus). Once the network recovers or a crashed agent restarts the cluster will repair itself and unmark a node as failed. The health check in the catalog will also be updated to reflect this.
|
-> **Note:** There is some nuance here since this update is only possible if the servers can still [form a quorum](/docs/internals/consensus). Once the network recovers or a crashed agent restarts the cluster will repair itself and unmark a node as failed. The health check in the catalog will also be updated to reflect this.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue