consul/website/source/docs/upgrade-specific.html.md

921 lines
42 KiB
Markdown
Raw Normal View History

2015-02-19 19:03:02 +00:00
---
layout: "docs"
page_title: "Upgrading Specific Versions"
sidebar_current: "docs-upgrading-specific"
description: |-
Specific versions of Consul may have additional information about the upgrade process beyond the standard flow.
---
# Upgrading Specific Versions
The [upgrading page](/docs/upgrading.html) covers the details of doing a
standard upgrade. However, specific versions of Consul may have more details
provided for their upgrades as a result of new features or changed behavior.
This page is used to document those details separately from the standard
upgrade flow.
2015-02-19 19:03:02 +00:00
## Consul 1.7.0
Consul 1.7.0 contains three major changes that impact upgrades:
[stricter JSON decoding](#stricter-json-decoding), [modified DNS outputs](#dns-ptr-record-output),
and [backward-incompatible Session API changes](#session-api).
### Session API
Consul 1.7.0 introduced a backwards incompatible change to the Session API.
Queries to view or renew sessions from agents on earlier versions will be rejected.
This impacts features and products including: Vault, the Enterprise snapshot agent, and locks.
The issue occurs when clients are still running 1.6.4 or earlier but servers have been upgraded to 1.7.0 or 1.7.1.
For this reason, we recommend you upgrade directly to 1.7.2 when it is available as it will include a fix for this issue.
### Stricter JSON Decoding
The HTTP API will now return 400 status codes with a textual error when unknown fields
are present in the payload of a request. Previously, Consul would simply ignore the
unknown fields. You will need to ensure that your API usage only uses supported
fields which are those documented in the example payloads in the API documentation.
### DNS PTR Record Output
Consul will now return the canonical service name in response to PTR queries. For OSS users the
change is that the datacenter will be present where it was not before. For Consul Enterprise
users, both the datacenter and the services namespace will be present. For example, where a
PTR record would previously have contained `web.service.consul`, it will now be `web.service.dc1.consul`
in OSS or `web.service.ns1.dc1.consul` for Enterprise.
### Telemetry: semantics of `consul.rpc.query` changed, see `consul.rpc.queries_blocking`
Consul has changed the semantics of query counts in its [telemetry](/docs/agent/telemetry.html#metrics-reference).
`consul.rpc.query` now only increments on the *start* of a query (blocking or non-blocking), whereas before it would
measure when blocking queries polled for more data. The `consul.rpc.queries_blocking` gauge has been added
to more precisely capture the view of *active* blocking queries.
### Vault: default `http_max_conns_per_client` too low to run Vault properly
Consul 1.7.0 introduced [limiting of connections per client](/docs/agent/options.html#http_max_conns_per_client). The default value
was 100, but Vault could use up to 128, which caused problems. If you want to use Vault with Consul 1.7.0, you should change the value to 200.
Starting with Consul 1.7.1 this is the new default.
## Consul 1.6.3
### Vault: default `http_max_conns_per_client` too low to run Vault properly
Consul 1.6.3 introduced [limiting of connections per client](/docs/agent/options.html#http_max_conns_per_client). The default value
was 100, but Vault could use up to 128, which caused problems. If you want to use Vault with Consul 1.6.3, you should change the value to 200.
Starting with Consul 1.6.4 this is the new default.
## Consul 1.6.0
#### Removal of Deprecated Features
Managed proxies (which have been [deprecated](/docs/connect/proxies/managed-deprecated.html)
since Consul 1.3.0) have now been [removed](/docs/connect/proxies.html). Before
upgrading, you will need to migrate any managed proxy usage to [sidecar service
registrations](/docs/connect/registration/sidecar-service.html).
## Consul 1.4.0
There are two major features in Consul 1.4.0 that may impact upgrades: a [new
ACL system](#acl-upgrade) and [multi-datacenter support for
Connect](#connect-multi-datacenter) in the Enterprise version.
### ACL Upgrade
Consul 1.4.0 includes a [new ACL
system](https://learn.hashicorp.com/consul/security-networking/production-acls)
that is designed to have a smooth upgrade path but requires care to upgrade
components in the right order.
**Note:** As with most major version upgrades, you cannot downgrade once the
upgrade to 1.4.0 is complete as it adds new state to the raft store. As always
it is _strongly_ recommended that you test the upgrade first outside of
production and ensure you take backup snapshots of all datacenters before
upgrading.
#### Primary Datacenter
The "ACL datacenter" in 1.3.x and earlier is now referred to as the "Primary
datacenter". All configuration is backwards compatible and shouldn't need to
change prior to upgrade although it's strongly recommended to migrate ACL
configuration to the new syntax soon after upgrade. This includes moving to
`primary_datacenter` rather than `acl_datacenter` and `acl_*` to the new [ACL
block](/docs/agent/options.html#acl).
Datacenters can be upgraded in any order although secondaries will remain in
[Legacy ACL mode](#legacy-acl-mode) until the primary datacenter is fully
upgraded.
Each datacenter should follow the [standard rolling upgrade
procedure](/docs/upgrading.html#standard-upgrades).
#### Legacy ACL Mode
When a 1.4.0 server first starts, it runs in "Legacy ACL mode". In this mode,
bootstrap requests and new ACL APIs will not be functional yet and will return
an error. The server advertises it's ability to support 1.4.0 ACLs via gossip
and waits.
In the primary datacenter, the servers all wait in legacy ACL mode until they
see every server in the primary datacenter advertise 1.4.0 ACL support. Once
this happens, the leader will complete the transition out of "legacy ACL mode"
and write this into the state so future restarts don't need to go through the
same transition.
In a secondary datacenter, the same process happens except that servers
_additionally_ wait for all servers in the primary datacenter making it safe to
upgrade datacenters in any order.
It should be noted that even if you are not upgrading, starting a brand new
1.4.0 cluster will transition through legacy ACL mode so you may be unable to
bootstrap ACLs until all the expected servers are up and healthy.
#### Legacy Token Accessor Migration
As soon as all servers in the primary datacenter have been upgraded to 1.4.0,
the leader will begin the process of creating new accessor IDs for all existing
ACL tokens.
This process completes in the background and is rate limited to ensure it
doesn't overload the leader. It completes upgrades in batches of 128 tokens and
will not upgrade more than one batch per second so on a cluster with 10,000
tokens, this may take several minutes.
While this is happening both old and new ACLs will work correctly with the
caveat that new ACL [Token APIs](/api/acl/tokens.html) may not return an
accessor ID for legacy tokens that are not yet migrated.
#### Migrating Existing ACLs
New ACL policies have slightly different syntax designed to fix some
shortcomings in old ACL syntax. During and after the upgrade process, any old
ACL tokens will continue to work and grant exactly the same level of access.
After upgrade, it is still possible to create "legacy" tokens using the existing
API so existing integrations that create tokens (e.g. Vault) will continue to
work. The "legacy" tokens generated though will not be able to take advantage of
new policy features. It's recommended that you complete migration of all tokens
as soon as possible after upgrade, as well as updating any integrations to work
with the the new ACL [Token](/api/acl/tokens.html) and
[Policy](/api/acl/policies.html) APIs.
More complete details on how to upgrade "legacy" tokens is available [here](/docs/acl/acl-migrate-tokens.html).
### Connect Multi-datacenter
This only applies to users upgrading from an older version of Consul Enterprise to Consul Enterprise 1.4.0 (all license types).
In addition, this upgrade will only affect clusters where [Connect is enabled](/docs/connect/configuration.html) on your servers before the migration.
Connect multi-datacenter uses the same primary/secondary approach as ACLs and
will use the same [primary_datacenter](#primary-datacenter). When a secondary
datacenter server restarts with 1.4.0 it will detect it is not the primary and
begin an automatic bootstrap of multi-datacenter CA federation.
Datacenters can be upgraded in either order; secondary datacenters will not
switch into multi-datacenter mode until all servers in both the secondary and
primary datacenter are detected to be running at least Consul 1.4.0. Secondary
datacenters monitor this periodically (every few minutes) and will
automatically upgrade Connect to use a federated Certificate Authority when
they do.
In general, migrating a Consul cluster from OSS to Enterprise will update the
CA to be federated automatically and without impact on Connect traffic. When
upgrading Consul Enterprise 1.3.x to Consul Enterprise 1.4.0 upgrades the CA
upgrade is seamless, however depending on the size of the cluster, _new_
connection attempts in the secondary datacenter might fail for a short window
(typically seconds) while the update is propagated due to the 1.3.x Beta
authorization endpoint validating originating cluster in a way that was not
fully forwards compatible with migrating between cluster trust domains. That
issue is fixed in 1.4.0 as part of General Availability.
Once migrated (typically a few seconds). Connect will use the primary
datacenter's Certificate Authority as the root of trust for all other
datacenters. CA migration or root key changes in the primary will now rotate
automatically and without loss of connectivity throughout all datacenters and
workloads.
For more information see [Connect
Multi-datacenter](/docs/enterprise/connect-multi-datacenter/index.html).
## Consul 1.3.0
This version added support for multiple tag filters in service discovery
queries, however it introduced a subtle bug where API calls to
`/catalog/service/:name?tag=<tag>` would ignore the tag filter _only during the
upgrade_. It only occurs when clients are still running 1.2.3 or earlier but
servers have been upgraded. The `/health/service/:name?tag=<tag>` endpoint and
DNS interface were _not_ affected.
For this reason, we recommend you upgrade directly to 1.3.1 which includes only
a fix for this issue.
## Consul 1.1.0
#### Removal of Deprecated Features
The following previously deprecated fields and config options have been removed:
- `CheckID` has been removed from config file check definitions (use `id` instead).
- `script` has been removed from config file check definitions (use `args` instead).
- `enableTagOverride` is no longer valid in service definitions (use `enable_tag_override` instead).
- The [deprecated set of metric names](/docs/upgrade-specific.html#metric-names-updated) (beginning with `consul.consul.`) has been removed
along with the `enable_deprecated_names` option from the metrics configuration.
#### New defaults for Raft Snapshot Creation
Consul 1.0.1 (and earlier versions of Consul) checked for raft snapshots every
5 seconds, and created new snapshots for every 8192 writes. These defaults cause
constant disk IO in large busy clusters. Consul 1.1.0 increases these to larger values,
and makes them tunable via the [raft_snapshot_interval](/docs/agent/options.html#_raft_snapshot_interval) and
[raft_snapshot_threshold](/docs/agent/options.html#_raft_snapshot_threshold) parameters. We recommend
keeping the new defaults. However, operators can go back to the old defaults by changing their
config if they prefer more frequent snapshots. See the documentation for [raft_snapshot_interval](/docs/agent/options.html#_raft_snapshot_interval)
and [raft_snapshot_threshold](/docs/agent/options.html#_raft_snapshot_threshold) to understand the trade-offs
when tuning these.
## Consul 1.0.7
When requesting a specific service (`/v1/health/:service` or
`/v1/catalog/:service` endpoints), the `X-Consul-Index` returned is now the
index at which that _specific service_ was last modified. In version 1.0.6 and
earlier the `X-Consul-Index` returned was the index at which _any_ service was
last modified. See [GH-3890](https://github.com/hashicorp/consul/issues/3890)
for more details.
During upgrades from 1.0.6 or lower to 1.0.7 or higher, watchers are likely to
see `X-Consul-Index` for these endpoints decrease between blocking calls.
Consuls watch feature and `consul-template` should gracefully handle this case.
Other tools relying on blocking service or health queries are also likely to
work; some may require a restart. It is possible external tools could break and
either stop working or continually re-request data without blocking if they
have assumed indexes can never decrease or be reset and/or persist index
values. Please test any blocking query integrations in a controlled environment
before proceeding.
## Consul 1.0.1
#### Carefully Check and Remove Stale Servers During Rolling Upgrades
Consul 1.0 (and earlier versions of Consul when running with [Raft protocol
3](/docs/agent/options.html#_raft_protocol) had an issue where performing
rolling updates of Consul servers could result in an outage from old servers
remaining in the cluster.
[Autopilot](https://learn.hashicorp.com/consul/day-2-operations/autopilot)
would normally remove old servers when new ones come online, but it was also
waiting to promote servers to voters in pairs to maintain an odd quorum size.
The pairwise promotion feature was removed so that servers become voters as
soon as they are stable, allowing Autopilot to remove old servers in a safer
way.
When upgrading from Consul 1.0, you may need to manually
[force-leave](/docs/commands/force-leave.html) old servers as part of a rolling
update to Consul 1.0.1.
## Consul 1.0
Consul 1.0 has several important breaking changes that are documented here.
Please be sure to read over all the details here before upgrading.
#### Raft Protocol Now Defaults to 3
The [`-raft-protocol`](/docs/agent/options.html#_raft_protocol) default has
been changed from 2 to 3, enabling all
[Autopilot](https://learn.hashicorp.com/consul/day-2-operations/autopilot)
features by default.
Raft protocol version 3 requires Consul running 0.8.0 or newer on all servers
in order to work, so if you are upgrading with older servers in a cluster then
you will need to set this back to 2 in order to upgrade. See [Raft Protocol
Version
Compatibility](/docs/upgrade-specific.html#raft-protocol-version-compatibility)
for more details. Also the format of `peers.json` used for outage recovery is
different when running with the latest Raft protocol. See [Manual Recovery
Using
peers.json](https://learn.hashicorp.com/consul/day-2-operations/outage#manual-recovery-using-peers-json)
for a description of the required format.
Please note that the Raft protocol is different from Consul's internal protocol
as described on the [Protocol Compatibility Promise](/docs/compatibility.html)
page, and as is shown in commands like `consul members` and `consul version`.
To see the version of the Raft protocol in use on each server, use the `consul
operator raft list-peers` command.
The easiest way to upgrade servers is to have each server leave the cluster,
upgrade its Consul version, and then add it back. Make sure the new server
joins successfully and that the cluster is stable before rolling the upgrade
forward to the next server. It's also possible to stand up a new set of
servers, and then slowly stand down each of the older servers in a similar
fashion.
When using Raft protocol version 3, servers are identified by their
[`-node-id`](/docs/agent/options.html#_node_id) instead of their IP address
when Consul makes changes to its internal Raft quorum configuration. This means
that once a cluster has been upgraded with servers all running Raft protocol
version 3, it will no longer allow servers running any older Raft protocol
versions to be added. If running a single Consul server, restarting it in-place
will result in that server not being able to elect itself as a leader. To avoid
this, either set the Raft protocol back to 2, or use [Manual Recovery Using
peers.json](https://learn.hashicorp.com/consul/day-2-operations/outage#manual-recovery-using-peers-json)
to map the server to its node ID in the Raft quorum configuration.
#### Config Files Require an Extension
As part of supporting the [HCL](https://github.com/hashicorp/hcl#syntax) format
for Consul's config files, an `.hcl` or `.json` extension is required for all
config files loaded by Consul, even when using the
[`-config-file`](/docs/agent/options.html#_config_file) argument to specify a
file directly.
#### Service Definition Parameter Case changed
All config file formats now require snake_case fields, so all CamelCased parameter
names should be changed before upgrading.
See [Service Definition Parameter Case](https://www.consul.io/docs/agent/services.html#service-definition-parameter-case) documentation for details.
#### Deprecated Options Have Been Removed
All of Consul's previously deprecated command line flags and config options
have been removed, so these will need to be mapped to their equivalents before
upgrading. Here's the complete list of removed options and their equivalents:
| Removed Option | Equivalent |
| -------------- | ---------- |
| `-dc` | [`-datacenter`](/docs/agent/options.html#_datacenter) |
| `-retry-join-azure-tag-name` | [`-retry-join`](/docs/agent/options.html#microsoft-azure) |
| `-retry-join-azure-tag-value` | [`-retry-join`](/docs/agent/options.html#microsoft-azure) |
| `-retry-join-ec2-region` | [`-retry-join`](/docs/agent/options.html#amazon-ec2) |
| `-retry-join-ec2-tag-key` | [`-retry-join`](/docs/agent/options.html#amazon-ec2) |
| `-retry-join-ec2-tag-value` | [`-retry-join`](/docs/agent/options.html#amazon-ec2) |
| `-retry-join-gce-credentials-file` | [`-retry-join`](/docs/agent/options.html#google-compute-engine) |
| `-retry-join-gce-project-name` | [`-retry-join`](/docs/agent/options.html#google-compute-engine) |
| `-retry-join-gce-tag-name` | [`-retry-join`](/docs/agent/options.html#google-compute-engine) |
| `-retry-join-gce-zone-pattern` | [`-retry-join`](/docs/agent/options.html#google-compute-engine) |
| `addresses.rpc` | None, the RPC server for CLI commands is no longer supported. |
| `advertise_addrs` | [`ports`](/docs/agent/options.html#ports) with [`advertise_addr`](https://www.consul.io/docs/agent/options.html#advertise_addr) and/or [`advertise_addr_wan`](/docs/agent/options.html#advertise_addr_wan) |
| `dogstatsd_addr` | [`telemetry.dogstatsd_addr`](/docs/agent/options.html#telemetry-dogstatsd_addr) |
| `dogstatsd_tags` | [`telemetry.dogstatsd_tags`](/docs/agent/options.html#telemetry-dogstatsd_tags) |
| `http_api_response_headers` | [`http_config.response_headers`](/docs/agent/options.html#response_headers) |
| `ports.rpc` | None, the RPC server for CLI commands is no longer supported. |
| `recursor` | [`recursors`](https://github.com/hashicorp/consul/blob/master/website/source/docs/agent/options.html.md#recursors) |
| `retry_join_azure` | [`-retry-join`](/docs/agent/options.html#microsoft-azure) |
| `retry_join_ec2` | [`-retry-join`](/docs/agent/options.html#amazon-ec2) |
| `retry_join_gce` | [`-retry-join`](/docs/agent/options.html#google-compute-engine) |
| `statsd_addr` | [`telemetry.statsd_address`](https://github.com/hashicorp/consul/blob/master/website/source/docs/agent/options.html.md#telemetry-statsd_address) |
| `statsite_addr` | [`telemetry.statsite_address`](https://github.com/hashicorp/consul/blob/master/website/source/docs/agent/options.html.md#telemetry-statsite_address) |
| `statsite_prefix` | [`telemetry.metrics_prefix`](/docs/agent/options.html#telemetry-metrics_prefix) |
| `telemetry.statsite_prefix` | [`telemetry.metrics_prefix`](/docs/agent/options.html#telemetry-metrics_prefix) |
| (service definitions) `serviceid` | [`service_id`](/docs/agent/services.html) |
| (service definitions) `dockercontainerid` | [`docker_container_id`](/docs/agent/services.html) |
| (service definitions) `tlsskipverify` | [`tls_skip_verify`](/docs/agent/services.html) |
| (service definitions) `deregistercriticalserviceafter` | [`deregister_critical_service_after`](/docs/agent/services.html) |
#### `statsite_prefix` Renamed to `metrics_prefix`
Since the `statsite_prefix` configuration option applied to all telemetry
providers, `statsite_prefix` was renamed to
[`metrics_prefix`](/docs/agent/options.html#telemetry-metrics_prefix).
Configuration files will need to be updated when upgrading to this version of
Consul.
#### `advertise_addrs` Removed
This configuration option was removed since it was redundant with
`advertise_addr` and `advertise_addr_wan` in combination with `ports` and also
wrongly stated that you could configure both host and port.
#### Escaping Behavior Changed for go-discover Configs
The format for [`-retry-join`](/docs/agent/options.html#retry-join) and
[`-retry-join-wan`](/docs/agent/options.html#retry-join-wan) values that use
[go-discover](https://github.com/hashicorp/go-discover) cloud auto joining has
changed. Values in `key=val` sequences must no longer be URL encoded and can be
provided as literals as long as they do not contain spaces, backslashes `\` or
double quotes `"`. If values contain these characters then use double quotes as
in `"some key"="some value"`. Special characters within a double quoted string
can be escaped with a backslash `\`.
#### HTTP Verbs are Enforced in Many HTTP APIs
Many endpoints in the HTTP API that previously took any HTTP verb now check for
specific HTTP verbs and enforce them. This may break clients relying on the old
behavior. Here's the complete list of updated endpoints and required HTTP
verbs:
| Endpoint | Required HTTP Verb |
| -------- | ------------------ |
| /v1/acl/info | GET |
| /v1/acl/list | GET |
| /v1/acl/replication | GET |
| /v1/agent/check/deregister | PUT |
| /v1/agent/check/fail | PUT |
| /v1/agent/check/pass | PUT |
| /v1/agent/check/register | PUT |
| /v1/agent/check/warn | PUT |
| /v1/agent/checks | GET |
| /v1/agent/force-leave | PUT |
| /v1/agent/join | PUT |
| /v1/agent/members | GET |
| /v1/agent/metrics | GET |
| /v1/agent/self | GET |
| /v1/agent/service/register | PUT |
| /v1/agent/service/deregister | PUT |
| /v1/agent/services | GET |
| /v1/catalog/datacenters | GET |
| /v1/catalog/deregister | PUT |
| /v1/catalog/node | GET |
| /v1/catalog/nodes | GET |
| /v1/catalog/register | PUT |
| /v1/catalog/service | GET |
| /v1/catalog/services | GET |
| /v1/coordinate/datacenters | GET |
| /v1/coordinate/nodes | GET |
| /v1/health/checks | GET |
| /v1/health/node | GET |
| /v1/health/service | GET |
| /v1/health/state | GET |
| /v1/internal/ui/node | GET |
| /v1/internal/ui/nodes | GET |
| /v1/internal/ui/services | GET |
| /v1/session/info | GET |
| /v1/session/list | GET |
| /v1/session/node | GET |
| /v1/status/leader | GET |
| /v1/status/peers | GET |
| /v1/operator/area/:uuid/members | GET |
| /v1/operator/area/:uuid/join | PUT |
#### Unauthorized KV Requests Return 403
When ACLs are enabled, reading a key with an unauthorized token returns a 403.
This previously returned a 404 response.
#### Config Section of Agent Self Endpoint has Changed
The /v1/agent/self endpoint's `Config` section has often been in flux as it was
directly returning one of Consul's internal data structures. This configuration
structure has been moved under `DebugConfig`, and is documents as for debugging
use and subject to change, and a small set of elements of `Config` have been
maintained and documented. See [Read
Configuration](/api/agent.html#read-configuration) endpoint documentation for
details.
#### Deprecated `configtest` Command Removed
The `configtest` command was deprecated and has been superseded by the
`validate` command.
#### Undocumented Flags in `validate` Command Removed
The `validate` command supported the `-config-file` and `-config-dir` command
line flags but did not document them. This support has been removed since the
flags are not required.
#### Metric Names Updated
Metric names no longer start with `consul.consul`. To help with transitioning
dashboards and other metric consumers, the field `enable_deprecated_names` has
been added to the telemetry section of the config, which will enable metrics
with the old naming scheme to be sent alongside the new ones. The following
prefixes were affected:
| Prefix |
| ------ |
| consul.consul.acl |
| consul.consul.autopilot |
| consul.consul.catalog |
| consul.consul.fsm |
| consul.consul.health |
| consul.consul.http |
| consul.consul.kvs |
| consul.consul.leader |
| consul.consul.prepared-query |
| consul.consul.rpc |
| consul.consul.session |
| consul.consul.session_ttl |
| consul.consul.txn |
#### Checks Validated On Agent Startup
Consul agents now validate health check definitions in their configuration and
will fail at startup if any checks are invalid. In previous versions of Consul,
invalid health checks would get skipped.
2017-07-17 21:11:08 +00:00
## Consul 0.9.0
2017-07-20 21:48:45 +00:00
#### Script Checks Are Now Opt-In
2017-07-17 21:11:08 +00:00
A new [`enable_script_checks`](/docs/agent/options.html#_enable_script_checks)
configuration option was added, and defaults to `false`, meaning that in order
to allow an agent to run health checks that execute scripts, this will need to
be configured and set to `true`. This provides a safer out-of-the-box
configuration for Consul where operators must opt-in to allow script-based
health checks.
2017-07-17 21:11:08 +00:00
If your cluster uses script health checks please be sure to set this to `true`
as part of upgrading agents. If this is set to `true`, you should also enable
[ACLs](https://learn.hashicorp.com/consul/security-networking/production-acls)
to provide control over which users are allowed to register health checks that
could potentially execute scripts on the agent machines.
2017-07-18 14:11:59 +00:00
#### Web UI Is No Longer Released Separately
Consul releases will no longer include a `web_ui.zip` file with the compiled
web assets. These have been built in to the Consul binary since the 0.7.x
series and can be enabled with the [`-ui`](/docs/agent/options.html#_ui)
configuration option. These built-in web assets have always been identical to
the contents of the `web_ui.zip` file for each release. The
[`-ui-dir`](/docs/agent/options.html#_ui_dir) option is still available for
hosting customized versions of the web assets, but the vast majority of Consul
users can just use the built in web assets.
2017-07-17 21:11:08 +00:00
2017-02-25 02:10:46 +00:00
## Consul 0.8.0
#### Upgrade Current Cluster Leader Last
We identified a potential issue with Consul 0.8 that requires the current
cluster leader to be upgraded last when updating multiple servers. Please see
[this issue](https://github.com/hashicorp/consul/issues/2889) for more details.
2017-02-25 02:10:46 +00:00
#### Command-Line Interface RPC Deprecation
The RPC client interface has been removed. All CLI commands that used RPC and
the `-rpc-addr` flag to communicate with Consul have been converted to use the
HTTP API and the appropriate flags for it, and the `rpc` field has been removed
from the port and address binding configs. You will need to remove these fields
from your config files and update any scripts that passed a custom `-rpc-addr`
to the following commands:
2017-02-25 02:10:46 +00:00
* `force-leave`
* `info`
* `join`
* `keyring`
* `leave`
* `members`
* `monitor`
* `reload`
2017-03-25 00:45:24 +00:00
#### Version 8 ACLs Are Now Opt-Out
The [`acl_enforce_version_8`](/docs/agent/options.html#acl_enforce_version_8)
configuration now defaults to `true` to enable full version 8 ACL support by
default. If you are upgrading an existing cluster with ACLs enabled, you will
need to set this to `false` during the upgrade on **both Consul agents and
Consul servers**. Version 8 ACLs were also changed so that
[`acl_datacenter`](/docs/agent/options.html#acl_datacenter) must be set on
agents in order to enable the agent-side enforcement of ACLs. This makes for a
smoother experience in clusters where ACLs aren't enabled at all, but where the
agents would have to wait to contact a Consul server before learning that.
2017-03-25 00:45:24 +00:00
#### Remote Exec Is Now Opt-In
The default for
[`disable_remote_exec`](/docs/agent/options.html#disable_remote_exec) was
changed to "true", so now operators need to opt-in to having agents support
running commands remotely via [`consul exec`](/docs/commands/exec.html).
#### Raft Protocol Version Compatibility
2017-03-10 22:55:18 +00:00
When upgrading to Consul 0.8.0 from a version lower than 0.7.0, users will need
to set the [`-raft-protocol`](/docs/agent/options.html#_raft_protocol) option
to 1 in order to maintain backwards compatibility with the old servers during
the upgrade. After the servers have been migrated to version 0.8.0,
`-raft-protocol` can be moved up to 2 and the servers restarted to match the
default.
2017-03-10 22:55:18 +00:00
The Raft protocol must be stepped up in this way; only adjacent version numbers
are compatible (for example, version 1 cannot talk to version 3). Here is a
table of the Raft Protocol versions supported by each Consul version:
2017-03-10 22:55:18 +00:00
<table class="table table-bordered table-striped">
<tr>
<th>Version</th>
<th>Supported Raft Protocols</th>
</tr>
<tr>
<td>0.6 and earlier</td>
<td>0</td>
</tr>
<tr>
<td>0.7</td>
<td>1</td>
</tr>
<tr>
<td>0.8</td>
<td>1, 2, 3</td>
</tr>
</table>
In order to enable all
[Autopilot](https://learn.hashicorp.com/consul/day-2-operations/autopilot)
features, all servers in a Consul cluster must be running with Raft protocol
version 3 or later.
2016-11-08 20:12:57 +00:00
## Consul 0.7.1
#### Child Process Reaping
Child process reaping support has been removed, along with the `reap`
configuration option. Reaping is also done via
[dumb-init](https://github.com/Yelp/dumb-init) in the [Consul Docker
image](https://github.com/hashicorp/docker-consul), so removing it from Consul
itself simplifies the code and eases future maintenance for Consul. If you are
running Consul as PID 1 in a container you will need to arrange for a wrapper
process to reap child processes.
2016-11-08 20:12:57 +00:00
#### DNS Resiliency Defaults
The default for [`max_stale`](/docs/agent/options.html#max_stale) has been
increased from 5 seconds to a near-indefinite threshold (10 years) to allow DNS
queries to continue to be served in the event of a long outage with no leader.
A new telemetry counter was added at `consul.dns.stale_queries` to track when
agents serve DNS queries that are stale by more than 5 seconds.
2016-11-08 20:12:57 +00:00
## Consul 0.7
Consul version 0.7 is a very large release with many important changes. Changes
to be aware of during an upgrade are categorized below.
2016-09-01 07:22:09 +00:00
#### Performance Timing Defaults and Tuning
Consul 0.7 now defaults the DNS configuration to allow for stale queries by
defaulting [`allow_stale`](/docs/agent/options.html#allow_stale) to true for
better utilization of available servers. If you want to retain the previous
behavior, set the following configuration:
```javascript
{
"dns_config": {
"allow_stale": false
}
}
```
Consul also 0.7 introduced support for tuning Raft performance using a new
[performance configuration block](/docs/agent/options.html#performance). Also,
the default Raft timing is set to a lower-performance mode suitable for
[minimal Consul servers](/docs/install/performance.html#minimum).
To continue to use the high-performance settings that were the default prior to
Consul 0.7 (recommended for production servers), add the following
configuration to all Consul servers when upgrading:
```javascript
{
"performance": {
"raft_multiplier": 1
}
}
```
See the [Server Performance](/docs/install/performance.html) guide for more details.
2016-09-01 07:22:09 +00:00
#### Leave-Related Configuration Defaults
2016-09-01 07:22:09 +00:00
The default behavior of [`leave_on_terminate`](/docs/agent/options.html#leave_on_terminate)
and [`skip_leave_on_interrupt`](/docs/agent/options.html#skip_leave_on_interrupt)
are now dependent on whether or not the agent is acting as a server or client:
* For servers, `leave_on_terminate` defaults to "false" and `skip_leave_on_interrupt`
defaults to "true".
* For clients, `leave_on_terminate` defaults to "true" and `skip_leave_on_interrupt`
defaults to "false".
These defaults are designed to be safer for servers so that you must explicitly
configure them to leave the cluster. This also results in a better experience for
clients, especially in cloud environments where they may be created and destroyed
often and users prefer not to wait for the 72 hour reap time for cleanup.
#### Dropped Support for Protocol Version 1
Consul version 0.7 dropped support for protocol version 1, which means it
is no longer compatible with versions of Consul prior to 0.3. You will need
to upgrade all agents to a newer version of Consul before upgrading to Consul
0.7.
#### Prepared Query Changes
Consul version 0.7 adds a feature which allows prepared queries to store a
2017-04-04 16:33:22 +00:00
[`Near` parameter](/api/query.html#near) in the query definition
itself. This feature enables using the distance sorting features of prepared
queries without explicitly providing the node to sort near in requests, but
requires the agent servicing a request to send additional information about
itself to the Consul servers when executing the prepared query. Agents prior
to 0.7 do not send this information, which means they are unable to properly
execute prepared queries configured with a `Near` parameter. Similarly, any
server nodes prior to version 0.7 are unable to store the `Near` parameter,
making them unable to properly serve requests for prepared queries using the
feature. It is recommended that all agents be running version 0.7 prior to
using this feature.
#### WAN Address Translation in HTTP Endpoints
Consul version 0.7 added support for translating WAN addresses in certain
[HTTP endpoints](/docs/agent/options.html#translate_wan_addrs). The servers
and the agents need to be running version 0.7 or later in order to use this
feature.
2016-09-01 07:22:09 +00:00
These translated addresses could break HTTP endpoint consumers that are
2017-04-04 16:52:00 +00:00
expecting local addresses, so a new [`X-Consul-Translate-Addresses`](/api/index.html#translate_header)
header was added to allow clients to detect if translation is enabled for HTTP
2016-09-01 07:22:09 +00:00
responses. A "lan" tag was added to `TaggedAddresses` for clients that need
the local address regardless of translation.
2016-09-01 07:22:09 +00:00
#### Outage Recovery and `peers.json` Changes
The `peers.json` file is no longer present by default and is only used when
performing recovery. This file will be deleted after Consul starts and ingests
2016-09-01 07:22:09 +00:00
the file. Consul 0.7 also uses a new, automatically-created raft/peers.info file
to avoid ingesting the `peers.json` file on the first start after upgrading (the
`peers.json` file is simply deleted on the first start after upgrading).
Please be sure to review the [Outage Recovery Guide](https://learn.hashicorp.com/consul/day-2-operations/outage)
before upgrading for more details.
## Consul 0.6.4
Consul 0.6.4 made some substantial changes to how ACLs work with prepared
queries. Existing queries will execute with no changes, but there are important
differences to understand about how prepared queries are managed before you
upgrade. In particular, prepared queries with no `Name` defined will no longer
require any ACL to manage them, and prepared queries with a `Name` defined are
2016-03-05 00:32:53 +00:00
now governed by a new `query` ACL policy that will need to be configured
after the upgrade.
See the [ACL rules documentation](/docs/acl/acl-rules.html#prepared-query-rules) for more details
about the new behavior and how it compares to previous versions of Consul.
## Consul 0.6
Consul version 0.6 is a very large release with many enhancements and
optimizations. Changes to be aware of during an upgrade are categorized below.
2016-08-04 13:39:50 +00:00
#### Data Store Changes
Consul changed the format used to store data on the server nodes in version 0.5
(see 0.5.1 notes below for details). Previously, Consul would automatically
detect data directories using the old LMDB format, and convert them to the newer
BoltDB format. This automatic upgrade has been removed for Consul 0.6, and
instead a safeguard has been put in place which will prevent Consul from booting
if the old directory format is detected.
It is still possible to migrate from a 0.5.x version of Consul to 0.6+ using the
[consul-migrate](https://github.com/hashicorp/consul-migrate) CLI utility. This
is the same tool that was previously embedded into Consul. See the
[releases](https://github.com/hashicorp/consul-migrate/releases) page for
downloadable versions of the tool.
Also, in this release Consul switched from LMDB to a fully in-memory database for
the state store. Because LMDB is a disk-based backing store, it was able to store
more data than could fit in RAM in some cases (though this is not a recommended
configuration for Consul). If you have an extremely large data set that won't fit
into RAM, you may encounter issues upgrading to Consul 0.6.0 and later. Consul
should be provisioned with physical memory approximately 2X the data set size to
allow for bursty allocations and subsequent garbage collection.
#### ACL Enhancements
Consul 0.6 introduces enhancements to the ACL system which may require special
handling:
* Service ACLs are enforced during service discovery (REST + DNS)
Previously, service discovery was wide open, and any client could query
information about any service without providing a token. Consul now requires
read-level access at a minimum when ACLs are enabled to return service
information over the REST or DNS interfaces. If clients depend on an open
service discovery system, then the following should be added to all ACL tokens
which require it:
# Enable discovery of all services
service "" {
policy = "read"
}
When the DNS interface is queried, the agent's
[`acl_token`](/docs/agent/options.html#acl_token) is used, so be sure
that token has sufficient privileges to return the DNS records you
expect to retrieve from it.
* Event and keyring ACLs
Similar to service discovery, the new event and keyring ACLs will block access
to these operations if the `acl_default_policy` is set to `deny`. If clients depend
on open access to these, then the following should be added to all ACL tokens which
require them:
event "" {
policy = "write"
}
keyring = "write"
Unfortunately, these are new ACLs for Consul 0.6, so they must be added after the
upgrade is complete.
#### Prepared Queries
Prepared queries introduce a new Raft log entry type that isn't supported on older
versions of Consul. It's important to not use the prepared query features of Consul
until all servers in a cluster have been upgraded to version 0.6.0.
#### Single Private IP Enforcement
Consul will refuse to start if there are multiple private IPs available, so
if this is the case you will need to configure Consul's advertise or bind addresses
before upgrading.
#### New Web UI File Layout
The release .zip file for Consul's web UI no longer contains a `dist` sub-folder;
everything has been moved up one level. If you have any automated scripts that
expect the old layout you may need to update them.
2015-04-11 00:52:49 +00:00
## Consul 0.5.1
Consul version 0.5.1 uses a different backend store for persisting the Raft
log. Because of this change, a data migration is necessary to move the log
entries out of LMDB and into the newer backend, BoltDB.
Consul version 0.5.1+ makes this transition seamless and easy. As a user, there
are no special steps you need to take. When Consul starts, it checks
2015-04-11 00:52:49 +00:00
for presence of the legacy LMDB data files, and migrates them automatically
if any are found. You will see a log emitted when Raft data is migrated, like
this:
```
==> Successfully migrated raft data in 5.839642ms
```
2015-04-11 00:52:49 +00:00
This automatic upgrade will only exist in Consul 0.5.1+ and it will
2017-07-20 21:48:45 +00:00
be removed starting with Consul 0.6.0+. It will still be possible to upgrade directly
from pre-0.5.1 versions by using the consul-migrate utility, which is available on the
2015-04-11 00:52:49 +00:00
[Consul Tools page](/downloads_tools.html).
2015-02-19 19:03:02 +00:00
## Consul 0.5
Consul version 0.5 adds two features that complicate the upgrade process:
* ACL system includes service discovery and registration
* Internal use of tombstones to fix behavior of blocking queries
in certain edge cases.
Users of the ACL system need to be aware that deploying Consul 0.5 will
cause service registration to be enforced. This means if an agent
attempts to register a service without proper privileges it will be denied.
If the `acl_default_policy` is "allow" then clients will continue to
work without an updated policy. If the policy is "deny", then all clients
will begin to have their registration rejected causing issues.
To avoid this situation, all the ACL policies should be updated to
add something like this:
# Enable all services to be registered
service "" {
policy = "write"
}
This will set the service policy to `write` level for all services.
The blank service name is the catch-all value. A more specific service
can also be specified:
# Enable only the API service to be registered
service "api" {
policy = "write"
}
The ACL policy can be updated while running 0.4, and enforcement will
being with the upgrade to 0.5. The policy updates will ensure the
availability of the cluster.
The second major change is the new internal command used for tombstones.
The details of the change are not important, however to function the leader
node will replicate a new command to its followers. Consul is designed
defensively, and when a command that is not recognized is received, the
server will panic. This is a purposeful design decision to avoid the possibility
2017-09-27 18:20:01 +00:00
of data loss, inconsistencies, or security issues caused by future incompatibility.
2015-02-19 19:03:02 +00:00
In practice, this means if a Consul 0.5 node is the leader, all of its
followers must also be running 0.5. There are a number of ways to do this
to ensure cluster availability:
* Add new 0.5 nodes, then remove the old servers. This will add the new
nodes as followers, and once the old servers are removed, one of the
0.5 nodes will become leader.
* Upgrade the followers first, then the leader last. Using `consul info`,
you can determine which nodes are followers. Do an in-place upgrade
on them first, and finally upgrade the leader last.
* Upgrade them in any order, but ensure all are done within 15 minutes.
Even if the leader is upgraded to 0.5 first, as long as all of the followers
are running 0.5 within 15 minutes there will be no issues.
Finally, even if any of the methods above are not possible or the process
fails for some reason, it is not fatal. The older version of the server
will simply panic and stop. At that point, you can upgrade to the new version
and restart the agent. There will be no data loss and the cluster will
resume operations.