From 6abc6a293c55bad788dd7d3dbb4101ea6f261a67 Mon Sep 17 00:00:00 2001 From: Luke Kysow <1034429+lkysow@users.noreply.github.com> Date: Thu, 1 Oct 2020 14:36:15 -0700 Subject: [PATCH] Update k8s upgrade docs (#8789) * Update k8s upgrade docs --- website/pages/docs/k8s/upgrade/index.mdx | 305 +++++++++++++++++++++-- 1 file changed, 279 insertions(+), 26 deletions(-) diff --git a/website/pages/docs/k8s/upgrade/index.mdx b/website/pages/docs/k8s/upgrade/index.mdx index 40d2d7e16d..614d6596d2 100644 --- a/website/pages/docs/k8s/upgrade/index.mdx +++ b/website/pages/docs/k8s/upgrade/index.mdx @@ -7,28 +7,252 @@ description: Upgrade Consul on Kubernetes # Upgrade Consul on Kubernetes -To upgrade Consul on Kubernetes, we follow the same pattern as -[generally upgrading Consul](/docs/upgrading), except we can use -the Helm chart to step through a rolling deploy. It is important to understand -how to [generally upgrade Consul](/docs/upgrading) before reading this -section. +## Upgrade Types -Upgrading Consul on Kubernetes will follow the same pattern: each server -will be updated one-by-one. After that is successful, the clients will -be updated in batches. +Consul on Kubernetes will need to be upgraded/updated if you change your Helm configuration, +if a new Helm chart is released, or if you wish to upgrade your Consul version. + +### Helm Configuration Changes + +If you make a change to your Helm values file, you will need to perform a `helm upgrade` +for those changes to take effect. + +For example, if you've installed Consul with the following: + +```yaml +global: + name: consul +connectInject: + enabled: false +``` + +And you wish to set `connectInject.enabled` to `true`: + +```diff +global: + name: consul +connectInject: +- enabled: false ++ enabled: true +``` + +Perform the following steps: + +1. Determine your current installed chart version. + +```bash +helm list -f consul +NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION +consul default 2 2020-09-30 ... deployed consul-0.24.0 1.8.2 +``` + +In this example, version `0.24.0` (from `consul-0.24.0`) is being used. + +1. Perform a `helm upgrade`: + + ```bash + helm upgrade consul hashicorp/consul --version 0.24.0 -f /path/to/my/values.yaml + ``` + + **Before performing the upgrade, be sure you've read the other sections on this page, + continuing at [Determining What Will Change](#determining-what-will-change).** + +~> NOTE: It's important to always set the `--version` flag, because otherwise Helm +will use the most up-to-date version in its local cache, which may result in an +unintended upgrade. + +### Helm Chart Version Upgrade + +You may wish to upgrade your Helm chart version to take advantage of new features, +bugfixes, or because you want to upgrade your Consul version, and it requires a +certain Helm chart version. + +1. Update your local Helm repository cache: + +```bash +helm repo update +``` + +1. List all available versions: + +```bash +helm search repo hashicorp/consul -l +NAME CHART VERSION APP VERSION DESCRIPTION +hashicorp/consul 0.24.1 1.8.2 Official HashiCorp Consul Chart +hashicorp/consul 0.24.0 1.8.1 Official HashiCorp Consul Chart +... +``` + +Here we can see that the latest version of `0.24.1`. + +1. To determine which version you have installed, issue the following command: + +```bash +helm list -f consul +NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION +consul default 2 2020-09-30 ... deployed consul-0.24.0 1.8.2 +``` + +In this example, version `0.24.0` (from `consul-0.24.0`) is being used. +If you want to upgrade to the latest `0.24.1` version, use the following procedure: + +1. Check the changelog for any breaking changes from that version and any versions in between: https://github.com/hashicorp/consul-helm/blob/master/CHANGELOG.md. + +1. Upgrade by performing a `helm upgrade` with the `--version` flag: + +```bash +helm upgrade consul hashicorp/consul --version 0.24.1 -f /path/to/my/values.yaml +``` + +**Before performing the upgrade, be sure you've read the other sections on this page, +continuing at [Determining What Will Change](#determining-what-will-change).** + +### Consul Version Upgrade + +If a new version of Consul is released, you will need to perform a Helm upgrade +to update to the new version. + +1. Ensure you've read the [Upgrading Consul](/docs/upgrading) documentation. +1. Ensure you've read any [specific instructions](/docs/upgrading/upgrade-specific) for the version you're upgrading + to and the Consul [changelog](https://github.com/hashicorp/consul/blob/master/CHANGELOG.md) for that version. +1. Read our [Compatibility Matrix](/docs/k8s/upgrade/compatibility) to ensure + your current Helm chart version supports this Consul version. If it does not, + you may need to also upgrade your Helm chart version at the same time. +1. Set `global.consul` in your `values.yaml` to the desired version: + ```yaml + global: + image: consul:1.8.3 + ``` +1. Determine your current installed chart version: + +```bash +helm list -f consul +NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION +consul default 2 2020-09-30 ... deployed consul-0.24.0 1.8.2 +``` + +In this example, version `0.24.0` (from `consul-0.24.0`) is being used. + +1. Perform a `helm upgrade`: + +```bash +helm upgrade consul hashicorp/consul --version 0.24.0 -f /path/to/my/values.yaml`. +``` + +**Before performing the upgrade, be sure you've read the other sections on this page, +continuing at [Determining What Will Change](#determining-what-will-change).** + +~> NOTE: It's important to always set the `--version` flag, because otherwise Helm +will use the most up-to-date version in its local cache, which may result in an +unintended upgrade. + +## Determining What Will Change + +Before upgrading, it's important to understand what changes will be made to your +cluster. For example, you will need to take more care if your upgrade will result +in the Consul server statefulset being redeployed. + +There is no built-in functionality in Helm that shows what a helm upgrade will +change. There is, however, a Helm plugin [helm-diff](https://github.com/databus23/helm-diff) +that can be used. + +1. Install `helm-diff` with: + +```bash +helm plugin install https://github.com/databus23/helm-diff +``` + +1. If you are updating your `values.yaml` file, do so now. +1. Take the same `helm upgrade` command you were planning to issue but perform `helm diff upgrade` instead of `helm upgrade`: + +```bash +helm diff upgrade consul hashicorp/consul --version 0.24.1 -f /path/to/your/values.yaml +``` + +This will print out the manifests that will be updated and their diffs. + +1. To see only the objects that will be updated, add `| grep "has changed"`: + +```bash +helm diff upgrade consul hashicorp/consul --version 0.24.1 -f /path/to/your/values.yaml | + grep "has changed" +``` + +1. Take specific note if `consul, DaemonSet` or `consul-server, StatefulSet` are listed. + This means that your Consul client daemonset or Consul server statefulset (or both) will be redeployed. + +If either is being redeployed, we will follow the same pattern for upgrades as +on other platforms: the servers will be redeployed one-by-one, and then the +clients will be redeployed in batches. Read [Upgrading Consul](/docs/upgrading) and then continue +reading below. + +If neither the client daemonset nor the server statefulset is being redeployed, +then you can continue with the helm upgrade without any specific sequence to follow. + +## Service Mesh + +If you are using Consul's service mesh features, as opposed to the [service sync](/docs/k8s/service-sync) +functionality, you must be aware of the behavior of the service mesh during upgrades. + +Consul clients operate as a daemonset across all Kubernernetes nodes. During an upgrade, +if the Consul client daemonset has changed, the client pods will need to be restarted +because their spec has changed. + +When a Consul client pod is restarted, it will deregister itself from Consul when it stops. +When the pod restarts, it will re-register itself with Consul. +Thus, during the period between the Consul client on a node stopping and restarting, +the following will occur: + +1. The node will be deregistered from Consul. It will not show up in the Consul UI + nor in API requests. +1. Because the node is deregistered, all service pods that were on that node will + also be deregistered. This means they will not receive service mesh traffic + until the Consul client pod restarts. +1. Service pods on that node can continue to make requests through the service + mesh because each Envoy proxy maintains a cache of the locations of upstream + services. However, if the upstream services change IPs, Envoy will not be able + to refresh its cluster information until its local Consul client is restarted. + So services can continue to make requests without downtime for a short period + of time, however, it's important for the local Consul client to be restarted + as quickly as possible. + +Once the local Consul client pod restarts, each service pod needs to re-register +itself with the client. This is done automatically by the `consul-connect-lifecycle-sidecar` +sidecar container that is injected alongside each service. + +Because service mesh pods are briefly deregistered during a Consul client restart, +it's **important that you do not restart all Consul clients at once**. Otherwise +you may experience downtime because no replicas of a specific service will be in the mesh. + +In addition, it's **important that you have multiple replicas** for each service. +If you only have one replica, then during restart of the Consul client on the +node hosting that replica, it will be briefly deregistered from the mesh. Since +it's the only replica, other services will not be able to make calls to that +service. (NOTE: This can also be avoided by stopping that replica so it is rescheduled to +a node whose Consul client has already been updated.) + +Given the above, we recommend that after Consul servers are upgraded, the Consul +client daemonset is set to use the `OnDelete` update strategy and Consul clients +are deleted one by one or in batches. See [Upgrading Consul Servers](#upgrading-consul-server) +and [Upgrading Consul Clients](#upgrading-consul-clients) for more details. ## Upgrading Consul Servers -To initiate the upgrade, change the `server.image` value to the -desired Consul version. For illustrative purposes, the example below will -use `consul:123.456`. Also, set the `server.updatePartition` value -_equal to the number of server replicas_: +To initiate the upgrade: + +1. Change the `global.image` value to the desired Consul version +1. Set the `server.updatePartition` value _equal to the number of server replicas_. + By default there are 3 servers, so you would set this value to `3` +1. Set the `updateStrategy` for clients to `OnDelete` ```yaml -server: +global: image: 'consul:123.456' - replicas: 3 +server: updatePartition: 3 +client: + updateStrategy: | + type: OnDelete ``` The `updatePartition` value controls how many instances of the server @@ -36,34 +260,63 @@ cluster are updated. Only instances with an index _greater than_ the `updatePartition` value are updated (zero-indexed). Therefore, by setting it equal to replicas, none should update yet. -Next, run the upgrade. You should run this with `--dry-run` first to verify -the changes that will be sent to the Kubernetes cluster. +The `updateStrategy` controls how Kubernetes rolls out changes to the client daemonset. +By setting it to `OnDelete`, no clients will be restarted until their pods are deleted. +Without this, they would be redeployed alongside the servers because their Docker +image versions have changed. This is not desirable because we want the Consul +servers to be upgraded _before_ the clients. -```shell-session -$ helm upgrade consul ./ -... +1. Next, perform the upgrade: + +```bash +helm upgrade consul hashicorp/consul --version -f /path/to/your/values.yaml ``` -This should cause no changes (although the resource will be updated). If +This will not cause the servers to redeploy (although the resource will be updated). If everything is stable, begin by decreasing the `updatePartition` value by one, -and running `helm upgrade` again. This should cause the first Consul server +and performing `helm upgrade` again. This will cause the first Consul server to be stopped and restarted with the new image. -Wait until the Consul server cluster is healthy again (30s to a few minutes) -then decrease `updatePartition` and upgrade again. Continue until +1. Wait until the Consul server cluster is healthy again (30s to a few minutes). + This can be confirmed by issuing `consul members` on one of the previous servers, + and ensuring that all servers are listed and are `alive`. + +Decrease `updatePartition` by one and upgrade again. Continue until `updatePartition` is `0`. At this point, you may remove the `updatePartition` configuration. Your server upgrade is complete. ## Upgrading Consul Clients -With the servers upgraded, it is time to upgrade the clients. To upgrade -the clients, set the `client.image` value to the desired Consul version. +With the servers upgraded, it is time to upgrade the clients. +If you are using Consul's service mesh features, you will want to be careful +restarting the clients as outlined in [Service Mesh](#service-mesh). + +You can either: + +1. Manually issue `kubectl delete pod ` for each consul daemonset pod +2. Set the updateStrategy to rolling update with a small number: + +```yaml +client: + updateStrategy: | + rollingUpdate: + maxUnavailable: 2 + type: RollingUpdate +``` + Then, run `helm upgrade`. This will upgrade the clients in batches, waiting until the clients come up healthy before continuing. +3. Cordon and drain each node to ensure there are no connect pods active on it, and then delete the + consul client pod on that node. + +-> NOTE: If you are using only the Service Sync functionality, you can perform an upgrade without +following a specific sequence since that component is more resilient to brief restarts of +Consul clients. + ## Configuring TLS on an Existing Cluster If you already have a Consul cluster deployed on Kubernetes and would like to turn on TLS for internal Consul communication, please see -[Configuring TLS on an Existing Cluster](/docs/k8s/operations/tls-on-existing-cluster). +[Configuring TLS on an Existing Cluster](/docs/k8s/tls-on-existing-cluster).