diff --git a/website/content/docs/nia/cli/start.mdx b/website/content/docs/nia/cli/start.mdx
index 023e2938a6..801b14ab6f 100644
--- a/website/content/docs/nia/cli/start.mdx
+++ b/website/content/docs/nia/cli/start.mdx
@@ -28,7 +28,7 @@ The following table describes all of the available flags.
| `-inspect` | Optional | boolean | Starts CTS in inspect mode (refer to [Modes](#modes) for additional information). In inspect mode, CTS displays the proposed state changes for all tasks once and exits. No changes are applied. If an error occurs before displaying all changes, CTS exits with a non-zero status. | `false` |
| `-inspect-task` | Optional | string | Starts CTS in inspect mode for the specified task. CTS displays the proposed state changes for the specified task and exits. No changes are applied.
You can specify the flag multiple times to display more than one task.
If an error occurs before displaying all changes, CTS exits with a non-zero status. | none |
| `-once` | Optional | boolean | Starts CTS in once-mode (refer to [Modes](#modes) for additional information). In once-mode, CTS renders templates and runs tasks once. CTS does not start the process in long-running mode and disables buffer periods. | `false` |
-| `-reset-storage` | Optional | boolean | Directs CTS to overwrite the state storage with new state information when the instance you are starting is elected the cluster leader.
Only use this flag when running CTS in high availability mode. | false |
+| `-reset-storage` | Optional | boolean | Directs CTS to overwrite the state storage with new state information when the instance you are starting is elected the cluster leader.
Only use this flag when running CTS in [high availability mode](/docs/nia/usage/run-ha). | false |
| `-h`, `-help` | Optional | boolean | Prints the CTS command line help. | `false` |
## Modes
@@ -40,4 +40,4 @@ By default, CTS starts in long-running mode. The following table describes all a
| Long-running mode | CTS starts in once-mode and switches to a long-running process.
During the once-mode phase, the daemon exits with a non-zero status if it encounters an error.
After successfully operating in once-mode, CTS begins a long-running process in which it logs errors and exits.
When the long-running process begins, the CTS daemon serves API and command requests.
| No additional flags.
This is the default mode. |
| Once mode | In once-mode, CTS renders templates and runs tasks once. CTS does not start the process in long-running mode and disables buffer periods. Use once-mode before starting CTS in long-running mode to verify that your configuration is accurate and tasks update network infrastructure as expected.
| Add the `-once` flag when starting CTS. |
| Inspect mode | CTS displays the proposed state changes for all tasks once and exits. No changes are applied. If an error occurs before displaying all changes, CTS exits with a non-zero status. Use inspect mode before starting CTS in long-running mode to debug one or more tasks and to verify that your tasks update network infrastructure as expected.
| Add the `-inspect` flag to verify all tasks. Add the `-inspect-task` flag to inspect a single task. Use multiple flags to verify more than one task.
|
-| High availability mode | Ensures that all changes to Consul that occur during a failover transition are processed and that CTS continues to operate as expected. CTS logs the errors and continues to operate without interruption. Refer to Run Consul-Terraform-Sync with High Availability for additional information. | Add the `high_availability` block to your CTS instance configuration. Refer to Run Consul-Terraform-Sync with High Availability for additional information.
|
+| High availability mode | Ensures that all changes to Consul that occur during a failover transition are processed and that CTS continues to operate as expected. CTS logs the errors and continues to operate without interruption. Refer to [Run Consul-Terraform-Sync with High Availability](/docs/nia/usage/run-ha) for additional information. | Add the `high_availability` block to your CTS instance configuration. Refer to [Run Consul-Terraform-Sync with High Availability](/docs/nia/usage/run-ha) for additional information.
|
diff --git a/website/content/docs/nia/usage/run-ha.mdx b/website/content/docs/nia/usage/run-ha.mdx
new file mode 100644
index 0000000000..50e8dda098
--- /dev/null
+++ b/website/content/docs/nia/usage/run-ha.mdx
@@ -0,0 +1,184 @@
+---
+layout: docs
+page_title: Run Consul-Terraform-Sync with High Availability
+description: >-
+ Improve network automation resiliency by enabling high availability for Consul-Terraform-Sync. HA enables persistent task and event data so that CTS functions as expected during a failover event.
+---
+
+# Run Consul-Terraform-Sync with High Availability
+
+This topic describes how to run Consul-Terraform-Sync (CTS) configured for high availability. High availability is an enterprise capability that ensures that all changes to Consul that occur during a failover transition are processed and that CTS continues to operate as expected.
+
+## Introduction
+
+A network always has exactly one instance of the CTS cluster that is the designated leader. The leader is responsible for monitoring and running tasks. If the leader fails, CTS triggers the following process if it is configured for high availability:
+
+1. The CTS cluster promotes a new leader from the pool of followers in the network.
+1. The new leader begins running all existing tasks in `once-mode` in order to process changes that occurred during the failover transition period. In this mode, CTS runs all existing tasks one time.
+1. The new leader logs any errors that occur during `once-mode` operation and the new leader continues to monitor Consul for changes.
+
+In a standard configuration, CTS exits if errors occur when the CTS instance runs tasks in `once-mode`. In a high availability configuration, CTS logs the errors and continues to operate without interruption.
+
+The following diagram shows operating state when high availability is enabled:
+
+![Consul-Terraform-Sync architecture configured for high availability before a shutdown event](/img/nia/cts-ha-before.png)
+
+The following diagram shows the CTS cluster state after the leader stops. CTS Instance B becomes the leader responsible for monitoring and running tasks.
+
+![Consul-Terraform-Sync architecture configured for high availability before a shutdown event](/img/nia/cts-ha-after.png)
+
+### Failover details
+
+* The time it takes for a new leader to be elected is determined by the `high_availability.cluster.storage.session_ttl` configuration. The minimum failover time is equal to the `session_ttl` value. The maximum failover time is double the `session_ttl` value.
+* If failover occurs during task execution, a new leader is elected and will attempt to run all tasks once before continuing to monitor for changes.
+* If using the [Terraform Cloud (TFC) driver](/docs/nia/network-drivers/terraform-cloud), the task finishes and CTS starts a new leader that attempts to queue a run for each task in TFC in once-mode.
+* If using [Terraform driver](/docs/nia/network-drivers/terraform), the task may complete depending on the cause of the failover. The new leader starts and attempts to run each task in [once-mode](/docs/nia/cli/start#modes). Depending on the module and provider, the task may require manual intervention to fix any inconsistencies between the infrastructure and Terraform state.
+* If failover occurs when no task is executing, CTS elects a new leader that attempts to run all tasks in once-mode.
+
+Note that driver behavior is consistent whether or not CTS is running in high availability mode.
+
+## Requirements
+
+* Verify that you have met the [basic requirements](/docs/nia/usage/requirements) for running CTS.
+* CTS Enterprise 0.7 or later
+* Terraform CLI 0.13 or later
+* All instances in a cluster must be in the same datacenter.
+
+You must configure appropriate ACL permissions for your cluster. Refer to [ACL permissions](#) for details.
+
+It’s not required, but we recommend specifying the [TFC driver](/docs/nia/network-drivers/terraform-cloud) in your CTS configuration if you want to run in high availability mode.
+
+## Configuration
+
+Add the `high_availability` block in your CTS configuration and configure the required settings to enable high availability. Refer to the [Configuration reference](#) for details about the configuration fields for the `high_availability` block.
+
+The following example configures high availability functionality for a cluster named `cts-cluster`:
+
+
+
+```hcl
+high_availability {
+ cluster {
+ name = "cts-cluster"
+ storage "consul" {
+ parent_path = "cts"
+ namespace = "ns"
+ session_ttl = "30s"
+ }
+ }
+
+ instance {
+ address = "cts-01.example.com"
+ }
+}
+```
+
+
+### ACL permissions
+
+The `session` and `keys` resources in your Consul environment must have `write` permissions. Refer to the [ACL documentation](/docs/security/acl) for details on how to define ACL policies.
+
+If the `high_availability.cluster.storage.namespace` field is configured, then your ACL policy must also enable `write` permissions for the `namespace` resource.
+
+## Start a new CTS cluster
+
+We recommend deploying a cluster that includes three CTS instances. This is so that the cluster has one leader and two followers.
+
+1. Create an HCL configuration file that includes the settings you want to include, including the `high_availability` block. Refer to [Configuration Options for Consul-Terraform-Sync](/docs/nia/installation/configure) for all configuration options.
+1. Issue the startup command and pass the configuration file. Refer to the [`start` command reference](/docs/nia/cli/start#modes) for additional information about CTS startup modes.
+ ```shell-session
+ $ consul-terraform-sync start -config-file ha-config.hcl
+ ```
+1. You can call the `/status` API endpoint to verify the status of tasks CTS is configured to monitor. Refer to the [`/status` API reference documentation](/docs/nia/api/status) for information about usage and responses.
+
+ ```shell-session
+ $ curl localhost:/status/tasks
+ ```
+
+## Modify an instance configuration
+
+You can implement a rolling update to update a non-task configuration for a CTS instance, such as the Consul connection settings, when high availability is enabled. If you need to update a task in the instance configuration, refer to [Modify tasks](#modify-tasks).
+
+1. Identify the leader CTS instance by either making a call to the [`status` API endpoint](/docs/nia/cli/start) or by checking the logs for the following entry:
+ ```shell-session
+ [INFO] ha: acquired leadership lock: id=
+ ```
+1. Stop one of the follower CTS instances and apply the new configuration.
+1. Restart the follower instance.
+1. Repeat steps 2 and 3 for other follower instances in your cluster.
+1. Stop the leader instance. One of the follower instances becomes the leader.
+1. Apply the new configuration to the former leader instance and restart it.
+
+## Modify tasks
+
+When high availability is enabled, CTS persists task and event data (refer to State storage and persistence for additional information). You can use the following methods for modifying tasks when high availability is enabled. We recommend choosing a single method to make all task configuration changes. This is to limit inconsistencies between the state and the configuration that can occur when mixing methods.
+
+### Delete and recreate the task (recommended)
+
+Use the CTS API to identify the CTS leader instance and delete and replace a task.
+
+1. Identify the leader CTS instance by either making a call to the [`status` API endpoint](/docs/nia/cli/start) or by checking the logs for the following entry:
+
+ ```shell-session
+ [INFO] ha: acquired leadership lock: id=
+ ```
+1. Send a `DELETE` call to the [`/task/` endpoint](/docs/nia/api/tasks#delete-task) to delete the task. In the following example, the leader instance is at `localhost:8558`:
+
+ ```shell-session
+ $ curl --request DELETE localhost:8558/v1/tasks/task_a
+ ```
+
+ You can also use the [`task delete` command](/docs/nia/cli/task#task-delete) to complete this step.
+
+1. Send a `POST` call to the `/task/` endpoint and include the updated task in your payload.
+ ```shell-session
+ $curl --header "Content-Type: application/json" \
+ --request POST \
+ --data @payload.json \
+ localhost:8558/v1/tasks
+```
+ You can also use the [`task-create` command](/docs/nia/cli/task#task-create) to complete this step.
+
+Send a `POST` call to the `/task/` endpoint and include the updated task in your payload.
+
+```shell-session
+$curl --header "Content-Type: application/json" \
+--request POST \
+--data @payload.json \
+ localhost:8558/v1/tasks
+```
+
+You can also use the `task-create` command to complete this step.
+
+### Discard data with the `-reset-storage` flag
+
+You can restart the CTS cluster using the [`-reset-storage` flag](/docs/nia/cli/options) to discard persisted data if you need to update a task.
+
+1. Stop a follower instance.
+1. Update the instance’s task configuration.
+1. Restart the instance and include the `-reset-storage` flag.
+1. Stop all other instances so that the updated instance becomes the leader.
+1. Start all other instances again.
+1. Restart the instance you restarted in step 3 without the `-reset-storage` flag so that it starts up with the current state. If you continue to run an instance with the `-reset-storage` flag enabled, then CTS will reset the state data whenever the instance becomes the leader.
+
+## Troubleshooting
+
+Use the following troubleshooting procedure if a previous leader had been running a task successfully but the new leader logs an error after a failover:
+
+1. Check the logs printed to the console for errors. Refer to the [`syslog` configuration](/docs/nia/configuration#syslog) for information on how to locate the logs. In the following example output, CTS reported a `401: Bad credentials` error:
+ ```shell-session
+ 2022-08-23T09:25:09.501-0700 [ERROR] tasksmanager: error applying task: task_name=config-task
+ error=
+ | error tf-apply for 'config-task': exit status 1
+ |
+ | Error: GET https://api.github.com/user: 401 Bad credentials []
+ |
+ | with module.config-task.provider["registry.terraform.io/integrations/github"],
+ | on .terraform/modules/config-task/main.tf line 11, in provider "github":
+ | 11: provider "github" {
+ |
+ ```
+1. Check for differences between the previous leader and new leader, such as differences in configurations, environment variables, and local resources.
+1. Start a new instance with the fix that resolves the issue.
+1. Tear down the leader instance that has the issue and any other instances that may have the same issue.
+1. Restart the instance(s) to implement the fix.
diff --git a/website/content/docs/nia/usage/run.mdx b/website/content/docs/nia/usage/run.mdx
index d4746eb03d..a90cfae1af 100644
--- a/website/content/docs/nia/usage/run.mdx
+++ b/website/content/docs/nia/usage/run.mdx
@@ -7,13 +7,15 @@ description: >-
# Run Consul-Terraform-Sync
+This topic describes the basic procedure for running Consul-Terraform-Sync (CTS). Verify that you have met the [basic requirements](/docs/nia/usage/requirements) before attempting to run CTS.
+
1. Move the `consul-terraform-sync` binary to a location available on your `PATH`.
```shell-session
$ mv ~/Downloads/consul-terraform-sync /usr/local/bin/consul-terraform-sync
```
-2. Create the config.hcl file, all the options are available [here](/docs/nia/configuration).
+2. Create the config.hcl file and configure the options for your use case. Refer to the [configuration reference](/docs/nia/configuration) for details about all CTS configurations.
3. Run Consul-Terraform-Sync (CTS).
@@ -29,6 +31,11 @@ description: >-
## Other Run modes
-CTS allows you to inspect your configuration before applying any change and to run in once mode, meaning that you can verify the changes are correctly applied in a test run before running it in unsupervised daemon mode.
+You can [configure CTS for high availability](/docs/nia/usage/run-ha), which is an enterprise capability that ensures that all changes to Consul that occur during a failover transition are processed and that CTS continues to operate as expected.
+
+You can start CTS in [inspect mode](/docs/nia/cli/start#modes) to review and test your configuration before applying any changes. Inspect mode allows you to verify that the changes work as expected before running them in an unsupervised daemon mode.
+
+For hands-on instructions on using inspect mode, refer to the [Consul-Terraform-Sync Run Modes and Status Inspection](https://learn.hashicorp.com/tutorials/consul/consul-terraform-sync-run-and-inspect?utm_source=WEBSITE&utm_medium=WEB_IO&utm_offer=ARTICLE_PAGE&utm_content=DOCS) tutorial.
+
+
-To learn more on these options check the [Consul-Terraform-Sync Run Modes and Status Inspection](https://learn.hashicorp.com/tutorials/consul/consul-terraform-sync-run-and-inspect?utm_source=WEBSITE&utm_medium=WEB_IO&utm_offer=ARTICLE_PAGE&utm_content=DOCS) tutorial.
diff --git a/website/data/docs-nav-data.json b/website/data/docs-nav-data.json
index d89223189f..eedbd4d9be 100644
--- a/website/data/docs-nav-data.json
+++ b/website/data/docs-nav-data.json
@@ -770,6 +770,10 @@
{
"title": "Run Consul-Terraform-Sync",
"path": "nia/usage/run"
+ },
+ {
+ "title": "Run Consul-Terraform-Sync with High Availability",
+ "path": "nia/usage/run-ha"
}
]
},
diff --git a/website/public/img/nia/cts-ha-after.png b/website/public/img/nia/cts-ha-after.png
new file mode 100644
index 0000000000..5cd256056a
Binary files /dev/null and b/website/public/img/nia/cts-ha-after.png differ
diff --git a/website/public/img/nia/cts-ha-before.png b/website/public/img/nia/cts-ha-before.png
new file mode 100644
index 0000000000..fc602de964
Binary files /dev/null and b/website/public/img/nia/cts-ha-before.png differ