consul/website/content/docs/ecs/architecture.mdx
Chris Thain 1502936c12
Consul on ECS 0.4.0 (#12694)
Update website docs for Consul on ECS 0.4.0
2022-04-07 11:43:12 -07:00

127 lines
7.0 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
layout: docs
page_title: Architecture - AWS ECS
description: >-
Architecture of Consul Service Mesh on AWS ECS (Elastic Container Service).
---
# Architecture
The following diagram shows the main components of the Consul architecture when deployed to an ECS cluster:
![Consul on ECS Architecture](/img/consul-ecs-arch.png)
1. **Consul servers:** Production-ready Consul server cluster
1. **Application tasks:** Runs user application containers along with two helper containers:
1. **Consul client:** The Consul client container runs Consul. The Consul client communicates
with the Consul server and configures the Envoy proxy sidecar. This communication
is called _control plane_ communication.
1. **Sidecar proxy:** The sidecar proxy container runs [Envoy](https://envoyproxy.io/). All requests
to and from the application container(s) run through the sidecar proxy. This communication
is called _data plane_ communication.
1. **Mesh Init:** Each task runs a short-lived container, called `mesh-init`, which sets up initial configuration
for Consul and Envoy.
1. **Health Syncing:** Optionally, an additional `health-sync` container can be included in a task to sync health statuses
from ECS into Consul.
1. **ACL Controller:** Automatically provisions Consul ACL tokens for Consul clients and service mesh services
in an ECS Cluster.
For more information about how Consul works in general, see Consul's [Architecture Overview](/docs/architecture).
## Task Startup
This diagram shows the timeline of a task starting up and all its containers:
<ImageConfig width={400}>
![Task Startup Timeline](/img/ecs-task-startup.svg)
</ImageConfig>
- **T0:** ECS starts the task. The `consul-client` and `mesh-init` containers start:
- `consul-client` uses the `retry-join` option to join the Consul cluster
- `mesh-init` registers the service for the current task and its sidecar proxy with
Consul. It runs `consul connect envoy -bootstrap` to generate Envoys
bootstrap JSON file and write it to a shared volume. `mesh-init` exits after completing these operations.
- **T1:** The following containers start:
- The `sidecar-proxy` container starts and runs Envoy by executing `envoy -c <path-to-bootstrap-json>`.
- If applicable, the `health-sync` container syncs health checks from ECS to Consul (see [ECS Health Check Syncing](#ecs-health-check-syncing)).
- **T2:** The `sidecar-proxy` container is marked as healthy by ECS. It uses a health check that detects if its public listener port is open. At this time, your application containers are started since all Consul machinery is ready to service requests. The only running containers are `consul-client`, `sidecar-proxy`, and your application container(s).
## Task Shutdown
This diagram shows an example timeline of a task shutting down:
<ImageConfig width={400}>
![Task Shutdown Timeline](/img/ecs-task-shutdown.svg)
</ImageConfig>
- **T0**: ECS sends a TERM signal to all containers. Each container reacts to the TERM signal:
- `consul-client` begins to gracefully leave the Consul cluster.
- `health-sync` stops syncing health status from ECS into Consul checks.
- `sidecar-proxy` ignores the TERM signal and continues running until the `user-app` container exits. This allows the application container to continue making outgoing requests through the proxy to the mesh.
- `user-app` exits if it is not configured to ignore the TERM signal. The `user-app` container will continue running if it is configured to ignore the TERM signal.
- **T1**:
- `health-sync` updates its Consul checks to critical status and exits. This ensures this service instance is marked unhealthy.
- `sidecar-proxy` notices the `user-app` container has stopped and exits.
- **T2**: `consul-client` finishes gracefully leaving the Consul datacenter and exits.
- **T3**:
- ECS notices all containers have exited, and will soon change the Task status to `STOPPED`
- Updates about this task have reached the rest of the Consul cluster, so downstream proxies have been updated to stopped sending traffic to this task.
- **T4**: At this point task shutdown should be complete. Otherwise, ECS will send a KILL signal to any containers still running. The KILL signal cannot be ignored and will forcefully stop containers. This will interrupt in-progress operations and possibly cause errors.
## ACL Controller
The ACL controller performs the following operations:
* Provisions Consul ACL tokens for Consul clients and service mesh services.
* Manages Consul admin partitions and namespaces.
### Automatic ACL Token Provisioning
Consul ACL tokens secure communication between agents and services.
The following containers in a task require an ACL token:
- `consul-client`: The Consul client uses a token to authorize itself with Consul servers.
All `consul-client` containers share the same token.
- `mesh-init`: The `mesh-init` container uses a token to register the service with Consul.
This token is unique for the Consul service, and is shared by instances of the service.
The ACL controller automatically creates ACL tokens for mesh-enabled tasks in an ECS cluster.
The `acl-controller` Terraform module creates the ACL controller task. The controller creates the
ACL token used by `consul-client` containers at startup and then watches for tasks in the cluster. It checks tags
to determine if the task is mesh-enabled. If so, it creates the service ACL token for the task, if the
token does not yet exist.
The ACL controller stores all ACL tokens in AWS Secrets Manager, and tasks are configured to pull these
tokens from AWS Secrets Manager when they start.
### Admin Partitions and Namespaces<EnterpriseAlert inline />
When [admin partitions and namespaces](/docs/ecs/enterprise#admin-partitions-and-namespaces) are enabled,
the ACL controller is assigned to its configured admin partition. The ACL controller provisions ACL
tokens for tasks in a single admin partition and supports one ACL controller instance per ECS
cluster. This results in an architecture with one admin partition per ECS cluster.
The ACL controller automatically performs the following actions:
* Creates its admin partition at startup if it does not exist.
* Inspects ECS task tags for the task's intended partition and namespace.
The ACL controller ignores tasks that do not match the `partition` tag.
* Creates namespaces when tasks start up. Namespaces are only created if they do not exist.
* Provisions ACL tokens and ACL policies that are scoped to the applicable admin partition and namespace.
* Provision ACL tokens that allow services to communicate with upstreams across admin partitions and namespaces.
## ECS Health Check Syncing
If the following conditions apply, ECS health checks automatically sync with Consul health checks for all application containers:
* marked as `essential`
* have ECS `healthChecks`
* are not configured with native Consul health checks
The `mesh-init` container creates a TTL health check for
every container that fits these criteria and the `health-sync` container ensures
that the ECS and Consul health checks remain in sync.