2019-06-14 05:52:50 +00:00
|
|
|
---
|
2020-04-07 18:55:19 +00:00
|
|
|
layout: docs
|
2020-09-01 15:14:13 +00:00
|
|
|
page_title: How Connect Works
|
2020-04-07 18:55:19 +00:00
|
|
|
description: >-
|
|
|
|
This page details the internals of Consul Connect: mutual TLS, agent caching
|
|
|
|
and performance, intention and certificate authority replication.
|
2019-06-14 05:52:50 +00:00
|
|
|
---
|
|
|
|
|
2022-04-25 17:04:06 +00:00
|
|
|
# How Service Mesh Works
|
2019-06-14 05:52:50 +00:00
|
|
|
|
2022-04-25 17:04:06 +00:00
|
|
|
This page details the inner workings of some of Consul sevice mehs's core features.
|
|
|
|
Understanding how these features work isn't a prerequisite for using Consul service mesh,
|
2019-06-14 05:52:50 +00:00
|
|
|
but will help you build a mental model of what's going on under the hood, which
|
2022-04-25 17:04:06 +00:00
|
|
|
may help you reason about service mesh's behavior in more complex deployment
|
2019-06-14 05:52:50 +00:00
|
|
|
scenarios.
|
|
|
|
|
2022-04-25 17:04:06 +00:00
|
|
|
As a reminder, Consul Connect is used interchangeably
|
|
|
|
with the name Consul Service Mesh and is what this document will use to refer to for Service Mesh functionality within Consul.
|
|
|
|
|
|
|
|
To try service mesh locally, complete the [Getting Started with Consul service
|
2020-08-17 16:19:04 +00:00
|
|
|
mesh](https://learn.hashicorp.com/tutorials/consul/service-mesh?utm_source=WEBSITE&utm_medium=WEB_IO&utm_offer=ARTICLE_PAGE&utm_content=DOCS)
|
|
|
|
tutorial.
|
2020-04-16 17:56:31 +00:00
|
|
|
|
2019-06-14 05:52:50 +00:00
|
|
|
## Mutual Transport Layer Security (mTLS)
|
|
|
|
|
|
|
|
The core of Connect is based on [mutual TLS](https://en.wikipedia.org/wiki/Mutual_authentication).
|
|
|
|
|
|
|
|
Connect provides each service with an identity encoded as a TLS certificate.
|
|
|
|
This certificate is used to establish and accept connections to and from other
|
|
|
|
services. The identity is encoded in the TLS certificate in compliance with
|
|
|
|
the [SPIFFE X.509 Identity Document](https://github.com/spiffe/spiffe/blob/master/standards/X509-SVID.md).
|
|
|
|
This enables Connect services to establish and accept connections with
|
|
|
|
other SPIFFE-compliant systems.
|
|
|
|
|
|
|
|
The client service verifies the destination service certificate
|
2022-03-30 21:16:26 +00:00
|
|
|
against the [public CA bundle](/api-docs/connect/ca#list-ca-root-certificates).
|
2019-06-14 05:52:50 +00:00
|
|
|
This is very similar to a typical HTTPS web browser connection. In addition
|
|
|
|
to this, the client provides its own client certificate to show its
|
|
|
|
identity to the destination service. If the connection handshake succeeds,
|
|
|
|
the connection is encrypted and authorized.
|
|
|
|
|
2020-10-14 15:23:05 +00:00
|
|
|
The destination service verifies the client certificate against the [public CA
|
2022-03-30 21:16:26 +00:00
|
|
|
bundle](/api-docs/connect/ca#list-ca-root-certificates). After verifying the
|
2020-10-14 15:23:05 +00:00
|
|
|
certificate, the next step depends upon the configured application protocol of
|
|
|
|
the destination service. TCP (L4) services must authorize incoming _connections_
|
|
|
|
against the configured set of Consul [intentions](/docs/connect/intentions),
|
|
|
|
whereas HTTP (L7) services must authorize incoming _requests_ against those same
|
|
|
|
intentions. If the intention check responds successfully, the
|
|
|
|
connection/request is established. Otherwise the connection/request is
|
|
|
|
rejected.
|
2019-06-14 05:52:50 +00:00
|
|
|
|
|
|
|
To generate and distribute certificates, Consul has a built-in CA that
|
|
|
|
requires no other dependencies, and
|
2020-04-09 23:46:54 +00:00
|
|
|
also ships with built-in support for [Vault](/docs/connect/ca/vault). The PKI system is designed to be pluggable
|
2019-06-14 05:52:50 +00:00
|
|
|
and can be extended to support any system by adding additional CA providers.
|
|
|
|
|
|
|
|
All APIs required for Connect typically respond in microseconds and impose
|
2020-01-14 22:59:27 +00:00
|
|
|
minimal overhead to existing services. To ensure this, Connect-related API calls
|
2019-06-14 05:52:50 +00:00
|
|
|
are all made to the local Consul agent over a loopback interface, and all [agent
|
2022-03-30 21:16:26 +00:00
|
|
|
Connect endpoints](/api-docs/agent/connect) implement local caching, background
|
2019-06-14 05:52:50 +00:00
|
|
|
updating, and support blocking queries. Most API calls operate on purely local
|
|
|
|
in-memory data.
|
|
|
|
|
|
|
|
## Agent Caching and Performance
|
|
|
|
|
2020-10-14 15:23:05 +00:00
|
|
|
To enable fast responses on endpoints such as the [agent Connect
|
2022-03-30 21:16:26 +00:00
|
|
|
API](/api-docs/agent/connect), the Consul agent locally caches most Connect-related
|
|
|
|
data and sets up background [blocking queries](/api-docs/features/blocking) against
|
2020-10-14 15:23:05 +00:00
|
|
|
the server to update the cache in the background. This allows most API calls
|
|
|
|
such as retrieving certificates or authorizing connections to use in-memory
|
|
|
|
data and respond very quickly.
|
2019-06-14 05:52:50 +00:00
|
|
|
|
|
|
|
All data cached locally by the agent is populated on demand. Therefore, if
|
|
|
|
Connect is not used at all, the cache does not store any data. On first request,
|
|
|
|
the data is loaded from the server and cached. The set of data cached is: public
|
|
|
|
CA root certificates, leaf certificates, intentions, and service discovery
|
|
|
|
results for upstreams. For leaf certificates and intentions, only data related
|
|
|
|
to the service requested is cached, not the full set of data.
|
|
|
|
|
|
|
|
Further, the cache is partitioned by ACL token and datacenters. This is done
|
|
|
|
to minimize the complexity of the cache and prevent bugs where an ACL token
|
|
|
|
may see data it shouldn't from the cache. This results in higher memory usage
|
|
|
|
for cached data since it is duplicated per ACL token, but with the benefit
|
|
|
|
of simplicity and security.
|
|
|
|
|
|
|
|
With Connect enabled, you'll likely see increased memory usage by the
|
|
|
|
local Consul agent. The total memory is dependent on the number of intentions
|
|
|
|
related to the services registered with the agent accepting Connect-based
|
|
|
|
connections. The other data (leaf certificates and public CA certificates)
|
|
|
|
is a relatively fixed size per service. In most cases, the overhead per
|
|
|
|
service should be relatively small: single digit kilobytes at most.
|
|
|
|
|
|
|
|
The cache does not evict entries due to memory pressure. If memory capacity
|
|
|
|
is reached, the process will attempt to swap. If swap is disabled, the Consul
|
|
|
|
agent may begin failing and eventually crash. Cache entries do have TTLs
|
|
|
|
associated with them and will evict their entries if they're not used. Given
|
|
|
|
a long period of inactivity (3 days by default), the cache will empty itself.
|
|
|
|
|
2019-07-08 14:12:42 +00:00
|
|
|
## Connections Across Datacenters
|
2019-06-14 05:52:50 +00:00
|
|
|
|
2020-10-14 15:23:05 +00:00
|
|
|
A sidecar proxy's [upstream configuration](/docs/connect/registration/service-registration#upstream-configuration-reference)
|
2019-07-08 14:12:42 +00:00
|
|
|
may specify an alternative datacenter or a prepared query that can address services
|
2020-08-17 16:19:04 +00:00
|
|
|
in multiple datacenters (such as the [geo failover](https://learn.hashicorp.com/tutorials/consul/automate-geo-failover) pattern).
|
2019-06-14 05:52:50 +00:00
|
|
|
|
2020-04-09 23:46:54 +00:00
|
|
|
[Intentions](/docs/connect/intentions) verify connections between services by
|
2019-07-08 14:12:42 +00:00
|
|
|
source and destination name seamlessly across datacenters.
|
2019-06-14 05:52:50 +00:00
|
|
|
|
2020-10-14 15:23:05 +00:00
|
|
|
Connections can be made via gateways to enable communicating across network
|
|
|
|
topologies, allowing connections between services in each datacenter without
|
|
|
|
externally routable IPs at the service level.
|
2019-07-08 14:12:42 +00:00
|
|
|
|
|
|
|
## Intention Replication
|
|
|
|
|
|
|
|
Intention replication happens automatically but requires the
|
2022-01-11 01:30:50 +00:00
|
|
|
[`primary_datacenter`](/docs/agent/config/config-files#primary_datacenter)
|
2019-07-08 14:12:42 +00:00
|
|
|
configuration to be set to specify a datacenter that is authoritative
|
|
|
|
for intentions. In production setups with ACLs enabled, the
|
2022-01-11 01:30:50 +00:00
|
|
|
[replication token](/docs/agent/config/config-files#acl_tokens_replication) must also
|
2019-07-08 14:12:42 +00:00
|
|
|
be set in the secondary datacenter server's configuration.
|
|
|
|
|
|
|
|
## Certificate Authority Federation
|
|
|
|
|
|
|
|
The primary datacenter also acts as the root Certificate Authority (CA) for Connect.
|
|
|
|
The primary datacenter generates a trust-domain UUID and obtains a root certificate
|
|
|
|
from the configured CA provider which defaults to the built-in one.
|
|
|
|
|
|
|
|
Secondary datacenters fetch the root CA public key and trust-domain ID from the
|
|
|
|
primary and generate their own key and Certificate Signing Request (CSR) for an
|
|
|
|
intermediate CA certificate. This CSR is signed by the root in the primary
|
|
|
|
datacenter and the certificate is returned. The secondary datacenter can now use
|
|
|
|
this intermediate to sign new Connect certificates in the secondary datacenter
|
|
|
|
without WAN communication. CA keys are never replicated between datacenters.
|
|
|
|
|
|
|
|
The secondary maintains watches on the root CA certificate in the primary. If the
|
|
|
|
CA root changes for any reason such as rotation or migration to a new CA, the
|
|
|
|
secondary automatically generates new keys and has them signed by the primary
|
|
|
|
datacenter's new root before initiating an automatic rotation of all issued
|
|
|
|
certificates in use throughout the secondary datacenter. This makes CA root key
|
|
|
|
rotation fully automatic and with zero downtime across multiple datacenters.
|