2020-05-08 18:07:45 -05:00
|
|
|
---
|
|
|
|
layout: docs
|
2020-09-01 10:14:13 -05:00
|
|
|
page_title: WAN Federation via Mesh Gateways
|
|
|
|
sidebar_title: WAN Federation
|
2020-05-08 18:07:45 -05:00
|
|
|
description: |-
|
|
|
|
WAN federation via mesh gateways allows for Consul servers in different datacenters to be federated exclusively through mesh gateways.
|
|
|
|
---
|
|
|
|
|
2020-06-24 16:12:24 -06:00
|
|
|
# WAN Federation via Mesh Gateways
|
2020-05-08 18:07:45 -05:00
|
|
|
|
2020-07-08 19:09:00 -04:00
|
|
|
-> **1.8.0+:** This feature is available in Consul versions 1.8.0 and higher
|
2020-05-08 18:07:45 -05:00
|
|
|
|
2020-08-18 18:22:29 -04:00
|
|
|
~> This topic requires familiarity with [mesh gateways](/docs/connect/gateways/mesh-gateway).
|
2020-05-08 18:07:45 -05:00
|
|
|
|
|
|
|
WAN federation via mesh gateways allows for Consul servers in different datacenters
|
|
|
|
to be federated exclusively through mesh gateways.
|
|
|
|
|
|
|
|
When setting up a
|
2020-08-13 17:02:44 -04:00
|
|
|
[multi-datacenter](https://learn.hashicorp.com/tutorials/consul/federarion-gossip-wan)
|
2020-05-08 18:07:45 -05:00
|
|
|
Consul cluster, operators must ensure that all Consul servers in every
|
|
|
|
datacenter must be directly connectable over their WAN-advertised network
|
|
|
|
address from each other.
|
|
|
|
|
|
|
|
This requires that operators setting up the virtual machines or containers
|
|
|
|
hosting the servers take additional steps to ensure the necessary routing and
|
|
|
|
firewall rules are in place to allow the servers to speak to each other over
|
|
|
|
the WAN.
|
|
|
|
|
|
|
|
Sometimes this prerequisite is difficult or undesirable to meet:
|
|
|
|
|
2020-07-08 19:09:00 -04:00
|
|
|
- **Difficult:** The datacenters may exist in multiple Kubernetes clusters that
|
2020-05-08 18:07:45 -05:00
|
|
|
unfortunately have overlapping pod IP subnets, or may exist in different
|
|
|
|
cloud provider VPCs that have overlapping subnets.
|
|
|
|
|
2020-07-08 19:09:00 -04:00
|
|
|
- **Undesirable:** Network security teams may not approve of granting so many
|
2020-05-08 18:07:45 -05:00
|
|
|
firewall rules. When using platform autoscaling, keeping rules up to date becomes untenable.
|
|
|
|
|
|
|
|
Operators looking to simplify their WAN deployment and minimize the exposed
|
|
|
|
security surface area can elect to join these datacenters together using [mesh
|
2020-08-18 19:13:40 -04:00
|
|
|
gateways](/docs/connect/gateways/mesh-gateway) to do so.
|
2020-05-08 18:07:45 -05:00
|
|
|
|
|
|
|
## Architecture
|
|
|
|
|
|
|
|
There are two main kinds of communication that occur over the WAN link spanning
|
|
|
|
the gulf between disparate Consul datacenters:
|
|
|
|
|
2020-07-08 19:09:00 -04:00
|
|
|
- **WAN gossip:** We leverage the serf and memberlist libraries to gossip
|
2020-05-08 18:07:45 -05:00
|
|
|
around failure detector knowledge about Consul servers in each datacenter.
|
|
|
|
By default this operates point to point between servers over `8302/udp` with
|
|
|
|
a fallback to `8302/tcp` (which logs a warning indicating the network is
|
|
|
|
misconfigured).
|
|
|
|
|
2020-07-08 19:09:00 -04:00
|
|
|
- **Cross-datacenter RPCs:** Consul servers expose a special multiplexed port
|
2020-05-08 18:07:45 -05:00
|
|
|
over `8300/tcp`. Several distinct kinds of messages can be received on this
|
|
|
|
port, such as RPC requests forwarded from servers in other datacenters.
|
|
|
|
|
|
|
|
In this network topology individual Consul client agents on a LAN in one
|
|
|
|
datacenter never need to directly dial servers in other datacenters. This
|
|
|
|
means you could introduce a set of firewall rules prohibiting `10.0.0.0/24`
|
|
|
|
from sending any traffic at all to `10.1.2.0/24` for security isolation.
|
|
|
|
|
|
|
|
You may already have configured [mesh
|
2020-08-13 17:02:44 -04:00
|
|
|
gateways](https://learn.hashicorp.com/tutorials/consul/service-mesh-gateways)
|
2020-05-08 18:07:45 -05:00
|
|
|
to allow for services in the service mesh to freely connect between datacenters
|
|
|
|
regardless of the lateral connectivity of the nodes hosting the Consul client
|
|
|
|
agents.
|
|
|
|
|
2020-05-12 16:34:59 -05:00
|
|
|
By activating WAN federation via mesh gateways the servers
|
2020-05-08 18:07:45 -05:00
|
|
|
can similarly use the existing mesh gateways to reach each other without
|
2020-05-12 16:34:59 -05:00
|
|
|
themselves being directly reachable.
|
2020-05-08 18:07:45 -05:00
|
|
|
|
|
|
|
## Configuration
|
|
|
|
|
|
|
|
### TLS
|
|
|
|
|
|
|
|
All Consul servers in all datacenters should have TLS configured with certificates containing
|
|
|
|
these SAN fields:
|
|
|
|
|
|
|
|
server.<this_datacenter>.<domain> (normal)
|
|
|
|
<node_name>.server.<this_datacenter>.<domain> (needed for wan federation)
|
|
|
|
|
2020-07-08 19:09:00 -04:00
|
|
|
This can be achieved using any number of tools, including `consul tls cert create` with the `-node` flag.
|
2020-05-08 18:07:45 -05:00
|
|
|
|
|
|
|
### Mesh Gateways
|
|
|
|
|
|
|
|
There needs to be at least one mesh gateway configured to opt-in to exposing
|
|
|
|
the servers in its configuration. When using the `consul connect envoy` CLI
|
|
|
|
this is done by using the flag `-expose-servers`. All this does is to register
|
|
|
|
the mesh gateway into the catalog with the additional piece of service metadata
|
|
|
|
of `{"consul-wan-federation":"1"}`. If you are registering the mesh gateways
|
|
|
|
into the catalog out of band you may simply add this to your existing
|
|
|
|
registration payload.
|
|
|
|
|
|
|
|
!> Before activating the feature on an existing cluster you should ensure that
|
|
|
|
there is at least one mesh gateway prepared to expose the servers registered in
|
|
|
|
each datacenter otherwise the WAN will become only partly connected.
|
|
|
|
|
|
|
|
### Consul Server Options
|
|
|
|
|
|
|
|
There are a few necessary additional pieces of configuration beyond those
|
|
|
|
required for standing up a
|
2020-08-13 17:02:44 -04:00
|
|
|
[multi-datacenter](https://learn.hashicorp.com/tutorials/consul/federarion-gossip-wan)
|
2020-05-08 18:07:45 -05:00
|
|
|
Consul cluster.
|
|
|
|
|
|
|
|
Consul servers in the _primary_ datacenter should add this snippet to the
|
|
|
|
configuration file:
|
|
|
|
|
|
|
|
```hcl
|
|
|
|
connect {
|
|
|
|
enabled = true
|
|
|
|
enable_mesh_gateway_wan_federation = true
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
Consul servers in all _secondary_ datacenters should add this snippet to the
|
|
|
|
configuration file:
|
|
|
|
|
|
|
|
```hcl
|
|
|
|
primary_gateways = [ "<primary-mesh-gateway-ip>:<primary-mesh-gateway-port>", ... ]
|
|
|
|
connect {
|
|
|
|
enabled = true
|
|
|
|
enable_mesh_gateway_wan_federation = true
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
2020-05-13 14:11:58 -05:00
|
|
|
Any references to [`start_join_wan`](/docs/agent/options#start_join_wan) or [`retry_join_wan`](/docs/agent/options#retry_join_wan) should be omitted.
|
2020-05-08 18:07:45 -05:00
|
|
|
|
|
|
|
-> The `primary_gateways` configuration can also use `go-discover` syntax just
|
|
|
|
like `retry_join_wan`.
|
|
|
|
|
|
|
|
### Bootstrapping
|
|
|
|
|
|
|
|
For ease of debugging (such as avoiding a flurry of misleading error messages)
|
|
|
|
when intending to activate WAN federation via mesh gateways it is best to
|
|
|
|
follow this general procedure:
|
|
|
|
|
|
|
|
### New secondary
|
|
|
|
|
|
|
|
1. Upgrade to the desired version of the consul binary for all servers,
|
|
|
|
clients, and CLI.
|
|
|
|
2. Start all consul servers and clients on the new version in the primary
|
|
|
|
datacenter.
|
|
|
|
3. Ensure the primary datacenter has at least one running, registered mesh gateway with
|
|
|
|
the service metadata key of `{"consul-wan-federation":"1"}` set.
|
|
|
|
4. Ensure you are _prepared_ to launch corresponding mesh gateways in all
|
|
|
|
secondaries. When ACLs are enabled actually registering these requires
|
|
|
|
upstream connectivity to the primary datacenter to authorize catalog
|
|
|
|
registration.
|
|
|
|
5. Ensure all servers in the primary datacenter have updated configuration and
|
|
|
|
restart.
|
|
|
|
6. Ensure all servers in the secondary datacenter have updated configuration.
|
|
|
|
7. Start all consul servers and clients on the new version in the secondary
|
|
|
|
datacenter.
|
|
|
|
8. When ACLs are enabled, shortly afterwards it should become possible to
|
|
|
|
resolve ACL tokens from the secondary, at which time it should be possible
|
|
|
|
to launch the mesh gateways in the secondary datacenter.
|
|
|
|
|
|
|
|
### Existing secondary
|
|
|
|
|
|
|
|
1. Upgrade to the desired version of the consul binary for all servers,
|
|
|
|
clients, and CLI.
|
|
|
|
2. Restart all consul servers and clients on the new version.
|
|
|
|
3. Ensure each datacenter has at least one running, registered mesh gateway with the
|
|
|
|
service metadata key of `{"consul-wan-federation":"1"}` set.
|
|
|
|
4. Ensure all servers in the primary datacenter have updated configuration and
|
|
|
|
restart.
|
|
|
|
5. Ensure all servers in the secondary datacenter have updated configuration and
|
|
|
|
restart.
|
|
|
|
|
|
|
|
### Verification
|
|
|
|
|
|
|
|
From any two datacenters joined together double check the following give you an
|
|
|
|
expected result:
|
|
|
|
|
2020-07-08 19:09:00 -04:00
|
|
|
- Check that `consul members -wan` lists all servers in all datacenters with
|
2020-05-08 18:07:45 -05:00
|
|
|
their _local_ ip addresses and are listed as `alive`.
|
|
|
|
|
2020-07-08 19:09:00 -04:00
|
|
|
- Ensure any API request that activates datacenter request forwarding. such as
|
2020-09-01 10:14:13 -05:00
|
|
|
[`/v1/catalog/services?dc=<OTHER_DATACENTER_NAME>`](/api/catalog#dc-1)
|
2020-05-08 18:07:45 -05:00
|
|
|
succeeds.
|