consul/agent/proxycfg/snapshot.go

package proxycfg

import (
	"context"

	"github.com/hashicorp/consul/agent/structs"
	"github.com/mitchellh/copystructure"
)

type configSnapshotConnectProxy struct {
	Leaf                     *structs.IssuedCert
	DiscoveryChain           map[string]*structs.CompiledDiscoveryChain // this is keyed by the Upstream.Identifier(), not the chain name
	WatchedUpstreams         map[string]map[string]context.CancelFunc
	WatchedUpstreamEndpoints map[string]map[string]structs.CheckServiceNodes
	WatchedGateways          map[string]map[string]context.CancelFunc
	WatchedGatewayEndpoints  map[string]map[string]structs.CheckServiceNodes
	WatchedServiceChecks     map[string][]structs.CheckType

	UpstreamEndpoints map[string]structs.CheckServiceNodes // DEPRECATED:see:WatchedUpstreamEndpoints
}

type configSnapshotMeshGateway struct {
	WatchedServices    map[string]context.CancelFunc
	WatchedDatacenters map[string]context.CancelFunc
	ServiceGroups      map[string]structs.CheckServiceNodes
	ServiceResolvers   map[string]*structs.ServiceResolverConfigEntry
	GatewayGroups      map[string]structs.CheckServiceNodes
}

// ConfigSnapshot captures all the resulting config needed for a proxy instance.
// It is meant to be point-in-time coherent and is used to deliver the current
// config state to observers who need it to be pushed in (e.g. XDS server).
type ConfigSnapshot struct {
	Kind            structs.ServiceKind
	Service         string
	ProxyID         string
	Address         string
	Port            int
	TaggedAddresses map[string]structs.ServiceAddress
	Proxy           structs.ConnectProxyConfig
	Datacenter      string
	Roots           *structs.IndexedCARoots

	// connect-proxy specific
	ConnectProxy configSnapshotConnectProxy

	// mesh-gateway specific
	MeshGateway configSnapshotMeshGateway

	// Skip intentions for now as we don't push those down yet, just pre-warm them.
}

// Valid returns whether or not the snapshot has all required fields filled yet.
func (s *ConfigSnapshot) Valid() bool {
	switch s.Kind {
	case structs.ServiceKindConnectProxy:
		// TODO(rb): sanity check discovery chain things here?
		return s.Roots != nil && s.ConnectProxy.Leaf != nil
	case structs.ServiceKindMeshGateway:
		// TODO (mesh-gateway) - what happens if all the connect services go away
		return s.Roots != nil && len(s.MeshGateway.ServiceGroups) > 0
	default:
		return false
	}
}

// Clone makes a deep copy of the snapshot we can send to other goroutines
// without worrying that they will racily read or mutate shared maps etc.
func (s *ConfigSnapshot) Clone() (*ConfigSnapshot, error) {
	snapCopy, err := copystructure.Copy(s)
	if err != nil {
		return nil, err
	}

	snap := snapCopy.(*ConfigSnapshot)

	// nil these out as anything receiving one of these clones does not need them and should never "cancel" our watches
	switch s.Kind {
	case structs.ServiceKindConnectProxy:
		snap.ConnectProxy.WatchedUpstreams = nil
		snap.ConnectProxy.WatchedGateways = nil
	case structs.ServiceKindMeshGateway:
		snap.MeshGateway.WatchedDatacenters = nil
		snap.MeshGateway.WatchedServices = nil
	}

	return snap, nil
}
Proxy Config Manager (#4729) * Proxy Config Manager This component watches for local state changes on the agent and ensures that each service registered locally with Kind == connect-proxy has it's state being actively populated in the cache. This serves two purposes: 1. For the built-in proxy, it ensures that the state needed to accept connections is available in RAM shortly after registration and likely before the proxy actually starts accepting traffic. 2. For (future - next PR) xDS server and other possible future proxies that require _push_ based config discovery, this provides a mechanism to subscribe and be notified about updates to a proxy instance's config including upstream service discovery results. * Address review comments * Better comments; Better delivery of latest snapshot for slow watchers; Embed Config * Comment typos * Add upstream Stringer for funsies 2018-10-03 12:36:38 +00:00			`package proxycfg`

			`import (`
Implement Mesh Gateways This includes both ingress and egress functionality. 2019-06-18 00:52:01 +00:00			`"context"`

Proxy Config Manager (#4729) * Proxy Config Manager This component watches for local state changes on the agent and ensures that each service registered locally with Kind == connect-proxy has it's state being actively populated in the cache. This serves two purposes: 1. For the built-in proxy, it ensures that the state needed to accept connections is available in RAM shortly after registration and likely before the proxy actually starts accepting traffic. 2. For (future - next PR) xDS server and other possible future proxies that require _push_ based config discovery, this provides a mechanism to subscribe and be notified about updates to a proxy instance's config including upstream service discovery results. * Address review comments * Better comments; Better delivery of latest snapshot for slow watchers; Embed Config * Comment typos * Add upstream Stringer for funsies 2018-10-03 12:36:38 +00:00			`"github.com/hashicorp/consul/agent/structs"`
			`"github.com/mitchellh/copystructure"`
			`)`

Implement Mesh Gateways This includes both ingress and egress functionality. 2019-06-18 00:52:01 +00:00			`type configSnapshotConnectProxy struct {`
activate most discovery chain features in xDS for envoy (#6024) 2019-07-02 03:10:51 +00:00			`Leaf *structs.IssuedCert`
			`DiscoveryChain map[string]*structs.CompiledDiscoveryChain // this is keyed by the Upstream.Identifier(), not the chain name`
connect: expose an API endpoint to compile the discovery chain (#6248) In addition to exposing compilation over the API cleaned up the structures that would be exchanged to be cleaner and easier to support and understand. Also removed ability to configure the envoy OverprovisioningFactor. 2019-08-02 20:34:54 +00:00			`WatchedUpstreams map[string]map[string]context.CancelFunc`
			`WatchedUpstreamEndpoints map[string]map[string]structs.CheckServiceNodes`
connect: fix failover through a mesh gateway to a remote datacenter (#6259) Failover is pushed entirely down to the data plane by creating envoy clusters and putting each successive destination in a different load assignment priority band. For example this shows that normally requests go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080: - name: foo load_assignment: cluster_name: foo policy: overprovisioning_factor: 100000 endpoints: - priority: 0 lb_endpoints: - endpoint: address: socket_address: address: 1.2.3.4 port_value: 8080 - priority: 1 lb_endpoints: - endpoint: address: socket_address: address: 6.7.8.9 port_value: 8080 Mesh gateways route requests based solely on the SNI header tacked onto the TLS layer. Envoy currently only lets you configure the outbound SNI header at the cluster layer. If you try to failover through a mesh gateway you ideally would configure the SNI value per endpoint, but that's not possible in envoy today. This PR introduces a simpler way around the problem for now: 1. We identify any target of failover that will use mesh gateway mode local or remote and then further isolate any resolver node in the compiled discovery chain that has a failover destination set to one of those targets. 2. For each of these resolvers we will perform a small measurement of comparative healths of the endpoints that come back from the health API for the set of primary target and serial failover targets. We walk the list of targets in order and if any endpoint is healthy we return that target, otherwise we move on to the next target. 3. The CDS and EDS endpoints both perform the measurements in (2) for the affected resolver nodes. 4. For CDS this measurement selects which TLS SNI field to use for the cluster (note the cluster is always going to be named for the primary target) 5. For EDS this measurement selects which set of endpoints will populate the cluster. Priority tiered failover is ignored. One of the big downsides to this approach to failover is that the failover detection and correction is going to be controlled by consul rather than deferring that entirely to the data plane as with the prior version. This also means that we are bound to only failover using official health signals and cannot make use of data plane signals like outlier detection to affect failover. In this specific scenario the lack of data plane signals is ok because the effectiveness is already muted by the fact that the ultimate destination endpoints will have their data plane signals scrambled when they pass through the mesh gateway wrapper anyway so we're not losing much. Another related fix is that we now use the endpoint health from the underlying service, not the health of the gateway (regardless of failover mode). 2019-08-05 18:30:35 +00:00			`WatchedGateways map[string]map[string]context.CancelFunc`
			`WatchedGatewayEndpoints map[string]map[string]structs.CheckServiceNodes`
Expose HTTP-based paths through Connect proxy (#6446) Fixes: #5396 This PR adds a proxy configuration stanza called expose. These flags register listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only listening on the loopback interface, while still accepting traffic from non Connect-enabled services. Under expose there is a boolean checks flag that would automatically expose all registered HTTP and gRPC check paths. This stanza also accepts a paths list to expose individual paths. The primary use case for this functionality would be to expose paths for third parties like Prometheus or the kubelet. Listeners for requests to exposed paths are be configured dynamically at run time. Any time a proxy, or check can be registered, a listener can also be created. In this initial implementation requests to these paths are not authenticated/encrypted. 2019-09-26 02:55:52 +00:00			`WatchedServiceChecks map[string][]structs.CheckType`
connect: fix failover through a mesh gateway to a remote datacenter (#6259) Failover is pushed entirely down to the data plane by creating envoy clusters and putting each successive destination in a different load assignment priority band. For example this shows that normally requests go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080: - name: foo load_assignment: cluster_name: foo policy: overprovisioning_factor: 100000 endpoints: - priority: 0 lb_endpoints: - endpoint: address: socket_address: address: 1.2.3.4 port_value: 8080 - priority: 1 lb_endpoints: - endpoint: address: socket_address: address: 6.7.8.9 port_value: 8080 Mesh gateways route requests based solely on the SNI header tacked onto the TLS layer. Envoy currently only lets you configure the outbound SNI header at the cluster layer. If you try to failover through a mesh gateway you ideally would configure the SNI value per endpoint, but that's not possible in envoy today. This PR introduces a simpler way around the problem for now: 1. We identify any target of failover that will use mesh gateway mode local or remote and then further isolate any resolver node in the compiled discovery chain that has a failover destination set to one of those targets. 2. For each of these resolvers we will perform a small measurement of comparative healths of the endpoints that come back from the health API for the set of primary target and serial failover targets. We walk the list of targets in order and if any endpoint is healthy we return that target, otherwise we move on to the next target. 3. The CDS and EDS endpoints both perform the measurements in (2) for the affected resolver nodes. 4. For CDS this measurement selects which TLS SNI field to use for the cluster (note the cluster is always going to be named for the primary target) 5. For EDS this measurement selects which set of endpoints will populate the cluster. Priority tiered failover is ignored. One of the big downsides to this approach to failover is that the failover detection and correction is going to be controlled by consul rather than deferring that entirely to the data plane as with the prior version. This also means that we are bound to only failover using official health signals and cannot make use of data plane signals like outlier detection to affect failover. In this specific scenario the lack of data plane signals is ok because the effectiveness is already muted by the fact that the ultimate destination endpoints will have their data plane signals scrambled when they pass through the mesh gateway wrapper anyway so we're not losing much. Another related fix is that we now use the endpoint health from the underlying service, not the health of the gateway (regardless of failover mode). 2019-08-05 18:30:35 +00:00
			`UpstreamEndpoints map[string]structs.CheckServiceNodes // DEPRECATED:see:WatchedUpstreamEndpoints`
Implement Mesh Gateways This includes both ingress and egress functionality. 2019-06-18 00:52:01 +00:00			`}`

			`type configSnapshotMeshGateway struct {`
			`WatchedServices map[string]context.CancelFunc`
			`WatchedDatacenters map[string]context.CancelFunc`
			`ServiceGroups map[string]structs.CheckServiceNodes`
Implement mesh gateway management of service subsets Fixup some error handling 2019-07-02 13:43:35 +00:00			`ServiceResolvers map[string]*structs.ServiceResolverConfigEntry`
Implement Mesh Gateways This includes both ingress and egress functionality. 2019-06-18 00:52:01 +00:00			`GatewayGroups map[string]structs.CheckServiceNodes`
			`}`

Proxy Config Manager (#4729) * Proxy Config Manager This component watches for local state changes on the agent and ensures that each service registered locally with Kind == connect-proxy has it's state being actively populated in the cache. This serves two purposes: 1. For the built-in proxy, it ensures that the state needed to accept connections is available in RAM shortly after registration and likely before the proxy actually starts accepting traffic. 2. For (future - next PR) xDS server and other possible future proxies that require _push_ based config discovery, this provides a mechanism to subscribe and be notified about updates to a proxy instance's config including upstream service discovery results. * Address review comments * Better comments; Better delivery of latest snapshot for slow watchers; Embed Config * Comment typos * Add upstream Stringer for funsies 2018-10-03 12:36:38 +00:00			`// ConfigSnapshot captures all the resulting config needed for a proxy instance.`
			`// It is meant to be point-in-time coherent and is used to deliver the current`
			`// config state to observers who need it to be pushed in (e.g. XDS server).`
			`type ConfigSnapshot struct {`
Implement Mesh Gateways This includes both ingress and egress functionality. 2019-06-18 00:52:01 +00:00			`Kind structs.ServiceKind`
			`Service string`
			`ProxyID string`
			`Address string`
			`Port int`
			`TaggedAddresses map[string]structs.ServiceAddress`
			`Proxy structs.ConnectProxyConfig`
			`Datacenter string`
			`Roots *structs.IndexedCARoots`

			`// connect-proxy specific`
			`ConnectProxy configSnapshotConnectProxy`

			`// mesh-gateway specific`
			`MeshGateway configSnapshotMeshGateway`
Proxy Config Manager (#4729) * Proxy Config Manager This component watches for local state changes on the agent and ensures that each service registered locally with Kind == connect-proxy has it's state being actively populated in the cache. This serves two purposes: 1. For the built-in proxy, it ensures that the state needed to accept connections is available in RAM shortly after registration and likely before the proxy actually starts accepting traffic. 2. For (future - next PR) xDS server and other possible future proxies that require _push_ based config discovery, this provides a mechanism to subscribe and be notified about updates to a proxy instance's config including upstream service discovery results. * Address review comments * Better comments; Better delivery of latest snapshot for slow watchers; Embed Config * Comment typos * Add upstream Stringer for funsies 2018-10-03 12:36:38 +00:00
			`// Skip intentions for now as we don't push those down yet, just pre-warm them.`
			`}`

			`// Valid returns whether or not the snapshot has all required fields filled yet.`
			`func (s *ConfigSnapshot) Valid() bool {`
Prepare for having different service kinds that are all generic… (#6013) Default to internal error when service kind is unknown 2019-06-24 19:05:36 +00:00			`switch s.Kind {`
			`case structs.ServiceKindConnectProxy:`
activate most discovery chain features in xDS for envoy (#6024) 2019-07-02 03:10:51 +00:00			`// TODO(rb): sanity check discovery chain things here?`
Implement Mesh Gateways This includes both ingress and egress functionality. 2019-06-18 00:52:01 +00:00			`return s.Roots != nil && s.ConnectProxy.Leaf != nil`
			`case structs.ServiceKindMeshGateway:`
			`// TODO (mesh-gateway) - what happens if all the connect services go away`
			`return s.Roots != nil && len(s.MeshGateway.ServiceGroups) > 0`
Prepare for having different service kinds that are all generic… (#6013) Default to internal error when service kind is unknown 2019-06-24 19:05:36 +00:00			`default:`
			`return false`
			`}`
Proxy Config Manager (#4729) * Proxy Config Manager This component watches for local state changes on the agent and ensures that each service registered locally with Kind == connect-proxy has it's state being actively populated in the cache. This serves two purposes: 1. For the built-in proxy, it ensures that the state needed to accept connections is available in RAM shortly after registration and likely before the proxy actually starts accepting traffic. 2. For (future - next PR) xDS server and other possible future proxies that require _push_ based config discovery, this provides a mechanism to subscribe and be notified about updates to a proxy instance's config including upstream service discovery results. * Address review comments * Better comments; Better delivery of latest snapshot for slow watchers; Embed Config * Comment typos * Add upstream Stringer for funsies 2018-10-03 12:36:38 +00:00			`}`

			`// Clone makes a deep copy of the snapshot we can send to other goroutines`
			`// without worrying that they will racily read or mutate shared maps etc.`
			`func (s ConfigSnapshot) Clone() (ConfigSnapshot, error) {`
			`snapCopy, err := copystructure.Copy(s)`
			`if err != nil {`
			`return nil, err`
			`}`
Implement Mesh Gateways This includes both ingress and egress functionality. 2019-06-18 00:52:01 +00:00
			`snap := snapCopy.(*ConfigSnapshot)`

activate most discovery chain features in xDS for envoy (#6024) 2019-07-02 03:10:51 +00:00			`// nil these out as anything receiving one of these clones does not need them and should never "cancel" our watches`
Implement Mesh Gateways This includes both ingress and egress functionality. 2019-06-18 00:52:01 +00:00			`switch s.Kind {`
activate most discovery chain features in xDS for envoy (#6024) 2019-07-02 03:10:51 +00:00			`case structs.ServiceKindConnectProxy:`
			`snap.ConnectProxy.WatchedUpstreams = nil`
connect: fix failover through a mesh gateway to a remote datacenter (#6259) Failover is pushed entirely down to the data plane by creating envoy clusters and putting each successive destination in a different load assignment priority band. For example this shows that normally requests go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080: - name: foo load_assignment: cluster_name: foo policy: overprovisioning_factor: 100000 endpoints: - priority: 0 lb_endpoints: - endpoint: address: socket_address: address: 1.2.3.4 port_value: 8080 - priority: 1 lb_endpoints: - endpoint: address: socket_address: address: 6.7.8.9 port_value: 8080 Mesh gateways route requests based solely on the SNI header tacked onto the TLS layer. Envoy currently only lets you configure the outbound SNI header at the cluster layer. If you try to failover through a mesh gateway you ideally would configure the SNI value per endpoint, but that's not possible in envoy today. This PR introduces a simpler way around the problem for now: 1. We identify any target of failover that will use mesh gateway mode local or remote and then further isolate any resolver node in the compiled discovery chain that has a failover destination set to one of those targets. 2. For each of these resolvers we will perform a small measurement of comparative healths of the endpoints that come back from the health API for the set of primary target and serial failover targets. We walk the list of targets in order and if any endpoint is healthy we return that target, otherwise we move on to the next target. 3. The CDS and EDS endpoints both perform the measurements in (2) for the affected resolver nodes. 4. For CDS this measurement selects which TLS SNI field to use for the cluster (note the cluster is always going to be named for the primary target) 5. For EDS this measurement selects which set of endpoints will populate the cluster. Priority tiered failover is ignored. One of the big downsides to this approach to failover is that the failover detection and correction is going to be controlled by consul rather than deferring that entirely to the data plane as with the prior version. This also means that we are bound to only failover using official health signals and cannot make use of data plane signals like outlier detection to affect failover. In this specific scenario the lack of data plane signals is ok because the effectiveness is already muted by the fact that the ultimate destination endpoints will have their data plane signals scrambled when they pass through the mesh gateway wrapper anyway so we're not losing much. Another related fix is that we now use the endpoint health from the underlying service, not the health of the gateway (regardless of failover mode). 2019-08-05 18:30:35 +00:00			`snap.ConnectProxy.WatchedGateways = nil`
Implement Mesh Gateways This includes both ingress and egress functionality. 2019-06-18 00:52:01 +00:00			`case structs.ServiceKindMeshGateway:`
			`snap.MeshGateway.WatchedDatacenters = nil`
			`snap.MeshGateway.WatchedServices = nil`
			`}`

			`return snap, nil`
Proxy Config Manager (#4729) * Proxy Config Manager This component watches for local state changes on the agent and ensures that each service registered locally with Kind == connect-proxy has it's state being actively populated in the cache. This serves two purposes: 1. For the built-in proxy, it ensures that the state needed to accept connections is available in RAM shortly after registration and likely before the proxy actually starts accepting traffic. 2. For (future - next PR) xDS server and other possible future proxies that require _push_ based config discovery, this provides a mechanism to subscribe and be notified about updates to a proxy instance's config including upstream service discovery results. * Address review comments * Better comments; Better delivery of latest snapshot for slow watchers; Embed Config * Comment typos * Add upstream Stringer for funsies 2018-10-03 12:36:38 +00:00			`}`