consul

Commit Graph

Author	SHA1	Message	Date
Mark Anderson	06f0f79218	Continue working through proxy and agent Rework/listeners, rename makeListener Refactor, tests pass Signed-off-by: Mark Anderson <manderson@hashicorp.com>	2021-05-04 12:41:43 -07:00
Freddy	ed1082510d	Fixup discovery chain handling in transparent mode (#10168 ) Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com> Previously we would associate the address of a discovery chain target with the discovery chain's filter chain. This was broken for a few reasons: - If the upstream is a virtual service, the client proxy has no way of dialing it because virtual services are not targets of their discovery chains. The targets are distinct services. This is addressed by watching the endpoints of all upstream services, not just their discovery chain targets. - If multiple discovery chains resolve to the same target, that would lead to multiple filter chains attempting to match on the target's virtual IP. This is addressed by only matching on the upstream's virtual IP. NOTE: this implementation requires an intention to the redirecting virtual service and not just to the final destination. This is how we can know that the virtual service is an upstream to watch. A later PR will look into traversing discovery chains when computing upstreams so that intentions are only required to the discovery chain targets.	2021-05-04 08:45:19 -06:00
Daniel Nephin	62efaaab21	config-entry: remove Kind and Name field from Mesh config entry No config entry needs a Kind field. It is only used to determine the Go type to target. As we introduce new config entries (like this one) we can remove the kind field and have the GetKind method return the single supported value. In this case (similar to proxy-defaults) the Name field is also unnecessary. We always use the same value. So we can omit the name field entirely.	2021-04-29 17:11:21 -04:00
R.B. Boyer	71d45a3460	Support Incremental xDS mode (#9855 ) This adds support for the Incremental xDS protocol when using xDS v3. This is best reviewed commit-by-commit and will not be squashed when merged. Union of all commit messages follows to give an overarching summary: xds: exclusively support incremental xDS when using xDS v3 Attempts to use SoTW via v3 will fail, much like attempts to use incremental via v2 will fail. Work around a strange older envoy behavior involving empty CDS responses over incremental xDS. xds: various cleanups and refactors that don't strictly concern the addition of incremental xDS support Dissolve the connectionInfo struct in favor of per-connection ResourceGenerators instead. Do a better job of ensuring the xds code uses a well configured logger that accurately describes the connected client. xds: pull out checkStreamACLs method in advance of a later commit xds: rewrite SoTW xDS protocol tests to use protobufs rather than hand-rolled json strings In the test we very lightly reuse some of the more boring protobuf construction helper code that is also technically under test. The important thing of the protocol tests is testing the protocol. The actual inputs and outputs are largely already handled by the xds golden output tests now so these protocol tests don't have to do double-duty. This also updates the SoTW protocol test to exclusively use xDS v2 which is the only variant of SoTW that will be supported in Consul 1.10. xds: default xds.Server.AuthCheckFrequency at use-time instead of construction-time	2021-04-29 13:54:05 -05:00
Freddy	078c40425f	Rename "cluster" config entry to "mesh" (#10127 ) This config entry is being renamed primarily because in k8s the name cluster could be confusing given that the config entry applies across federated datacenters. Additionally, this config entry will only apply to Consul as a service mesh, so the more generic "cluster" name is not needed.	2021-04-28 16:13:29 -06:00
Daniel Nephin	2a26085b2c	connect: do not set QuerySource.Node Setting this field to a value is equivalent to using the 'near' query paramter. The intent is to sort the results by proximity to the node requesting them. However with connect we send the results to envoy, which doesn't care about the order, so setting this field is increasing the work performed for no gain. It is necessary to unset this field now because we would like connect to use streaming, but streaming does not support sorting by proximity.	2021-04-27 19:03:16 -04:00
Freddy	439a7fce2d	Split Upstream.Identifier() so non-empty namespace is always prepended in ent (#10031 )	2021-04-15 13:54:40 -06:00
freddygv	8857195437	Fixup wildcard ent assertion	2021-04-12 17:04:33 -06:00
freddygv	7bd51ff536	Replace TransparentProxy bool with ProxyMode This PR replaces the original boolean used to configure transparent proxy mode. It was replaced with a string mode that can be set to: - "": Empty string is the default for when the setting should be defaulted from other configuration like config entries. - "direct": Direct mode is how applications originally opted into the mesh. Proxy listeners need to be dialed directly. - "transparent": Transparent mode enables configuring Envoy as a transparent proxy. Traffic must be captured and redirected to the inbound and outbound listeners. This PR also adds a struct for transparent proxy specific configuration. Initially this is not stored as a pointer. Will revisit that decision before GA.	2021-04-12 09:35:14 -06:00
freddygv	b21224a4c8	PR comments	2021-04-08 11:16:03 -06:00
freddygv	49a4a78fd5	Ensure mesh gateway mode override is set for upstreams for intentions	2021-04-07 09:32:48 -06:00
freddygv	5140c3e51f	Finish resolving upstream defaults in proxycfg	2021-04-07 09:32:48 -06:00
R.B. Boyer	499fee73b3	connect: add toggle to globally disable wildcard outbound network access when transparent proxy is enabled (#9973 ) This adds a new config entry kind "cluster" with a single special name "cluster" where this can be controlled.	2021-04-06 13:19:59 -05:00
freddygv	098b9af901	Fixup enterprise tests from tproxy changes	2021-03-17 23:05:00 -06:00
freddygv	eb1e0a1751	Cancel watch on all errors	2021-03-17 21:44:14 -06:00
freddygv	f4f45af6d0	Merge master and fix upstream config protocol defaulting	2021-03-17 21:13:40 -06:00
freddygv	0da8702f34	PR comments	2021-03-17 16:18:56 -06:00
freddygv	a54d6a9010	Update proxycfg for transparent proxy	2021-03-17 13:40:39 -06:00
Daniel Nephin	f40b76af2d	proxycfg: use rpcclient/health.Client instead of passing around cache name This should allow us to swap out the implementation with something other than `agent/cache` without making further code changes.	2021-03-12 11:46:04 -05:00
Daniel Nephin	906834ce8e	proxycfg: Use streaming in connect state	2021-03-12 11:35:42 -05:00
Freddy	82c269a7c5	Avoid potential proxycfg/xDS deadlock using non-blocking send	2021-02-08 16:14:06 -07:00
freddygv	ec5f75776b	Update comments on avoiding proxycfg deadlock	2021-02-08 09:45:45 -07:00
R.B. Boyer	43193a35c6	xds: prevent LDS flaps in mesh gateways due to unstable datacenter lists (#9651 ) Also fix a similar issue in Terminating Gateways that was masked by an overzealous test.	2021-02-08 10:19:57 -06:00
freddygv	6e443e5536	Retry send after timer fires, in case no updates occur	2021-02-05 18:00:59 -07:00
freddygv	95e7641faa	Update proxycfg logging, labels were already attached	2021-02-05 15:14:49 -07:00
freddygv	5ba14ad41d	Add trace logs to proxycfg state runner and xds srv	2021-02-02 12:26:38 -07:00
freddygv	37190c0d0d	Avoid potential deadlock using non-blocking send Deadlock scenario: 1. Due to scheduling, the state runner sends one snapshot into snapCh and then attempts to send a second. The first send succeeds because the channel is buffered, but the second blocks. 2. Separately, Manager.Watch is called by the xDS server after getting a discovery request from Envoy. This function acquires the manager lock and then blocks on receiving the CurrentSnapshot from the state runner. 3. Separately, there is a Manager goroutine that reads the snapshots from the channel in step 1. These reads are done to notify proxy watchers, but they require holding the manager lock. This goroutine goes to acquire that lock, but can't because it is held by step 2. Now, the goroutine from step 3 is waiting on the one from step 2 to release the lock. The goroutine from step 2 won't release the lock until the goroutine in step 1 advances. But the goroutine in step 1 is waiting for the one in step 3. Deadlock. By making this send non-blocking step 1 above can proceed. The coalesce timer will be reset and a new valid snapshot will be delivered after it elapses or when one is requested by xDS.	2021-02-02 11:31:14 -07:00
Daniel Nephin	b9e60c0775	testing: skip slow tests with -short Add a skip condition to all tests slower than 100ms. This change was made using `gotestsum tool slowest` with data from the last 3 CI runs of master. See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests With this change: ``` $ time go test -count=1 -short ./agent ok github.com/hashicorp/consul/agent 0.743s real 0m4.791s $ time go test -count=1 -short ./agent/consul ok github.com/hashicorp/consul/agent/consul 4.229s real 0m8.769s ```	2020-12-07 13:42:55 -05:00
freddygv	856d5a25ee	Fix text type assertion	2020-09-14 16:28:40 -06:00
freddygv	7fd518ff1d	Merge master	2020-09-14 16:17:43 -06:00
freddygv	87541ab80a	Fix type assertion	2020-09-14 16:12:21 -06:00
freddygv	768dbaa68d	Add session flag to cookie config	2020-09-11 18:34:03 -06:00
freddygv	eab90ea9fa	Revert EnvoyConfig nesting	2020-09-11 09:21:43 -06:00
freddygv	30ba080d25	Add explicit protocol overrides in tgw xds test cases	2020-09-03 08:57:48 -06:00
freddygv	f81fe6a1a1	Remove LB infix and move injection to xds	2020-09-02 15:13:50 -06:00
freddygv	63f79e5f9b	Restructure structs and other PR comments	2020-09-02 09:10:50 -06:00
freddygv	28d0602fc1	Pass LB config to Envoy via xDS	2020-08-28 14:27:40 -06:00
R.B. Boyer	74d5df7c7a	xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569 )	2020-08-27 12:20:58 -05:00
Matt Keeler	be01c4241d	Default Cache rate limiting options in New Also get rid of the TestCache helper which was where these defaults were happening previously.	2020-07-28 12:34:35 -04:00
Pierre Souchay	505de6dc29	Added ratelimit to handle throtling cache (#8226 ) This implements a solution for #7863 It does: Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf Add cache.entry_fetch_max_burst size of rate limit (default value = 2) The new configuration now supports the following syntax for instance to allow 1 query every 3s: command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}' in JSON { "cache": { "entry_fetch_rate": 0.333 } }	2020-07-27 23:11:11 +02:00
Matt Keeler	12acdd7481	Disable background cache refresh for Connect Leaf Certs The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself. In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed. If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond. This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.	2020-07-21 12:19:25 -04:00
Daniel Nephin	010a609912	Fix a bunch of unparam lint issues	2020-06-24 13:00:14 -04:00
Freddy	5baa7b1b04	Always return a gateway cluster (#8158 )	2020-06-19 13:31:39 -06:00
Daniel Nephin	5afcf5c1bc	Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4 ci: enable SA4006 staticcheck check and add ineffassign	2020-06-17 12:16:02 -04:00
Daniel Nephin	068b43df90	Enable gofmt simplify Code changes done automatically with 'gofmt -s -w'	2020-06-16 13:21:11 -04:00
Daniel Nephin	cb050b280c	ci: enable SA4006 staticcheck check And fix the 'value not used' issues. Many of these are not bugs, but a few are tests not checking errors, and one appears to be a missed error in non-test code.	2020-06-16 13:10:11 -04:00
freddygv	19e3954603	Move compound service names to use ServiceName type	2020-06-12 13:47:43 -06:00
Freddy	166a8b2a58	Only pass one hostname via EDS and prefer healthy ones (#8084 ) Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Currently when passing hostname clusters to Envoy, we set each service instance registered with Consul as an LbEndpoint for the cluster. However, Envoy can only handle one per cluster: [2020-06-04 18:32:34.094][1][warning][config] [source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Error adding/updating cluster(s) dc2.internal.ddd90499-9b47-91c5-4616-c0cbf0fc358a.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint, server.dc2.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint Envoy is currently handling this gracefully by only picking one of the endpoints. However, we should avoid passing multiple to avoid these warning logs. This PR: * Ensures we only pass one endpoint, which is tied to one service instance. * We prefer sending an endpoint which is marked as Healthy by Consul. * If no endpoints are healthy we emit a warning and skip the cluster. * If multiple unique hostnames are spread across service instances we emit a warning and let the user know which will be resolved.	2020-06-12 13:46:17 -06:00
Freddy	9ed325ba8b	Enable gateways to resolve hostnames to IPv4 addresses (#7999 ) The DNS resolution will be handled by Envoy and defaults to LOGICAL_DNS. This discovery type can be overridden on a per-gateway basis with the envoy_dns_discovery_type Gateway Option. If a service contains an instance with a hostname as an address we set the Envoy cluster to use DNS as the discovery type rather than EDS. Since both mesh gateways and terminating gateways route to clusters using SNI, whenever there is a mix of hostnames and IP addresses associated with a service we use the hostname + CDS rather than the IPs + EDS. Note that we detect hostnames by attempting to parse the service instance's address as an IP. If it is not a valid IP we assume it is a hostname.	2020-06-03 15:28:45 -06:00
Daniel Nephin	c88fae0aac	ci: Add staticcheck and fix most errors Three of the checks are temporarily disabled to limit the size of the diff, and allow us to enable all the other checks in CI. In a follow up we can fix the issues reported by the other checks one at a time, and enable them.	2020-05-28 11:59:58 -04:00
Kyle Havlovitz	b14696e32a	Standardize support for Tagged and BindAddresses in Ingress Gateways (#7924 ) * Standardize support for Tagged and BindAddresses in Ingress Gateways This updates the TaggedAddresses and BindAddresses behavior for Ingress to match Mesh/Terminating gateways. The `consul connect envoy` command now also allows passing an address without a port for tagged/bind addresses. * Update command/connect/envoy/envoy.go Co-authored-by: Freddy <freddygv@users.noreply.github.com> * PR comments * Check to see if address is an actual IP address * Update agent/xds/listeners.go Co-authored-by: Freddy <freddygv@users.noreply.github.com> * fix whitespace Co-authored-by: Chris Piraino <cpiraino@hashicorp.com> Co-authored-by: Freddy <freddygv@users.noreply.github.com>	2020-05-21 09:08:12 -05:00
Chris Piraino	9d9e23cc44	Add service id context to the proxycfg logger This is especially useful when multiple proxies are all querying the same Consul agent.	2020-05-18 09:08:05 -05:00
Kyle Havlovitz	136549205c	Merge pull request #7759 from hashicorp/ingress/tls-hosts Add TLS option for Ingress Gateway listeners	2020-05-11 09:18:43 -07:00
Chris Piraino	a0e1f57ac2	Remove development log line	2020-05-08 20:24:18 -07:00
Chris Piraino	26f92e74f6	Compute all valid DNSSANs for ingress gateways For DNSSANs we take into account the following and compute the appropriate wildcard values: - source datacenter - namespaces - alt domains	2020-05-08 20:23:17 -07:00
Freddy	c32a4f1ece	Fix up enterprise compatibility for gateways (#7813 )	2020-05-08 09:44:34 -06:00
Chris Piraino	0bd5618cb2	Cleanup proxycfg for TLS - Use correct enterprise metadata for finding config entry - nil out cancel functions on config snapshot copy - Look at HostsSet when checking validity	2020-05-07 10:22:57 -05:00
Freddy	b069887b2a	Remove timeout and call to Fatal from goroutine (#7797 )	2020-05-06 14:33:17 -06:00
Kyle Havlovitz	f14c54e25e	Add TLS option and DNS SAN support to ingress config xds: Only set TLS context for ingress listener when requested	2020-05-06 15:12:02 -05:00
Chris Piraino	881760f701	xds: Use only the port number as the configured route name This removes duplication of protocol from the stats_prefix	2020-05-06 15:06:13 -05:00
Chris Piraino	f40833d094	Allow Hosts field to be set on an ingress config entry - Validate that this cannot be set on a 'tcp' listener nor on a wildcard service. - Add Hosts field to api and test in consul config write CLI - xds: Configure envoy with user-provided hosts from ingress gateways	2020-05-06 15:06:13 -05:00
Kyle Havlovitz	711d1389aa	Support multiple listeners referencing the same service in gateway definitions	2020-05-06 15:06:13 -05:00
Kyle Havlovitz	247f9eaf13	Allow ingress gateways to route traffic based on Host header This commit adds the necessary changes to allow an ingress gateway to route traffic from a single defined port to multiple different upstream services in the Consul mesh. To do this, we now require all HTTP requests coming into the ingress gateway to specify a Host header that matches "<service-name>.*" in order to correctly route traffic to the correct service. - Differentiate multiple listener's route names by port - Adds a case in xds for allowing default discovery chains to create a route configuration when on an ingress gateway. This allows default services to easily use host header routing - ingress-gateways have a single route config for each listener that utilizes domain matching to route to different services.	2020-05-06 15:06:13 -05:00
Freddy	137a2c32c6	TLS Origination for Terminating Gateways (#7671 )	2020-04-27 16:25:37 -06:00
freddygv	034d7d83d4	Fix snapshot IsEmpty	2020-04-27 11:08:41 -06:00
Freddy	3b1b24c2ce	Update agent/proxycfg/state_test.go	2020-04-27 11:08:41 -06:00
freddygv	eddd5bd73b	PR comments	2020-04-27 11:08:41 -06:00
freddygv	6abc71f915	Skip filter chain creation if no client cert	2020-04-27 11:08:41 -06:00
freddygv	09a8e5f36d	Use golden files for gateway certs and fix listener test flakiness	2020-04-27 11:08:41 -06:00
freddygv	840d27a9d5	Un-nest switch in gateway update handler	2020-04-27 11:08:40 -06:00
freddygv	c0e1751878	Allow terminating-gateway to setup listener before servicegroups are known	2020-04-27 11:08:40 -06:00
freddygv	913b13f31f	Add subset support	2020-04-27 11:08:40 -06:00
freddygv	219c78e586	Add xds cluster/listener/endpoint management	2020-04-27 11:08:40 -06:00
freddygv	24207226ca	Add proxycfg state management for terminating-gateways	2020-04-27 11:07:06 -06:00
Chris Piraino	cb9df538d5	Add all the xds ingress tests This commit copies many of the connect-proxy xds testcases and reuses for ingress gateways. This allows us to more easily see changes to the envoy configuration when make updates to ingress gateways.	2020-04-24 09:31:32 -05:00
Chris Piraino	0ca9b606e8	Pull out setupTestVariationConfigEntriesAndSnapshot in proxycfg This allows us to reuse the same variations for ingress gateway testing	2020-04-24 09:31:32 -05:00
Daniel Nephin	5fe7043439	agent/cache: Make all cache options RegisterOptions Previously the SupportsBlocking option was specified by a method on the type, and all the other options were specified from RegisterOptions. This change moves RegisterOptions to a method on the type, and moves SupportsBlocking into the options struct. Currently there are only 2 cache-types. So all cache-types can implement this method by embedding a struct with those predefined values. In the future if a cache type needs to be registered more than once with different options it can remove the embedded type and implement the method in a way that allows for paramaterization.	2020-04-16 18:56:34 -04:00
Kyle Havlovitz	e9e8c0e730	Ingress Gateways for TCP services (#7509 ) * Implements a simple, tcp ingress gateway workflow This adds a new type of gateway for allowing Ingress traffic into Connect from external services. Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>	2020-04-16 14:00:48 -07:00
Chris Piraino	584f90bbeb	Fix flapping of mesh gateway connect-service watches (#7575 )	2020-04-02 10:12:13 -05:00
Andy Lindeman	c1cb18c648	proxycfg: support path exposed with non-HTTP2 protocol (#7510 ) If a proxied service is a gRPC or HTTP2 service, but a path is exposed using the HTTP1 or TCP protocol, Envoy should not be configured with `http2ProtocolOptions` for the cluster backing the path. A situation where this comes up is a gRPC service whose healthcheck or metrics route (e.g. for Prometheus) is an HTTP1 service running on a different port. Previously, if these were exposed either using `Expose: { Checks: true }` or `Expose: { Paths: ... }`, Envoy would still be configured to communicate with the path over HTTP2, which would not work properly.	2020-04-02 09:35:04 +02:00
R.B. Boyer	6adad71125	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
Matt Keeler	4c9577678e	xDS Mesh Gateway Resolver Subset Fixes (#7294 ) * xDS Mesh Gateway Resolver Subset Fixes The first fix was that clusters were being generated for every service resolver subset regardless of there being any service instances of the associated service in that dc. The previous logic didn’t care at all but now it will omit generating those clusters unless we also have service instances that should be proxied. The second fix was to respect the DefaultSubset of a service resolver so that mesh-gateways would configure the endpoints of the unnamed subset cluster to only those endpoints matched by the default subsets filters. * Refactor the gateway endpoint generation to be a little easier to read	2020-02-19 11:57:55 -05:00
Lars Lehtonen	6bcd596539	agent/proxycfg: fix dropped error in state.initWatchesMeshGateway() (#7267 )	2020-02-18 14:41:01 +01:00
Matt Keeler	9e5fd7f925	OSS Changes for various config entry namespacing bugs (#7226 )	2020-02-06 10:52:25 -05:00
Matt Keeler	dfb0177dbc	Testing updates to support namespaced testing of the agent/xds… (#7185 ) * Various testing updates to support namespaced testing of the agent/xds package * agent/proxycfg package updates to support better namespace testing	2020-02-03 09:26:47 -05:00
Chris Piraino	401221de58	Allow users to configure either unstructured or JSON logging (#7130 ) * hclog Allow users to choose between unstructured and JSON logging	2020-01-28 17:50:41 -06:00
Matt Keeler	c09693e545	Updates to Config Entries and Connect for Namespaces (#7116 )	2020-01-24 10:04:58 -05:00
Aestek	ba8fd8296f	Add support for dual stack IPv4/IPv6 network (#6640 ) * Use consts for well known tagged adress keys * Add ipv4 and ipv6 tagged addresses for node lan and wan * Add ipv4 and ipv6 tagged addresses for service lan and wan * Use IPv4 and IPv6 address in DNS	2020-01-17 09:54:17 -05:00
Matt Keeler	27f49eede9	Move where the service-resolver watch is done so that it happen… (#7025 ) Before we were issuing 1 watch for every service in the services listing which would have caused the agent to process many more identical events simultaneously.	2020-01-10 10:30:13 -05:00
Matt Keeler	5934f803bf	Sync of OSS changes to support namespaces (#6909 )	2019-12-09 21:26:41 -05:00
R.B. Boyer	2011f3d7dc	xds: mesh gateway CDS requests are now allowed to receive an empty CDS reply (#6787 ) This is the rest of the fix for #6543 that was incompletely fixed in #6576.	2019-11-26 15:55:13 -06:00
R.B. Boyer	97aa050c20	agent: allow mesh gateways to initialize even if there are no connect services registered yet (#6576 ) Fixes #6543 Also improved some of the proxycfg tests to cover snapshot validity better.	2019-10-17 16:46:49 -05:00
R.B. Boyer	9566df524e	agent: cache notifications work after error if the underlying RPC returns index=1 (#6547 ) Fixes #6521 Ensure that initial failures to fetch an agent cache entry using the notify API where the underlying RPC returns a synthetic index of 1 correctly recovers when those RPCs resume working. The bug in the Cache.notifyBlockingQuery used to incorrectly "fix" the index for the next query from 0 to 1 for all queries, when it should have not done so for queries that errored. Also fixed some things that made debugging difficult: - config entry read/list endpoints send back QueryMeta headers - xds event loops don't swallow the cache notification errors	2019-09-26 10:42:17 -05:00
Freddy	fdd10dd8b8	Expose HTTP-based paths through Connect proxy (#6446 ) Fixes: #5396 This PR adds a proxy configuration stanza called expose. These flags register listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only listening on the loopback interface, while still accepting traffic from non Connect-enabled services. Under expose there is a boolean checks flag that would automatically expose all registered HTTP and gRPC check paths. This stanza also accepts a paths list to expose individual paths. The primary use case for this functionality would be to expose paths for third parties like Prometheus or the kubelet. Listeners for requests to exposed paths are be configured dynamically at run time. Any time a proxy, or check can be registered, a listener can also be created. In this initial implementation requests to these paths are not authenticated/encrypted.	2019-09-25 20:55:52 -06:00
R.B. Boyer	af01d397a5	connect: don't colon-hex-encode the AuthorityKeyId and SubjectKeyId fields in connect certs (#6492 ) The fields in the certs are meant to hold the original binary representation of this data, not some ascii-encoded version. The only time we should be colon-hex-encoding fields is for display purposes or marshaling through non-TLS mediums (like RPC).	2019-09-23 12:52:35 -05:00
R.B. Boyer	dfcdc41ef8	connect: allow 'envoy_cluster_json' escape hatch to continue to function (#6378 )	2019-08-22 15:11:56 -05:00
R.B. Boyer	561b2fe606	connect: generate the full SNI names for discovery targets in the compiler rather than in the xds package (#6340 )	2019-08-19 13:03:03 -05:00
R.B. Boyer	ae79cdab1b	connect: introduce ExternalSNI field on service-defaults (#6324 ) Compiling this will set an optional SNI field on each DiscoveryTarget. When set this value should be used for TLS connections to the instances of the target. If not set the default should be used. Setting ExternalSNI will disable mesh gateway use for that target. It also disables several service-resolver features that do not make sense for an external service.	2019-08-19 12:19:44 -05:00
Mike Morris	65be58703c	connect: remove managed proxies (#6220 ) * connect: remove managed proxies implementation and all supporting config options and structs * connect: remove deprecated ProxyDestination * command: remove CONNECT_PROXY_TOKEN env var * agent: remove entire proxyprocess proxy manager * test: remove all managed proxy tests * test: remove irrelevant managed proxy note from TestService_ServerTLSConfig * test: update ContentHash to reflect managed proxy removal * test: remove deprecated ProxyDestination test * telemetry: remove managed proxy note * http: remove /v1/agent/connect/proxy endpoint * ci: remove deprecated test exclusion * website: update managed proxies deprecation page to note removal * website: remove managed proxy configuration API docs * website: remove managed proxy note from built-in proxy config * website: add note on removing proxy subdirectory of data_dir	2019-08-09 15:19:30 -04:00
R.B. Boyer	8e22d80e35	connect: fix failover through a mesh gateway to a remote datacenter (#6259 ) Failover is pushed entirely down to the data plane by creating envoy clusters and putting each successive destination in a different load assignment priority band. For example this shows that normally requests go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080: - name: foo load_assignment: cluster_name: foo policy: overprovisioning_factor: 100000 endpoints: - priority: 0 lb_endpoints: - endpoint: address: socket_address: address: 1.2.3.4 port_value: 8080 - priority: 1 lb_endpoints: - endpoint: address: socket_address: address: 6.7.8.9 port_value: 8080 Mesh gateways route requests based solely on the SNI header tacked onto the TLS layer. Envoy currently only lets you configure the outbound SNI header at the cluster layer. If you try to failover through a mesh gateway you ideally would configure the SNI value per endpoint, but that's not possible in envoy today. This PR introduces a simpler way around the problem for now: 1. We identify any target of failover that will use mesh gateway mode local or remote and then further isolate any resolver node in the compiled discovery chain that has a failover destination set to one of those targets. 2. For each of these resolvers we will perform a small measurement of comparative healths of the endpoints that come back from the health API for the set of primary target and serial failover targets. We walk the list of targets in order and if any endpoint is healthy we return that target, otherwise we move on to the next target. 3. The CDS and EDS endpoints both perform the measurements in (2) for the affected resolver nodes. 4. For CDS this measurement selects which TLS SNI field to use for the cluster (note the cluster is always going to be named for the primary target) 5. For EDS this measurement selects which set of endpoints will populate the cluster. Priority tiered failover is ignored. One of the big downsides to this approach to failover is that the failover detection and correction is going to be controlled by consul rather than deferring that entirely to the data plane as with the prior version. This also means that we are bound to only failover using official health signals and cannot make use of data plane signals like outlier detection to affect failover. In this specific scenario the lack of data plane signals is ok because the effectiveness is already muted by the fact that the ultimate destination endpoints will have their data plane signals scrambled when they pass through the mesh gateway wrapper anyway so we're not losing much. Another related fix is that we now use the endpoint health from the underlying service, not the health of the gateway (regardless of failover mode).	2019-08-05 13:30:35 -05:00

1 2 3 4

174 Commits