consul

Commit Graph

Author	SHA1	Message	Date
Daniel Nephin	8b887af0d3	streaming: store services with a unique ID that includes namespace	2020-10-06 16:54:56 -04:00
Daniel Nephin	5972bdc87c	streaming: improve godoc for cache-type And fix a bug where any error that implemented the temporary interface was considered a temporary error, even when the method would return false.	2020-10-06 13:52:02 -04:00
Daniel Nephin	3fa08beecf	submatview: add a test for handling of NewSnapshotToFollow Also add some godoc Rename some vars and functions Fix a data race in the new cache test for entry closing.	2020-10-06 13:22:02 -04:00
Daniel Nephin	534d8b45bb	submatview: refactor Materializer Refactor of Materializer.Run Use handlers to manage state in Materializer Rename Materializer receiver rename m.l to m.lock, and flip some conditionals to remove the negative. Improve godoc, rename Deps, move resetErr, and pass err into notifyUpdate Update for NewSnapshotToFollow events Refactor to move context cancel out of Materializer	2020-10-06 13:22:02 -04:00
Daniel Nephin	e849f6d7ac	submatview: Move the 'use materialize from result.State' logic No need to do all this other work if we have one already. This logic moved closer to this call site 3 times during the process of refactoring.	2020-10-06 13:22:02 -04:00
Daniel Nephin	edf30b2714	submatview: Move Materializer to submatview package	2020-10-06 13:22:02 -04:00
Daniel Nephin	ed45957ffb	submatview: Refactor MaterializeView Replace InitFilter with Reset. Removes the need to store a fatalErr and the cache-type, and removes the need to recreate the filter each time. Pass dependencies into MaterializedView. Remove context from MaterializedView. Rename state to view. Rename MaterialziedView to Materialzier. Rename to NewMaterializer Pass in retry.Waiter	2020-10-06 13:22:02 -04:00
Daniel Nephin	b576a2d3c7	cache-types: Update Streaming health cache-type To use latest protobuf types	2020-10-06 13:22:02 -04:00
Daniel Nephin	132b76acef	agent/cache: Add cache-type and materialized view for streaming health Extracted from `d97412ce4c` Co-authored-by: Paul Banks <banks@banksco.de>	2020-10-06 13:21:57 -04:00
freddygv	7fd518ff1d	Merge master	2020-09-14 16:17:43 -06:00
freddygv	c8f5215e9d	Fix test build	2020-08-06 11:31:56 -06:00
Daniel Nephin	ba3ace1219	Return nil value on error. The main bug was fixed in `cb050b280c`, but the return value of 'result' is still misleading. Change the return value to nil to make the code more clear.	2020-08-05 13:10:17 -04:00
freddygv	aa6c59dbfc	end to end changes to pass gatewayservices to /ui/services/	2020-07-30 10:21:11 -06:00
Matt Keeler	be01c4241d	Default Cache rate limiting options in New Also get rid of the TestCache helper which was where these defaults were happening previously.	2020-07-28 12:34:35 -04:00
Matt Keeler	83d09de230	Fix some broken code in master There were several PRs that while all passed CI independently, when they all got merged into the same branch caused compilation errors in test code. The main changes that caused issues where changing agent/cache.Cache.New to require a concrete options struct instead of a pointer. This broke the cert monitor tests and the catalog_list_services_test.go. Another change was made to unembed the http.Server from the agent.HTTPServer struct. That coupled with another change to add a test to ensure cache rate limiting coming from HTTP requests was working as expected caused compilation failures.	2020-07-28 09:50:10 -04:00
Matt Keeler	12acdd7481	Disable background cache refresh for Connect Leaf Certs The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself. In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed. If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond. This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.	2020-07-21 12:19:25 -04:00
Daniel Nephin	2ec3760b70	agent/cache: Use AllowNotModifiedResponse in CatalogListServices Co-authored-by: Pierre Souchay <pierresouchay@users.noreply.github.com>	2020-07-14 18:58:20 -04:00
Matt Keeler	a5a9560bbd	Initialize the agent leaf cert cache result with a state to prevent unnecessary second certificate signing	2020-06-30 09:59:07 -04:00
Matt Keeler	39b567a55a	Fix auto_encrypt IP/DNS SANs The initial auto encrypt CSR wasn’t containing the user supplied IP and DNS SANs. This fixes that. Also We were configuring a default :: IP SAN. This should be ::1 instead and was fixed.	2020-06-30 09:59:07 -04:00
Daniel Nephin	5afcf5c1bc	Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4 ci: enable SA4006 staticcheck check and add ineffassign	2020-06-17 12:16:02 -04:00
Daniel Nephin	068b43df90	Enable gofmt simplify Code changes done automatically with 'gofmt -s -w'	2020-06-16 13:21:11 -04:00
Daniel Nephin	cb050b280c	ci: enable SA4006 staticcheck check And fix the 'value not used' issues. Many of these are not bugs, but a few are tests not checking errors, and one appears to be a missed error in non-test code.	2020-06-16 13:10:11 -04:00
Matt Keeler	8837907de4	Make the Agent Cache more Context aware (#8092 ) Blocking queries issues will still be uncancellable (that cannot be helped until we get rid of net/rpc). However this makes it so that if calling getWithIndex (like during a cache Notify go routine) we can cancell the outer routine. Previously it would keep issuing more blocking queries until the result state actually changed.	2020-06-15 11:01:25 -04:00
freddygv	19e3954603	Move compound service names to use ServiceName type	2020-06-12 13:47:43 -06:00
freddygv	15c74d6943	Move GatewayServices out of Internal	2020-06-12 13:46:47 -06:00
Daniel Nephin	caa692deea	ci: Enabled SA2002 staticcheck check And handle errors in the main test goroutine	2020-06-05 17:50:11 -04:00
Daniel Nephin	c662f0f0de	Fix a number of problems found by staticcheck Some of these problems are minor (unused vars), but others are real bugs (ignored errors). Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>	2020-05-19 16:50:14 -04:00
Freddy	b3ec383d04	Gateway Services Nodes UI Endpoint (#7685 ) The endpoint supports queries for both Ingress Gateways and Terminating Gateways. Used to display a gateway's linked services in the UI.	2020-05-11 11:35:17 -06:00
Chris Piraino	30792e933b	Add test for adding DNSSAN for ConnectCALeaf cache type	2020-05-06 15:12:02 -05:00
Kyle Havlovitz	f14c54e25e	Add TLS option and DNS SAN support to ingress config xds: Only set TLS context for ingress listener when requested	2020-05-06 15:12:02 -05:00
Freddy	137a2c32c6	TLS Origination for Terminating Gateways (#7671 )	2020-04-27 16:25:37 -06:00
Daniel Nephin	5fe7043439	agent/cache: Make all cache options RegisterOptions Previously the SupportsBlocking option was specified by a method on the type, and all the other options were specified from RegisterOptions. This change moves RegisterOptions to a method on the type, and moves SupportsBlocking into the options struct. Currently there are only 2 cache-types. So all cache-types can implement this method by embedding a struct with those predefined values. In the future if a cache type needs to be registered more than once with different options it can remove the embedded type and implement the method in a way that allows for paramaterization.	2020-04-16 18:56:34 -04:00
Kyle Havlovitz	e9e8c0e730	Ingress Gateways for TCP services (#7509 ) * Implements a simple, tcp ingress gateway workflow This adds a new type of gateway for allowing Ingress traffic into Connect from external services. Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>	2020-04-16 14:00:48 -07:00
sasha	ac9b330f6b	add DNSSAN and IPSAN to cache key (#7597 )	2020-04-15 10:11:11 -05:00
R.B. Boyer	6adad71125	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
Matt Keeler	e231d62bc9	Make the config entry and leaf cert cache types ns aware (#7256 )	2020-02-10 19:26:01 -05:00
Anthony Scalisi	beb928f8de	fix spelling errors (#7135 )	2020-01-27 07:00:33 -06:00
Matt Keeler	c09693e545	Updates to Config Entries and Connect for Namespaces (#7116 )	2020-01-24 10:04:58 -05:00
Hans Hasselberg	87f32c8ba6	auto_encrypt: set dns and ip san for k8s and provide configuration (#6944 ) * Add CreateCSRWithSAN * Use CreateCSRWithSAN in auto_encrypt and cache * Copy DNSNames and IPAddresses to cert * Verify auto_encrypt.sign returns cert with SAN * provide configuration options for auto_encrypt dnssan and ipsan * rename CreateCSRWithSAN to CreateCSR	2020-01-17 23:25:26 +01:00
Matt Keeler	5934f803bf	Sync of OSS changes to support namespaces (#6909 )	2019-12-09 21:26:41 -05:00
Todd Radel	54f92e2924	Make all Connect Cert Common Names valid FQDNs (#6423 )	2019-11-11 17:11:54 +00:00
Paul Banks	87699eca2f	Fix support for RSA CA keys in Connect. (#6638 ) * Allow RSA CA certs for consul and vault providers to correctly sign EC leaf certs. * Ensure key type ad bits are populated from CA cert and clean up tests * Add integration test and fix error when initializing secondary CA with RSA key. * Add more tests, fix review feedback * Update docs with key type config and output * Apply suggestions from code review Co-Authored-By: R.B. Boyer <rb@hashicorp.com>	2019-11-01 13:20:26 +00:00
R.B. Boyer	c4b92d5534	connect: connect CA Roots in secondary datacenters should use a SigningKeyID derived from their local intermediate (#6513 ) This fixes an issue where leaf certificates issued in secondary datacenters would be reissued very frequently (every ~20 seconds) because the logic meant to detect root rotation was errantly triggering because a hash of the ultimate root (in the primary) was being compared against a hash of the local intermediate root (in the secondary) and always failing.	2019-09-26 11:54:14 -05:00
Freddy	fdd10dd8b8	Expose HTTP-based paths through Connect proxy (#6446 ) Fixes: #5396 This PR adds a proxy configuration stanza called expose. These flags register listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only listening on the loopback interface, while still accepting traffic from non Connect-enabled services. Under expose there is a boolean checks flag that would automatically expose all registered HTTP and gRPC check paths. This stanza also accepts a paths list to expose individual paths. The primary use case for this functionality would be to expose paths for third parties like Prometheus or the kubelet. Listeners for requests to exposed paths are be configured dynamically at run time. Any time a proxy, or check can be registered, a listener can also be created. In this initial implementation requests to these paths are not authenticated/encrypted.	2019-09-25 20:55:52 -06:00
R.B. Boyer	af01d397a5	connect: don't colon-hex-encode the AuthorityKeyId and SubjectKeyId fields in connect certs (#6492 ) The fields in the certs are meant to hold the original binary representation of this data, not some ascii-encoded version. The only time we should be colon-hex-encoding fields is for display purposes or marshaling through non-TLS mediums (like RPC).	2019-09-23 12:52:35 -05:00
R.B. Boyer	20eb0d3e94	cache: remove data race in agent cache In normal operations there is a read/write race related to request QueryOptions fields. An example race: WARNING: DATA RACE Read at 0x00c000836950 by goroutine 30: github.com/hashicorp/consul/agent/structs.(ServiceConfigRequest).CacheInfo() /go/src/github.com/hashicorp/consul/agent/structs/config_entry.go:506 +0x109 github.com/hashicorp/consul/agent/cache.(Cache).getWithIndex() /go/src/github.com/hashicorp/consul/agent/cache/cache.go:262 +0x5c github.com/hashicorp/consul/agent/cache.(Cache).notifyBlockingQuery() /go/src/github.com/hashicorp/consul/agent/cache/watch.go:89 +0xd7 Previous write at 0x00c000836950 by goroutine 147: github.com/hashicorp/consul/agent/cache-types.(ResolvedServiceConfig).Fetch() /go/src/github.com/hashicorp/consul/agent/cache-types/resolved_service_config.go:31 +0x219 github.com/hashicorp/consul/agent/cache.(*Cache).fetch.func1() /go/src/github.com/hashicorp/consul/agent/cache/cache.go:495 +0x112 This patch does a lightweight copy of the request struct so that the embedded QueryOptions fields that are mutated during Fetch() are scoped to just that one RPC.	2019-09-12 16:18:01 -05:00
Alvin Huang	c516fabfac	revert commits on master (#6413 )	2019-08-27 17:45:58 -04:00
tradel	7f36a5b676	construct a common name for each CSR	2019-08-27 14:12:56 -07:00
R.B. Boyer	561b2fe606	connect: generate the full SNI names for discovery targets in the compiler rather than in the xds package (#6340 )	2019-08-19 13:03:03 -05:00
R.B. Boyer	8e22d80e35	connect: fix failover through a mesh gateway to a remote datacenter (#6259 ) Failover is pushed entirely down to the data plane by creating envoy clusters and putting each successive destination in a different load assignment priority band. For example this shows that normally requests go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080: - name: foo load_assignment: cluster_name: foo policy: overprovisioning_factor: 100000 endpoints: - priority: 0 lb_endpoints: - endpoint: address: socket_address: address: 1.2.3.4 port_value: 8080 - priority: 1 lb_endpoints: - endpoint: address: socket_address: address: 6.7.8.9 port_value: 8080 Mesh gateways route requests based solely on the SNI header tacked onto the TLS layer. Envoy currently only lets you configure the outbound SNI header at the cluster layer. If you try to failover through a mesh gateway you ideally would configure the SNI value per endpoint, but that's not possible in envoy today. This PR introduces a simpler way around the problem for now: 1. We identify any target of failover that will use mesh gateway mode local or remote and then further isolate any resolver node in the compiled discovery chain that has a failover destination set to one of those targets. 2. For each of these resolvers we will perform a small measurement of comparative healths of the endpoints that come back from the health API for the set of primary target and serial failover targets. We walk the list of targets in order and if any endpoint is healthy we return that target, otherwise we move on to the next target. 3. The CDS and EDS endpoints both perform the measurements in (2) for the affected resolver nodes. 4. For CDS this measurement selects which TLS SNI field to use for the cluster (note the cluster is always going to be named for the primary target) 5. For EDS this measurement selects which set of endpoints will populate the cluster. Priority tiered failover is ignored. One of the big downsides to this approach to failover is that the failover detection and correction is going to be controlled by consul rather than deferring that entirely to the data plane as with the prior version. This also means that we are bound to only failover using official health signals and cannot make use of data plane signals like outlier detection to affect failover. In this specific scenario the lack of data plane signals is ok because the effectiveness is already muted by the fact that the ultimate destination endpoints will have their data plane signals scrambled when they pass through the mesh gateway wrapper anyway so we're not losing much. Another related fix is that we now use the endpoint health from the underlying service, not the health of the gateway (regardless of failover mode).	2019-08-05 13:30:35 -05:00

1 2

90 Commits