consul

Commit Graph

Author	SHA1	Message	Date
Andrew Stucki	4c848a554d	Fix missing references to enterprise metadata (#16237 )	2023-02-10 20:47:16 +00:00
Andrew Stucki	318ba215ab	[API Gateway] Add integration test for conflicted TCP listeners (#16225 )	2023-02-10 11:34:01 -06:00
Derek Menteer	4f2ce60654	Fix peering acceptors in secondary datacenters. (#16230 ) Prior to this commit, secondary datacenters could not be initialized as peering acceptors if ACLs were enabled. This is due to the fact that internal server-to-server API calls would fail because the management token was not generated. This PR makes it so that both primary and secondary datacenters generate their own management token whenever a leader is elected in their respective clusters.	2023-02-10 09:47:17 -06:00
Andrew Stucki	3b9c569561	Simple API Gateway e2e test for tcp routes (#16222 ) * Simple API Gateway e2e test for tcp routes * Drop DNSSans since we don't front the Gateway with a leaf cert	2023-02-09 16:20:12 -05:00
skpratt	db2bd404bf	Synthesize anonymous token pre-bootstrap when needed (#16200 ) * add bootstrapping detail for acl errors * error detail improvements * update acl bootstrapping test coverage * update namespace errors * update test coverage * consolidate error message code and update changelog * synthesize anonymous token * Update token language to distinguish Accessor and Secret ID usage (#16044) * remove legacy tokens * remove lingering legacy token references from docs * update language and naming for token secrets and accessor IDs * updates all tokenID references to clarify accessorID * remove token type references and lookup tokens by accessorID index * remove unnecessary constants * replace additional tokenID param names * Add warning info for deprecated -id parameter Co-authored-by: Paul Glass <pglass@hashicorp.com> * Update field comment Co-authored-by: Paul Glass <pglass@hashicorp.com> --------- Co-authored-by: Paul Glass <pglass@hashicorp.com> * revert naming change * add testing * revert naming change --------- Co-authored-by: Paul Glass <pglass@hashicorp.com>	2023-02-09 20:34:02 +00:00
Thomas Eckert	e81a0c2855	API Gateway to Ingress Gateway Snapshot Translation and Routes to Virtual Routers and Splitters (#16127 ) * Stub proxycfg handler for API gateway * Add Service Kind constants/handling for API Gateway * Begin stubbing for SDS * Add new Secret type to xDS order of operations * Continue stubbing of SDS * Iterate on proxycfg handler for API gateway * Handle BoundAPIGateway config entry subscription in proxycfg-glue * Add API gateway to config snapshot validation * Add API gateway to config snapshot clone, leaf, etc. * Subscribe to bound route + cert config entries on bound-api-gateway * Track routes + certs on API gateway config snapshot * Generate DeepCopy() for types used in watch.Map * Watch all active references on api-gateway, unwatch inactive * Track loading of initial bound-api-gateway config entry * Use proper proto package for SDS mapping * Use ResourceReference instead of ServiceName, collect resources * Fix typo, add + remove TODOs * Watch discovery chains for TCPRoute * Add TODO for updating gateway services for api-gateway * make proto * Regenerate deep-copy for proxycfg * Set datacenter on upstream ID from query source * Watch discovery chains for http-route service backends * Add ServiceName getter to HTTP+TCP Service structs * Clean up unwatched discovery chains on API Gateway * Implement watch for ingress leaf certificate * Collect upstreams on http-route + tcp-route updates * Remove unused GatewayServices update handler * Remove unnecessary gateway services logic for API Gateway * Remove outdate TODO * Use .ToIngress where appropriate, including TODO for cleaning up * Cancel before returning error * Remove GatewayServices subscription * Add godoc for handlerAPIGateway functions * Update terminology from Connect => Consul Service Mesh Consistent with terminology changes in https://github.com/hashicorp/consul/pull/12690 * Add missing TODO * Remove duplicate switch case * Rerun deep-copy generator * Use correct property on config snapshot * Remove unnecessary leaf cert watch * Clean up based on code review feedback * Note handler properties that are initialized but set elsewhere * Add TODO for moving helper func into structs pkg * Update generated DeepCopy code * gofmt * Begin stubbing for SDS * Start adding tests * Remove second BoundAPIGateway case in glue * TO BE PICKED: fix formatting of str * WIP * Fix merge conflict * Implement HTTP Route to Discovery Chain config entries * Stub out function to create discovery chain * Add discovery chain merging code (#16131) * Test adding TCP and HTTP routes * Add some tests for the synthesizer * Run go mod tidy * Pairing with N8 * Run deep copy * Clean up GatewayChainSynthesizer * Fix missing assignment of BoundAPIGateway topic * Separate out synthesizeChains and toIngressTLS * Fix build errors * Ensure synthesizer skips non-matching routes by protocol * Rebase on N8s work * Generate DeepCopy() for API gateway listener types * Improve variable name * Regenerate DeepCopy() code * Fix linting issue * fix protobuf import * Fix more merge conflict errors * Fix synthesize test * Run deep copy * Add URLRewrite to proto * Update agent/consul/discoverychain/gateway_tcproute.go Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> * Remove APIGatewayConfigEntry that was extra * Error out if route kind is unknown * Fix formatting errors in proto --------- Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> Co-authored-by: Andrew Stucki <andrew.stucki@hashicorp.com>	2023-02-09 17:58:55 +00:00
Andrew Stucki	f4210d47dd	Add basic smoke test to make sure an APIGateway runs (#16217 )	2023-02-09 11:32:10 -05:00
Andrew Stucki	0891b4554d	Clean-up Gateway Controller Binding Logic (#16214 ) * Fix detecting when a route doesn't bind to a gateway because it's already bound * Clean up status setting code * rework binding a bit * More cleanup * Flatten all files * Fix up docstrings	2023-02-09 10:17:25 -05:00
skpratt	6f0b226b0d	ACL error improvements: incomplete bootstrapping and non-existent token (#16105 ) * add bootstrapping detail for acl errors * error detail improvements * update acl bootstrapping test coverage * update namespace errors * update test coverage * add changelog * update message for unbootstrapped error * consolidate error message code and update changelog * logout message change	2023-02-08 23:49:44 +00:00
Nathan Coleman	72a73661c9	Implement APIGateway proxycfg snapshot (#16194 ) * Stub proxycfg handler for API gateway * Add Service Kind constants/handling for API Gateway * Begin stubbing for SDS * Add new Secret type to xDS order of operations * Continue stubbing of SDS * Iterate on proxycfg handler for API gateway * Handle BoundAPIGateway config entry subscription in proxycfg-glue * Add API gateway to config snapshot validation * Add API gateway to config snapshot clone, leaf, etc. * Subscribe to bound route + cert config entries on bound-api-gateway * Track routes + certs on API gateway config snapshot * Generate DeepCopy() for types used in watch.Map * Watch all active references on api-gateway, unwatch inactive * Track loading of initial bound-api-gateway config entry * Use proper proto package for SDS mapping * Use ResourceReference instead of ServiceName, collect resources * Fix typo, add + remove TODOs * Watch discovery chains for TCPRoute * Add TODO for updating gateway services for api-gateway * make proto * Regenerate deep-copy for proxycfg * Set datacenter on upstream ID from query source * Watch discovery chains for http-route service backends * Add ServiceName getter to HTTP+TCP Service structs * Clean up unwatched discovery chains on API Gateway * Implement watch for ingress leaf certificate * Collect upstreams on http-route + tcp-route updates * Remove unused GatewayServices update handler * Remove unnecessary gateway services logic for API Gateway * Remove outdate TODO * Use .ToIngress where appropriate, including TODO for cleaning up * Cancel before returning error * Remove GatewayServices subscription * Add godoc for handlerAPIGateway functions * Update terminology from Connect => Consul Service Mesh Consistent with terminology changes in https://github.com/hashicorp/consul/pull/12690 * Add missing TODO * Remove duplicate switch case * Rerun deep-copy generator * Use correct property on config snapshot * Remove unnecessary leaf cert watch * Clean up based on code review feedback * Note handler properties that are initialized but set elsewhere * Add TODO for moving helper func into structs pkg * Update generated DeepCopy code * gofmt * Generate DeepCopy() for API gateway listener types * Improve variable name * Regenerate DeepCopy() code * Fix linting issue * Temporarily remove the secret type from resource generation	2023-02-08 15:52:12 -06:00
Nitya Dhanushkodi	1f25289048	troubleshoot: output messages for the troubleshoot proxy command (#16208 )	2023-02-08 13:03:15 -08:00
Kyle Havlovitz	898e59b13c	Add the `operator usage instances` command and api endpoint (#16205 ) This endpoint shows total services, connect service instances and billable service instances in the local datacenter or globally. Billable instances = total service instances - connect services - consul server instances.	2023-02-08 12:07:21 -08:00
Andrew Stucki	df03b45bbc	Add additional controller implementations (#16188 ) * Add additional controller implementations * remove additional interface * Fix comparison checks and mark unused contexts * Switch to time.Now().UTC() * Add a pointer helper for shadowing loop variables * Extract anonymous functions for readability * clean up logging * Add Type to the Condition proto * Update some comments and add additional space for readability * Address PR feedback * Fix up dirty checks and change to pointer receiver	2023-02-08 14:50:17 -05:00
Paul Banks	5397e9ee7f	Adding experimental support for a more efficient LogStore implementation (#16176 ) * Adding experimental support for a more efficient LogStore implementation * Adding changelog entry * Fix go mod tidy issues	2023-02-08 16:50:22 +00:00
cskh	e91bc9c058	feat: envoy extension - http local rate limit (#16196 ) - http local rate limit - Apply rate limit only to local_app - unit test and integ test	2023-02-07 21:56:15 -05:00
John Eikenberry	ed7367b6f4	remove redundant vault api retry logic (#16143 ) remove redundant vault api retry logic We upgraded Vault API module version to a version that has built-in retry logic. So this code is no longer necessary. Also add mention of re-configuring the provider in comments.	2023-02-07 20:52:22 +00:00
skpratt	1e7e52e3ef	revert method name change in xds server protocol for version compatibility (#16195 )	2023-02-07 14:19:09 -06:00
skpratt	9199e99e21	Update token language to distinguish Accessor and Secret ID usage (#16044 ) * remove legacy tokens * remove lingering legacy token references from docs * update language and naming for token secrets and accessor IDs * updates all tokenID references to clarify accessorID * remove token type references and lookup tokens by accessorID index * remove unnecessary constants * replace additional tokenID param names * Add warning info for deprecated -id parameter Co-authored-by: Paul Glass <pglass@hashicorp.com> * Update field comment Co-authored-by: Paul Glass <pglass@hashicorp.com> --------- Co-authored-by: Paul Glass <pglass@hashicorp.com>	2023-02-07 12:26:30 -06:00
wangxinyi7	906ebb97f6	change log level (#16128 )	2023-02-06 12:58:13 -08:00
Dhia Ayachi	c680a35b36	Net 2229/rpc reduce max retries 2 (#16165 ) * feat: calculate retry wait time with exponential back-off * test: add test for getWaitTime method * feat: enforce random jitter between min value from previous iteration and current * extract randomStagger to simplify tests and use Milliseconds to avoid float math. * rename variables * add test and rename comment --------- Co-authored-by: Poonam Jadhav <poonam.jadhav@hashicorp.com>	2023-02-06 14:07:41 -05:00
Nitya Dhanushkodi	b8b37c2357	refactor: remove troubleshoot module dependency on consul top level module (#16162 ) Ensure nothing in the troubleshoot go module depends on consul's top level module. This is so we can import troubleshoot into consul-k8s and not import all of consul. * turns troubleshoot into a go module [authored by @curtbushko] * gets the envoy protos into the troubleshoot module [authored by @curtbushko] * adds a new go module `envoyextensions` which has xdscommon and extensioncommon folders that both the xds package and the troubleshoot package can import * adds testing and linting for the new go modules * moves the unit tests in `troubleshoot/validateupstream` that depend on proxycfg/xds into the xds package, with a comment describing why those tests cannot be in the troubleshoot package * fixes all the imports everywhere as a result of these changes Co-authored-by: Curt Bushko <cbushko@gmail.com>	2023-02-06 09:14:35 -08:00
Poonam Jadhav	24c431270c	feat: client RPC is retries on ErrRetryElsewhere error and forwardRequestToLeader method retries ErrRetryLater error (#16099 )	2023-02-06 11:31:25 -05:00
skpratt	a010902978	Remove legacy acl policies (#15922 ) * remove legacy tokens * remove legacy acl policies * flatten test policies to _prefix address oss feedback re: phrasing and tests	2023-02-06 15:35:52 +00:00
John Eikenberry	5c836f2aa9	fix goroutine leak in renew testing (#16142 ) fix goroutine leak in renew testing Test overwrote the stopWatcher() function variable for the test without keeping and calling the original value. The original value is the function that stops the goroutine... so it needs to be called.	2023-02-03 22:09:34 +00:00
sarahalsmiller	143b2bc1f0	API Gateway Controller Logic (#16058 ) * Add initial API gateway controller logic --------- Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> Co-authored-by: Andrew Stucki <andrew.stucki@hashicorp.com> Co-authored-by: Thomas Eckert <teckert@hashicorp.com>	2023-02-03 21:55:48 +00:00
Derek Menteer	2f149d60cc	[OSS] Add Peer field to service-defaults upstream overrides (#15956 ) * Add Peer field to service-defaults upstream overrides. * add api changes, compat mode for service default overrides * Fixes based on testing --------- Co-authored-by: DanStough <dan.stough@hashicorp.com>	2023-02-03 10:51:53 -05:00
Paul Glass	a884d0d7c7	Use agent token for service/check deregistration during anti-entropy (#16097 ) Use only the agent token for deregistration during anti-entropy The previous behavior had the agent attempt to use the "service" token (i.e. from the `token` field in a service definition file), and if that was not set then it would use the agent token. The previous behavior was problematic because, if the service token had been deleted, the deregistration request would fail. The agent would retry the deregistration during each anti-entropy sync, and the situation would never resolve. The new behavior is to only/always use the agent token for service and check deregistration during anti-entropy. This approach is: * Simpler: No fallback logic to try different tokens * Faster (slightly): No time spent attempting the service token * Correct: The agent token is able to deregister services on that agent's node, because: * node:write permissions allow deregistration of services/checks on that node. * The agent token must have node:write permission, or else the agent is not be able to (de)register itself into the catalog Co-authored-by: Vesa Hagström <weeezes@gmail.com>	2023-02-03 08:45:11 -06:00
Dan Upton	e40b731a52	rate: add prometheus definitions, docs, and clearer names (#15945 )	2023-02-03 12:01:57 +00:00
Nitya Dhanushkodi	8d4c3aa42c	refactor: move service to service validation to troubleshoot package (#16132 ) This is to reduce the dependency on xds from within the troubleshoot package.	2023-02-02 22:18:10 -08:00
Derek Menteer	06338c8ee7	Add unit test and update golden files. (#16115 )	2023-02-01 09:51:08 -06:00
Andrew Stucki	1fbfb5905b	APIGateway HTTPRoute scaffolding (#15859 ) * Stub Config Entries for Consul Native API Gateway (#15644) * Add empty InlineCertificate struct and protobuf * apigateway stubs * new files * Stub HTTPRoute in api pkg * checkpoint * Stub HTTPRoute in structs pkg * Simplify api.APIGatewayConfigEntry to be consistent w/ other entries * Update makeConfigEntry switch, add docstring for HTTPRouteConfigEntry * Add TCPRoute to MakeConfigEntry, return unique Kind * proto generated files * Stub BoundAPIGatewayConfigEntry in agent Since this type is only written by a controller and read by xDS, it doesn't need to be defined in the `api` pkg * Add RaftIndex to APIGatewayConfigEntry stub * Add new config entry kinds to validation allow-list * Add RaftIndex to other added config entry stubs * fix panic * Update usage metrics assertions to include new cfg entries * Regenerate proto w/ Go 1.19 * Run buf formatter on config_entry.proto * Add Meta and acl.EnterpriseMeta to all new ConfigEntry types * Remove optional interface method Warnings() for now Will restore later if we wind up needing it * Remove unnecessary Services field from added config entry types * Implement GetMeta(), GetEnterpriseMeta() for added config entry types * Add meta field to proto, name consistently w/ existing config entries * Format config_entry.proto * Add initial implementation of CanRead + CanWrite for new config entry types * Add unit tests for decoding of new config entry types * Add unit tests for parsing of new config entry types * Add unit tests for API Gateway config entry ACLs * Return typed PermissionDeniedError on BoundAPIGateway CanWrite * Add unit tests for added config entry ACLs * Add BoundAPIGateway type to AllConfigEntryKinds * Return proper kind from BoundAPIGateway * Add docstrings for new config entry types * Add missing config entry kinds to proto def * Update usagemetrics_oss_test.go * Use utility func for returning PermissionDeniedError * Add BoundAPIGateway to proto def Co-authored-by: Sarah Alsmiller <sarah.alsmiller@hashicorp.com> Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> * Add APIGateway validation * Fix comment * Add additional validations * Add cert ref validation * Add protobuf definitions * Tabs to spaces * Fix up field types * Add API structs * Move struct fields around a bit * EventPublisher subscriptions for Consul Native API Gateway (#15757) * Create new event topics in subscribe proto * Add tests for PBSubscribe func * Make configs singular, add all configs to PBToStreamSubscribeRequest * Add snapshot methods * Add config_entry_events tests * Add config entry kind to topic for new configs * Add unit tests for snapshot methods * Start adding integration test * Test using the new controller code * Update agent/consul/state/config_entry_events.go Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> * Check value of error Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> * Add controller stubs for API Gateway (#15837) * update initial stub implementation * move files, clean up mutex references * Remove embed, use idiomatic names for constructors * Remove stray file introduced in merge Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> * Initial server-side and proto defs * drop trailing whitespace * Add APIGateway validation (#15847) * Add APIGateway validation * Fix comment * Add additional validations * Add cert ref validation * Add protobuf definitions * Tabs to spaces * Fix up field types * Add API structs * Move struct fields around a bit * APIGateway InlineCertificate validation (#15856) * Add APIGateway validation * Add additional validations * Add protobuf definitions * Tabs to spaces * Add API structs * Move struct fields around a bit * Add validation for InlineCertificate * Fix ACL test * APIGateway BoundAPIGateway validation (#15858) * Add APIGateway validation * Fix comment * Add additional validations * Add cert ref validation * Add protobuf definitions * Tabs to spaces * Fix up field types * Add API structs * Move struct fields around a bit * Add validation for BoundAPIGateway * drop trailing whitespace * APIGateway TCPRoute validation (#15855) * Add APIGateway validation * Fix comment * Add additional validations * Add cert ref validation * Add protobuf definitions * Tabs to spaces * Fix up field types * Add API structs * Move struct fields around a bit * Add TCPRoute normalization and validation * Address PR feedback * Add forgotten Status * Add some more field docs in api package * Fix test * Fix bad merge * Remove duplicate helpers * Fix up proto defs * Fix up stray changes * remove extra newline --------- Co-authored-by: Thomas Eckert <teckert@hashicorp.com> Co-authored-by: Sarah Alsmiller <sarah.alsmiller@hashicorp.com> Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> Co-authored-by: sarahalsmiller <100602640+sarahalsmiller@users.noreply.github.com>	2023-02-01 07:59:49 -05:00
Derek Menteer	b19c5a94c7	Add Envoy extension metrics. (#16114 )	2023-01-31 14:50:30 -06:00
cskh	f6da81c9d0	improvement: prevent filter being added twice from any enovy extension (#16112 ) * improvement: prevent filter being added twice from any enovy extension * break if error != nil * update test	2023-01-31 16:49:45 +00:00
Poonam Jadhav	9db5b7d896	feat: apply retry policy to read only grpc endpoints (#16085 )	2023-01-31 10:44:25 -05:00
Derek Menteer	1b02749375	Add extension validation on config save and refactor extensions. (#16110 )	2023-01-30 15:35:26 -06:00
Nitya Dhanushkodi	8728a4496c	troubleshoot: service to service validation (#16096 ) * Add Tproxy support to Envoy Extensions (this is needed for service to service validation) * Add validation for Envoy configuration for an upstream service * Use both /config_dump and /cluster to validate Envoy configuration This is because of a bug in Envoy where the EndpointsConfigDump does not include a cluster_name, making it impossible to match an endpoint to verify it exists. This removes endpoints support for builtin extensions since only the validate plugin was using it, and it is no longer used. It also removes test cases for endpoint validation. Endpoints validation now only occurs in the top level test from config_dump and clusters json files. Co-authored-by: Eric <eric@haberkorn.co>	2023-01-27 11:43:16 -08:00
Andrew Stucki	da99514ac8	Add a server-only method for updating ConfigEntry Statuses (#16053 ) * Add a server-only method for updating ConfigEntry Statuses * Address PR feedback * Regen proto	2023-01-27 14:34:11 -05:00
skpratt	ad43846755	Remove legacy acl tokens (#15947 ) * remove legacy tokens * Update test comment Co-authored-by: Paul Glass <pglass@hashicorp.com> * fix imports * update docs for additional CLI changes * add test case for anonymous token * set deprecated api fields to json ignore and fix patch errors * update changelog to breaking-change * fix import * update api docs to remove legacy reference * fix docs nav data --------- Co-authored-by: Paul Glass <pglass@hashicorp.com>	2023-01-27 09:17:07 -06:00
Thomas Eckert	7814471159	Match route and listener protocols when binding (#16057 ) * Add GatewayMeta for matching routes to listeners based on protocols * Add GetGatewayMeta * Apply suggestions from code review Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> * Make GatewayMeta private * Bound -> BoundGateway * Document gatewayMeta more * Simplify conditional * Parallelize tests and simplify bind conditional * gofmt * 💧 getGatewayMeta --------- Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>	2023-01-27 09:41:03 -05:00
Michael Wilkerson	a1498b015d	Mw/lambda envoy extension parse region (#4107 ) (#16069 ) * updated builtin extension to parse region directly from ARN - added a unit test - added some comments/light refactoring * updated golden files with proper ARNs - ARNs need to be right format now that they are being processed * updated tests and integration tests - removed 'region' from all EnvoyExtension arguments - added properly formatted ARN which includes the same region found in the removed "Region" field: 'us-east-1'	2023-01-26 15:44:52 -08:00
Andrew Stucki	3febdbff39	Add trigger for doing reconciliation based on watch sets (#16052 ) * Add trigger for doing reconciliation based on watch sets * update doc string * Fix my grammar fail	2023-01-26 15:20:37 -05:00
Poonam Jadhav	f4f62b5da6	feat: panic handler in rpc rate limit interceptor (#16022 ) * feat: handle panic in rpc rate limit interceptor * test: additional test cases to rpc rate limiting interceptor * refactor: remove unused listener	2023-01-25 14:13:38 -05:00
Nathan Coleman	e0f4f6c152	Run config entry controller routines on leader (#16054 )	2023-01-25 12:21:46 -06:00
Ronald	6167aef641	Warn when the token query param is used for auth (#16009 )	2023-01-24 16:21:41 +00:00
Thomas Eckert	20146f2916	Implement BindRoutesToGateways (#15950 ) * Stub out bind code * Move into a new package and flesh out binding * Fill in the actual binding logic * Bind to all listeners if not specified * Move bind code up to gateways package * Fix resource type check * Add UpsertRoute to listeners * Add RemoveRoute to listener * Implement binding as associated functions * Pass in gateways to BindRouteToGateways * Add a bunch of tests * Fix hopping from one listener on a gateway to another * Remove parents from HTTPRoute * Apply suggestions from code review * Fix merge conflict * Unify binding into a single variadic function 🙌 @nathancoleman * Remove vestigial error * Add TODO on protocol check	2023-01-20 15:11:16 -05:00
cskh	25396d81c9	Apply agent partition to load services and agent api (#16024 ) * Apply agent partition to load services and agent api changelog	2023-01-20 12:59:26 -05:00
Derek Menteer	5f5e6864ca	Fix proxy-defaults incorrectly merging config on upstreams. (#16021 )	2023-01-20 11:25:51 -06:00
John Murret	794277371f	Integration test for server rate limiting (#15960 ) * rate limit test * Have tests for the 3 modes * added assertions for logs and metrics * add comments to test sections * add check for rate limit exceeded text in log assertion section. * fix linting error * updating test to use KV get and put. move log assertion tolast. * Adding logging for blocking messages in enforcing mode. refactoring tests. * modified test description * formatting * Apply suggestions from code review Co-authored-by: Dan Upton <daniel@floppy.co> * Update test/integration/consul-container/test/ratelimit/ratelimit_test.go Co-authored-by: Dhia Ayachi <dhia@hashicorp.com> * expand log checking so that it ensures both logs are they when they are supposed to be and not there when they are not expected to be. * add retry on test * Warn once when rate limit exceed regardless of enforcing vs permissive. * Update test/integration/consul-container/test/ratelimit/ratelimit_test.go Co-authored-by: Dan Upton <daniel@floppy.co> Co-authored-by: Dan Upton <daniel@floppy.co> Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>	2023-01-19 08:43:33 -07:00
Thomas Eckert	13da1a5285	Native API Gateway Config Entries (#15897 ) * Stub Config Entries for Consul Native API Gateway (#15644) * Add empty InlineCertificate struct and protobuf * apigateway stubs * Stub HTTPRoute in api pkg * Stub HTTPRoute in structs pkg * Simplify api.APIGatewayConfigEntry to be consistent w/ other entries * Update makeConfigEntry switch, add docstring for HTTPRouteConfigEntry * Add TCPRoute to MakeConfigEntry, return unique Kind * Stub BoundAPIGatewayConfigEntry in agent * Add RaftIndex to APIGatewayConfigEntry stub * Add new config entry kinds to validation allow-list * Add RaftIndex to other added config entry stubs * Update usage metrics assertions to include new cfg entries * Add Meta and acl.EnterpriseMeta to all new ConfigEntry types * Remove unnecessary Services field from added config entry types * Implement GetMeta(), GetEnterpriseMeta() for added config entry types * Add meta field to proto, name consistently w/ existing config entries * Format config_entry.proto * Add initial implementation of CanRead + CanWrite for new config entry types * Add unit tests for decoding of new config entry types * Add unit tests for parsing of new config entry types * Add unit tests for API Gateway config entry ACLs * Return typed PermissionDeniedError on BoundAPIGateway CanWrite * Add unit tests for added config entry ACLs * Add BoundAPIGateway type to AllConfigEntryKinds * Return proper kind from BoundAPIGateway * Add docstrings for new config entry types * Add missing config entry kinds to proto def * Update usagemetrics_oss_test.go * Use utility func for returning PermissionDeniedError * EventPublisher subscriptions for Consul Native API Gateway (#15757) * Create new event topics in subscribe proto * Add tests for PBSubscribe func * Make configs singular, add all configs to PBToStreamSubscribeRequest * Add snapshot methods * Add config_entry_events tests * Add config entry kind to topic for new configs * Add unit tests for snapshot methods * Start adding integration test * Test using the new controller code * Update agent/consul/state/config_entry_events.go * Check value of error * Add controller stubs for API Gateway (#15837) * update initial stub implementation * move files, clean up mutex references * Remove embed, use idiomatic names for constructors * Remove stray file introduced in merge * Add APIGateway validation (#15847) * Add APIGateway validation * Add additional validations * Add cert ref validation * Add protobuf definitions * Fix up field types * Add API structs * Move struct fields around a bit * APIGateway InlineCertificate validation (#15856) * Add APIGateway validation * Add additional validations * Add protobuf definitions * Tabs to spaces * Add API structs * Move struct fields around a bit * Add validation for InlineCertificate * Fix ACL test * APIGateway BoundAPIGateway validation (#15858) * Add APIGateway validation * Add additional validations * Add cert ref validation * Add protobuf definitions * Fix up field types * Add API structs * Move struct fields around a bit * Add validation for BoundAPIGateway * APIGateway TCPRoute validation (#15855) * Add APIGateway validation * Add additional validations * Add cert ref validation * Add protobuf definitions * Fix up field types * Add API structs * Add TCPRoute normalization and validation * Add forgotten Status * Add some more field docs in api package * Fix test * Format imports * Rename snapshot test variable names * Add plumbing for Native API GW Subscriptions (#16003) Co-authored-by: Sarah Alsmiller <sarah.alsmiller@hashicorp.com> Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com> Co-authored-by: sarahalsmiller <100602640+sarahalsmiller@users.noreply.github.com> Co-authored-by: Andrew Stucki <andrew.stucki@hashicorp.com>	2023-01-18 22:14:34 +00:00
Chris Thain	2f4c8e50f2	Support Vault agent auth config for AWS/GCP CA provider auth (#15970 )	2023-01-18 11:53:04 -08:00
Derek Menteer	2facf50923	Fix configuration merging for implicit tproxy upstreams. (#16000 ) Fix configuration merging for implicit tproxy upstreams. Change the merging logic so that the wildcard upstream has correct proxy-defaults and service-defaults values combined into it. It did not previously merge all fields, and the wildcard upstream did not exist unless service-defaults existed (it ignored proxy-defaults, essentially). Change the way we fetch upstream configuration in the xDS layer so that it falls back to the wildcard when no matching upstream is found. This is what allows implicit peer upstreams to have the correct "merged" config. Change proxycfg to always watch local mesh gateway endpoints whenever a peer upstream is found. This simplifies the logic so that we do not have to inspect the "merged" configuration on peer upstreams to extract the mesh gateway mode.	2023-01-18 13:43:53 -06:00
Dan Upton	7a55de375c	xds: don't attempt to load-balance sessions for local proxies (#15789 ) Previously, we'd begin a session with the xDS concurrency limiter regardless of whether the proxy was registered in the catalog or in the server's local agent state. This caused problems for users who run `consul connect envoy` directly against a server rather than a client agent, as the server's locally registered proxies wouldn't be included in the limiter's capacity. Now, the `ConfigSource` is responsible for beginning the session and we only do so for services in the catalog. Fixes: https://github.com/hashicorp/consul/issues/15753	2023-01-18 12:33:21 -06:00
Chris S. Kim	e4a268e33e	Warn if ACL is enabled but no token is provided to Envoy (#15967 )	2023-01-16 12:31:56 -05:00
Dhia Ayachi	87ff8c1c95	avoid logging RPC errors when it's specific rate limiter errors (#15968 ) * avoid logging RPC errors when it's specific rate limiter errors * simplify if statements	2023-01-16 12:08:09 -05:00
Derek Menteer	19a46d6ca4	Enforce lowercase peer names. (#15697 ) Enforce lowercase peer names. Prior to this change peer names could be mixed case. This can cause issues, as peer names are used as DNS labels in various locations. It also caused issues with envoy configuration.	2023-01-13 14:20:28 -06:00
Dan Stough	6d2880e894	feat: add access logs to dataplane bootstrap rpc (#15951 )	2023-01-11 13:40:09 -05:00
Matt Keeler	5afd4657ec	Protobuf Modernization (#15949 ) * Protobuf Modernization Remove direct usage of golang/protobuf in favor of google.golang.org/protobuf Marshallers (protobuf and json) needed some changes to account for different APIs. Moved to using the google.golang.org/protobuf/types/known/* for the well known types including replacing some custom Struct manipulation with whats available in the structpb well known type package. This also updates our devtools script to install protoc-gen-go from the right location so that files it generates conform to the correct interfaces. * Fix go-mod-tidy make target to work on all modules	2023-01-11 09:39:10 -05:00
Paul Glass	f5231b9157	Add new config_file_service_registration token (#15828 )	2023-01-10 10:24:02 -06:00
Chris S. Kim	a7b34d50fc	Output user-friendly name for anonymous token (#15884 )	2023-01-09 12:28:53 -06:00
Dan Upton	644cd864a5	Rate limit improvements and fixes (#15917 ) - Fixes a panic when Operation.SourceAddr is nil (internal net/rpc calls) - Adds proper HTTP response codes (429 and 503) for rate limit errors - Makes the error messages clearer - Enables automatic retries for rate-limit errors in the net/rpc stack	2023-01-09 10:20:05 +00:00
Semir Patel	40c0bb24ae	emit metrics for global rate limiting (#15891 )	2023-01-06 17:49:33 -06:00
Dhia Ayachi	233eacf0a4	inject logger and create logdrop sink (#15822 ) * inject logger and create logdrop sink * init sink with an empty struct instead of nil * wrap a logger instead of a sink and add a discard logger to avoid double logging * fix compile errors * fix linter errors * Fix bug where log arguments aren't properly formatted * Move log sink construction outside of handler * Add prometheus definition and docs for log drop counter Co-authored-by: Daniel Upton <daniel@floppy.co>	2023-01-06 11:33:53 -07:00
Eric Haberkorn	8d923c1789	Add the Lua Envoy extension (#15906 )	2023-01-06 12:13:40 -05:00
Paul Glass	666c2b2e2b	Fix TLS_BadVerify test assertions on macOS (#15903 )	2023-01-05 11:47:45 -06:00
Dan Upton	b78de5a7a2	grpc/acl: fix bug where ACL token was required even if disabled (#15904 ) Fixes a bug introduced by #15346 where we'd always require an ACL token even if ACLs were disabled because we were erroneously treating `nil` identity as anonymous.	2023-01-05 16:31:18 +00:00
Dan Upton	d53ce39c32	grpc: switch servers and retry on error (#15892 ) This is the OSS portion of enterprise PR 3822. Adds a custom gRPC balancer that replicates the router's server cycling behavior. Also enables automatic retries for RESOURCE_EXHAUSTED errors, which we now get for free.	2023-01-05 10:21:27 +00:00
Nick Irvine	6fb628c07d	fix: return error when config file with unknown extension is passed (#15107 )	2023-01-04 16:57:00 -08:00
Florian Apolloner	077b0a48a3	Allow Operator Generated bootstrap token (#14437 ) Add support to provide an initial token via the bootstrap HTTP API, similar to hashicorp/nomad#12520	2023-01-04 20:19:33 +00:00
Semir Patel	a6482341a5	Wire up the rate limiter to net/rpc calls (#15879 )	2023-01-04 13:38:44 -06:00
Dan Upton	d4c435856b	grpc: `protoc` plugin for generating gRPC rate limit specifications (#15564 ) Adds automation for generating the map of `gRPC Method Name → Rate Limit Type` used by the middleware introduced in #15550, and will ensure we don't forget to add new endpoints. Engineers must annotate their RPCs in the proto file like so: ``` rpc Foo(FooRequest) returns (FooResponse) { option (consul.internal.ratelimit.spec) = { operation_type: READ, }; } ``` When they run `make proto` a protoc plugin `protoc-gen-consul-rate-limit` will be installed that writes rate-limit specs as a JSON array to a file called `.ratelimit.tmp` (one per protobuf package/directory). After running Buf, `make proto` will execute a post-process script that will ingest all of the `.ratelimit.tmp` files and generate a Go file containing the mappings in the `agent/grpc-middleware` package. In the enterprise repository, it will write an additional file with the enterprise-only endpoints. If an engineer forgets to add the annotation to a new RPC, the plugin will return an error like so: ``` RPC Foo is missing rate-limit specification, fix it with: import "proto-public/annotations/ratelimit/ratelimit.proto"; service Bar { rpc Foo(...) returns (...) { option (hashicorp.consul.internal.ratelimit.spec) = { operation_type: OPERATION_READ \| OPERATION_WRITE \| OPERATION_EXEMPT, }; } } ``` In the future, this annotation can be extended to support rate-limit category (e.g. KV vs Catalog) and to determine the retry policy.	2023-01-04 16:07:02 +00:00
Dan Upton	7c7503c849	grpc/acl: relax permissions required for "core" endpoints (#15346 ) Previously, these endpoints required `service:write` permission on _any_ service as a sort of proxy for "is the caller allowed to participate in the mesh?". Now, they're called as part of the process of establishing a server connection by any consumer of the consul-server-connection-manager library, which will include non-mesh workloads (e.g. Consul KV as a storage backend for Vault) as well as ancillary components such as consul-k8s' acl-init process, which likely won't have `service:write` permission. So this commit relaxes those requirements to accept any valid ACL token on the following gRPC endpoints: - `hashicorp.consul.dataplane.DataplaneService/GetSupportedDataplaneFeatures` - `hashicorp.consul.serverdiscovery.ServerDiscoveryService/WatchServers` - `hashicorp.consul.connectca.ConnectCAService/WatchRoots`	2023-01-04 12:40:34 +00:00
Derek Menteer	1f7e7abeac	Fix issue with incorrect proxycfg watch on upstream peer-targets. (#15865 ) This fixes an issue where the incorrect partition was given to the upstream target watch, which meant that failover logic would not work correctly.	2023-01-03 10:44:08 -06:00
Derek Menteer	f3776894bf	Fix agent cache incorrectly notifying unchanged protobufs. (#15866 ) Fix agent cache incorrectly notifying unchanged protobufs. This change fixes a situation where the protobuf private fields would be read by reflect.DeepEqual() and indicate data was modified. This resulted in change notifications being fired every time, which could cause performance problems in proxycfg.	2023-01-03 10:11:56 -06:00
Dan Upton	7747384f1f	Wire in rate limiter to handle internal and external gRPC calls (#15857 )	2022-12-23 13:42:16 -06:00
Dan Stough	b3bd3a6586	[OSS] feat: access logs for listeners and listener filters (#15864 ) * feat: access logs for listeners and listener filters * changelog * fix integration test	2022-12-22 15:18:15 -05:00
Nitya Dhanushkodi	24f01f96b1	add extensions for local service to GetExtensionConfigurations (#15871 ) This gets the extensions information for the local service into the snapshot and ExtensionConfigurations for a proxy. It grabs the extensions from config entries and puts them in structs.NodeService.Proxy field, which already is copied into the config snapshot. Also: * add EnvoyExtensions to api.AgentService so that it matches structs.NodeService	2022-12-22 10:03:33 -08:00
Nitya Dhanushkodi	c7ef04c597	[OSS] extensions: refactor PluginConfiguration into a more generic type ExtensionConfiguration (#15846 ) * extensions: refactor PluginConfiguration into a more generic type ExtensionConfiguration Also: * adds endpoints configuration to lambda golden tests * uses string constant for builtin/aws/lambda Co-authored-by: Eric <eric@haberkorn.co>	2022-12-20 22:26:20 -08:00
John Murret	f5e01f8c6b	Rate Limit Handler - ensure rate limiting is not in the code path when not configured (#15819 ) * Rate limiting handler - ensure configuration has changed before modifying limiters * Updating test to validate arguments to UpdateConfig * Removing duplicate test. Updating mock. * Renaming NullRateLimiter to NullRequestLimitsHandler * Rate Limit Handler - ensure rate limiting is not in the code path when not configured * Update agent/consul/rate/handler.go Co-authored-by: Dhia Ayachi <dhia@hashicorp.com> * formatting handler.go * Rate limiting handler - ensure configuration has changed before modifying limiters * Updating test to validate arguments to UpdateConfig * Removing duplicate test. Updating mock. * adding logging for when UpdateConfig is called but the config has not changed. * Update agent/consul/rate/handler.go Co-authored-by: Dhia Ayachi <dhia@hashicorp.com> * Update agent/consul/rate/handler_test.go Co-authored-by: Dan Upton <daniel@floppy.co> * modifying existing variable name based on pr feedback * updating a broken merge conflict; Co-authored-by: Dhia Ayachi <dhia@hashicorp.com> Co-authored-by: Dan Upton <daniel@floppy.co>	2022-12-20 15:00:22 -07:00
John Murret	aba43d85d9	Rate limiting handler - ensure configuration has changed before modifying limiters (#15805 ) * Rate limiting handler - ensure configuration has changed before modifying limiters * Updating test to validate arguments to UpdateConfig * Removing duplicate test. Updating mock. * adding logging for when UpdateConfig is called but the config has not changed. * Update agent/consul/rate/handler.go Co-authored-by: Dhia Ayachi <dhia@hashicorp.com> Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>	2022-12-20 14:12:03 -07:00
Michael Wilkerson	1b28b89439	Enhancement: Consul Compatibility Checking (#15818 ) * add functions for returning the max and min Envoy major versions - added an UnsupportedEnvoyVersions list - removed an unused error from TestDetermineSupportedProxyFeaturesFromString - modified minSupportedVersion to use the function for getting the Min Envoy major version. Using just the major version without the patch is equivalent to using `.0` * added a function for executing the envoy --version command - added a new exec.go file to not be locked to unix system * added envoy version check when using consul connect envoy * added changelog entry * added docs change	2022-12-20 09:58:19 -08:00
Derek Menteer	74b11c416c	Fix incorrect protocol check on discovery chains with peer targets. (#15833 )	2022-12-20 10:15:03 -06:00
Semir Patel	799b34f1a9	Map net/rpc endpoints to a read/write/exempt op for rate-limiting (#15825 ) Also fixed TestRequestRecorder flaky tests due to loss of precision in elapsed time in the test.	2022-12-19 16:04:52 -06:00
Nitya Dhanushkodi	d382ca0aec	extensions: refactor serverless plugin to use extensions from config entry fields (#15817 ) docs: update config entry docs and the Lambda manual registration docs Co-authored-by: Nitya Dhanushkodi <nitya@hashicorp.com> Co-authored-by: Eric <eric@haberkorn.co>	2022-12-19 12:19:37 -08:00
Chris S. Kim	d44b23cb31	Break instead (#15844 )	2022-12-19 11:53:05 -07:00
Chris S. Kim	831680d2c5	Add custom balancer to always remove subConns (#15701 ) The new balancer is a patched version of gRPC's default pick_first balancer which removes the behavior of preserving the active subconnection if a list of new addresses contains the currently active address.	2022-12-19 17:39:31 +00:00
Andrew Stucki	ab199a11b0	Add async reconciliation controller subpackage (#15534 ) * Add async reconciliation controller subpackage * Address initial feedback * Add tests for panic assertions * Fix comment	2022-12-16 16:49:26 -05:00
Dhia Ayachi	f04f88e4b9	add missing code and fix enterprise specific code (#15375 ) * add missing code and fix enterprise specific code * fix retry * fix flaky tests * fix linter error in test	2022-12-16 16:31:05 -05:00
Dhia Ayachi	2d902b26ac	add log-drop package (#15670 ) * add log-drop package * refactor to extract level * extract metrics * Apply suggestions from code review Co-authored-by: Dan Upton <daniel@floppy.co> * fix compile errors * change to implement a log sink * fix tests to remove sleep * rename and add go docs * fix expending variadic Co-authored-by: Dan Upton <daniel@floppy.co>	2022-12-15 12:52:48 -05:00
Paul Glass	619032cfcd	Deprecate -join and -join-wan (#15598 )	2022-12-14 20:28:25 +00:00
Dhia Ayachi	6468e3e09c	Server side rate limiter: handle the race condition for limiters tree write in multilimiter (#15767 ) * change to perform all tree writes in the same go routine to avoid race condition. * rename runStoreOnce to reconcile * Apply suggestions from code review Co-authored-by: Dan Upton <daniel@floppy.co> * reduce nesting Co-authored-by: Dan Upton <daniel@floppy.co>	2022-12-14 17:32:11 +00:00
Semir Patel	bafa5c7156	Pass remote addr of incoming HTTP requests through to RPC(..) calls (#15700 )	2022-12-14 09:24:22 -06:00
John Murret	e027c94b52	adding config for request_limits (#15531 ) * server: add placeholder glue for rate limit handler This commit adds a no-op implementation of the rate-limit handler and adds it to the `consul.Server` struct and setup code. This allows us to start working on the net/rpc and gRPC interceptors and config logic. * Add handler errors * Set the global read and write limits * fixing multilimiter moving packages * Fix typo * Simplify globalLimit usage * add multilimiter and tests * exporting LimitedEntity * Apply suggestions from code review Co-authored-by: John Murret <john.murret@hashicorp.com> * add config update and rename config params * add doc string and split config * Apply suggestions from code review Co-authored-by: Dan Upton <daniel@floppy.co> * use timer to avoid go routine leak and change the interface * add comments to tests * fix failing test * add prefix with config edge, refactor tests * Apply suggestions from code review Co-authored-by: Dan Upton <daniel@floppy.co> * refactor to apply configs for limiters under a prefix * add fuzz tests and fix bugs found. Refactor reconcile loop to have a simpler logic * make KeyType an exported type * split the config and limiter trees to fix race conditions in config update * rename variables * fix race in test and remove dead code * fix reconcile loop to not create a timer on each loop * add extra benchmark tests and fix tests * fix benchmark test to pass value to func * server: add placeholder glue for rate limit handler This commit adds a no-op implementation of the rate-limit handler and adds it to the `consul.Server` struct and setup code. This allows us to start working on the net/rpc and gRPC interceptors and config logic. * Set the global read and write limits * fixing multilimiter moving packages * add server configuration for global rate limiting. * remove agent test * remove added stuff from handler * remove added stuff from multilimiter * removing unnecessary TODOs * Removing TODO comment from handler * adding in defaulting to infinite * add disabled status in there * adding in documentation for disabled mode. * make disabled the default. * Add mock and agent test * addig documentation and missing mock file. * Fixing test TestLoad_IntegrationWithFlags * updating docs based on PR feedback. * Updating Request Limits mode to use int based on PR feedback. * Adding RequestLimits struct so we have a nested struct in ReloadableConfig. * fixing linting references * Update agent/consul/rate/handler.go Co-authored-by: Dan Upton <daniel@floppy.co> * Update agent/consul/config.go Co-authored-by: Dan Upton <daniel@floppy.co> * removing the ignore of the request limits in JSON. addingbuilder logic to convert any read rate or write rate less than 0 to rate.Inf * added conversion function to convert request limits object to handler config. * Updating docs to reflect gRPC and RPC are rate limit and as a result, HTTP requests are as well. * Updating values for TestLoad_FullConfig() so that they were different and discernable. * Updating TestRuntimeConfig_Sanitize * Fixing TestLoad_IntegrationWithFlags test * putting nil check in place * fixing rebase * removing change for missing error checks. will put in another PR * Rebasing after default multilimiter config change * resolving rebase issues * updating reference for incomingRPCLimiter to use interface * updating interface * Updating interfaces * Fixing mock reference Co-authored-by: Daniel Upton <daniel@floppy.co> Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>	2022-12-13 13:09:55 -07:00
Dan Stough	233dbcb67f	feat: add access logging API to proxy defaults (#15780 )	2022-12-13 14:52:18 -05:00
cskh	04bf24c8c1	feat(ingress-gateway): support outlier detection of upstream service for ingress gateway (#15614 ) * feat(ingress-gateway): support outlier detection of upstream service for ingress gateway * changelog Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>	2022-12-13 11:51:37 -05:00
Derek Menteer	e87d35e313	Fix DialedDirectly configuration for Consul dataplane. (#15760 ) Fix DialedDirectly configuration for Consul dataplane.	2022-12-13 09:16:31 -06:00
Dan Upton	c692802dec	grpc: add rate-limiting middleware (#15550 ) Implements the gRPC middleware for rate-limiting as a tap.ServerInHandle function (executed before the request is unmarshaled). Mappings between gRPC methods and their operation type are generated by a protoc plugin introduced by #15564.	2022-12-13 15:01:56 +00:00
Dan Upton	eef38c2199	server: add placeholder glue for rate limit handler (#15539 ) Adds a no-op implementation of the rate-limit handler and exposes it on the consul.Server struct. It allows us to start working on the net/rpc and gRPC interceptors and config (re)loading logic, without having to implement the full handler up-front. Co-authored-by: John Murret <john.murret@hashicorp.com> Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>	2022-12-13 11:41:54 +00:00
John Murret	cd53120cd7	agent: Fix assignment of error when auto-reloading cert and key file changes. (#15769 ) * Adding the setting of errors missing in config file watcher code in agent. * add changelog	2022-12-12 12:24:39 -07:00
R.B. Boyer	4a32070210	test: remove variable shadowing in TestDNS_ServiceLookup_ARecordLimits (#15740 )	2022-12-09 10:19:02 -06:00
Eric Haberkorn	4268c1c25c	Remove the `connect.enable_serverless_plugin` agent configuration option (#15710 )	2022-12-08 14:46:42 -05:00
Dhia Ayachi	81e40c1fac	add multilimiter and tests (#15467 ) * add multilimiter and tests * exporting LimitedEntity * go mod tidy * Apply suggestions from code review Co-authored-by: John Murret <john.murret@hashicorp.com> * add config update and rename config params * add doc string and split config * Apply suggestions from code review Co-authored-by: Dan Upton <daniel@floppy.co> * use timer to avoid go routine leak and change the interface * add comments to tests * fix failing test * add prefix with config edge, refactor tests * Apply suggestions from code review Co-authored-by: Dan Upton <daniel@floppy.co> * refactor to apply configs for limiters under a prefix * add fuzz tests and fix bugs found. Refactor reconcile loop to have a simpler logic * make KeyType an exported type * split the config and limiter trees to fix race conditions in config update * rename variables * fix race in test and remove dead code * fix reconcile loop to not create a timer on each loop * add extra benchmark tests and fix tests * fix benchmark test to pass value to func * use a separate go routine to write limiters (#15643) * use a separate go routine to write limiters * Add updating limiter when another limiter is created * fix waiter to be a ticker, so we commit more than once. * fix tests and add tests for coverage * unexport members and add tests * make UpdateConfig thread safe and multi call to Run safe * replace swith with if * fix review comments * replace time.sleep with retries * fix flaky test and remove unnecessary init * fix test races * remove unnecessary negative test case * remove fixed todo Co-authored-by: John Murret <john.murret@hashicorp.com> Co-authored-by: Dan Upton <daniel@floppy.co>	2022-12-08 14:42:07 -05:00
cskh	3df68751f5	Flakiness test: case-cfg-splitter-peering-ingress-gateways (#15707 ) * integ-test: fix flaky test - case-cfg-splitter-peering-ingress-gateways * add retry peering to all peering cases Co-authored-by: Dan Stough <dan.stough@hashicorp.com>	2022-12-07 20:19:34 -05:00
Derek Menteer	97ec5279aa	Fix local mesh gateway with peering discovery chains. (#15690 ) Fix local mesh gateway with peering discovery chains. Prior to this patch, discovery chains with peers would not properly honor the mesh gateway mode for two reasons. 1. An incorrect target upstream ID was used to lookup the mesh gateway mode. To fix this, the parent upstream uid is now used instead of the discovery-chain-target-uid to find the intended mesh gateway mode. 2. The watch for local mesh gateways was never initialized for discovery chains. To fix this, the discovery chains are now scanned, and a local GW watch is spawned if: the mesh gateway mode is local and the target is a peering connection.	2022-12-07 13:07:42 -06:00
R.B. Boyer	5af94fb2a0	connect: use -dev-no-store-token for test vaults to reduce source of flakes (#15691 ) It turns out that by default the dev mode vault server will attempt to interact with the filesystem to store the provided root token. If multiple vault instances are running they'll all awkwardly share the filesystem and if timing results in one server stopping while another one is starting then the starting one will error with: Error initializing Dev mode: rename /home/circleci/.vault-token.tmp /home/circleci/.vault-token: no such file or directory This change uses `-dev-no-store-token` to bypass that source of flakes. Also the stdout/stderr from the vault process is included if the test fails. The introduction of more `t.Parallel` use in https://github.com/hashicorp/consul/pull/15669 increased the likelihood of this failure, but any of the tests with multiple vaults in use (or running multiple package tests in parallel that all use vault) were eventually going to flake on this.	2022-12-06 13:15:13 -06:00
R.B. Boyer	900584ca82	connect: ensure all vault connect CA tests use limited privilege tokens (#15669 ) All of the current integration tests where Vault is the Connect CA now use non-root tokens for the test. This helps us detect privilege changes in the vault model so we can keep our guides up to date. One larger change was that the RenewIntermediate function got refactored slightly so it could be used from a test, rather than the large duplicated function we were testing in a test which seemed error prone.	2022-12-06 10:06:36 -06:00
R.B. Boyer	4940a728ab	Detect Vault 1.11+ import in secondary datacenters and update default issuer (#15661 ) The fix outlined and merged in #15253 fixed the issue as it occurs in the primary DC. There is a similar issue that arises when vault is used as the Connect CA in a secondary datacenter that is fixed by this PR. Additionally: this PR adds support to run the existing suite of vault related integration tests against the last 4 versions of vault (1.9, 1.10, 1.11, 1.12)	2022-12-05 15:39:21 -06:00
Chris S. Kim	c046d1a4d8	Add warn log when all ACL policies are filtered out (#15632 )	2022-12-05 11:26:10 -05:00
cskh	36f05bc8fb	integ-test: test consul upgrade from the snapshot of a running cluster (#15595 ) * integ-test: test consul upgrade from the snapshot of a running cluster * use Target version as default Co-authored-by: Dan Stough <dan.stough@hashicorp.com>	2022-12-01 10:39:09 -05:00
R.B. Boyer	11a277f372	peering: better represent non-passing states during peer check flattening (#15615 ) During peer stream replication we flatten checks from the source cluster and build one thin overall check to hide the irrelevant details from the consuming cluster. This flattening logic did correctly flip to non-passing if there were any non-passing checks, but WHICH status it got during that was random (warn/error). Also it didn't represent "maintenance" operations. There is an api package call AggregatedStatus which more correctly flattened check statuses. This PR replicated the more complete logic into the peer stream package.	2022-11-30 11:29:21 -06:00
Freddy	941f6da202	Remove log line about server mgmt token init (#15610 ) * Remove log line about server mgmt token init Currently the server management token is only being bootstrapped in the primary datacenter. That means that servers on the secondary datacenter will never have this token available, and would log this line any time a token is resolved. Bootstrapping the token in secondary datacenters will be done in a follow-up. * Add changelog entry	2022-11-29 17:56:03 -05:00
James Oulman	7e78fb7818	Add support for configuring Envoys route idle_timeout (#14340 ) * Add idleTimeout Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>	2022-11-29 17:43:15 -05:00
Derek Menteer	95dc0c7b30	Add peering `.service` and `.node` DNS lookups. (#15596 ) Add peering `.service` and `.node` DNS lookups.	2022-11-29 12:23:18 -06:00
cskh	97c9432843	fix(peering): increase the gRPC limit to 8MB (#15503 ) * fix(peering): increase the gRPC limit to 50MB * changelog * update gRPC limit to 8MB	2022-11-28 17:48:43 -05:00
Chris S. Kim	c9ec9fa320	Fix Vault managed intermediate PKI bug (#15525 )	2022-11-28 16:17:58 -05:00
Chris S. Kim	27c53f6c82	Use backport-compatible assertion (#15546 ) * Use backport-compatible assertion * Add workaround for broken apt-get	2022-11-24 11:44:20 -05:00
Chris S. Kim	386da5439a	Use rpcHoldTimeout to calculate blocking timeout (#15541 ) Adds buffer to clients so that servers have time to respond to blocking queries.	2022-11-24 10:13:02 -05:00
Jared Kirschner	3e7e8ae9c5	Support RFC 2782 for prepared query DNS lookups (#14465 ) Format: _<query id or name>._tcp.query[.<datacenter>].<domain>	2022-11-20 17:21:24 -05:00
Alexander Scheel	2b90307f6d	Detect Vault 1.11+ import, update default issuer (#15253 ) Consul used to rely on implicit issuer selection when calling Vault endpoints to issue new CSRs. Vault 1.11+ changed that behavior, which caused Consul to check the wrong (previous) issuer when renewing its Intermediate CA. This patch allows Consul to explicitly set a default issuer when it detects that the response from Vault is 1.11+. Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> Co-authored-by: Chris S. Kim <ckim@hashicorp.com>	2022-11-17 16:29:49 -05:00
cskh	435e16ecda	fix: clarifying error message when acquiring a lock in remote dc (#15394 ) * fix: clarifying error message when acquiring a lock in remote dc * Update website/content/commands/lock.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>	2022-11-16 15:27:37 -05:00
Kyle Havlovitz	f4c3e54b11	auto-config: relax node name validation for JWT authorization (#15370 ) * auto-config: relax node name validation for JWT authorization This changes the JWT authorization logic to allow all non-whitespace, non-quote characters when validating node names. Consul had previously allowed these characters in node names, until this validation was added to fix a security vulnerability with whitespace/quotes being passed to the `bexpr` library. This unintentionally broke node names with characters like `.` which aren't related to this vulnerability. * Update website/content/docs/agent/config/cli-flags.mdx Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com> Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>	2022-11-14 18:24:40 -06:00
Dhia Ayachi	225ae55e83	Leadership transfer cmd (#14132 ) * add leadership transfer command * add RPC call test (flaky) * add missing import * add changelog * add command registration * Apply suggestions from code review Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> * add the possibility of providing an id to raft leadership transfer. Add few tests. * delete old file from cherry pick * rename changelog filename to PR # * rename changelog and fix import * fix failing test * check for OperatorWrite Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> * rename from leader-transfer to transfer-leader * remove version check and add test for operator read * move struct to operator.go * first pass * add code for leader transfer in the grpc backend and tests * wire the http endpoint to the new grpc endpoint * remove the RPC endpoint * remove non needed struct * fix naming * add mog glue to API * fix comment * remove dead code * fix linter error * change package name for proto file * remove error wrapping * fix failing test * add command registration * add grpc service mock tests * fix receiver to be pointer * use defined values Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> * reuse MockAclAuthorizer * add documentation * remove usage of external.TokenFromContext * fix failing tests * fix proto generation * Apply suggestions from code review Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com> * Apply suggestions from code review * add more context in doc for the reason * Apply suggestions from docs code review Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * regenerate proto * fix linter errors Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com> Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com> Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2022-11-14 15:35:12 -05:00
Freddy	706866fa00	Ensure that NodeDump imported nodes are filtered (#15356 )	2022-11-14 12:35:20 -07:00
Freddy	c58f86a00f	Fixup authz for data imported from peers (#15347 ) There are a few changes that needed to be made to to handle authorizing reads for imported data: - If the data was imported from a peer we should not attempt to read the data using the traditional authz rules. This is because the name of services/nodes in a peer cluster are not equivalent to those of the importing cluster. - If the data was imported from a peer we need to check whether the token corresponds to a service, meaning that it has service:write permissions, or to a local read only token that can read all nodes/services in a namespace. This required changes at the policyAuthorizer level, since that is the only view available to OSS Consul, and at the enterprise partition/namespace level.	2022-11-14 11:36:27 -07:00
Kyle Havlovitz	dde5c524ad	connect: strip port from DNS SANs for ingress gateway leaf cert (#15320 ) * connect: strip port from DNS SANs for ingress gateway leaf cert * connect: format DNS SANs in CreateCSR * connect: Test wildcard case when formatting SANs	2022-11-14 10:27:03 -08:00
Derek Menteer	931cec42b3	Prevent serving TLS via ports.grpc (#15339 ) Prevent serving TLS via ports.grpc We remove the ability to run the ports.grpc in TLS mode to avoid confusion and to simplify configuration. This breaking change ensures that any user currently using ports.grpc in an encrypted mode will receive an error message indicating that ports.grpc_tls must be explicitly used. The suggested action for these users is to simply swap their ports.grpc to ports.grpc_tls in the configuration file. If both ports are defined, or if the user has not configured TLS for grpc, then the error message will not be printed.	2022-11-11 14:29:22 -06:00
Dan Stough	626249fbf5	[OSS] fix: wait and try longer to peer through mesh gw (#15328 )	2022-11-10 13:54:00 -05:00
Kyle Schochenmaier	bf0f61a878	removes ioutil usage everywhere which was deprecated in go1.16 (#15297 ) * update go version to 1.18 for api and sdk, go mod tidy * removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward. Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>	2022-11-10 10:26:01 -06:00
malizz	b51f0e25e9	update ACLs for cluster peering (#15317 ) * update ACLs for cluster peering * add changelog * Update .changelog/15317.txt Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com> Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>	2022-11-09 13:02:58 -08:00
malizz	b9a9e1219c	update config defaults, add docs (#15302 ) * update config defaults, add docs * update grpc tls port for non-default values * add changelog * Update website/content/docs/upgrading/upgrade-specific.mdx Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com> * Update website/content/docs/agent/config/config-files.mdx Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com> * update logic for setting grpc tls port value * move default config to default.go, update changelog * update docs * Fix config tests. * Fix linter error. * Fix ConnectCA tests. * Cleanup markdown on upgrade notes. Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com> Co-authored-by: Derek Menteer <derek.menteer@hashicorp.com>	2022-11-09 09:29:55 -08:00
Eric Haberkorn	c340922991	Log Warnings When Peering With Mesh Gateway Mode None (#15304 ) warn when mesh gateway mode is set to none for peering	2022-11-09 11:48:58 -05:00
Derek Menteer	418bd62c44	Fix mesh gateway configuration with proxy-defaults (#15186 ) * Fix mesh gateway proxy-defaults not affecting upstreams. * Clarify distinction with upstream settings Top-level mesh gateway mode in proxy-defaults and service-defaults gets merged into NodeService.Proxy.MeshGateway, and only gets merged with the mode attached to an an upstream in proxycfg/xds. * Fix mgw mode usage for peered upstreams There were a couple issues with how mgw mode was being handled for peered upstreams. For starters, mesh gateway mode from proxy-defaults and the top-level of service-defaults gets stored in NodeService.Proxy.MeshGateway, but the upstream watch for peered data was only considering the mesh gateway config attached in NodeService.Proxy.Upstreams[i]. This means that applying a mesh gateway mode via global proxy-defaults or service-defaults on the downstream would not have an effect. Separately, transparent proxy watches for peered upstreams didn't consider mesh gateway mode at all. This commit addresses the first issue by ensuring that we overlay the upstream config for peered upstreams as we do for non-peered. The second issue is addressed by re-using setupWatchesForPeeredUpstream when handling transparent proxy updates. Note that for transparent proxies we do not yet support mesh gateway mode per upstream, so the NodeService.Proxy.MeshGateway mode is used. * Fix upstream mesh gateway mode handling in xds This commit ensures that when determining the mesh gateway mode for peered upstreams we consider the NodeService.Proxy.MeshGateway config as a baseline. In absense of this change, setting a mesh gateway mode via proxy-defaults or the top-level of service-defaults will not have an effect for peered upstreams. * Merge service/proxy defaults in cfg resolver Previously the mesh gateway mode for connect proxies would be merged at three points: 1. On servers, in ComputeResolvedServiceConfig. 2. On clients, in MergeServiceConfig. 3. On clients, in proxycfg/xds. The first merge returns a ServiceConfigResponse where there is a top-level MeshGateway config from proxy/service-defaults, along with per-upstream config. The second merge combines per-upstream config specified at the service instance with per-upstream config specified centrally. The third merge combines the NodeService.Proxy.MeshGateway config containing proxy/service-defaults data with the per-upstream mode. This third merge is easy to miss, which led to peered upstreams not considering the mesh gateway mode from proxy-defaults. This commit removes the third merge, and ensures that all mesh gateway config is available at the upstream. This way proxycfg/xds do not need to do additional overlays. * Ensure that proxy-defaults is considered in wc Upstream defaults become a synthetic Upstream definition under a wildcard key "". Now that proxycfg/xds expect Upstream definitions to have the final MeshGateway values, this commit ensures that values from proxy-defaults/service-defaults are the default for this synthetic upstream. Add changelog. Co-authored-by: freddygv <freddy@hashicorp.com>	2022-11-09 10:14:29 -06:00
Dan Upton	7b2d08d461	chore: remove unused argument from MergeNodeServiceWithCentralConfig (#15024 ) Previously, the MergeNodeServiceWithCentralConfig method accepted a ServiceSpecificRequest argument, of which only the Datacenter and QueryOptions fields were used. Digging a little deeper, it turns out these fields were only passed down to the ComputeResolvedServiceConfig method (through the ServiceConfigRequest struct) which didn't actually use them. As such, not all call-sites passed a valid ServiceSpecificRequest so it's safer to remove the argument altogether to prevent future changes from depending on it.	2022-11-09 14:54:57 +00:00
Derek Menteer	b64972d486	Bring back parameter ServerExternalAddresses in GenerateToken endpoint (#15267 ) Re-add ServerExternalAddresses parameter in GenerateToken endpoint This reverts commit `5e156772f6` and adds extra functionality to support newer peering behaviors.	2022-11-08 14:55:18 -06:00
cskh	a3f57cc5e8	fix(mesh-gateway): remove deregistered service from mesh gateway (#15272 ) * fix(mesh-gateway): remove deregistered service from mesh gateway * changelog Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com> Co-authored-by: Evan Culver <eculver@users.noreply.github.com>	2022-11-07 20:30:15 -05:00
Freddy	7f5f7e9cf9	Avoid blocking child type updates on parent ack (#15083 )	2022-11-07 18:10:42 -07:00
Derek Menteer	c064ddf606	Backport test fix from ent. (#15279 )	2022-11-07 12:17:46 -06:00
Chris S. Kim	985a4ee1b1	Update hcp-scada-provider to fix diamond dependency problem with go-msgpack (#15185 )	2022-11-07 11:34:30 -05:00
Eric Haberkorn	1804b58799	Fix a bug in mesh gateway proxycfg where ACL tokens aren't passed. (#15273 )	2022-11-07 10:00:11 -05:00
Dan Stough	553312ef61	fix: persist peering CA updates to dialing clusters (#15243 ) fix: persist peering CA updates to dialing clusters	2022-11-04 12:53:20 -04:00
Derek Menteer	18d6c338f4	Backport tests from ent. (#15260 ) * Backport agent tests. Original commit: 0710b2d12fb51a29cedd1119b5fb086e5c71f632 Original commit: aaedb3c28bfe247266f21013d500147d8decb7cd (partial) * Backport test fix and reduce flaky failures.	2022-11-04 10:19:24 -05:00
Derek Menteer	0834fe349b	Backport test from ENT: "Fix missing test fields" (#15258 ) * Backport test from ENT: "Fix missing test fields" Original Author: Sarah Pratt Original Commit: a5c88bef7a969ea5d06ed898d142ab081ba65c69 * Update with proper linting.	2022-11-04 09:29:16 -05:00
Derek Menteer	f4cb2f82bf	Backport various fixes from ENT. (#15254 ) * Regenerate golden files. * Backport from ENT: "Avoid race" Original commit: 5006c8c858b0e332be95271ef9ba35122453315b Original author: freddygv * Backport from ENT: "chore: fix flake peerstream test" Original commit: b74097e7135eca48cc289798c5739f9ef72c0cc8 Original author: DanStough	2022-11-03 16:34:57 -05:00
malizz	617a5f2dc2	convert stream status time fields to pointers (#15252 )	2022-11-03 11:51:22 -07:00
sarahalsmiller	436160e155	Added check for empty peeringsni in restrictPeeringEndpoints (#15239 ) Add check for empty peeringSNI in restrictPeeringEndpoints Co-authored-by: Derek Menteer <derek.menteer@hashicorp.com>	2022-11-02 17:20:52 -05:00
Derek Menteer	bd1019fadb	Prevent peering acceptor from subscribing to addr updates. (#15214 )	2022-11-02 07:55:41 -05:00
Dan Stough	05e93f7569	test: refactor testcontainers and add peering integ tests (#15084 )	2022-11-01 15:03:23 -04:00
Derek Menteer	fa5d87c116	Decrease retry time for failed peering connections.	2022-10-31 14:30:27 -05:00
R.B. Boyer	97b9fcbf48	test: fix flaky TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages test (#15195 ) Allow for some message duplication in subscription events during assertions. I'm pretty sure the subscriptions machinery allows for messages to occasionally be duplicated instead of dropping them, as a once-and-only-once queue is a pipe dream and you have to pick one of the other two options.	2022-10-31 12:10:43 -05:00
Evan Culver	62d4517f9e	connect: Add Envoy 1.24 to integration tests, remove Envoy 1.20 (#15093 )	2022-10-31 10:50:45 -05:00
Derek Menteer	693c8a4706	Allow peering endpoints to bypass verify_incoming.	2022-10-31 09:56:30 -05:00
Derek Menteer	2d4b62be3c	Add tests.	2022-10-31 08:45:00 -05:00
Derek Menteer	1483c94531	Fix peered service protocols using proxy-defaults.	2022-10-31 08:45:00 -05:00
Eric Haberkorn	cf50bdbe20	Fix peering metrics bug (#15178 ) This bug was caused by the peering health metric being set to NaN.	2022-10-28 10:51:12 -04:00
Chris S. Kim	0e176dd6aa	Allow consul debug on non-ACL consul servers (#15155 )	2022-10-27 09:25:18 -04:00
cskh	a9427e1310	fix(peering): nil pointer in calling handleUpdateService (#15160 ) * fix(peering): nil pointer in calling handleUpdateService * changelog	2022-10-26 11:50:34 -04:00
Eric Haberkorn	1bdad89026	fix bug that resulted in generating Envoy configs that use CDS with an EDS configuration (#15140 )	2022-10-25 14:49:57 -04:00
Luke Kysow	d3aa2bd9c5	ingress-gateways: don't log error when registering gateway (#15001 ) * ingress-gateways: don't log error when registering gateway Previously, when an ingress gateway was registered without a corresponding ingress gateway config entry, an error was logged because the watch on the config entry returned a nil result. This is expected so don't log an error.	2022-10-25 10:55:44 -07:00
Luke Kysow	9999672fd7	autoencrypt: helpful error for clients with wrong dc (#14832 ) * autoencrypt: helpful error for clients with wrong dc If clients have set a different datacenter than the servers they're connecting with for autoencrypt, give a helpful error message.	2022-10-25 10:13:41 -07:00
R.B. Boyer	3c44116a8f	cache: refactor agent cache fetching to prevent unnecessary fetches on error (#14956 ) This continues the work done in #14908 where a crude solution to prevent a goroutine leak was implemented. The former code would launch a perpetual goroutine family every iteration (+1 +1) and the fixed code simply caused a new goroutine family to first cancel the prior one to prevent the leak (-1 +1 == 0). This PR refactors this code completely to: - make it more understandable - remove the recursion-via-goroutine strangeness - prevent unnecessary RPC fetches when the prior one has errored. The core issue arose from a conflation of the entry.Fetching field to mean: - there is an RPC (blocking query) in flight right now - there is a goroutine running to manage the RPC fetch retry loop The problem is that the goroutine-leak-avoidance check would treat Fetching like (2), but within the body of a goroutine it would flip that boolean back to false before the retry sleep. This would cause a new chain of goroutines to launch which #14908 would correct crudely. The refactored code uses a plain for-loop and changes the semantics to track state for "is there a goroutine associated with this cache entry" instead of the former. We use a uint64 unique identity per goroutine instead of a boolean so that any orphaned goroutines can tell when they've been replaced when the expiry loop deletes a cache entry while the goroutine is still running and is later replaced.	2022-10-25 10:27:26 -05:00
R.B. Boyer	da70daba43	test: ensure that all dependencies in a test agent use the test logger (#14996 )	2022-10-24 17:02:38 -05:00
Chris S. Kim	9f0ed81cfd	Remove invalid 1xx HTTP codes These tests started failing in go1.19, presumably due to support for valid 1xx responses being added. https://github.com/golang/go/issues/56346	2022-10-24 16:12:08 -04:00
Chris S. Kim	bde57c0dd0	Regenerate files according to 1.19.2 formatter	2022-10-24 16:12:08 -04:00
cskh	db82ffe503	fix(peering): replicating wan address (#15108 ) * fix(peering): replicating wan address * add changelog * unit test	2022-10-24 15:44:57 -04:00
Iryna Shustava	176abb5ff2	proxycfg: watch service-defaults config entries (#15025 ) To support Destinations on the service-defaults (for tproxy with terminating gateway), we need to now also make servers watch service-defaults config entries.	2022-10-24 12:50:28 -06:00
Chris S. Kim	b236e86030	Move oss-only test to its own file	2022-10-24 14:17:43 -04:00
R.B. Boyer	d04cf25fa8	test: fix flaky TestHealthServiceNodes_NodeMetaFilter by waiting until the streaming subsystem has a valid grpc connection (#15019 ) Also potentially unflakes TestHealthIngressServiceNodes for similar reasons.	2022-10-24 13:09:53 -05:00
R.B. Boyer	300860412c	chore: update golangci-lint to v1.50.1 (#15022 )	2022-10-24 11:48:02 -05:00
Venu Yanamandra	efc813e92d	Update error message when restoring ENT snapshot in OSS (#15066 )	2022-10-24 11:40:26 -04:00
freddygv	d65e60de86	Return forbidden on permission denied This commit updates the establish endpoint to bubble up a 403 status code to callers when the establishment secret from the token is invalid. This is a signal that a new peering token must be generated.	2022-10-20 17:11:49 -06:00
Chris S. Kim	a7ea26192b	Update expected encoding in test go-memdb was updated in v1.3.3 to make integers in indexes sortable, which changed how integers were encoded.	2022-10-20 14:32:42 -04:00
freddygv	6d9be5fb15	Use plain TaggedAddressWAN	2022-10-19 16:32:44 -06:00
freddygv	8d211cc9cc	Add unit test	2022-10-19 16:26:15 -06:00
cskh	058ee4fb84	fix: wan address isn't used by peering token	2022-10-19 16:33:25 -04:00
Nitya Dhanushkodi	5e156772f6	Remove ability to specify external addresses in GenerateToken endpoint (#14930 ) * Reverts "update generate token endpoint to take external addresses (#13844)" This reverts commit `f47319b7c6`.	2022-10-19 09:31:36 -07:00
Kyle Havlovitz	5c3427608b	Merge pull request #15035 from hashicorp/vault-ttl-update-warn Warn instead of returning error when missing intermediate mount tune permissions	2022-10-18 15:41:52 -07:00
cskh	d562d363fc	peering: skip registering duplicate node and check from the peer (#14994 ) * peering: skip register duplicate node and check from the peer * Prebuilt the nodes map and checks map to avoid repeated for loop * use key type to struct: node id, service id, and check id	2022-10-18 16:19:24 -04:00
Chris S. Kim	29a297d3e9	Refactor client RPC timeouts (#14965 ) Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created. Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.	2022-10-18 15:05:09 -04:00
Kyle Havlovitz	d122108992	Warn instead of returning an error when intermediate mount tune permission is missing	2022-10-18 12:01:25 -07:00
R.B. Boyer	0cca4c088d	test: possibly fix flake in TestIntentionGetExact (#15021 ) Restructure test setup to be similar to TestAgent_ServerCertificate and see if that's enough to avoid flaking after join.	2022-10-18 10:51:20 -05:00
R.B. Boyer	fe2d41ddad	cache: prevent goroutine leak in agent cache (#14908 ) There is a bug in the error handling code for the Agent cache subsystem discovered: 1. NotifyCallback calls notifyBlockingQuery which calls getWithIndex in a loop (which backs off on-error up to 1 minute) 2. getWithIndex calls fetch if there’s no valid entry in the cache 3. fetch starts a goroutine which calls Fetch on the cache-type, waits for a while (again with backoff up to 1 minute for errors) and then calls fetch to trigger a refresh The end result being that every 1 minute notifyBlockingQuery spawns an ancestry of goroutines that essentially lives forever. This PR ensures that the goroutine started by `fetch` cancels any prior goroutine spawned by the same line for the same key. In isolated testing where a cache type was tweaked to indefinitely error, this patch prevented goroutine counts from skyrocketing.	2022-10-17 14:38:10 -05:00
R.B. Boyer	02a858efa0	ca: fix a masked bug in leaf cert generation that would not be notified of root cert rotation after the first one (#15005 ) In practice this was masked by #14956 and was only uncovered fixing the other bug. go test ./agent -run TestAgentConnectCALeafCert_goodNotLocal would fail when only #14956 was fixed.	2022-10-17 13:24:27 -05:00
Chris S. Kim	3d2dffff16	Merge pull request #13388 from deblasis/feature/health-checks_windows_service Feature: Health checks windows service	2022-10-17 09:26:19 -04:00
Dan Upton	f8b4b41205	proxycfg: fix goroutine leak when service is re-registered (#14988 ) Fixes a bug where we'd leak a goroutine in state.run when the given context was canceled while there was a pending update.	2022-10-17 11:31:10 +01:00
Kyle Havlovitz	aaf892a383	Extend tcp keepalive settings to work for terminating gateways as well	2022-10-14 17:05:46 -07:00
Kyle Havlovitz	2c569f6b9c	Update docs and add tcp_keepalive_probes setting	2022-10-14 17:05:46 -07:00
Kyle Havlovitz	2242d1ec4a	Add TCP keepalive settings to proxy config for mesh gateways	2022-10-14 17:05:46 -07:00
Derek Menteer	2a33d0ff96	Fix issue with incorrect method signature on test.	2022-10-14 11:04:57 -05:00
Freddy	24d0c8801a	Merge pull request #14981 from hashicorp/peering/dial-through-gateways	2022-10-14 09:44:56 -06:00
Dan Upton	328e3ff563	proxycfg: rate-limit delivery of config snapshots (#14960 ) Adds a user-configurable rate limiter to proxycfg snapshot delivery, with a default limit of 250 updates per second. This addresses a problem observed in our load testing of Consul Dataplane where updating a "global" resource such as a wildcard intention or the proxy-defaults config entry could starve the Raft or Memberlist goroutines of CPU time, causing general cluster instability.	2022-10-14 15:52:00 +01:00
Derek Menteer	29ebcf5ff0	Add tests for peering state snapshots / restores.	2022-10-14 09:48:04 -05:00
Derek Menteer	e3ff9912d0	Add test for ExportedServicesForAllPeersByName	2022-10-14 09:48:04 -05:00
Dan Upton	e6b55d1d81	perf: remove expensive reflection from xDS hot path (#14934 ) Replaces the reflection-based implementation of proxycfg's ConfigSnapshot.Clone with code generated by deep-copy. While load testing server-based xDS (for consul-dataplane) we discovered this method is extremely expensive. The ConfigSnapshot struct, directly or indirectly, contains a copy of many of the structs in the agent/structs package, which creates a large graph for copystructure.Copy to traverse at runtime, on every proxy reconfiguration.	2022-10-14 10:26:42 +01:00
freddygv	c77123a2aa	Use split var in tests	2022-10-13 17:12:47 -06:00
freddygv	bf51021c07	Use split wildcard partition name This way OSS avoids passing a non-empty label, which will be rejected in OSS consul.	2022-10-13 16:55:28 -06:00
Freddy	ee4cdc4985	Merge pull request #14935 from hashicorp/fix/alias-leak	2022-10-13 16:31:15 -06:00
freddygv	573aa408a1	Lint	2022-10-13 15:55:55 -06:00
Derek Menteer	0f424e3cdf	Reset wait on ensureServerAddrSubscription	2022-10-13 15:58:26 -05:00
freddygv	96fdd3728a	Fix CA init error code	2022-10-13 14:58:11 -06:00
freddygv	2c99a21596	Update leader routine to maybe use gateways	2022-10-13 14:58:00 -06:00
freddygv	e69bc727ec	Update peering establishment to maybe use gateways When peering through mesh gateways we expect outbound dials to peer servers to flow through the local mesh gateway addresses. Now when establishing a peering we get a list of dial addresses as a ring buffer that includes local mesh gateway addresses if the local DC is configured to peer through mesh gateways. The ring buffer includes the mesh gateway addresses first, but also includes the remote server addresses as a fallback. This fallback is present because it's possible that direct egress from the servers may be allowed. If not allowed then the leader will cycle back to a mesh gateway address through the ring. When attempting to dial the remote servers we retry up to a fixed timeout. If using mesh gateways we also have an initial wait in order to allow for the mesh gateways to configure themselves. Note that if we encounter a permission denied error we do not retry since that error indicates that the secret in the peering token is invalid.	2022-10-13 14:57:55 -06:00
malizz	b0b0cbb8ee	increase protobuf size limit for cluster peering (#14976 )	2022-10-13 13:46:51 -07:00
Derek Menteer	4e140c98bc	Address PR comments.	2022-10-13 14:11:02 -05:00
Derek Menteer	1e394da400	Disallow peering to the same cluster.	2022-10-13 14:11:02 -05:00
Derek Menteer	8742fbe14f	Prevent consul peer-exports by discovery chain.	2022-10-13 12:45:09 -05:00
Derek Menteer	f366edcb8d	Prevent the "consul" service from being exported.	2022-10-13 12:45:09 -05:00
Derek Menteer	caa1396255	Add remote peer partition and datacenter info.	2022-10-13 10:37:41 -05:00
Dan Upton	cbb4a030c4	xds: properly merge central config for "agentless" services (#14962 )	2022-10-13 12:04:59 +01:00
Dan Upton	0af9f16343	bug: fix goroutine leaks caused by incorrect usage of `WatchCh` (#14916 ) memdb's `WatchCh` method creates a goroutine that will publish to the returned channel when the watchset is triggered or the given context is canceled. Although this is called out in its godoc comment, it's not obvious that this method creates a goroutine who's lifecycle you need to manage. In the xDS capacity controller, we were calling `WatchCh` on each iteration of the control loop, meaning the number of goroutines would grow on each autopilot event until there was catalog churn. In the catalog config source, we were calling `WatchCh` with the background context, meaning that the goroutine would keep running after the sync loop had terminated.	2022-10-13 12:04:27 +01:00
Hans Hasselberg	0d5935ab83	adding configuration option cloud.scada_address (#14936 ) * adding scada_address * config tests * add changelog entry	2022-10-13 11:31:28 +02:00
Paul Glass	bcda205f88	Add consul.xds.server.streamStart metric (#14957 ) This adds a new consul.xds.server.streamStart metric to measure the time taken to first generate xDS resources after an xDS stream is opened.	2022-10-12 14:17:58 -05:00
Riddhi Shah	345191a0df	Service http checks data source for agentless proxies (#14924 ) Adds another datasource for proxycfg.HTTPChecks, for use on server agents. Typically these checks are performed by local client agents and there is no equivalent of this in agentless (where servers configure consul-dataplane proxies). Hence, the data source is mostly a no-op on servers but in the case where the service is present within the local state, it delegates to the cache data source.	2022-10-12 07:49:56 -07:00
Freddy	9ca8bb8ec4	Merge pull request #14958 from hashicorp/peering/nonce	2022-10-12 08:18:15 -06:00
freddygv	1b46b35041	Actually track nonce in test	2022-10-12 07:50:17 -06:00
Derek Menteer	f330438a45	Fix incorrect backoff-wait logic.	2022-10-12 08:01:10 -05:00
freddygv	7f9a5d0f58	Add basic nonce management This commit adds a monotonically increasing nonce to include in peering replication response messages. Every ack/nack from the peer handling a response will include this nonce, allowing to correlate the ack/nack with a specific resource. At the moment nothing is done with the nonce when it is received. In the future we may want to add functionality such as retries on NACKs, depending on the class of error.	2022-10-11 19:02:04 -06:00
Paul Glass	d17af23641	gRPC server metrics (#14922 ) * Move stats.go from grpc-internal to grpc-middleware * Update grpc server metrics with server type label * Add stats test to grpc-external * Remove global metrics instance from grpc server tests	2022-10-11 17:00:32 -05:00
cskh	e0356e1502	fix(peering): add missing grpc_tls_port for server address reconciliation (#14944 )	2022-10-11 10:56:29 -04:00
freddygv	f4cc4577ca	Fix alias check leak Preivously when alias check was removed it would not be stopped nor cleaned up from the associated aliasChecks map. This means that any time an alias check was deregistered we would leak a goroutine for CheckAlias.run() because the stopCh would never be closed. This issue mostly affects service mesh deployments on platforms where the client agent is mostly static but proxy services come and go regularly, since by default sidecars are registered with an alias check.	2022-10-10 16:42:29 -06:00
James Oulman	b8bd7a3058	Configure Envoy alpn_protocols based on service protocol (#14356 ) * Configure Envoy alpn_protocols based on service protocol * define alpnProtocols in a more standard way * http2 protocol should be h2 only * formatting * add test for getAlpnProtocol() * create changelog entry * change scope is connect-proxy * ignore errors on ParseProxyConfig; fixes linter * add tests for grpc and http2 public listeners * remove newlines from PR * Add alpn_protocol configuration for ingress gateway * Guard against nil tlsContext * add ingress gateway w/ TLS tests for gRPC and HTTP2 * getAlpnProtocols: add TCP protocol test * add tests for ingress gateway with grpc/http2 and per-listener TLS config * add tests for ingress gateway with grpc/http2 and per-listener TLS config * add Gateway level TLS config with mixed protocol listeners to validate ALPN * update changelog to include ingress-gateway * add http/1.1 to http2 ALPN * go fmt * fix test on custom-trace-listener	2022-10-10 13:13:56 -07:00
freddygv	bf72df7b0e	Fixup test	2022-10-10 13:20:14 -06:00
Chris S. Kim	4f4112662e	Fix nil pointer	2022-10-10 13:20:14 -06:00
Chris S. Kim	b0a4c5c563	Include stream-related information in peering endpoints	2022-10-10 13:20:14 -06:00
Paul Glass	c0c187f1c5	Merge central config for GetEnvoyBootstrapParams (#14869 ) This fixes GetEnvoyBootstrapParams to merge in proxy-defaults and service-defaults. Co-authored-by: Dan Upton <daniel@floppy.co>	2022-10-10 12:40:27 -05:00
Freddy	4abad02abd	Merge pull request #14796 from hashicorp/peering/use-connect-ca	2022-10-07 10:37:37 -06:00
freddygv	7d4da6eb22	Fixup test	2022-10-07 09:34:16 -06:00
freddygv	3034df6a5c	Require Connect and TLS to generate peering tokens By requiring Connect and a gRPC TLS listener we can automatically configure TLS for all peering control-plane traffic.	2022-10-07 09:06:29 -06:00
freddygv	fac3ddc857	Use internal server certificate for peering TLS A previous commit introduced an internally-managed server certificate to use for peering-related purposes. Now the peering token has been updated to match that behavior: - The server name matches the structure of the server cert - The CA PEMs correspond to the Connect CA Note that if Conect is disabled, and by extension the Connect CA, we fall back to the previous behavior of returning the manually configured certs and local server SNI. Several tests were updated to use the gRPC TLS port since they enable Connect by default. This means that the peering token will embed the Connect CA, and the dialer will expect a TLS listener.	2022-10-07 09:05:32 -06:00
freddygv	5f97223822	Simplify mgw watch mgmt	2022-10-07 08:54:37 -06:00
freddygv	d54db25421	Use existing query options to build ctx	2022-10-07 08:46:53 -06:00
DanStough	77ab28c5c7	feat: xDS updates for peerings control plane through mesh gw	2022-10-07 08:46:42 -06:00
Eric Haberkorn	1633cf20ea	Make the mesh gateway changes to allow `local` mode for cluster peering data plane traffic (#14817 ) Make the mesh gateway changes to allow `local` mode for cluster peering data plane traffic	2022-10-06 09:54:14 -04:00
cskh	c1b5f34fb7	fix: missing UDP field in checkType (#14885 ) * fix: missing UDP field in checkType * Add changelog * Update doc	2022-10-05 15:57:21 -04:00
Derek Menteer	a279d2d329	Fix explicit tproxy listeners with discovery chains. (#14751 ) Fix explicit tproxy listeners with discovery chains.	2022-10-05 14:38:25 -05:00
Alex Oskotsky	13da2c5fad	Add the ability to retry on reset connection to service-routers (#12890 )	2022-10-05 13:06:44 -04:00
John Murret	79a541fd7d	Upgrade serf to v0.10.1 and memberlist to v0.5.0 to get memberlist size metrics and broadcast queue depth metric (#14873 ) * updating to serf v0.10.1 and memberlist v0.5.0 to get memberlist size metrics and memberlist broadcast queue depth metric * update changelog * update changelog * correcting changelog * adding "QueueCheckInterval" for memberlist to test * updating integration test containers to grab latest api	2022-10-04 17:51:37 -06:00
Evan Culver	a3be5a5a82	connect: Bump Envoy 1.20 to 1.20.7, 1.21 to 1.21.5 and 1.22 to 1.22.5 (#14831 )	2022-10-04 13:15:01 -07:00
Eric Haberkorn	1b565444be	Rename `PeerName` to `Peer` on prepared queries and exported services (#14854 )	2022-10-04 14:46:15 -04:00
Freddy	d9fe3578ac	Merge pull request #14734 from hashicorp/NET-643-update-mesh-gateway-envoy-config-for-inbound-peering-control-plane-traffic	2022-10-03 12:54:11 -06:00
freddygv	b15d41534f	Update xds generation for peering over mesh gws This commit adds the xDS resources needed for INBOUND traffic from peer clusters: - 1 filter chain for all inbound peering requests. - 1 cluster for all inbound peering requests. - 1 endpoint per voting server with the gRPC TLS port configured. There is one filter chain and cluster because unlike with WAN federation, peer clusters will not attempt to dial individual servers. Peer clusters will only dial the local mesh gateway addresses.	2022-10-03 12:42:27 -06:00
freddygv	a8c4d6bc55	Share mgw addrs in peering stream if needed This commit adds handling so that the replication stream considers whether the user intends to peer through mesh gateways. The subscription will return server or mesh gateway addresses depending on the mesh configuration setting. These watches can be updated at runtime by modifying the mesh config entry.	2022-10-03 11:42:20 -06:00
freddygv	4ff9d475b0	Return mesh gateway addrs if peering through mgw	2022-10-03 11:35:10 -06:00
chappie	ad7295e5d9	Merge pull request #14811 from hashicorp/chappie/dns Add DNS gRPC proxying support	2022-10-03 08:02:48 -07:00
Chris Chapman	d7b5351b66	Making suggested comments	2022-09-30 15:03:33 -07:00
Chris Chapman	46bea72212	Making suggested changes	2022-09-30 14:51:12 -07:00
Chris Chapman	a05563b788	Update comment	2022-09-30 09:35:01 -07:00
DanStough	7f8971d77f	chore: fix flakey scada provider test	2022-09-30 11:56:40 -04:00
Chris Chapman	81e267171b	Bind a dns mux handler to gRPC proxy	2022-09-29 21:44:45 -07:00
Chris Chapman	7bc9cad180	Adding grpc handler for dns proxy	2022-09-29 21:19:51 -07:00
Eric Haberkorn	80e51ff907	Add exported services event to cluster peering replication. (#14797 )	2022-09-29 15:37:19 -04:00
Ashwin Venkatesh	4ba260958c	bug: watch local mesh gateways in non-default partitions with agentless (#14799 )	2022-09-29 13:19:04 -04:00
cskh	69f40df548	feat(ingress gateway: support configuring limits in ingress-gateway c… (#14749 ) * feat(ingress gateway: support configuring limits in ingress-gateway config entry - a new Defaults field with max_connections, max_pending_connections, max_requests is added to ingress gateway config entry - new field max_connections, max_pending_connections, max_requests in individual services to overwrite the value in Default - added unit test and integration test - updated doc Co-authored-by: Chris S. Kim <ckim@hashicorp.com> Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Co-authored-by: Dan Stough <dan.stough@hashicorp.com>	2022-09-28 14:56:46 -04:00
malizz	84b0f408fa	Support Stale Queries for Trust Bundle Lookups (#14724 ) * initial commit * add tags, add conversations * add test for query options utility functions * update previous tests * fix test * don't error out on empty context * add changelog * update decode config	2022-09-28 09:56:59 -07:00
Eric Haberkorn	6570d5f004	Enable outbound peered requests to go through local mesh gateway (#14763 )	2022-09-27 09:49:28 -04:00
Nick Ethier	1c1b0994b8	add HCP integration component (#14723 ) * add HCP integration * lint: use non-deprecated logging interface	2022-09-26 14:58:15 -04:00
Derek Menteer	aa4709ab74	Add envoy connection balancing. (#14616 ) Add envoy connection balancing config.	2022-09-26 11:29:06 -05:00
Chris S. Kim	2203cdc4db	Add new internal endpoint to list exported services to a peer	2022-09-23 09:43:56 -04:00
freddygv	d818d7b096	Manage local server watches depending on mesh cfg Routing peering control plane traffic through mesh gateways can be enabled or disabled at runtime with the mesh config entry. This commit updates proxycfg to add or cancel watches for local servers depending on this central config. Note that WAN federation over mesh gateways is determined by a service metadata flag, and any updates to the gateway service registration will force the creation of a new snapshot. If enabled, WAN-fed over mesh gateways will trigger a local server watch on initialize(). Because of this we will only add/remove server watches if WAN federation over mesh gateways is disabled.	2022-09-22 19:32:10 -06:00
Alessandro De Blasis	461b42ed48	fix(check): added missing OSService props	2022-09-21 13:10:21 +01:00
Alessandro De Blasis	5719fd6560	fix(checks): os_service OK message in output	2022-09-21 09:27:33 +01:00
Alessandro De Blasis	f440966a38	fix(checks): os_service lifecycle bugfix	2022-09-21 09:26:47 +01:00
Alessandro De Blasis	fc0dd92dcf	fix(agent): uninitialized map panic error	2022-09-21 09:25:54 +01:00
malizz	1a0aa38a82	increase the size of txn to support vault (#14599 ) * increase the size of txn to support vault * add test, revert change to acl endpoint * add changelog * update test, add passing test case * Update .changelog/14599.txt Co-authored-by: Freddy <freddygv@users.noreply.github.com> Co-authored-by: Freddy <freddygv@users.noreply.github.com>	2022-09-19 09:07:19 -07:00
freddygv	5fbb26525b	Add awareness of server mode to TLS configurator Preivously the TLS configurator would default to presenting auto TLS certificates as client certificates. Server agents should not have this behavior and should instead present the manually configured certs. The autoTLS certs for servers are exclusively used for peering and should not be used as the default for outbound communication.	2022-09-16 17:57:10 -06:00
freddygv	f30bc96239	Test fixes - Pulls in CLI test fix from main - Updates psutils to fix TestAgent_Host on M1 Mac	2022-09-16 17:57:10 -06:00
freddygv	02d3ce1039	Add server certificate manager This certificate manager will request a leaf certificate for server agents and then keep them up to date.	2022-09-16 17:57:10 -06:00
freddygv	0e5131bd33	Generate ACL token for server management This commit introduces a new ACL token used for internal server management purposes. It has a few key properties: - It has unlimited permissions. - It is persisted through Raft as System Metadata rather than in the ACL tokens table. This is to avoid users seeing or modifying it. - It is re-generated on leadership establishment.	2022-09-16 17:54:34 -06:00
freddygv	0ea3353537	Add handling in agent cache for server leaf certs	2022-09-16 17:54:34 -06:00
Kyle Havlovitz	0d9ae52643	Merge pull request #14598 from hashicorp/root-removal-fix connect/ca: Don't discard old roots on primaryInitialize	2022-09-15 14:36:01 -07:00
Kyle Havlovitz	6105a7fd9f	connect/ca: don't discard old roots on primaryInitialize	2022-09-15 12:59:09 -07:00
Gabriel Santos	e53af28bd7	Middleware: `RequestRecorder` reports calls below 1ms as decimal value (#12905 ) * Typos * Test failing * Convert values <1ms to decimal * Fix test * Update docs and test error msg * Applied suggested changes to test case * Changelog file and suggested changes * Update .changelog/12905.txt Co-authored-by: Chris S. Kim <kisunji92@gmail.com> * suggested change - start duration with microseconds instead of nanoseconds * fix error * suggested change - floats Co-authored-by: alex <8968914+acpana@users.noreply.github.com> Co-authored-by: Chris S. Kim <kisunji92@gmail.com>	2022-09-15 13:04:37 -04:00
Daniel Graña	8c98172f53	[BUGFIX] Do not use interval as timeout (#14619 ) Do not use interval as timeout	2022-09-15 12:39:48 -04:00
Evan Culver	d0416f593c	connect: Bump latest Envoy to 1.23.1 in test matrix (#14573 )	2022-09-14 13:20:16 -07:00
DanStough	485e1b5d4e	fix(peering): generate token metrics only for leader	2022-09-14 11:37:30 -04:00
DanStough	2a2debee64	feat(peering): validate server name conflicts on establish	2022-09-14 11:37:30 -04:00
Kyle Havlovitz	60cee76746	Merge pull request #14516 from hashicorp/ca-ttl-fixes Fix inconsistent TTL behavior in CA providers	2022-09-13 16:07:36 -07:00
Kyle Havlovitz	d67bccd210	Update intermediate pki mount/role when reconfiguring Vault provider	2022-09-13 15:42:26 -07:00
Kyle Havlovitz	f46955101a	connect/ca: Clarify behavior around IntermediateCertTTL in CA config	2022-09-13 15:42:26 -07:00
DanStough	0150e88200	feat: add PeerThroughMeshGateways to mesh config	2022-09-13 17:19:54 -04:00
Derek Menteer	0aa13733a0	Add CSR check for number of URIs. (#14579 ) Add CSR check for number of URIs.	2022-09-13 14:21:47 -05:00
Derek Menteer	db83ff4fa6	Add input validation for auto-config JWT authorization checks.	2022-09-13 11:16:36 -05:00
cskh	f22685b969	Config-entry: Support proxy config in service-defaults (#14395 ) * Config-entry: Support proxy config in service-defaults * Update website/content/docs/connect/config-entries/service-defaults.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>	2022-09-12 10:41:58 -04:00
Eric Haberkorn	aa8268e50c	Implement Cluster Peering Redirects (#14445 ) implement cluster peering redirects	2022-09-09 13:58:28 -04:00
skpratt	b761589340	add non-double-prefixed metrics (#14193 )	2022-09-09 12:13:43 -05:00
skpratt	19f79aa9a6	PR #14057 follow up fix: service id parsing from sidecar id (#14541 ) * fix service id parsing from sidecar id * simplify suffix trimming	2022-09-09 09:47:10 -05:00
Dan Upton	1c2c975b0b	xDS Load Balancing (#14397 ) Prior to #13244, connect proxies and gateways could only be configured by an xDS session served by the local client agent. In an upcoming release, it will be possible to deploy a Consul service mesh without client agents. In this model, xDS sessions will be handled by the servers themselves, which necessitates load-balancing to prevent a single server from receiving a disproportionate amount of load and becoming overwhelmed. This introduces a simple form of load-balancing where Consul will attempt to achieve an even spread of load (xDS sessions) between all healthy servers. It does so by implementing a concurrent session limiter (limiter.SessionLimiter) and adjusting the limit according to autopilot state and proxy service registrations in the catalog. If a server is already over capacity (i.e. the session limit is lowered), Consul will begin draining sessions to rebalance the load. This will result in the client receiving a `RESOURCE_EXHAUSTED` status code. It is the client's responsibility to observe this response and reconnect to a different server. Users of the gRPC client connection brokered by the consul-server-connection-manager library will get this for free. The rate at which Consul will drain sessions to rebalance load is scaled dynamically based on the number of proxies in the catalog.	2022-09-09 15:02:01 +01:00
Derek Menteer	f7c884f0af	Merge branch 'main' of github.com:hashicorp/consul into derekm/split-grpc-ports	2022-09-08 14:53:08 -05:00
Derek Menteer	bfe7c5e8af	Remove rebuilding grpc server.	2022-09-08 13:45:44 -05:00
Derek Menteer	80d31458e5	Various cleanups.	2022-09-08 10:51:50 -05:00
Chris S. Kim	03df6c3ac6	Reuse http.DefaultTransport in UIMetricsProxy (#14521 ) http.Transport keeps a pool of connections and should be reused when possible. We instantiate a new http.DefaultTransport for every metrics request, making large numbers of concurrent requests inefficiently spin up new connections instead of reusing open ones.	2022-09-08 11:02:05 -04:00
Chris S. Kim	1c4a6eef4f	Merge pull request #14285 from hashicorp/NET-638-push-server-address-updates-to-the-peer peering: Subscribe to server address changes and push updates to peers	2022-09-07 09:30:45 -04:00
skpratt	3bf1edfb3f	move port and default check logic to locked step (#14057 )	2022-09-06 19:35:31 -05:00
Freddy	f4dfd42e0a	Add SpiffeID for Consul server agents (#14485 ) Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com> By adding a SpiffeID for server agents, servers can now request a leaf certificate from the Connect CA. This new Spiffe ID has a key property: servers are identified by their datacenter name and trust domain. All servers that share these attributes will share a ServerURI. The aim is to use these certificates to verify the server name of ANY server in a Consul datacenter.	2022-09-06 17:58:13 -06:00
Daniel Upton	8c46e48e0d	proxycfg-glue: server-local implementation of IntentionUpstreamsDestination This is the OSS portion of enterprise PR 2463. Generalises the serverIntentionUpstreams type to support matching on a service or destination.	2022-09-06 23:27:25 +01:00
Daniel Upton	f8dba7e9ac	proxycfg-glue: server-local implementation of InternalServiceDump This is the OSS portion of enterprise PR 2489. This PR introduces a server-local implementation of the proxycfg.InternalServiceDump interface that sources data from a blocking query against the server's state store. For simplicity, it only implements the subset of the Internal.ServiceDump RPC handler actually used by proxycfg - as such the result type has been changed to IndexedCheckServiceNodes to avoid confusion.	2022-09-06 23:27:25 +01:00
Daniel Upton	a31738f76f	proxycfg-glue: server-local implementation of ResolvedServiceConfig This is the OSS portion of enterprise PR 2460. Introduces a server-local implementation of the proxycfg.ResolvedServiceConfig interface that sources data from a blocking query against the server's state store. It moves the service config resolution logic into the agent/configentry package so that it can be used in both the RPC handler and data source. I've also done a little re-arranging and adding comments to call out data sources for which there is to be no server-local equivalent.	2022-09-06 23:27:25 +01:00
Derek Menteer	bf769daae4	Merge branch 'main' of github.com:hashicorp/consul into derekm/split-grpc-ports	2022-09-06 10:51:04 -05:00
Derek Menteer	02ae66bda8	Add kv txn get-not-exists operation.	2022-09-06 10:28:59 -05:00
Chris S. Kim	953808e899	PR feedback on terminated state checking	2022-09-06 10:28:20 -04:00
Chris S. Kim	ddb9375cb6	Add testcase for parsing grpc_port	2022-09-06 10:17:44 -04:00
Kyle Havlovitz	d97ccccdd5	Merge pull request #14429 from hashicorp/ca-prune-intermediates Prune old expired intermediate certs when appending a new one	2022-09-02 15:34:33 -07:00
cskh	0f7d4efac3	fix(txn api): missing proxy config in registering proxy service (#14471 ) * fix(txn api): missing proxy config in registering proxy service	2022-09-02 14:28:05 -04:00
Chris S. Kim	ec36755cc0	Properly assert for ServerAddresses replication request	2022-09-02 11:44:54 -04:00
Chris S. Kim	d1d9dbff8e	Fix terminate not returning early	2022-09-02 11:44:38 -04:00
Derek Menteer	f64771c707	Address PR comments.	2022-09-01 16:54:24 -05:00
Kyle Havlovitz	0c2fb7252d	Prune intermediates before appending new one	2022-09-01 14:24:30 -07:00
Luke Kysow	81d7cc41dc	Use proxy address for default check (#14433 ) When a sidecar proxy is registered, a check is automatically added. Previously, the address this check used was the underlying service's address instead of the proxy's address, even though the check is testing if the proxy is up. This worked in most cases because the proxy ran on the same IP as the underlying service but it's not guaranteed and so the proper default address should be the proxy's address.	2022-09-01 14:03:35 -07:00
malizz	f1054dada9	fix TestProxyConfigEntry (#14435 )	2022-09-01 11:37:47 -07:00
malizz	b3ac8f48ca	Add additional parameters to envoy passive health check config (#14238 ) * draft commit * add changelog, update test * remove extra param * fix test * update type to account for nil value * add test for custom passive health check * update comments and tests * update description in docs * fix missing commas	2022-09-01 09:59:11 -07:00
Chris S. Kim	f2b147e575	Add Internal.ServiceDump support for querying by PeerName	2022-09-01 10:32:59 -04:00
Chris S. Kim	e62f830fa8	Merge pull request #13998 from jorgemarey/f-new-tracing-envoy Add new envoy tracing configuration	2022-09-01 08:57:23 -04:00
Derek Menteer	cf7f24a6ec	Change serf-tag references to field references.	2022-08-31 16:38:42 -05:00
malizz	a80e0bcd00	validate args before deleting proxy defaults (#14290 ) * validate args before deleting proxy defaults * add changelog * validate name when normalizing proxy defaults * add test for proxyConfigEntry * add comments	2022-08-31 13:03:38 -07:00
Kyle Havlovitz	113454645d	Prune old expired intermediate certs when appending a new one	2022-08-31 11:41:58 -07:00
Alessandro De Blasis	60c7c831c6	Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service	2022-08-30 18:49:20 +01:00
Eric Haberkorn	3726a0ab7a	Finish up cluster peering failover (#14396 )	2022-08-30 11:46:34 -04:00
Chris S. Kim	560d410c6d	Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer # Conflicts: # agent/grpc-external/services/peerstream/stream_test.go	2022-08-30 11:09:25 -04:00
Jorge Marey	3f3bb8831e	Fix typos. Add test. Add documentation	2022-08-30 16:59:02 +02:00
Jorge Marey	ed7b34128f	Add new tracing configuration	2022-08-30 16:59:02 +02:00
Freddy	97d1db759f	Merge pull request #13496 from maxb/fix-kv_entries-metric	2022-08-29 15:35:11 -06:00
Freddy	829a2a8722	Merge pull request #14364 from hashicorp/peering/term-delete	2022-08-29 15:33:18 -06:00
Max Bowsher	decc9231ee	Merge branch 'main' into fix-kv_entries-metric	2022-08-29 22:22:10 +01:00
Chris S. Kim	5010fa5c03	Merge pull request #14371 from hashicorp/kisunji/peering-metrics-update Adjust metrics reporting for peering tracker	2022-08-29 17:16:19 -04:00
Chris S. Kim	74ddf040dd	Add heartbeat timeout grace period when accounting for peering health	2022-08-29 16:32:26 -04:00
Derek Menteer	0ceec9017b	Expose `grpc_tls` via serf for cluster peering.	2022-08-29 13:43:49 -05:00
Derek Menteer	1255a8a20d	Add separate grpc_tls port. To ease the transition for users, the original gRPC port can still operate in a deprecated mode as either plain-text or TLS mode. This behavior should be removed in a future release whenever we no longer support this. The resulting behavior from this commit is: `ports.grpc > 0 && ports.grpc_tls > 0` spawns both plain-text and tls ports. `ports.grpc > 0 && grpc.tls == undefined` spawns a single plain-text port. `ports.grpc > 0 && grpc.tls != undefined` spawns a single tls port (backwards compat mode).	2022-08-29 13:43:43 -05:00
freddygv	310608fb19	Add validation to prevent switching dialing mode This prevents unexpected changes to the output of ShouldDial, which should never change unless a peering is deleted and recreated.	2022-08-29 12:31:13 -06:00
Eric Haberkorn	72f90754ae	Update max_ejection_percent on outlier detection for peered clusters to 100% (#14373 ) We can't trust health checks on peered services when service resolvers, splitters and routers are used.	2022-08-29 13:46:41 -04:00
Alessandro De Blasis	26cc56bc68	fix(agent): removed redundant code in docker check as well	2022-08-29 18:15:59 +01:00
Alessandro De Blasis	c0d647d11e	fix(agent): removed redundant check on prev. running check	2022-08-29 17:53:39 +01:00
Chris S. Kim	def529edd3	Rename test	2022-08-29 10:34:50 -04:00
Chris S. Kim	93271f649c	Fix test	2022-08-29 10:20:30 -04:00
Eric Haberkorn	1099665473	Update the structs and discovery chain for service resolver redirects to cluster peers. (#14366 )	2022-08-29 09:51:32 -04:00
Alessandro De Blasis	f3437eaf05	Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service Signed-off-by: Alessandro De Blasis <alex@deblasis.net>	2022-08-28 18:09:31 +01:00
Alessandro De Blasis	f634e36811	fix(OSServiceCheck): fixes following code-review	2022-08-28 17:56:30 +01:00
Chris S. Kim	4d97e2f936	Adjust metrics reporting for peering tracker	2022-08-26 17:34:17 -04:00
freddygv	650e48624d	Allow terminated peerings to be deleted Peerings are terminated when a peer decides to delete the peering from their end. Deleting a peering sends a termination message to the peer and triggers them to mark the peering as terminated but does NOT delete the peering itself. This is to prevent peerings from disappearing from both sides just because one side deleted them. Previously the Delete endpoint was skipping the deletion if the peering was not marked as active. However, terminated peerings are also inactive. This PR makes some updates so that peerings marked as terminated can be deleted by users.	2022-08-26 10:52:47 -06:00
Chris S. Kim	937a8ec742	Fix casing	2022-08-26 11:56:26 -04:00
Chris S. Kim	87962b9713	Merge branch 'main' into catalog-service-list-filter	2022-08-26 11:16:06 -04:00
Chris S. Kim	e2fe8b8d65	Fix tests for enterprise	2022-08-26 11:14:02 -04:00
Chris S. Kim	1c43a1a7b4	Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer # Conflicts: # agent/grpc-external/services/peerstream/stream_test.go	2022-08-26 10:43:56 -04:00
Chris S. Kim	6ddcc04613	Replace ring buffer with async version (#14314 ) We need to watch for changes to peerings and update the server addresses which get served by the ring buffer. Also, if there is an active connection for a peer, we are getting up-to-date server addresses from the replication stream and can safely ignore the token's addresses which may be stale.	2022-08-26 10:27:13 -04:00
alex	30ff2e9a35	peering: add peer health metric (#14004 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-08-25 16:32:59 -07:00
Chris S. Kim	181063cd23	Exit loop when context is cancelled	2022-08-25 11:48:25 -04:00
cskh	41aea65214	Fix: the inboundconnection limit filter should be placed in front of http co… (#14325 ) * fix: the inboundconnection limit should be placed in front of http connection manager Co-authored-by: Freddy <freddygv@users.noreply.github.com>	2022-08-24 14:13:10 -04:00
Chris S. Kim	8c94d1a80c	Update test comment	2022-08-24 13:50:24 -04:00
Chris S. Kim	5f2959329f	Add check for zero-length server addresses	2022-08-24 13:30:52 -04:00
skpratt	919da33331	no-op: refactor usagemetrics tests for clarity and DRY cases (#14313 )	2022-08-24 12:00:09 -05:00
Pablo Ruiz García	1f293e5244	Added new auto_encrypt.grpc_server_tls config option to control AutoTLS enabling of GRPC Server's TLS usage Fix for #14253 Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>	2022-08-24 12:31:38 -04:00
Dan Upton	3b993f2da7	dataplane: update envoy bootstrap params for consul-dataplane (#14017 ) Contains 2 changes to the GetEnvoyBootstrapParams response to support consul-dataplane. Exposing node_name and node_id: consul-dataplane will support providing either the node_id or node_name in its configuration. Unfortunately, supporting both in the xDS meta adds a fair amount of complexity (partly because most tables are currently indexed on node_name) so for now we're going to return them both from the bootstrap params endpoint, allowing consul-dataplane to exchange a node_id for a node_name (which it will supply in the xDS meta). Properly setting service for gateways: To avoid the need to special case gateways in consul-dataplane, service will now either be the destination service name for connect proxies, or the gateway service name. This means it can be used as-is in Envoy configuration (i.e. as a cluster name or in metric tags).	2022-08-24 12:03:15 +01:00
Daniel Upton	13c04a13af	proxycfg: terminate stream on irrecoverable errors This is the OSS portion of enterprise PR 2339. It improves our handling of "irrecoverable" errors in proxycfg data sources. The canonical example of this is what happens when the ACL token presented by Envoy is deleted/revoked. Previously, the stream would get "stuck" until the xDS server re-checked the token (after 5 minutes) and terminated the stream. Materializers would also sit burning resources retrying something that could never succeed. Now, it is possible for data sources to mark errors as "terminal" which causes the xDS stream to be closed immediately. Similarly, the submatview.Store will evict materializers when it observes they have encountered such an error.	2022-08-23 20:17:49 +01:00
Chris S. Kim	81e965479b	PR feedback to specify Node name in test mock	2022-08-23 11:51:04 -04:00
Eric Haberkorn	58901ad7df	Cluster peering failover disco chain changes (#14296 )	2022-08-23 09:13:43 -04:00
Chris S. Kim	cdc8b0634d	Fix flakes	2022-08-22 14:45:31 -04:00
Chris S. Kim	03e92826aa	Increase heartbeat rate to reduce test flakes	2022-08-22 14:24:05 -04:00
Chris S. Kim	06ba9775ee	Remove check for ResponseNonce	2022-08-22 13:55:01 -04:00
Chris S. Kim	547fb9570e	Add missing mock assertions	2022-08-22 13:55:01 -04:00
Chris S. Kim	adff2eef16	Fix data race newMockSnapshotHandler has an assertion on t.Cleanup which gets called before the event publisher is cancelled. This commit reorders the context.WithCancel so it properly gets cancelled before the assertion is made.	2022-08-22 13:55:01 -04:00
cskh	060531a29a	Fix: add missing ent meta for test (#14289 )	2022-08-22 13:51:04 -04:00
Chris S. Kim	4e40e1d222	Handle server addresses update as client	2022-08-22 13:42:12 -04:00
Chris S. Kim	584d3409c4	Send server addresses on update from server	2022-08-22 13:41:44 -04:00
Chris S. Kim	c9d8ad3939	Add new subscription for server addresses	2022-08-22 13:40:25 -04:00
Chris S. Kim	028b87d51f	Cleanup unused logger	2022-08-22 13:40:23 -04:00
Chris S. Kim	df951bd601	Expose external gRPC port in autopilot The grpc_port was added to a NodeService's meta in `ea58f235f5`	2022-08-22 10:07:00 -04:00
cskh	527ebd068a	fix: missing MaxInboundConnections field in service-defaults config entry (#14072 ) * fix: missing max_inbound_connections field in merge config	2022-08-19 14:11:21 -04:00
cskh	e84e4b8868	Fix: upgrade pkg imdario/merg to prevent merge config panic (#14237 ) * upgrade imdario/merg to prevent merge config panic * test: service definition takes precedence over service-defaults in merged results	2022-08-17 21:14:04 -04:00
James Hartig	f92883bbce	Use the maximum jitter when calculating the timeout The timeout should include the maximum possible jitter since the server will randomly add to it's timeout a jitter. If the server's timeout is less than the client's timeout then the client will return an i/o deadline reached error. Before: ``` time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644' rpc error making call: i/o deadline reached real 10m11.469s user 0m0.018s sys 0m0.023s ``` After: ``` time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644' [...] real 10m35.835s user 0m0.021s sys 0m0.021s ```	2022-08-17 10:24:09 -04:00
Eric Haberkorn	1a73b0ca20	Add `Targets` field to service resolver failovers. (#14162 ) This field will be used for cluster peering failover.	2022-08-15 09:20:25 -04:00
Alessandro De Blasis	5dee555888	Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service Signed-off-by: Alessandro De Blasis <alex@deblasis.net>	2022-08-15 08:26:55 +01:00
Alessandro De Blasis	ab611eabc3	Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service Signed-off-by: Alessandro De Blasis <alex@deblasis.net>	2022-08-15 08:09:56 +01:00
cskh	d46b515b64	fix: missing segment and partition (#14194 )	2022-08-12 15:21:39 -04:00
Eric Haberkorn	ebd5513d4b	Refactor failover code to use Envoy's aggregate clusters (#14178 )	2022-08-12 14:30:46 -04:00
cskh	81931e52c3	feat(telemetry): add labels to serf and memberlist metrics (#14161 ) * feat(telemetry): add labels to serf and memberlist metrics * changelog * doc update Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>	2022-08-11 22:09:56 -04:00
Chris S. Kim	4c928cb2f7	Handle breaking change for ServiceVirtualIP restore (#14149 ) Consul 1.13.0 changed ServiceVirtualIP to use PeeredServiceName instead of ServiceName which was a breaking change for those using service mesh and wanted to restore their snapshot after upgrading to 1.13.0. This commit handles existing data with older ServiceName and converts it during restore so that there are no issues when restoring from older snapshots.	2022-08-11 14:47:10 -04:00
Chris S. Kim	3926009405	Add test to verify forwarding	2022-08-11 11:16:02 -04:00
Chris S. Kim	1ef22360c3	Register peerStreamServer internally to enable RPC forwarding	2022-08-11 11:16:02 -04:00
Chris S. Kim	de73171202	Handle wrapped errors in isFailedPreconditionErr	2022-08-11 11:16:02 -04:00
Daniel Kimsey	3c4fa9b468	Add support for filtering the 'List Services' API 1. Create a bexpr filter for performing the filtering 2. Change the state store functions to return the raw (not aggregated) list of ServiceNodes. 3. Move the aggregate service tags by name logic out of the state store functions into a new function called from the RPC endpoint 4. Perform the filtering in the endpoint before aggregation.	2022-08-10 16:52:32 -05:00
cskh	11e7a0d547	fix: shadowed err in retryJoin() (#14112 ) - err value will be used later to surface the error message if r.join() returns any err.	2022-08-10 10:53:57 -04:00
skpratt	79c23a7cd2	Merge pull request #14056 from hashicorp/proxy-register-port-race Refactor sidecar_service method to separate port assignment	2022-08-10 09:46:29 -05:00
skpratt	aa77559819	Merge branch 'main' into proxy-register-port-race	2022-08-10 08:40:45 -05:00
Chris S. Kim	e3046120b3	Close active listeners on error If startListeners successfully created listeners for some of its input addresses but eventually failed, the function would return an error and existing listeners would not be cleaned up.	2022-08-09 12:22:39 -04:00
Chris S. Kim	6311c651de	Add retry in TestAgentConnectCALeafCert_good	2022-08-09 11:20:37 -04:00
Kyle Havlovitz	6938b8c755	Merge pull request #13958 from hashicorp/gateway-wildcard-fix Fix wildcard picking up services it shouldn't for ingress/terminating gateways	2022-08-08 12:54:40 -07:00
Kyle Havlovitz	fe1fcea34f	Add some extra handling for destination deletes	2022-08-08 11:38:13 -07:00
freddygv	d421e18172	Update snapshot test	2022-08-08 09:17:15 -06:00
freddygv	1031ffc3c7	Re-validate existing secrets at state store Previously establishment and pending secrets were only checked at the RPC layer. However, given that these are Check-and-Set transactions we should ensure that the given secrets are still valid when persisting a secret exchange or promotion. Otherwise it would be possible for concurrent requests to overwrite each other.	2022-08-08 09:06:07 -06:00
freddygv	0ea4bfae94	Test fixes	2022-08-08 08:31:47 -06:00
freddygv	c04515a844	Use proto message for each secrets write op Previously there was a field indicating the operation that triggered a secrets write. Now there is a message for each operation and it contains the secret ID being persisted.	2022-08-08 01:41:00 -06:00
Kyle Havlovitz	6580566c3b	Update ingress/terminating wildcard logic and handle destinations	2022-08-05 07:56:10 -07:00
freddygv	8067890787	Inherit active secret when exchanging	2022-08-03 17:32:53 -05:00
freddygv	60d6e28c97	Pass explicit signal with op for secrets write Previously the updates to the peering secrets UUID table relied on inferring what action triggered the update based on a reconciliation against the existing secrets. Instead we now explicitly require the operation to be given so that the inference isn't necessary. This makes the UUID table logic easier to reason about and fixes some related bugs. There is also an update so that the peering secrets get handled on snapshots/restores.	2022-08-03 17:25:12 -05:00
freddygv	9ca687bc7c	Avoid deleting peering secret UUIDs at dialers Dialers do not keep track of peering secret UUIDs, so they should not attempt to clean up data from that table when their peering is deleted. We also now keep peer server addresses when marking peerings for deletion. Peer server addresses are used by the ShouldDial() helper when determining whether the peering is for a dialer or an acceptor. We need to keep this data so that peering secrets can be cleaned up accordingly.	2022-08-03 16:34:57 -05:00
skpratt	58eed6b049	Merge pull request #13906 from skpratt/validate-port-agent-split Separate port and socket path validation for local agent	2022-08-02 16:58:41 -05:00
Dhia Ayachi	7154367892	add token to the request when creating a cacheIntentions query (#14005 )	2022-08-02 14:27:34 -04:00
Kyle Havlovitz	499211f907	Fix wildcard picking up services it shouldn't for ingress/terminating gateways	2022-08-02 09:41:31 -07:00
Daniel Upton	6452118c15	proxycfg-sources: fix hot loop when service not found in catalog Fixes a bug where a service getting deleted from the catalog would cause the ConfigSource to spin in a hot loop attempting to look up the service. This is because we were returning a nil WatchSet which would always unblock the select. Kudos to @freddygv for discovering this!	2022-08-02 15:42:29 +01:00
Freddy	42996411cc	Various peering fixes (#13979 ) * Avoid logging StreamSecretID * Wrap additional errors in stream handler * Fix flakiness in leader test and rename servers for clarity. There was a race condition where the peering was being deleted in the test before the stream was active. Now the test waits for the stream to be connected on both sides before deleting the associated peering. * Run flaky test serially	2022-08-01 15:06:18 -06:00
DanStough	169ff71132	fix: ipv4 destination dns resolution	2022-08-01 16:45:57 -04:00
Luke Kysow	988e1fd35d	peering: default to false (#13963 ) * defaulting to false because peering will be released as beta * Ignore peering disabled error in bundles cachetype Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Co-authored-by: freddygv <freddy@hashicorp.com> Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>	2022-08-01 15:22:36 -04:00
Freddy	dacf703d20	Merge branch 'main' into fix-kv_entries-metric	2022-08-01 13:19:27 -06:00
Freddy	72b6d69652	Merge pull request #13499 from maxb/delete-unused-metric Delete definition of metric `consul.acl.blocked.node.deregistration`	2022-08-01 12:31:05 -06:00
Dhia Ayachi	6fd65a4a45	Tgtwy egress HTTP support (#13953 ) * add golden files * add support to http in tgateway egress destination * fix slice sorting to include both address and port when using server_names * fix listener loop for http destination * fix routes to generate a route per port and a virtualhost per port-address combination * sort virtual hosts list to have a stable order * extract redundant serviceNode	2022-08-01 14:12:43 -04:00
Matt Keeler	f74d0cef7a	Implement/Utilize secrets for Peering Replication Stream (#13977 )	2022-08-01 10:33:18 -04:00
alex	a45bb1f06b	block PeerName register requests (#13887 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-07-29 14:36:22 -07:00
Luke Kysow	95096e2c03	peering: retry establishing connection more quickly on certain errors (#13938 ) When we receive a FailedPrecondition error, retry that more quickly because we expect it will resolve shortly. This is particularly important in the context of Consul servers behind a load balancer because when establishing a connection we have to retry until we randomly land on a leader node. The default retry backoff goes from 2s, 4s, 8s, etc. which can result in very long delays quite quickly. Instead, this backoff retries in 8ms five times, then goes exponentially from there: 16ms, 32ms, ... up to a max of 8152ms.	2022-07-29 13:04:32 -07:00
Sarah Pratt	10a4999a87	Separate port and socket path requirement in case of local agent assignment	2022-07-29 13:28:21 -05:00
alex	92c615c35f	Merge pull request #13952 from hashicorp/sync-more-acl sync more acl enforcement	2022-07-28 12:31:02 -07:00
Dhia Ayachi	256694b603	inject gateway addons to destination clusters (#13951 )	2022-07-28 15:17:35 -04:00
acpana	eae4e71492	sync more acl enforcement sync w ent at 32756f7 Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-07-28 12:01:52 -07:00
alex	41f3343eac	Merge pull request #13929 from hashicorp/fix-validation [sync] fix empty partitions matching	2022-07-28 10:14:49 -07:00
Sarah Pratt	a3ef6f016e	refactor sidecare_service method into parts	2022-07-28 09:07:13 -05:00
Ashwin Venkatesh	eef9edaed9	Add peer counts to emitted metrics. (#13930 )	2022-07-27 18:34:04 -04:00
Luke Kysow	465a9801e1	Merge pull request #13924 from hashicorp/lkysow/util-metric-peering peering: don't track imported services/nodes in usage	2022-07-27 14:49:55 -07:00
acpana	6033584349	use EqualPartitions Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-07-27 14:48:30 -07:00
acpana	0351ca5136	better fix Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-07-27 14:28:08 -07:00
acpana	8b2ef80336	sync w ent Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-07-27 11:41:39 -07:00
Chris S. Kim	0999e05a7d	Reduce arm64 flakes for TestConnectCA_ConfigurationSet_ChangeKeyConfig_Primary There were 16 combinations of tests but 4 of them were duplicates since the default key type and bits were "ec" and 256. That entry was commented out to reduce the subtest count to 12. testrpc.WaitForLeader was failing on arm64 environments; the cause is unknown but it might be due to the environment being flooded with parallel tests making RPC calls. The RPC polling+retry was replaced with a simpler check for leadership based on raft.	2022-07-27 13:54:34 -04:00
Chris S. Kim	8ead1caf53	Retry checks for virtual IP metadata	2022-07-27 13:54:34 -04:00
Chris S. Kim	62ed0250c3	Sort slice of ServiceNames deterministically	2022-07-27 13:54:34 -04:00
Sarah Pratt	f520f6dd0f	Separate port and socket path requirement in case of local agent assignment	2022-07-27 12:30:52 -05:00
Luke Kysow	740d54e730	peering: don't track imported services/nodes in usage Services/nodes that are imported from other peers are stored in state. We don't want to count those as part of our own cluster's usage.	2022-07-27 09:08:51 -07:00
cskh	4e292b7b72	chore: clarify the error message: service.service must not be empty (#13907 ) - when register service using catalog endpoint, the key of service name actually should be "service". Add this information to the error message will help user to quickly fix in the request.	2022-07-27 10:16:46 -04:00
cskh	59e81a728e	chore: removed unused method AddService (#13905 ) - This AddService is not used anywhere. AddServiceWithChecks is place of AddService - Test code is updated	2022-07-26 16:54:53 -04:00
Luke Kysow	021b00e321	Remove duplicate comment	2022-07-26 10:19:49 -07:00
alex	437a28d18a	peering: prevent peering in same partition (#13851 ) Co-authored-by: Chris S. Kim <ckim@hashicorp.com>	2022-07-25 18:00:48 -07:00
Nitya Dhanushkodi	27bd895ac8	peering: remove validation that forces peering token server addresses to be an IP, allow hostname based addresses (#13874 )	2022-07-25 16:33:47 -07:00
Luke Kysow	8c5b70d227	Rename receive to recv in tracker (#13896 ) Because it's shorter	2022-07-25 16:08:03 -07:00
Luke Kysow	3530d3782d	peering: read endpoints can now return failing status (#13849 ) Track streams that have been disconnected due to an error and set their statuses to failing.	2022-07-25 14:27:53 -07:00
Kyle Havlovitz	93de25f87c	Merge pull request #13872 from hashicorp/remove-upstream-log Remove extra logging from ingress upstream watch shutdown	2022-07-25 12:55:30 -07:00
Chris S. Kim	73a84f256f	Preserve PeeringState on upsert (#13666 ) Fixes a bug where if the generate token is called twice, the second call upserts the zero-value (undefined) of PeeringState.	2022-07-25 14:37:56 -04:00
Chris S. Kim	8ed49ea4d0	Update envoy metrics label extraction for peered clusters and listeners (#13818 ) Now that peered upstreams can generate envoy resources (#13758), we need a way to disambiguate local from peered resources in our metrics. The key difference is that datacenter and partition will be replaced with peer, since in the context of peered resources partition is ambiguous (could refer to the partition in a remote cluster or one that exists locally). The partition and datacenter of the proxy will always be that of the source service. Regexes were updated to make emitting datacenter and partition labels mutually exclusive with peer labels. Listener filter names were updated to better match the existing regex. Cluster names assigned to peered upstreams were updated to be synthesized from local peer name (it previously used the externally provided primary SNI, which contained the peer name from the other side of the peering). Integration tests were updated to assert for the new peer labels.	2022-07-25 13:49:00 -04:00
DanStough	2da8949d78	feat: convert destination address to slice	2022-07-25 12:31:58 -04:00
Freddy	f03cca7576	[OSS] Add ACL enforcement to peering endpoints (#13878 )	2022-07-25 10:04:10 -06:00
Matt Keeler	58e4d8235b	Enable/Disable Peering Support in the UI (#13816 ) We enabled/disable based on the config flag.	2022-07-25 11:50:11 -04:00
freddygv	b544ce6485	Add ACL enforcement to peering endpoints	2022-07-25 09:34:29 -06:00
Kyle Havlovitz	016f963e7e	Remove excess debug log from ingress upstream shutdown	2022-07-22 17:29:38 -07:00
alex	279d458e6e	peering: use ShouldDial to validate peer role (#13823 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-07-22 15:56:25 -07:00
Luke Kysow	a1e6d69454	peering: add config to enable/disable peering (#13867 ) * peering: add config to enable/disable peering Add config: ``` peering { enabled = true } ``` Defaults to true. When disabled: 1. All peering RPC endpoints will return an error 2. Leader won't start its peering establishment goroutines 3. Leader won't start its peering deletion goroutines	2022-07-22 15:20:21 -07:00
Kyle Havlovitz	0786517b56	Merge pull request #13847 from hashicorp/gateway-goroutine-leak Fix goroutine leaks in proxycfg when using ingress gateway	2022-07-22 14:43:22 -07:00
Freddy	f99df57840	[OSS] Add new peering ACL rule (#13848 ) This commit adds a new ACL rule named "peering" to authorize actions taken against peering-related endpoints. The "peering" rule has several key properties: - It is scoped to a partition, and MUST be defined in the default namespace. - Its access level must be "read', "write", or "deny". - Granting an access level will apply to all peerings. This ACL rule cannot be used to selective grant access to some peerings but not others. - If the peering rule is not specified, we fall back to the "operator" rule and then the default ACL rule.	2022-07-22 14:42:23 -06:00
alex	927cee692b	peering: emit exported services count metric (#13811 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-07-22 12:05:08 -07:00
Daniel Upton	a8df87f574	proxycfg-glue: server-local implementation of `ExportedPeeredServices` This is the OSS portion of enterprise PR 2377. Adds a server-local implementation of the proxycfg.ExportedPeeredServices interface that sources data from a blocking query against the server's state store.	2022-07-22 15:23:23 +01:00
Eric Haberkorn	501089292e	Add Cluster Peering Failover Support to Prepared Queries (#13835 ) Add peering failover support to prepared queries	2022-07-22 09:14:43 -04:00
Nitya Dhanushkodi	f47319b7c6	update generate token endpoint to take external addresses (#13844 ) Update generate token endpoint (rpc, http, and api module) If ServerExternalAddresses are set, it will override any addresses gotten from the "consul" service, and be used in the token instead, and dialed by the dialer. This allows for setting up a load balancer for example, in front of the consul servers.	2022-07-21 14:56:11 -07:00
acpana	12b773ab02	Rename peering internal to ~ sync ENT to 5679392c81 Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-07-21 10:51:05 -07:00
Luke Kysow	0c87be0845	peering: Add heartbeating to peering streams (#13806 ) * Add heartbeating to peering streams	2022-07-21 10:03:27 -07:00
Daniel Upton	3655802fdc	proxycfg-glue: server-local implementation of `PeeredUpstreams` This is the OSS portion of enterprise PR 2352. It adds a server-local implementation of the proxycfg.PeeredUpstreams interface based on a blocking query against the server's state store. It also fixes an omission in the Virtual IP freeing logic where we were never updating the max index (and therefore blocking queries against VirtualIPsForAllImportedServices would not return on service deletion).	2022-07-21 13:51:59 +01:00
Luke Kysow	c411e6b326	Add send mutex to protect against concurrent sends (#13805 )	2022-07-20 15:48:18 -07:00
Kyle Havlovitz	0be7d923dc	Cancel upstream watches when the discovery chain has been removed	2022-07-20 14:26:52 -07:00
Kyle Havlovitz	31318d7049	Fix duplicate Notify calls for discovery chains in ingress gateways	2022-07-20 14:25:20 -07:00
Evan Culver	4116537b83	connect: Add support for Envoy 1.23, remove 1.19 (#13807 )	2022-07-19 14:51:04 -07:00
Paul Glass	77afe0e76e	Extract AWS auth implementation out of Consul (#13760 )	2022-07-19 16:26:44 -05:00
Chris S. Kim	495936300e	Make envoy resources for inferred peered upstreams (#13758 ) Peered upstreams has a separate loop in xds from discovery chain upstreams. This PR adds similar but slightly modified code to add filters for peered upstream listeners, clusters, and endpoints in the case of transparent proxy.	2022-07-19 14:56:28 -04:00
alex	de5a991d8c	peering: refactor reconcile, cleanup (#13795 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-07-19 11:43:29 -07:00
Luke Kysow	e8d965e56f	peerstream: set keepalive enforcement to 15s (#13796 ) The client is set to send keepalive pings every 30s. The server keepalive enforcement must be set to a number less than that, otherwise it will disconnect clients for sending pings too often. MinTime governs the minimum amount of time between pings.	2022-07-18 16:12:03 -07:00
alex	a9ae2ff4fa	peering: track exported services (#13784 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-07-18 10:20:04 -07:00
R.B. Boyer	cd513aeead	peerstream: require a resource subscription to receive updates of that type (#13767 ) This mimics xDS's discovery protocol where you must request a resource explicitly for the exporting side to send those events to you. As part of this I aligned the overall ResourceURL with the TypeURL that gets embedded into the encoded protobuf Any construct. The CheckServiceNodes is now wrapped in a better named "ExportedService" struct now.	2022-07-15 15:03:40 -05:00
R.B. Boyer	c737301093	peerstream: fix test assertions (#13780 )	2022-07-15 14:43:24 -05:00
Luke Kysow	46381b1a7f	Add docs for peerStreamServer vs peeringServer. (#13781 )	2022-07-15 12:23:05 -07:00
Luke Kysow	ca3d7c964c	peerstream: dialer should reconnect when stream closes (#13745 ) * peerstream: dialer should reconnect when stream closes If the stream is closed unexpectedly (i.e. when we haven't received a terminated message), the dialer should attempt to re-establish the stream. Previously, the `HandleStream` would return `nil` when the stream was closed. The caller then assumed the stream was terminated on purpose and so didn't reconnect when instead it was stopped unexpectedly and the dialer should have attempted to reconnect.	2022-07-15 11:58:33 -07:00
R.B. Boyer	bb4d4040fb	server: ensure peer replication can successfully use TLS over external gRPC (#13733 ) Ensure that the peer stream replication rpc can successfully be used with TLS activated. Also: - If key material is configured for the gRPC port but HTTPS is not enabled now TLS will still be activated for the gRPC port. - peerstream replication stream opened by the establishing-side will now ignore grpc.WithBlock so that TLS errors will bubble up instead of being awkwardly delayed or suppressed	2022-07-15 13:15:50 -05:00
alex	adb5ffa1a6	peering: track imported services (#13718 )	2022-07-15 10:20:43 -07:00
Matt Keeler	257f88d4df	Use Node Name for peering healthSnapshot instead of ID (#13773 ) A Node ID is not a required field with Consul’s data model. Therefore we cannot reliably expect all uses to have it. However the node name is required and must be unique so its equally as good of a key for the internal healthSnapshot node tracking.	2022-07-15 10:51:38 -04:00
Matt Keeler	05b5e7e2ca	Enable partition support for peering establishment (#13772 ) Prior to this the dialing side of the peering would only ever work within the default partition. This commit allows properly parsing the partition field out of the API struct request body, query param and header.	2022-07-15 10:07:07 -04:00
Dan Stough	49f3dadb8f	feat: connect proxy xDS for destinations Signed-off-by: Dhia Ayachi <dhia@hashicorp.com>	2022-07-14 15:27:02 -04:00
Daniel Upton	3d74efa8ad	proxycfg-glue: server-local implementation of `FederationStateListMeshGateways` This is the OSS portion of enterprise PR 2265. This PR provides a server-local implementation of the proxycfg.FederationStateListMeshGateways interface based on blocking queries.	2022-07-14 18:22:12 +01:00
Daniel Upton	ccc672013e	proxycfg-glue: server-local implementation of `GatewayServices` This is the OSS portion of enterprise PR 2259. This PR provides a server-local implementation of the proxycfg.GatewayServices interface based on blocking queries.	2022-07-14 18:22:12 +01:00
Daniel Upton	15a319dbfe	proxycfg-glue: server-local implementation of `TrustBundle` and `TrustBundleList` This is the OSS portion of enterprise PR 2250. This PR provides server-local implementations of the proxycfg.TrustBundle and proxycfg.TrustBundleList interfaces, based on local blocking queries.	2022-07-14 18:22:12 +01:00
Daniel Upton	673d02d30f	proxycfg-glue: server-local implementation of the `Health` interface This is the OSS portion of enterprise PR 2249. This PR introduces an implementation of the proxycfg.Health interface based on a local materialized view of the health events. It reuses the view and request machinery from agent/rpcclient/health, which made it super straightforward.	2022-07-14 18:22:12 +01:00
Daniel Upton	3c533ceea8	proxycfg-glue: server-local implementation of `ServiceList` This is the OSS portion of enterprise PR 2242. This PR introduces a server-local implementation of the proxycfg.ServiceList interface, backed by streaming events and a local materializer.	2022-07-14 18:22:12 +01:00
Daniel Upton	fbf88d3b19	proxycfg-glue: server-local compiled discovery chain data source This is the OSS portion of enterprise PR 2236. Adds a local blocking query-based implementation of the proxycfg.CompiledDiscoveryChain interface.	2022-07-14 18:22:12 +01:00
Chris S. Kim	f56810132f	Check if an upstream is implicit from either intentions or peered services	2022-07-13 16:53:20 -04:00
Chris S. Kim	02cff2394d	Use new maps for proxycfg peered data	2022-07-13 16:05:10 -04:00
Chris S. Kim	7f32cba735	Add new watch.Map type to refactor proxycfg	2022-07-13 16:05:10 -04:00
Chris S. Kim	b4ffa9ae0c	Scrub VirtualIPs before exporting	2022-07-13 16:05:10 -04:00
Kyle Havlovitz	9097e2b0f0	Merge pull request #13699 from hashicorp/tgate-http2-upstream Respect http2 protocol for upstreams of terminating gateways	2022-07-13 09:41:15 -07:00
Dan Upton	b9e525d689	grpc: rename public/private directories to external/internal (#13721 ) Previously, public referred to gRPC services that are both exposed on the dedicated gRPC port and have their definitions in the proto-public directory (so were considered usable by 3rd parties). Whereas private referred to services on the multiplexed server port that are only usable by agents and other servers. Now, we're splitting these definitions, such that external/internal refers to the port and public/private refers to whether they can be used by 3rd parties. This is necessary because the peering replication API needs to be exposed on the dedicated port, but is not (yet) suitable for use by 3rd parties.	2022-07-13 16:33:48 +01:00
R.B. Boyer	30fffd0c90	peerstream: some cosmetic refactors to make this easier to follow (#13732 ) - Use some protobuf construction helper methods for brevity. - Rename a local variable to avoid later shadowing. - Rename the Nonce field to be more like xDS's naming. - Be more explicit about which PeerID fields are empty.	2022-07-13 10:00:35 -05:00
Kyle Havlovitz	7d0c692374	Use protocol from resolved config entry, not gateway service	2022-07-12 16:23:40 -07:00
Kyle Havlovitz	7162e3bde2	Enable http2 options for grpc protocol	2022-07-12 14:38:44 -07:00
R.B. Boyer	c5c216008d	peering: always send the mesh gateway SpiffeID even for tcp services (#13728 ) If someone were to switch a peer-exported service from L4 to L7 there would be a brief SAN validation hiccup as traffic shifted to the mesh gateway for termination. This PR sends the mesh gateway SpiffeID down all the time so the clients always expect a switch.	2022-07-12 11:38:13 -05:00
R.B. Boyer	f0e6e4e697	state: prohibit changing an exported tcp discovery chain in a way that would break SAN validation (#13727 ) For L4/tcp exported services the mesh gateways will not be terminating TLS. A caller in one peer will be directly establishing TLS connections to the ultimate exported service in the other peer. The caller will be doing SAN validation using the replicated SpiffeID values shipped from the exporting side. There are a class of discovery chain edits that could be done on the exporting side that would cause the introduction of a new SpiffeID value. In between the time of the config entry update on the exporting side and the importing side getting updated peer stream data requests to the exported service would fail due to SAN validation errors. This is unacceptable so instead prohibit the exporting peer from making changes that would break peering in this way.	2022-07-12 11:17:33 -05:00
R.B. Boyer	2317f37b4d	state: prohibit exported discovery chains to have cross-datacenter or cross-partition references (#13726 ) Because peerings are pairwise, between two tuples of (datacenter, partition) having any exported reference via a discovery chain that crosses out of the peered datacenter or partition will ultimately not be able to work for various reasons. The biggest one is that there is no way in the ultimate destination to configure an intention that can allow an external SpiffeID to access a service. This PR ensures that a user simply cannot do this, so they won't run into weird situations like this.	2022-07-12 11:03:41 -05:00
Chris S. Kim	a6634db4a5	Return error if ServerAddresses is empty (#13714 )	2022-07-12 11:09:00 -04:00
Kyle Havlovitz	439eccdd80	Respect http2 protocol for upstreams of terminating gateways	2022-07-08 14:30:45 -07:00
R.B. Boyer	af04851637	peering: move peer replication to the external gRPC port (#13698 ) Peer replication is intended to be between separate Consul installs and effectively should be considered "external". This PR moves the peer stream replication bidirectional RPC endpoint to the external gRPC server and ensures that things continue to function.	2022-07-08 12:01:13 -05:00
R.B. Boyer	ea58f235f5	server: broadcast the public grpc port using lan serf and update the consul service in the catalog with the same data (#13687 ) Currently servers exchange information about their WAN serf port and RPC port with serf tags, so that they all learn of each other's addressing information. We intend to make larger use of the new public-facing gRPC port exposed on all of the servers, so this PR addresses that by passing around the gRPC port via serf tags and then ensuring the generated consul service in the catalog has metadata about that new port as well for ease of non-serf-based lookup.	2022-07-07 13:55:41 -05:00
Freddy	3542138e4d	Parse peer name for virtual IP DNS queries (#13602 ) This commit updates the DNS query locality parsing so that the virtual IP for an imported service can be queried. Note that: - Support for parsing a peer in other service discovery queries was not added. - Querying another datacenter for a virtual IP is not supported. This was technically allowed in 1.11 but is being rolled back for 1.13 because it is not a use-case we intended to support. Virtual IPs in different datacenters are going to collide because they are allocated sequentially.	2022-07-06 10:30:04 -06:00
R.B. Boyer	2a945facec	test: update mockery use to put mocks into test files (#13656 ) --testonly doesn't do anything anymore so switch to --filename instead	2022-07-05 16:57:15 -05:00
Chris S. Kim	f07132dacc	Revise possible states for a peering. (#13661 ) These changes are primarily for Consul's UI, where we want to be more specific about the state a peering is in. - The "initial" state was renamed to pending, and no longer applies to peerings being established from a peering token. - Upon request to establish a peering from a peering token, peerings will be set as "establishing". This will help distinguish between the two roles: the cluster that generates the peering token and the cluster that establishes the peering. - When marked for deletion, peering state will be set to "deleting". This way the UI determines the deletion via the state rather than the "DeletedAt" field. Co-authored-by: freddygv <freddy@hashicorp.com>	2022-07-04 10:47:58 -04:00
Daniel Upton	45886848b4	proxycfg: server-local intention upstreams data source This is the OSS portion of enterprise PR 2157. It builds on the local blocking query work in #13438 to implement the proxycfg.IntentionUpstreams interface using server-local data. Also moves the ACL filtering logic from agent/consul into the acl/filter package so that it can be reused here.	2022-07-04 10:48:36 +01:00
Daniel Upton	37ccbd2826	proxycfg: server-local intentions data source This is the OSS portion of enterprise PR 2141. This commit provides a server-local implementation of the `proxycfg.Intentions` interface that sources data from streaming events. It adds events for the `service-intentions` config entry type, and then consumes event streams (via materialized views) for the service's explicit intentions and any applicable wildcard intentions, merging them into a single list of intentions. An alternative approach I considered was to consume _all_ intention events (via `SubjectWildcard`) and filter out the irrelevant ones. This would admittedly remove some complexity in the `agent/proxycfg-glue` package but at the expense of considerable overhead from waking potentially many thousands of connect proxies every time any intention is updated.	2022-07-04 10:48:36 +01:00
Daniel Upton	653b8c4f9d	proxycfg: server-local config entry data sources This is the OSS portion of enterprise PR 2056. This commit provides server-local implementations of the proxycfg.ConfigEntry and proxycfg.ConfigEntryList interfaces, that source data from streaming events. It makes use of the LocalMaterializer type introduced for peering replication, adding the necessary support for authorization. It also adds support for "wildcard" subscriptions (within a topic) to the event publisher, as this is needed to fetch service-resolvers for all services when configuring mesh gateways. Currently, events will be emitted for just the ingress-gateway, service-resolver, and mesh config entry types, as these are the only entries required by proxycfg — the events will be emitted on topics named IngressGateway, ServiceResolver, and MeshConfig topics respectively. Though these events will only be consumed "locally" for now, they can also be consumed via the gRPC endpoint (confirmed using grpcurl) so using them from client agents should be a case of swapping the LocalMaterializer for an RPCMaterializer.	2022-07-04 10:48:36 +01:00
alex	cd9ca4290a	peering: add imported/exported counts to peering (#13644 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com> Co-authored-by: Chris S. Kim <ckim@hashicorp.com>	2022-06-29 14:07:30 -07:00
Chris S. Kim	b186731a2e	Fix ENT drift in files (#13647 )	2022-06-29 16:53:22 -04:00
Chris S. Kim	d8b7940e40	Add internal endpoint to fetch peered upstream candidates from VirtualIP table (#13642 ) For initial cluster peering TProxy support we consider all imported services of a partition to be potential upstreams. We leverage the VirtualIP table because it stores plain service names (e.g. "api", not "api-sidecar-proxy").	2022-06-29 16:34:58 -04:00
Eric Haberkorn	653cb42944	Fix spelling mistake in serverless patcher (#13607 ) passhthrough -> passthrough	2022-06-29 15:21:21 -04:00
alex	07bc22e405	no 1.9 style metrics (#13532 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-06-29 09:46:37 -07:00
alex	beb8b03e8a	peering: reconcile/ hint active state for list (#13619 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-06-29 09:43:50 -07:00
R.B. Boyer	31b95c747b	xds: modify rbac rules to use the XFCC header for peered L7 enforcement (#13629 ) When the protocol is http-like, and an intention has a peered source then the normal RBAC mTLS SAN field check is replaces with a joint combo of: mTLS SAN field must be the service's local mesh gateway leaf cert AND the first XFCC header (from the MGW) must have a URI field that matches the original intention source Also: - Update the regex program limit to be much higher than the teeny defaults, since the RBAC regex constructions are more complicated now. - Fix a few stray panics in xds generation.	2022-06-29 10:29:54 -05:00
R.B. Boyer	de0f9ac519	xds: have mesh gateways forward peered SpiffeIDs using the XFCC header (#13625 )	2022-06-28 15:32:42 -05:00
R.B. Boyer	1a9c86ea8f	xds: mesh gateways now correctly load up peer-exported discovery chains using L7 protocols (#13624 ) A mesh gateway will now configure the filter chains for L7 exported services using the correct discovery chain information.	2022-06-28 14:52:25 -05:00
R.B. Boyer	0fa828db76	peering: replicate all SpiffeID values necessary for the importing side to do SAN validation (#13612 ) When traversing an exported peered service, the discovery chain evaluation at the other side may re-route the request to a variety of endpoints. Furthermore we intend to terminate mTLS at the mesh gateway for arriving peered traffic that is http-like (L7), so the caller needs to know the mesh gateway's SpiffeID in that case as well. The following new SpiffeID values will be shipped back in the peerstream replication: - tcp: all possible SpiffeIDs resulting from the service-resolver component of the exported discovery chain - http-like: the SpiffeID of the mesh gateway	2022-06-27 14:37:18 -05:00
Max Bowsher	ef4b9e541f	Merge branch 'main' into fix-kv_entries-metric	2022-06-27 18:57:03 +01:00
alex	53f0cf5835	peering, internal: support UIServices, UINodes, UINodeInfo (#13577 )	2022-06-24 15:17:35 -07:00
Chris S. Kim	2e4cb6f77d	Add new index for PeeredServiceName and ServiceVirtualIP (#13582 ) For TProxy we will be leveraging the VirtualIP table, which needs to become peer-aware	2022-06-24 14:38:39 -04:00
alex	20ecf0febd	Merge pull request #13570 from hashicorp/acpance/peering-oss-intentions oss: peering, http: get peer service intentions (#2098)	2022-06-23 08:15:59 -07:00
Will Jordan	34ecbc1d71	Add per-node max indexes (#12399 ) Adds fine-grained node.[node] entries to the index table, allowing blocking queries to return fine-grained indexes that prevent them from returning immediately when unrelated nodes/services are updated. Co-authored-by: kisunji <ckim@hashicorp.com>	2022-06-23 11:13:25 -04:00
Chris S. Kim	ba89a7d9b0	Make memdb indexers generic (#13558 ) We have many indexer functions in Consul which take interface{} and type assert before building the index. We can use generics to get rid of the initial plumbing and pass around functions with better defined signatures. This has two benefits: 1) Less verbosity; 2) Developers can parse the argument types to memdb schemas without having to introspect the function for the type assertion.	2022-06-23 11:07:19 -04:00
Matt Keeler	7a4d13b0b2	Port over the index 0 -> 1 code that lived in the old rpc setQueryMeta function. (#13561 )	2022-06-23 09:34:47 -04:00
acpana	99c2e11328	oss: peering, http: get peer service intentions (#2098 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-06-22 16:25:09 -07:00
R.B. Boyer	e8ea3d7c3b	state: peering ID assignment cannot happen inside of the state store (#13525 ) Move peering ID assignment outisde of the FSM, so that the ID is written to the raft log and the same ID is used by all voters, and after restarts.	2022-06-21 13:04:08 -05:00
Matt Keeler	cb01702cd2	Add server local blocking queries and watches (#13438 ) Co-authored-by: Dan Upton <daniel@floppy.co>	2022-06-21 13:36:49 -04:00
Chris S. Kim	fb5eb20563	Pass trust domain to RBAC to validate and fix use of wrong peer trust bundles (#13508 )	2022-06-20 22:47:14 -04:00
Max Bowsher	7b97b8abd2	Delete definition of metric `consul.acl.blocked.node.registration` Although the metric is defined, there is no code which ever sets its value - the code in question is genuinely asymmetric - there are 3 types of object for which registration can be tracked, but only 2 for which deregistration can be tracked.	2022-06-19 17:38:04 +01:00
Max Bowsher	7c19c701e1	Fix incorrect name and doc for kv_entries metric The name of the metric as registered with the metrics library to provide the help string, was incorrect compared with the actual code that sets the metric value - bring them into sync. Also, the help message was incorrect. Rather than copy the help message from telemetry.mdx, which was correct, but felt a bit unnatural in the way it was worded, update both of them to a new wording.	2022-06-19 11:58:23 +01:00
Dan Upton	e00e3a0bc3	Move ACLResolveResult into acl/resolver package (#13467 ) Having this type live in the agent/consul package makes it difficult to put anything that relies on token resolution (e.g. the new gRPC services) in separate packages without introducing import cycles. For example, if package foo imports agent/consul for the ACLResolveResult type it means that agent/consul cannot import foo to register its service. We've previously worked around this by wrapping the ACLResolver to "downgrade" its return type to an acl.Authorizer - aside from the added complexity, this also loses the resolved identity information. In the future, we may want to move the whole ACLResolver into the acl/resolver package. For now, putting the result type there at least, fixes the immediate import cycle issues.	2022-06-17 10:24:43 +01:00
DanStough	4b402e3119	feat: tgtwy xDS generation for destinations Signed-off-by: Dhia Ayachi <dhia@hashicorp.com>	2022-06-16 16:17:49 -04:00
alex	bd4ddb3720	peering: block Intention.Apply ops (#13451 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-06-16 12:07:28 -07:00
alex	b3e99784a6	peering, state: account for peer intentions (#13443 ) Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>	2022-06-16 10:27:31 -07:00
R.B. Boyer	da8cea58c9	xds: begin refactor to always pass test snapshots through all xDS types (#13461 )	2022-06-15 14:58:28 -05:00
R.B. Boyer	201d1458c3	xds: mesh gateways now have their own leaf certificate when involved in a peering (#13460 ) This is only configured in xDS when a service with an L7 protocol is exported. They also load any relevant trust bundles for the peered services to eventually use for L7 SPIFFE validation during mTLS termination.	2022-06-15 14:36:18 -05:00
Riddhi Shah	411edc876b	[OSS] Support merge-central-config option in node services list API (#13450 ) Adds the merge-central-config query param option to the /catalog/node-services/:node-name API, to get a service definition in the response that is merged with central defaults (proxy-defaults/service-defaults). Updated the consul connect envoy command to use this option when retrieving the proxy service details so as to render the bootstrap configuration correctly.	2022-06-15 08:30:31 -07:00
Evan Culver	7f8c650d61	connect: Use Envoy 1.22.2 instead of 1.22.1 (#13444 )	2022-06-14 15:29:41 -07:00
freddygv	f3843809da	Avoid deleting peerings marked as terminated. When our peer deletes the peering it is locally marked as terminated. This termination should kick off deleting all imported data, but should not delete the peering object itself. Keeping peerings marked as terminated acts as a signal that the action took place.	2022-06-14 15:37:09 -06:00
freddygv	6453375ab2	Add leader routine to clean up peerings Once a peering is marked for deletion a new leader routine will now clean up all imported resources and then the peering itself. A lot of the logic was grabbed from the namespace/partitions deferred deletions but with a handful of simplifications: - The rate limiting is not configurable. - Deleting imported nodes/services/checks is done by deleting nodes with the Txn API. The services and checks are deleted as a side-effect. - There is no "round rate limiter" like with namespaces and partitions. This is because peerings are purely local, and deleting a peering in the datacenter does not depend on deleting data from other DCs like with WAN-federated namespaces. All rate limiting is handled by the Raft rate limiter.	2022-06-14 15:36:50 -06:00
Evan Culver	ba6136eb42	connect: Update Envoy support matrix to latest patch releases (#13431 )	2022-06-14 13:19:09 -07:00
alex	a0a49ce2a6	peering: intentions list test (#13435 )	2022-06-14 10:59:53 -07:00
freddygv	6c8ab1bbac	Fixup stream tear-down steps. 1. Fix a bug where the peering leader routine would not track all active peerings in the "stored" reconciliation map. This could lead to tearing down streams where the token was generated, since the ConnectedStreams() method used for reconciliation returns all streams and not just the ones initiated by this leader routine. 2. Fix a race where stream contexts were being canceled before termination messages were being processed by a peer. Previously the leader routine would tear down streams by canceling their context right after the termination message was sent. This context cancelation could be propagated to the server side faster than the termination message. Now there is a change where the dialing peer uses CloseSend() to signal when no more messages will be sent. Eventually the server peer will read an EOF after receiving and processing the preceding termination message. Using CloseSend() is actually not enough to address the issue mentioned, since it doesn't wait for the server peer to finish processing messages. Because of this now the dialing peer also reads from the stream until an error signals that there are no more messages. Receiving an EOF from our peer indicates that they processed the termination message and have no additional work to do. Given that the stream is being closed, all the messages received by Recv are discarded. We only check for errors to avoid importing new data.	2022-06-13 12:10:42 -06:00
freddygv	cc921a9c78	Update peering state and RPC for deferred deletion When deleting a peering we do not want to delete the peering and all imported data in a single operation, since deleting a large amount of data at once could overload Consul. Instead we defer deletion of peerings so that: 1. When a peering deletion request is received via gRPC the peering is marked for deletion by setting the DeletedAt field. 2. A leader routine will monitor for peerings that are marked for deletion and kick off a throttled deletion of all imported resources before deleting the peering itself. This commit mostly addresses point #1 by modifying the peering service to mark peerings for deletion. Another key change is to add a PeeringListDeleted state store function which can return all peerings marked for deletion. This function is what will be watched by the deferred deletion leader routine.	2022-06-13 12:10:32 -06:00
Freddy	71b254522e	Clean up imported nodes/services/checks as needed (#13367 ) Previously, imported data would never be deleted. As nodes/services/checks were registered and deregistered, resources deleted from the exporting cluster would accumulate in the imported cluster. This commit makes updates to replication so that whenever an update is received for a service name we reconcile what was present in the catalog against what was received. This handleUpdateService method can handle both updates and deletions.	2022-06-13 11:52:28 -06:00
Mark Anderson	edbf19f4e8	Merge pull request #13357 from hashicorp/ma/add-build-date-oss Add build date (oss)	2022-06-13 08:43:20 -07:00
Chris S. Kim	a02e9abcc1	Update RBAC to handle imported services (#13404 ) When converting from Consul intentions to xds RBAC rules, services imported from other peers must encode additional data like partition (from the remote cluster) and trust domain. This PR updates the PeeringTrustBundle to hold the sending side's local partition as ExportedPartition. It also updates RBAC code to encode SpiffeIDs of imported services with the ExportedPartition and TrustDomain.	2022-06-10 17:15:22 -04:00
R.B. Boyer	f557509e58	xds: allow for peered upstreams to use tagged addresses that are hostnames (#13422 ) Mesh gateways can use hostnames in their tagged addresses (#7999). This is useful if you were to expose a mesh gateway using a cloud networking load balancer appliance that gives you a DNS name but no reliable static IPs. Envoy cannot accept hostnames via EDS and those must be configured using CDS. There was already logic when configuring gateways in other locations in the code, but given the illusions in play for peering the downstream of a peered service wasn't aware that it should be doing that. Also: - ensuring that we always try to use wan-like addresses to cross peer boundaries.	2022-06-10 16:11:40 -05:00
Kyle Havlovitz	7f62571419	Add dns node lookup support in partitions	2022-06-10 11:23:51 -07:00
R.B. Boyer	7001e1151c	peering: rename initiate to establish in the context of the APIs (#13419 )	2022-06-10 11:10:46 -05:00
Mark Anderson	dd22ceccd1	Change default dates Signed-off-by: Mark Anderson <manderson@hashicorp.com>	2022-06-09 17:07:41 -07:00
Mark Anderson	f65093f1c6	Fixup some more tests Signed-off-by: Mark Anderson <manderson@hashicorp.com>	2022-06-09 17:04:05 -07:00
Mark Anderson	19c87be3a6	Add build date to self endpoint Signed-off-by: Mark Anderson <manderson@hashicorp.com>	2022-06-09 17:04:05 -07:00
Mark Anderson	ec060e5e37	Build date in config file Signed-off-by: Mark Anderson <manderson@hashicorp.com>	2022-06-09 17:04:05 -07:00
R.B. Boyer	bba3eb8cdd	peering: mesh gateways are required for cross-peer service mesh communication (#13410 ) Require use of mesh gateways in order for service mesh data plane traffic to flow between peers. This also adds plumbing for envoy integration tests involving peers, and one starter peering test.	2022-06-09 11:05:18 -05:00
Alessandro De Blasis	06304bfb0d	lint: conversion	2022-06-09 16:17:20 +01:00
Alessandro De Blasis	28f19e4627	tests: removed redundant probe test	2022-06-09 15:49:45 +01:00
Alessandro De Blasis	af083cc5ba	tests: added syscall mocking and tests for Check_OSService	2022-06-09 15:48:34 +01:00
kisunji	196a1c468a	Add missing index for read	2022-06-08 13:53:31 -04:00
kisunji	d026d84880	Add IntentionMatch tests for source peers	2022-06-08 13:53:31 -04:00
kisunji	bb0b42da12	Update ServiceIntentionSourceIndex to handle peer	2022-06-08 13:53:31 -04:00
Chris S. Kim	bb832e2bba	Add SourcePeer fields to relevant Intentions types (#13390 )	2022-06-08 13:24:10 -04:00
R.B. Boyer	7423886136	peering: allow protobuf requests to populate the default partition or namespace (#13398 )	2022-06-08 11:55:18 -05:00
Dhia Ayachi	ec0d267a35	Fix intentions wildcard dest (#13397 ) * when enterprise meta are wildcard assume it's a service intention * fix partition and namespace * move kind outside the loops * get the kind check outside the loop and add a comment Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>	2022-06-08 10:38:55 -04:00

... 9 10 11 12 13 ...

5381 Commits