213 Commits

Author SHA1 Message Date
Iryna Shustava
d747b51dab
Handle ACL errors consistently when blocking query timeout is reached. (#20876)
Currently, when a client starts a blocking query and an ACL token expires within
that time, Consul will return ACL not found error with a 403 status code. However,
sometimes if an ACL token is invalidated at the same time as the query's deadline is reached,
Consul will instead return an empty response with a 200 status code.

This is because of the events being executed.
1. Client issues a blocking query request with timeout `t`.
2. ACL is deleted.
3. Server detects a change in ACLs and force closes the gRPC stream.
4. Client resubscribes with the same token and resets its state (view).
5. Client sees "ACL not found" error.

If ACL is deleted before step 4, the client is unaware that the stream was closed due to
an ACL error and will return an empty view (from the reset state) with the 200 status code.

To fix this problem, we introduce another state to the subsciption to indicate when a change
to ACLs has occured. If the server sees that there was an error due to ACL change, it will
re-authenticate the request and return an error if the token is no longer valid.

Fixes #20790
2024-03-22 14:59:54 -06:00
Derek Menteer
eabff257d7
Various bug-fixes and improvements (#20866)
* Shuffle the list of servers returned by `pbserverdiscovery.WatchServers`.

This randomizes the list of servers to help reduce the chance of clients
all connecting to the same server simultaneously. Consul-dataplane is one
such client that does not randomize its own list of servers.

* Fix potential goroutine leak in xDS recv loop.

This commit ensures that the goroutine which receives xDS messages from
proxies will not block forever if the stream's context is cancelled but
the `processDelta()` function never consumes the message (due to being
terminated).

* Add changelog.
2024-03-15 13:10:48 -05:00
Semir Patel
943426bc79
v2tenancy: add optional LicenseFeature to type Registration struct (#20673) 2024-02-20 14:42:31 -06:00
Dan Stough
14efb28086
fix(v2dns): add node ttl to workloads, comment cleanup, and changelog (#20643)
* fix(v2dns): add node ttl to workloads, plus comment cleanup

* docs(v2dns): changelog
2024-02-14 17:38:11 -05:00
Dan Stough
9602b43183
feat(v2dns): catalog v2 workload query support (#20466) 2024-02-02 18:29:38 -05:00
R.B. Boyer
c029b20615
v2: ensure the controller caches are fully populated before first use (#20421)
The new controller caches are initialized before the DependencyMappers or the 
Reconciler run, but importantly they are not populated. The expectation is that 
when the WatchList call is made to the resource service it will send an initial 
snapshot of all resources matching a single type, and then perpetually send 
UPSERT/DELETE events afterward. This initial snapshot will cycle through the 
caching layer and will catch it up to reflect the stored data.

Critically the dependency mappers and reconcilers will race against the restoration 
of the caches on server startup or leader election. During this time it is possible a
 mapper or reconciler will use the cache to lookup a specific relationship and 
not find it. That very same reconciler may choose to then recompute some 
persisted resource and in effect rewind it to a prior computed state.

Change

- Since we are updating the behavior of the WatchList RPC, it was aligned to 
  match that of pbsubscribe and pbpeerstream using a protobuf oneof instead of the enum+fields option.

- The WatchList rpc now has 3 alternating response events: Upsert, Delete, 
  EndOfSnapshot. When set the initial batch of "snapshot" Upserts sent on a new 
  watch, those operations will be followed by an EndOfSnapshot event before beginning 
  the never-ending sequence of Upsert/Delete events.

- Within the Controller startup code we will launch N+1 goroutines to execute WatchList 
  queries for the watched types. The UPSERTs will be applied to the nascent cache
   only (no mappers will execute).

- Upon witnessing the END operation, those goroutines will terminate.

- When all cache priming routines complete, then the normal set of N+1 long lived 
watch routines will launch to officially witness all events in the system using the 
primed cached.
2024-02-02 15:11:05 -06:00
wangxinyi7
fb2b696c0e
missing prefix / (#20447)
* missing prefix / and fix typos
2024-02-02 12:48:45 -08:00
wangxinyi7
3b44be530d
only forwarding the resource service traffic in client agent to server agent (#20347)
* only forwarding the resource service traffic in client agent to server agent
2024-01-31 12:05:47 -08:00
Matt Keeler
34a32d4ce5
Remove V2 PeerName field from pbresource.Tenancy (#19865)
The peer name will eventually show up elsewhere in the resource. For now though this rips it out of where we don’t want it to be.
2024-01-29 15:08:31 -05:00
Semir Patel
efdf80413c
resource: add MutateAndValidate endpoint (#20311) 2024-01-25 13:12:30 -06:00
Tauhid Anjum
5d294b26d3
NET-5824 Exported services api (#20015)
* Exported services api implemented

* Tests added, refactored code

* Adding server tests

* changelog added

* Proto gen added

* Adding codegen changes

* changing url, response object

* Fixing lint error by having namespace and partition directly

* Tests changes

* refactoring tests

* Simplified uniqueness logic for exported services, sorted the response in order of service name

* Fix lint errors, refactored code
2024-01-23 10:06:59 +05:30
Dan Stough
97ae244d8a
feat(v2dns): add grpc DNS support (#20296) 2024-01-22 10:10:03 -05:00
Semir Patel
6d9e8fdd05
resource: retry non-CAS deletes automatically (#20292) 2024-01-22 06:45:01 -08:00
Derek Menteer
b8b8ad46fc
Various race condition and test fixes. (#20212)
* Increase timeouts for flakey peering test.

* Various test fixes.

* Fix race condition in reconcilePeering.

This resolves an issue where a peering object in the state store was
incorrectly mutated by a function, resulting in the test being flagged as
failing when the -race flag was used.
2024-01-16 08:57:43 -06:00
Matt Keeler
326c0ecfbe
In-Memory gRPC (#19942)
* Implement In-Process gRPC for use by controller caching/indexing

This replaces the pipe base listener implementation we were previously using. The new style CAN avoid cloning resources which our controller caching/indexing is taking advantage of to not duplicate resource objects in memory.

To maintain safety for controllers and for them to be able to modify data they get back from the cache and the resource service, the client they are presented in their runtime will be wrapped with an autogenerated client which clones request and response messages as they pass through the client.

Another sizable change in this PR is to consolidate how server specific gRPC services get registered and managed. Before this was in a bunch of different methods and it was difficult to track down how gRPC services were registered. Now its all in one place.

* Fix race in tests

* Ensure the resource service is registered to the multiplexed handler for forwarding from client agents

* Expose peer streaming on the internal handler
2024-01-12 11:54:07 -05:00
Matt Keeler
123bc95e1a
Add Common Controller Caching Infrastructure (#19767)
* Add Common Controller Caching Infrastructure
2023-12-13 10:06:39 -05:00
Matt Keeler
efe279f802
Retry lint fixes (#19151)
* Add a make target to run lint-consul-retry on all the modules
* Cleanup sdk/testutil/retry
* Fix a bunch of retry.Run* usage to not use the outer testing.T
* Fix some more recent retry lint issues and pin to v1.4.0 of lint-consul-retry
* Fix codegen copywrite lint issues
* Don’t perform cleanup after each retry attempt by default.
* Use the common testutil.TestingTB interface in test-integ/tenancy
* Fix retry tests
* Update otel access logging extension test to perform requests within the retry block
2023-12-06 12:11:32 -05:00
Semir Patel
c1bbda8128
resource: block default namespace deletion + test refactorings (#19822) 2023-12-05 14:00:06 -05:00
Semir Patel
5930748cb0
resource: ListByOwner returns empty list on non-existent tenancy (#19742) 2023-11-27 14:56:08 -06:00
Semir Patel
75c2def1ca
resource: preserve deferred deletion metadata on non-CAS writes (#19674) 2023-11-17 10:51:25 -06:00
Semir Patel
1eed205286
resource: freeze resources after marked for deletion (4 of 5) (#19603) 2023-11-15 10:58:27 -06:00
Kumar Kavish
68e7f27fd2
[NET-6438] Add tenancy to xDS Tests (#19551)
* [NET-6438] Add tenancy to xDS Tests

* [NET-6438] Add tenancy to xDS Tests
- Fixing imports

* [NET-6438] Add tenancy to xDS Tests
- Added cleanup post test run

* [NET-6356] Add tenancy to xDS Tests
- Added cleanup post test run

* [NET-6438] Add tenancy to xDS Tests
- using t.Cleanup instead of defer delete

* [NET-6438] Add tenancy to xDS Tests
- rebased

* [NET-6438] Add tenancy to xDS Tests
- rebased
2023-11-10 15:32:36 +05:30
Kumar Kavish
f09dbb99e9
[NET-6356] Add tenancy to Failover Tests (#19547)
* [NET-6356] Add tenancy to Failover Tests

* [NET-6438] Add tenancy to xDS Tests
- Added cleanup post test run

* [NET-6356] Add tenancy to failover Tests
- using t.Cleanup instead of defer delete
2023-11-10 01:14:09 +05:30
Poonam Jadhav
c3c836edae
Net-6291/fix/watch resources (#19467)
* fix: update watch endpoint to default based on scope

* test: additional test

* refactor: rename list validate function

* refactor: rename validate<Op>Request() -> ensure<Op>RequestValid() for consistency
2023-11-03 16:03:07 -04:00
Semir Patel
ef35525cf1
resource: finalizer aware delete endpoint (2 of 5) (#19493)
resource: make delete endpoint finalizer aware
2023-11-03 10:10:58 -04:00
Semir Patel
0abd96c0d9
resource: resource service now checks for v2tenancy feature flag (#19400) 2023-10-27 08:55:02 -05:00
Poonam Jadhav
1806bcb38c
test: add missing tests for list endpoint (#19364) 2023-10-26 11:24:33 -04:00
Derek Menteer
48c4a5b736
Add grpc keepalive configuration. (#19339)
Prior to the introduction of this configuration, grpc keepalive messages were
sent after 2 hours of inactivity on the stream. This posed issues in various
scenarios where the server-side xds connection balancing was unaware that envoy
instances were uncleanly killed / force-closed, since the connections would
only be cleaned up after ~5 minutes of TCP timeouts occurred. Setting this
config to a 30 second interval with a 20 second timeout ensures that at most,
it should take up to 50 seconds for a dead xds connection to be closed.
2023-10-24 08:05:31 -05:00
Semir Patel
96606d114c
resource: default peername to local in list endpoints (#19340) 2023-10-23 16:30:47 -05:00
Dhia Ayachi
d5c9f11b59
Tenancy Bridge v2 (#19220)
* tenancy bridge v2 for v2 resources

* add missing copywrite headers
2023-10-20 14:49:54 -04:00
Semir Patel
ad177698f7
resource: enforce lowercase v2 resource names (#19218) 2023-10-16 12:55:30 -05:00
Iryna Shustava
105ebfdd00
catalog, mesh: implement missing ACL hooks (#19143)
This change adds ACL hooks to the remaining catalog and mesh resources, excluding any computed ones. Those will for now continue using the default operator:x permissions.

It refactors a lot of the common testing functions so that they can be re-used between resources.

There are also some types that we don't yet support (e.g. virtual IPs) that this change adds ACL hooks to for future-proofing.
2023-10-13 23:16:26 +00:00
Dhia Ayachi
5fbf0c00d3
Add namespace read write tests (#19173) 2023-10-13 12:03:06 -04:00
Iryna Shustava
25283f0ec2
get-envoy-bootstrap-params: when v2 is enabled, use computed proxy configuration (#19175) 2023-10-12 14:01:36 -06:00
Semir Patel
830c4ea81c
v2tenancy: cluster scoped reads (#19082) 2023-10-10 13:30:23 -05:00
Chris S. Kim
92ce814693
Remove old build tags (#19128) 2023-10-10 10:58:06 -04:00
John Murret
d67e5c6e35
NET-5590 - authorization: check for identity:write in CA certs, xds server, and getting envoy bootstrap params (#19049)
* NET-5590 - authorization: check for identity:write in CA certs, xds server, and getting envoy bootstrap params

* gofmt file
2023-10-03 22:02:23 +00:00
Iryna Shustava
e6b724d062
catalog,mesh,auth: Move resource types to the proto-public module (#18935) 2023-09-22 15:50:56 -06:00
R.B. Boyer
7688178ad2
peerstream: fix flaky test related to autopilot integration (#18979) 2023-09-22 13:12:00 -05:00
Iryna Shustava
d88888ee8b
catalog,mesh,auth: Bump versions to v2beta1 (#18930) 2023-09-22 10:51:15 -06:00
R.B. Boyer
ef6f2494c7
resource: allow for the ACLs.Read hook to request the entire data payload to perform the authz check (#18925)
The ACLs.Read hook for a resource only allows for the identity of a 
resource to be passed in for use in authz consideration. For some 
resources we wish to allow for the current stored value to dictate how 
to enforce the ACLs (such as reading a list of applicable services from 
the payload and allowing service:read on any of them to control reading the enclosing resource).

This change update the interface to usually accept a *pbresource.ID, 
but if the hook decides it needs more data it returns a sentinel error 
and the resource service knows to defer the authz check until after
 fetching the data from storage.
2023-09-22 09:53:55 -05:00
Nitya Dhanushkodi
3a2e62053a
v2: various fixes to make K8s tproxy multiport acceptance tests and manual explicit upstreams (single port) tests pass (#18874)
Adding coauthors who mobbed/paired at various points throughout last week.
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
Co-authored-by: Iryna Shustava <iryna@hashicorp.com>
Co-authored-by: John Murret <john.murret@hashicorp.com>
Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: Michael Wilkerson <mwilkerson@hashicorp.com>
2023-09-20 00:02:01 +00:00
Semir Patel
62796a1454
resource: mutate and validate before acls on write (#18868) 2023-09-18 17:04:29 -05:00
Dhia Ayachi
4435e4a420
add v2 tenancy bridge Flag and v2 Tenancy Bridge initial implementation (#18830)
* add v2 tenancy bridge and a feature flag for v2 tenancy

* move tenancy bridge v2 under resource package
2023-09-18 12:25:05 -04:00
skpratt
1fda2965e8
Allow empty data writes for resources (#18819)
* allow nil data writes for resources

* update demo to test valid type with no data
2023-09-15 14:00:23 -05:00
Semir Patel
d3dad14030
resource: default peername to "local" for now (#18822) 2023-09-15 09:34:18 -05:00
Iryna Shustava
4eb2197e82
dataplane: Allow getting bootstrap parameters when using V2 APIs (#18504)
This PR enables the GetEnvoyBootstrapParams endpoint to construct envoy bootstrap parameters from v2 catalog and mesh resources.

   * Make bootstrap request and response parameters less specific to services so that we can re-use them for workloads or service instances.
   * Remove ServiceKind from bootstrap params response. This value was unused previously and is not needed for V2.
   * Make access logs generation generic so that we can generate them using v1 or v2 resources.
2023-09-06 16:46:25 -06:00
Semir Patel
b96cff7436
resource: Require scope for resource registration (#18635) 2023-09-01 09:44:53 -05:00
Semir Patel
7b9e243297
resource: Allow nil tenancy (#18618) 2023-08-31 09:24:09 -05:00
Semir Patel
2225bf0550
resource: Make resource writestatus tenancy aware (#18577) 2023-08-24 19:18:47 +00:00