4626 Commits

Author SHA1 Message Date
Daniel Upton
13c04a13af proxycfg: terminate stream on irrecoverable errors
This is the OSS portion of enterprise PR 2339.

It improves our handling of "irrecoverable" errors in proxycfg data sources.

The canonical example of this is what happens when the ACL token presented by
Envoy is deleted/revoked. Previously, the stream would get "stuck" until the
xDS server re-checked the token (after 5 minutes) and terminated the stream.

Materializers would also sit burning resources retrying something that could
never succeed.

Now, it is possible for data sources to mark errors as "terminal" which causes
the xDS stream to be closed immediately. Similarly, the submatview.Store will
evict materializers when it observes they have encountered such an error.
2022-08-23 20:17:49 +01:00
Chris S. Kim
81e965479b PR feedback to specify Node name in test mock 2022-08-23 11:51:04 -04:00
Eric Haberkorn
58901ad7df
Cluster peering failover disco chain changes (#14296) 2022-08-23 09:13:43 -04:00
Chris S. Kim
cdc8b0634d Fix flakes 2022-08-22 14:45:31 -04:00
Chris S. Kim
03e92826aa Increase heartbeat rate to reduce test flakes 2022-08-22 14:24:05 -04:00
Chris S. Kim
06ba9775ee Remove check for ResponseNonce 2022-08-22 13:55:01 -04:00
Chris S. Kim
547fb9570e Add missing mock assertions 2022-08-22 13:55:01 -04:00
Chris S. Kim
adff2eef16 Fix data race
newMockSnapshotHandler has an assertion on t.Cleanup which gets called before the event publisher is cancelled. This commit reorders the context.WithCancel so it properly gets cancelled before the assertion is made.
2022-08-22 13:55:01 -04:00
cskh
060531a29a
Fix: add missing ent meta for test (#14289) 2022-08-22 13:51:04 -04:00
Chris S. Kim
4e40e1d222 Handle server addresses update as client 2022-08-22 13:42:12 -04:00
Chris S. Kim
584d3409c4 Send server addresses on update from server 2022-08-22 13:41:44 -04:00
Chris S. Kim
c9d8ad3939 Add new subscription for server addresses 2022-08-22 13:40:25 -04:00
Chris S. Kim
028b87d51f Cleanup unused logger 2022-08-22 13:40:23 -04:00
Chris S. Kim
df951bd601 Expose external gRPC port in autopilot
The grpc_port was added to a NodeService's meta in ea58f235f5da416224ba615405269661ba1f4d8d
2022-08-22 10:07:00 -04:00
cskh
527ebd068a
fix: missing MaxInboundConnections field in service-defaults config entry (#14072)
* fix:  missing max_inbound_connections field in merge config
2022-08-19 14:11:21 -04:00
cskh
e84e4b8868
Fix: upgrade pkg imdario/merg to prevent merge config panic (#14237)
* upgrade imdario/merg to prevent merge config panic

* test: service definition takes precedence over service-defaults in merged results
2022-08-17 21:14:04 -04:00
James Hartig
f92883bbce Use the maximum jitter when calculating the timeout
The timeout should include the maximum possible
jitter since the server will randomly add to it's
timeout a jitter. If the server's timeout is less
than the client's timeout then the client will
return an i/o deadline reached error.

Before:
```
time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644'
rpc error making call: i/o deadline reached
real    10m11.469s
user    0m0.018s
sys     0m0.023s
```

After:
```
time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644'
[...]
real    10m35.835s
user    0m0.021s
sys     0m0.021s
```
2022-08-17 10:24:09 -04:00
Eric Haberkorn
1a73b0ca20
Add Targets field to service resolver failovers. (#14162)
This field will be used for cluster peering failover.
2022-08-15 09:20:25 -04:00
cskh
d46b515b64
fix: missing segment and partition (#14194) 2022-08-12 15:21:39 -04:00
Eric Haberkorn
ebd5513d4b
Refactor failover code to use Envoy's aggregate clusters (#14178) 2022-08-12 14:30:46 -04:00
cskh
81931e52c3
feat(telemetry): add labels to serf and memberlist metrics (#14161)
* feat(telemetry): add labels to serf and memberlist metrics
* changelog
* doc update

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-08-11 22:09:56 -04:00
Chris S. Kim
4c928cb2f7
Handle breaking change for ServiceVirtualIP restore (#14149)
Consul 1.13.0 changed ServiceVirtualIP to use PeeredServiceName instead of ServiceName which was a breaking change for those using service mesh and wanted to restore their snapshot after upgrading to 1.13.0.

This commit handles existing data with older ServiceName and converts it during restore so that there are no issues when restoring from older snapshots.
2022-08-11 14:47:10 -04:00
Chris S. Kim
3926009405 Add test to verify forwarding 2022-08-11 11:16:02 -04:00
Chris S. Kim
1ef22360c3 Register peerStreamServer internally to enable RPC forwarding 2022-08-11 11:16:02 -04:00
Chris S. Kim
de73171202 Handle wrapped errors in isFailedPreconditionErr 2022-08-11 11:16:02 -04:00
Daniel Kimsey
3c4fa9b468 Add support for filtering the 'List Services' API
1. Create a bexpr filter for performing the filtering
2. Change the state store functions to return the raw (not aggregated)
   list of ServiceNodes.
3. Move the aggregate service tags by name logic out of the state store
   functions into a new function called from the RPC endpoint
4. Perform the filtering in the endpoint before aggregation.
2022-08-10 16:52:32 -05:00
cskh
11e7a0d547
fix: shadowed err in retryJoin() (#14112)
- err value will be used later to surface the error message
  if r.join() returns any err.
2022-08-10 10:53:57 -04:00
skpratt
79c23a7cd2
Merge pull request #14056 from hashicorp/proxy-register-port-race
Refactor sidecar_service method to separate port assignment
2022-08-10 09:46:29 -05:00
skpratt
aa77559819 Merge branch 'main' into proxy-register-port-race 2022-08-10 08:40:45 -05:00
Chris S. Kim
e3046120b3 Close active listeners on error
If startListeners successfully created listeners for some of its input addresses but eventually failed, the function would return an error and existing listeners would not be cleaned up.
2022-08-09 12:22:39 -04:00
Chris S. Kim
6311c651de Add retry in TestAgentConnectCALeafCert_good 2022-08-09 11:20:37 -04:00
Kyle Havlovitz
6938b8c755
Merge pull request #13958 from hashicorp/gateway-wildcard-fix
Fix wildcard picking up services it shouldn't for ingress/terminating gateways
2022-08-08 12:54:40 -07:00
Kyle Havlovitz
fe1fcea34f Add some extra handling for destination deletes 2022-08-08 11:38:13 -07:00
freddygv
d421e18172 Update snapshot test 2022-08-08 09:17:15 -06:00
freddygv
1031ffc3c7 Re-validate existing secrets at state store
Previously establishment and pending secrets were only checked at the
RPC layer. However, given that these are Check-and-Set transactions we
should ensure that the given secrets are still valid when persisting a
secret exchange or promotion.

Otherwise it would be possible for concurrent requests to overwrite each
other.
2022-08-08 09:06:07 -06:00
freddygv
0ea4bfae94 Test fixes 2022-08-08 08:31:47 -06:00
freddygv
c04515a844 Use proto message for each secrets write op
Previously there was a field indicating the operation that triggered a
secrets write. Now there is a message for each operation and it contains
the secret ID being persisted.
2022-08-08 01:41:00 -06:00
Kyle Havlovitz
6580566c3b Update ingress/terminating wildcard logic and handle destinations 2022-08-05 07:56:10 -07:00
freddygv
8067890787 Inherit active secret when exchanging 2022-08-03 17:32:53 -05:00
freddygv
60d6e28c97 Pass explicit signal with op for secrets write
Previously the updates to the peering secrets UUID table relied on
inferring what action triggered the update based on a reconciliation
against the existing secrets.

Instead we now explicitly require the operation to be given so that the
inference isn't necessary. This makes the UUID table logic easier to
reason about and fixes some related bugs.

There is also an update so that the peering secrets get handled on
snapshots/restores.
2022-08-03 17:25:12 -05:00
freddygv
9ca687bc7c Avoid deleting peering secret UUIDs at dialers
Dialers do not keep track of peering secret UUIDs, so they should not
attempt to clean up data from that table when their peering is deleted.

We also now keep peer server addresses when marking peerings for
deletion. Peer server addresses are used by the ShouldDial() helper
when determining whether the peering is for a dialer or an acceptor.
We need to keep this data so that peering secrets can be cleaned up
accordingly.
2022-08-03 16:34:57 -05:00
skpratt
58eed6b049
Merge pull request #13906 from skpratt/validate-port-agent-split
Separate port and socket path validation for local agent
2022-08-02 16:58:41 -05:00
Dhia Ayachi
7154367892
add token to the request when creating a cacheIntentions query (#14005) 2022-08-02 14:27:34 -04:00
Kyle Havlovitz
499211f907 Fix wildcard picking up services it shouldn't for ingress/terminating gateways 2022-08-02 09:41:31 -07:00
Daniel Upton
6452118c15 proxycfg-sources: fix hot loop when service not found in catalog
Fixes a bug where a service getting deleted from the catalog would cause
the ConfigSource to spin in a hot loop attempting to look up the service.

This is because we were returning a nil WatchSet which would always
unblock the select.

Kudos to @freddygv for discovering this!
2022-08-02 15:42:29 +01:00
Freddy
42996411cc
Various peering fixes (#13979)
* Avoid logging StreamSecretID
* Wrap additional errors in stream handler
* Fix flakiness in leader test and rename servers for clarity. There was
  a race condition where the peering was being deleted in the test
  before the stream was active. Now the test waits for the stream to be
  connected on both sides before deleting the associated peering.
* Run flaky test serially
2022-08-01 15:06:18 -06:00
DanStough
169ff71132 fix: ipv4 destination dns resolution 2022-08-01 16:45:57 -04:00
Luke Kysow
988e1fd35d
peering: default to false (#13963)
* defaulting to false because peering will be released as beta
* Ignore peering disabled error in bundles cachetype

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: freddygv <freddy@hashicorp.com>
Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
2022-08-01 15:22:36 -04:00
Freddy
dacf703d20
Merge branch 'main' into fix-kv_entries-metric 2022-08-01 13:19:27 -06:00
Freddy
72b6d69652
Merge pull request #13499 from maxb/delete-unused-metric
Delete definition of metric `consul.acl.blocked.node.deregistration`
2022-08-01 12:31:05 -06:00