Commit Graph

4577 Commits

Author SHA1 Message Date
Dan Upton 3b993f2da7
dataplane: update envoy bootstrap params for consul-dataplane (#14017)
Contains 2 changes to the GetEnvoyBootstrapParams response to support
consul-dataplane.

Exposing node_name and node_id:

consul-dataplane will support providing either the node_id or node_name in its
configuration. Unfortunately, supporting both in the xDS meta adds a fair amount
of complexity (partly because most tables are currently indexed on node_name)
so for now we're going to return them both from the bootstrap params endpoint,
allowing consul-dataplane to exchange a node_id for a node_name (which it will
supply in the xDS meta).

Properly setting service for gateways:

To avoid the need to special case gateways in consul-dataplane, service will now
either be the destination service name for connect proxies, or the gateway
service name. This means it can be used as-is in Envoy configuration (i.e. as a
cluster name or in metric tags).
2022-08-24 12:03:15 +01:00
Daniel Upton 13c04a13af proxycfg: terminate stream on irrecoverable errors
This is the OSS portion of enterprise PR 2339.

It improves our handling of "irrecoverable" errors in proxycfg data sources.

The canonical example of this is what happens when the ACL token presented by
Envoy is deleted/revoked. Previously, the stream would get "stuck" until the
xDS server re-checked the token (after 5 minutes) and terminated the stream.

Materializers would also sit burning resources retrying something that could
never succeed.

Now, it is possible for data sources to mark errors as "terminal" which causes
the xDS stream to be closed immediately. Similarly, the submatview.Store will
evict materializers when it observes they have encountered such an error.
2022-08-23 20:17:49 +01:00
Chris S. Kim 81e965479b PR feedback to specify Node name in test mock 2022-08-23 11:51:04 -04:00
Eric Haberkorn 58901ad7df
Cluster peering failover disco chain changes (#14296) 2022-08-23 09:13:43 -04:00
Chris S. Kim cdc8b0634d Fix flakes 2022-08-22 14:45:31 -04:00
Chris S. Kim 03e92826aa Increase heartbeat rate to reduce test flakes 2022-08-22 14:24:05 -04:00
Chris S. Kim 06ba9775ee Remove check for ResponseNonce 2022-08-22 13:55:01 -04:00
Chris S. Kim 547fb9570e Add missing mock assertions 2022-08-22 13:55:01 -04:00
Chris S. Kim adff2eef16 Fix data race
newMockSnapshotHandler has an assertion on t.Cleanup which gets called before the event publisher is cancelled. This commit reorders the context.WithCancel so it properly gets cancelled before the assertion is made.
2022-08-22 13:55:01 -04:00
cskh 060531a29a
Fix: add missing ent meta for test (#14289) 2022-08-22 13:51:04 -04:00
Chris S. Kim 4e40e1d222 Handle server addresses update as client 2022-08-22 13:42:12 -04:00
Chris S. Kim 584d3409c4 Send server addresses on update from server 2022-08-22 13:41:44 -04:00
Chris S. Kim c9d8ad3939 Add new subscription for server addresses 2022-08-22 13:40:25 -04:00
Chris S. Kim 028b87d51f Cleanup unused logger 2022-08-22 13:40:23 -04:00
Chris S. Kim df951bd601 Expose external gRPC port in autopilot
The grpc_port was added to a NodeService's meta in ea58f235f5
2022-08-22 10:07:00 -04:00
cskh 527ebd068a
fix: missing MaxInboundConnections field in service-defaults config entry (#14072)
* fix:  missing max_inbound_connections field in merge config
2022-08-19 14:11:21 -04:00
cskh e84e4b8868
Fix: upgrade pkg imdario/merg to prevent merge config panic (#14237)
* upgrade imdario/merg to prevent merge config panic

* test: service definition takes precedence over service-defaults in merged results
2022-08-17 21:14:04 -04:00
James Hartig f92883bbce Use the maximum jitter when calculating the timeout
The timeout should include the maximum possible
jitter since the server will randomly add to it's
timeout a jitter. If the server's timeout is less
than the client's timeout then the client will
return an i/o deadline reached error.

Before:
```
time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644'
rpc error making call: i/o deadline reached
real    10m11.469s
user    0m0.018s
sys     0m0.023s
```

After:
```
time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644'
[...]
real    10m35.835s
user    0m0.021s
sys     0m0.021s
```
2022-08-17 10:24:09 -04:00
Eric Haberkorn 1a73b0ca20
Add `Targets` field to service resolver failovers. (#14162)
This field will be used for cluster peering failover.
2022-08-15 09:20:25 -04:00
cskh d46b515b64
fix: missing segment and partition (#14194) 2022-08-12 15:21:39 -04:00
Eric Haberkorn ebd5513d4b
Refactor failover code to use Envoy's aggregate clusters (#14178) 2022-08-12 14:30:46 -04:00
cskh 81931e52c3
feat(telemetry): add labels to serf and memberlist metrics (#14161)
* feat(telemetry): add labels to serf and memberlist metrics
* changelog
* doc update

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-08-11 22:09:56 -04:00
Chris S. Kim 4c928cb2f7
Handle breaking change for ServiceVirtualIP restore (#14149)
Consul 1.13.0 changed ServiceVirtualIP to use PeeredServiceName instead of ServiceName which was a breaking change for those using service mesh and wanted to restore their snapshot after upgrading to 1.13.0.

This commit handles existing data with older ServiceName and converts it during restore so that there are no issues when restoring from older snapshots.
2022-08-11 14:47:10 -04:00
Chris S. Kim 3926009405 Add test to verify forwarding 2022-08-11 11:16:02 -04:00
Chris S. Kim 1ef22360c3 Register peerStreamServer internally to enable RPC forwarding 2022-08-11 11:16:02 -04:00
Chris S. Kim de73171202 Handle wrapped errors in isFailedPreconditionErr 2022-08-11 11:16:02 -04:00
Daniel Kimsey 3c4fa9b468 Add support for filtering the 'List Services' API
1. Create a bexpr filter for performing the filtering
2. Change the state store functions to return the raw (not aggregated)
   list of ServiceNodes.
3. Move the aggregate service tags by name logic out of the state store
   functions into a new function called from the RPC endpoint
4. Perform the filtering in the endpoint before aggregation.
2022-08-10 16:52:32 -05:00
cskh 11e7a0d547
fix: shadowed err in retryJoin() (#14112)
- err value will be used later to surface the error message
  if r.join() returns any err.
2022-08-10 10:53:57 -04:00
skpratt 79c23a7cd2
Merge pull request #14056 from hashicorp/proxy-register-port-race
Refactor sidecar_service method to separate port assignment
2022-08-10 09:46:29 -05:00
skpratt aa77559819 Merge branch 'main' into proxy-register-port-race 2022-08-10 08:40:45 -05:00
Chris S. Kim e3046120b3 Close active listeners on error
If startListeners successfully created listeners for some of its input addresses but eventually failed, the function would return an error and existing listeners would not be cleaned up.
2022-08-09 12:22:39 -04:00
Chris S. Kim 6311c651de Add retry in TestAgentConnectCALeafCert_good 2022-08-09 11:20:37 -04:00
Kyle Havlovitz 6938b8c755
Merge pull request #13958 from hashicorp/gateway-wildcard-fix
Fix wildcard picking up services it shouldn't for ingress/terminating gateways
2022-08-08 12:54:40 -07:00
Kyle Havlovitz fe1fcea34f Add some extra handling for destination deletes 2022-08-08 11:38:13 -07:00
freddygv d421e18172 Update snapshot test 2022-08-08 09:17:15 -06:00
freddygv 1031ffc3c7 Re-validate existing secrets at state store
Previously establishment and pending secrets were only checked at the
RPC layer. However, given that these are Check-and-Set transactions we
should ensure that the given secrets are still valid when persisting a
secret exchange or promotion.

Otherwise it would be possible for concurrent requests to overwrite each
other.
2022-08-08 09:06:07 -06:00
freddygv 0ea4bfae94 Test fixes 2022-08-08 08:31:47 -06:00
freddygv c04515a844 Use proto message for each secrets write op
Previously there was a field indicating the operation that triggered a
secrets write. Now there is a message for each operation and it contains
the secret ID being persisted.
2022-08-08 01:41:00 -06:00
Kyle Havlovitz 6580566c3b Update ingress/terminating wildcard logic and handle destinations 2022-08-05 07:56:10 -07:00
freddygv 8067890787 Inherit active secret when exchanging 2022-08-03 17:32:53 -05:00
freddygv 60d6e28c97 Pass explicit signal with op for secrets write
Previously the updates to the peering secrets UUID table relied on
inferring what action triggered the update based on a reconciliation
against the existing secrets.

Instead we now explicitly require the operation to be given so that the
inference isn't necessary. This makes the UUID table logic easier to
reason about and fixes some related bugs.

There is also an update so that the peering secrets get handled on
snapshots/restores.
2022-08-03 17:25:12 -05:00
freddygv 9ca687bc7c Avoid deleting peering secret UUIDs at dialers
Dialers do not keep track of peering secret UUIDs, so they should not
attempt to clean up data from that table when their peering is deleted.

We also now keep peer server addresses when marking peerings for
deletion. Peer server addresses are used by the ShouldDial() helper
when determining whether the peering is for a dialer or an acceptor.
We need to keep this data so that peering secrets can be cleaned up
accordingly.
2022-08-03 16:34:57 -05:00
skpratt 58eed6b049
Merge pull request #13906 from skpratt/validate-port-agent-split
Separate port and socket path validation for local agent
2022-08-02 16:58:41 -05:00
Dhia Ayachi 7154367892
add token to the request when creating a cacheIntentions query (#14005) 2022-08-02 14:27:34 -04:00
Kyle Havlovitz 499211f907 Fix wildcard picking up services it shouldn't for ingress/terminating gateways 2022-08-02 09:41:31 -07:00
Daniel Upton 6452118c15 proxycfg-sources: fix hot loop when service not found in catalog
Fixes a bug where a service getting deleted from the catalog would cause
the ConfigSource to spin in a hot loop attempting to look up the service.

This is because we were returning a nil WatchSet which would always
unblock the select.

Kudos to @freddygv for discovering this!
2022-08-02 15:42:29 +01:00
Freddy 42996411cc
Various peering fixes (#13979)
* Avoid logging StreamSecretID
* Wrap additional errors in stream handler
* Fix flakiness in leader test and rename servers for clarity. There was
  a race condition where the peering was being deleted in the test
  before the stream was active. Now the test waits for the stream to be
  connected on both sides before deleting the associated peering.
* Run flaky test serially
2022-08-01 15:06:18 -06:00
DanStough 169ff71132 fix: ipv4 destination dns resolution 2022-08-01 16:45:57 -04:00
Luke Kysow 988e1fd35d
peering: default to false (#13963)
* defaulting to false because peering will be released as beta
* Ignore peering disabled error in bundles cachetype

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: freddygv <freddy@hashicorp.com>
Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
2022-08-01 15:22:36 -04:00
Freddy dacf703d20
Merge branch 'main' into fix-kv_entries-metric 2022-08-01 13:19:27 -06:00
Freddy 72b6d69652
Merge pull request #13499 from maxb/delete-unused-metric
Delete definition of metric `consul.acl.blocked.node.deregistration`
2022-08-01 12:31:05 -06:00
Dhia Ayachi 6fd65a4a45
Tgtwy egress HTTP support (#13953)
* add golden files

* add support to http in tgateway egress destination

* fix slice sorting to include both address and port when using server_names

* fix listener loop for http destination

* fix routes to generate a route per port and a virtualhost per port-address combination

* sort virtual hosts list to have a stable order

* extract redundant serviceNode
2022-08-01 14:12:43 -04:00
Matt Keeler f74d0cef7a
Implement/Utilize secrets for Peering Replication Stream (#13977) 2022-08-01 10:33:18 -04:00
alex a45bb1f06b
block PeerName register requests (#13887)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-29 14:36:22 -07:00
Luke Kysow 95096e2c03
peering: retry establishing connection more quickly on certain errors (#13938)
When we receive a FailedPrecondition error, retry that more quickly
because we expect it will resolve shortly. This is particularly
important in the context of Consul servers behind a load balancer
because when establishing a connection we have to retry until we
randomly land on a leader node.

The default retry backoff goes from 2s, 4s, 8s, etc. which can result in
very long delays quite quickly. Instead, this backoff retries in 8ms
five times, then goes exponentially from there: 16ms, 32ms, ... up to a
max of 8152ms.
2022-07-29 13:04:32 -07:00
Sarah Pratt 10a4999a87 Separate port and socket path requirement in case of local agent assignment 2022-07-29 13:28:21 -05:00
alex 92c615c35f
Merge pull request #13952 from hashicorp/sync-more-acl
sync more acl enforcement
2022-07-28 12:31:02 -07:00
Dhia Ayachi 256694b603
inject gateway addons to destination clusters (#13951) 2022-07-28 15:17:35 -04:00
acpana eae4e71492
sync more acl enforcement
sync w ent at 32756f7

Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-28 12:01:52 -07:00
alex 41f3343eac
Merge pull request #13929 from hashicorp/fix-validation
[sync] fix empty partitions matching
2022-07-28 10:14:49 -07:00
Sarah Pratt a3ef6f016e refactor sidecare_service method into parts 2022-07-28 09:07:13 -05:00
Ashwin Venkatesh eef9edaed9
Add peer counts to emitted metrics. (#13930) 2022-07-27 18:34:04 -04:00
Luke Kysow 465a9801e1
Merge pull request #13924 from hashicorp/lkysow/util-metric-peering
peering: don't track imported services/nodes in usage
2022-07-27 14:49:55 -07:00
acpana 6033584349
use EqualPartitions
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-27 14:48:30 -07:00
acpana 0351ca5136
better fix
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-27 14:28:08 -07:00
acpana 8b2ef80336
sync w ent
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-27 11:41:39 -07:00
Chris S. Kim 0999e05a7d Reduce arm64 flakes for TestConnectCA_ConfigurationSet_ChangeKeyConfig_Primary
There were 16 combinations of tests but 4 of them were duplicates since the default key type and bits were "ec" and 256. That entry was commented out to reduce the subtest count to 12.

testrpc.WaitForLeader was failing on arm64 environments; the cause is unknown but it might be due to the environment being flooded with parallel tests making RPC calls. The RPC polling+retry was replaced with a simpler check for leadership based on raft.
2022-07-27 13:54:34 -04:00
Chris S. Kim 8ead1caf53 Retry checks for virtual IP metadata 2022-07-27 13:54:34 -04:00
Chris S. Kim 62ed0250c3 Sort slice of ServiceNames deterministically 2022-07-27 13:54:34 -04:00
Sarah Pratt f520f6dd0f Separate port and socket path requirement in case of local agent assignment 2022-07-27 12:30:52 -05:00
Luke Kysow 740d54e730 peering: don't track imported services/nodes in usage
Services/nodes that are imported from other peers are stored in
state. We don't want to count those as part of our own cluster's usage.
2022-07-27 09:08:51 -07:00
cskh 4e292b7b72
chore: clarify the error message: service.service must not be empty (#13907)
- when register service using catalog endpoint, the key of service
  name actually should be "service". Add this information to the
  error message will help user to quickly fix in the request.
2022-07-27 10:16:46 -04:00
cskh 59e81a728e
chore: removed unused method AddService (#13905)
- This AddService is not used anywhere.
  AddServiceWithChecks is place of AddService
- Test code is updated
2022-07-26 16:54:53 -04:00
Luke Kysow 021b00e321 Remove duplicate comment 2022-07-26 10:19:49 -07:00
alex 437a28d18a
peering: prevent peering in same partition (#13851)
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-07-25 18:00:48 -07:00
Nitya Dhanushkodi 27bd895ac8
peering: remove validation that forces peering token server addresses to be an IP, allow hostname based addresses (#13874) 2022-07-25 16:33:47 -07:00
Luke Kysow 8c5b70d227
Rename receive to recv in tracker (#13896)
Because it's shorter
2022-07-25 16:08:03 -07:00
Luke Kysow 3530d3782d
peering: read endpoints can now return failing status (#13849)
Track streams that have been disconnected due to an error and
set their statuses to failing.
2022-07-25 14:27:53 -07:00
Kyle Havlovitz 93de25f87c
Merge pull request #13872 from hashicorp/remove-upstream-log
Remove extra logging from ingress upstream watch shutdown
2022-07-25 12:55:30 -07:00
Chris S. Kim 73a84f256f
Preserve PeeringState on upsert (#13666)
Fixes a bug where if the generate token is called twice, the second call upserts the zero-value (undefined) of PeeringState.
2022-07-25 14:37:56 -04:00
Chris S. Kim 8ed49ea4d0
Update envoy metrics label extraction for peered clusters and listeners (#13818)
Now that peered upstreams can generate envoy resources (#13758), we need a way to disambiguate local from peered resources in our metrics. The key difference is that datacenter and partition will be replaced with peer, since in the context of peered resources partition is ambiguous (could refer to the partition in a remote cluster or one that exists locally). The partition and datacenter of the proxy will always be that of the source service.

Regexes were updated to make emitting datacenter and partition labels mutually exclusive with peer labels.

Listener filter names were updated to better match the existing regex.

Cluster names assigned to peered upstreams were updated to be synthesized from local peer name (it previously used the externally provided primary SNI, which contained the peer name from the other side of the peering). Integration tests were updated to assert for the new peer labels.
2022-07-25 13:49:00 -04:00
DanStough 2da8949d78 feat: convert destination address to slice 2022-07-25 12:31:58 -04:00
Freddy f03cca7576
[OSS] Add ACL enforcement to peering endpoints (#13878) 2022-07-25 10:04:10 -06:00
Matt Keeler 58e4d8235b
Enable/Disable Peering Support in the UI (#13816)
We enabled/disable based on the config flag.
2022-07-25 11:50:11 -04:00
freddygv b544ce6485 Add ACL enforcement to peering endpoints 2022-07-25 09:34:29 -06:00
Kyle Havlovitz 016f963e7e Remove excess debug log from ingress upstream shutdown 2022-07-22 17:29:38 -07:00
alex 279d458e6e
peering: use ShouldDial to validate peer role (#13823)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-22 15:56:25 -07:00
Luke Kysow a1e6d69454
peering: add config to enable/disable peering (#13867)
* peering: add config to enable/disable peering

Add config:

```
peering {
  enabled = true
}
```

Defaults to true. When disabled:
1. All peering RPC endpoints will return an error
2. Leader won't start its peering establishment goroutines
3. Leader won't start its peering deletion goroutines
2022-07-22 15:20:21 -07:00
Kyle Havlovitz 0786517b56
Merge pull request #13847 from hashicorp/gateway-goroutine-leak
Fix goroutine leaks in proxycfg when using ingress gateway
2022-07-22 14:43:22 -07:00
Freddy f99df57840
[OSS] Add new peering ACL rule (#13848)
This commit adds a new ACL rule named "peering" to authorize
actions taken against peering-related endpoints.

The "peering" rule has several key properties:
- It is scoped to a partition, and MUST be defined in the default
  namespace.

- Its access level must be "read', "write", or "deny".

- Granting an access level will apply to all peerings. This ACL rule
  cannot be used to selective grant access to some peerings but not
  others.

- If the peering rule is not specified, we fall back to the "operator"
  rule and then the default ACL rule.
2022-07-22 14:42:23 -06:00
alex 927cee692b
peering: emit exported services count metric (#13811)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-22 12:05:08 -07:00
Daniel Upton a8df87f574 proxycfg-glue: server-local implementation of `ExportedPeeredServices`
This is the OSS portion of enterprise PR 2377.

Adds a server-local implementation of the proxycfg.ExportedPeeredServices
interface that sources data from a blocking query against the server's
state store.
2022-07-22 15:23:23 +01:00
Eric Haberkorn 501089292e
Add Cluster Peering Failover Support to Prepared Queries (#13835)
Add peering failover support to prepared queries
2022-07-22 09:14:43 -04:00
Nitya Dhanushkodi f47319b7c6
update generate token endpoint to take external addresses (#13844)
Update generate token endpoint (rpc, http, and api module)

If ServerExternalAddresses are set, it will override any addresses gotten from the "consul" service, and be used in the token instead, and dialed by the dialer. This allows for setting up a load balancer for example, in front of the consul servers.
2022-07-21 14:56:11 -07:00
acpana 12b773ab02
Rename peering internal to ~
sync ENT to 5679392c81

Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-21 10:51:05 -07:00
Luke Kysow 0c87be0845
peering: Add heartbeating to peering streams (#13806)
* Add heartbeating to peering streams
2022-07-21 10:03:27 -07:00
Daniel Upton 3655802fdc proxycfg-glue: server-local implementation of `PeeredUpstreams`
This is the OSS portion of enterprise PR 2352.

It adds a server-local implementation of the proxycfg.PeeredUpstreams interface
based on a blocking query against the server's state store.

It also fixes an omission in the Virtual IP freeing logic where we were never
updating the max index (and therefore blocking queries against
VirtualIPsForAllImportedServices would not return on service deletion).
2022-07-21 13:51:59 +01:00
Luke Kysow c411e6b326
Add send mutex to protect against concurrent sends (#13805) 2022-07-20 15:48:18 -07:00
Kyle Havlovitz 0be7d923dc Cancel upstream watches when the discovery chain has been removed 2022-07-20 14:26:52 -07:00
Kyle Havlovitz 31318d7049 Fix duplicate Notify calls for discovery chains in ingress gateways 2022-07-20 14:25:20 -07:00