Commit Graph

4775 Commits

Author SHA1 Message Date
freddygv d818d7b096 Manage local server watches depending on mesh cfg
Routing peering control plane traffic through mesh gateways can be
enabled or disabled at runtime with the mesh config entry.

This commit updates proxycfg to add or cancel watches for local servers
depending on this central config.

Note that WAN federation over mesh gateways is determined by a service
metadata flag, and any updates to the gateway service registration will
force the creation of a new snapshot. If enabled, WAN-fed over mesh
gateways will trigger a local server watch on initialize().

Because of this we will only add/remove server watches if WAN federation
over mesh gateways is disabled.
2022-09-22 19:32:10 -06:00
Alessandro De Blasis 461b42ed48 fix(check): added missing OSService props 2022-09-21 13:10:21 +01:00
Alessandro De Blasis 5719fd6560 fix(checks): os_service OK message in output 2022-09-21 09:27:33 +01:00
Alessandro De Blasis f440966a38 fix(checks): os_service lifecycle bugfix 2022-09-21 09:26:47 +01:00
Alessandro De Blasis fc0dd92dcf fix(agent): uninitialized map panic error 2022-09-21 09:25:54 +01:00
malizz 1a0aa38a82
increase the size of txn to support vault (#14599)
* increase the size of txn to support vault

* add test, revert change to acl endpoint

* add changelog

* update test, add passing test case

* Update .changelog/14599.txt

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2022-09-19 09:07:19 -07:00
freddygv 5fbb26525b Add awareness of server mode to TLS configurator
Preivously the TLS configurator would default to presenting auto TLS
certificates as client certificates.

Server agents should not have this behavior and should instead present
the manually configured certs. The autoTLS certs for servers are
exclusively used for peering and should not be used as the default for
outbound communication.
2022-09-16 17:57:10 -06:00
freddygv f30bc96239 Test fixes
- Pulls in CLI test fix from main
- Updates psutils to fix TestAgent_Host on M1 Mac
2022-09-16 17:57:10 -06:00
freddygv 02d3ce1039 Add server certificate manager
This certificate manager will request a leaf certificate for server
agents and then keep them up to date.
2022-09-16 17:57:10 -06:00
freddygv 0e5131bd33 Generate ACL token for server management
This commit introduces a new ACL token used for internal server
management purposes.

It has a few key properties:
- It has unlimited permissions.
- It is persisted through Raft as System Metadata rather than in the
ACL tokens table. This is to avoid users seeing or modifying it.
- It is re-generated on leadership establishment.
2022-09-16 17:54:34 -06:00
freddygv 0ea3353537 Add handling in agent cache for server leaf certs 2022-09-16 17:54:34 -06:00
Kyle Havlovitz 0d9ae52643
Merge pull request #14598 from hashicorp/root-removal-fix
connect/ca: Don't discard old roots on primaryInitialize
2022-09-15 14:36:01 -07:00
Kyle Havlovitz 6105a7fd9f connect/ca: don't discard old roots on primaryInitialize 2022-09-15 12:59:09 -07:00
Gabriel Santos e53af28bd7
Middleware: `RequestRecorder` reports calls below 1ms as decimal value (#12905)
* Typos

* Test failing

* Convert values <1ms to decimal

* Fix test

* Update docs and test error msg

* Applied suggested changes to test case

* Changelog file and suggested changes

* Update .changelog/12905.txt

Co-authored-by: Chris S. Kim <kisunji92@gmail.com>

* suggested change - start duration with microseconds instead of nanoseconds

* fix error

* suggested change - floats

Co-authored-by: alex <8968914+acpana@users.noreply.github.com>
Co-authored-by: Chris S. Kim <kisunji92@gmail.com>
2022-09-15 13:04:37 -04:00
Daniel Graña 8c98172f53
[BUGFIX] Do not use interval as timeout (#14619)
Do not use interval as timeout
2022-09-15 12:39:48 -04:00
Evan Culver d0416f593c
connect: Bump latest Envoy to 1.23.1 in test matrix (#14573) 2022-09-14 13:20:16 -07:00
DanStough 485e1b5d4e fix(peering): generate token metrics only for leader 2022-09-14 11:37:30 -04:00
DanStough 2a2debee64 feat(peering): validate server name conflicts on establish 2022-09-14 11:37:30 -04:00
Kyle Havlovitz 60cee76746
Merge pull request #14516 from hashicorp/ca-ttl-fixes
Fix inconsistent TTL behavior in CA providers
2022-09-13 16:07:36 -07:00
Kyle Havlovitz d67bccd210 Update intermediate pki mount/role when reconfiguring Vault provider 2022-09-13 15:42:26 -07:00
Kyle Havlovitz f46955101a connect/ca: Clarify behavior around IntermediateCertTTL in CA config 2022-09-13 15:42:26 -07:00
DanStough 0150e88200 feat: add PeerThroughMeshGateways to mesh config 2022-09-13 17:19:54 -04:00
Derek Menteer 0aa13733a0
Add CSR check for number of URIs. (#14579)
Add CSR check for number of URIs.
2022-09-13 14:21:47 -05:00
Derek Menteer db83ff4fa6 Add input validation for auto-config JWT authorization checks. 2022-09-13 11:16:36 -05:00
cskh f22685b969
Config-entry: Support proxy config in service-defaults (#14395)
* Config-entry: Support proxy config in service-defaults

* Update website/content/docs/connect/config-entries/service-defaults.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2022-09-12 10:41:58 -04:00
Eric Haberkorn aa8268e50c
Implement Cluster Peering Redirects (#14445)
implement cluster peering redirects
2022-09-09 13:58:28 -04:00
skpratt b761589340
add non-double-prefixed metrics (#14193) 2022-09-09 12:13:43 -05:00
skpratt 19f79aa9a6
PR #14057 follow up fix: service id parsing from sidecar id (#14541)
* fix service id parsing from sidecar id

* simplify suffix trimming
2022-09-09 09:47:10 -05:00
Dan Upton 1c2c975b0b
xDS Load Balancing (#14397)
Prior to #13244, connect proxies and gateways could only be configured by an
xDS session served by the local client agent.

In an upcoming release, it will be possible to deploy a Consul service mesh
without client agents. In this model, xDS sessions will be handled by the
servers themselves, which necessitates load-balancing to prevent a single
server from receiving a disproportionate amount of load and becoming
overwhelmed.

This introduces a simple form of load-balancing where Consul will attempt to
achieve an even spread of load (xDS sessions) between all healthy servers.
It does so by implementing a concurrent session limiter (limiter.SessionLimiter)
and adjusting the limit according to autopilot state and proxy service
registrations in the catalog.

If a server is already over capacity (i.e. the session limit is lowered),
Consul will begin draining sessions to rebalance the load. This will result
in the client receiving a `RESOURCE_EXHAUSTED` status code. It is the client's
responsibility to observe this response and reconnect to a different server.

Users of the gRPC client connection brokered by the
consul-server-connection-manager library will get this for free.

The rate at which Consul will drain sessions to rebalance load is scaled
dynamically based on the number of proxies in the catalog.
2022-09-09 15:02:01 +01:00
Derek Menteer f7c884f0af Merge branch 'main' of github.com:hashicorp/consul into derekm/split-grpc-ports 2022-09-08 14:53:08 -05:00
Derek Menteer bfe7c5e8af Remove rebuilding grpc server. 2022-09-08 13:45:44 -05:00
Derek Menteer 80d31458e5 Various cleanups. 2022-09-08 10:51:50 -05:00
Chris S. Kim 03df6c3ac6
Reuse http.DefaultTransport in UIMetricsProxy (#14521)
http.Transport keeps a pool of connections and should be reused when possible. We instantiate a new http.DefaultTransport for every metrics request, making large numbers of concurrent requests inefficiently spin up new connections instead of reusing open ones.
2022-09-08 11:02:05 -04:00
Chris S. Kim 1c4a6eef4f
Merge pull request #14285 from hashicorp/NET-638-push-server-address-updates-to-the-peer
peering: Subscribe to server address changes and push updates to peers
2022-09-07 09:30:45 -04:00
skpratt 3bf1edfb3f
move port and default check logic to locked step (#14057) 2022-09-06 19:35:31 -05:00
Freddy f4dfd42e0a
Add SpiffeID for Consul server agents (#14485)
Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>

By adding a SpiffeID for server agents, servers can now request a leaf
certificate from the Connect CA.

This new Spiffe ID has a key property: servers are identified by their
datacenter name and trust domain. All servers that share these
attributes will share a ServerURI.

The aim is to use these certificates to verify the server name of ANY
server in a Consul datacenter.
2022-09-06 17:58:13 -06:00
Daniel Upton 8c46e48e0d proxycfg-glue: server-local implementation of IntentionUpstreamsDestination
This is the OSS portion of enterprise PR 2463.

Generalises the serverIntentionUpstreams type to support matching on a
service or destination.
2022-09-06 23:27:25 +01:00
Daniel Upton f8dba7e9ac proxycfg-glue: server-local implementation of InternalServiceDump
This is the OSS portion of enterprise PR 2489.

This PR introduces a server-local implementation of the
proxycfg.InternalServiceDump interface that sources data from a blocking query
against the server's state store.

For simplicity, it only implements the subset of the Internal.ServiceDump RPC
handler actually used by proxycfg - as such the result type has been changed
to IndexedCheckServiceNodes to avoid confusion.
2022-09-06 23:27:25 +01:00
Daniel Upton a31738f76f proxycfg-glue: server-local implementation of ResolvedServiceConfig
This is the OSS portion of enterprise PR 2460.

Introduces a server-local implementation of the proxycfg.ResolvedServiceConfig
interface that sources data from a blocking query against the server's state
store.

It moves the service config resolution logic into the agent/configentry package
so that it can be used in both the RPC handler and data source.

I've also done a little re-arranging and adding comments to call out data
sources for which there is to be no server-local equivalent.
2022-09-06 23:27:25 +01:00
Derek Menteer bf769daae4 Merge branch 'main' of github.com:hashicorp/consul into derekm/split-grpc-ports 2022-09-06 10:51:04 -05:00
Derek Menteer 02ae66bda8 Add kv txn get-not-exists operation. 2022-09-06 10:28:59 -05:00
Chris S. Kim 953808e899 PR feedback on terminated state checking 2022-09-06 10:28:20 -04:00
Chris S. Kim ddb9375cb6 Add testcase for parsing grpc_port 2022-09-06 10:17:44 -04:00
Kyle Havlovitz d97ccccdd5
Merge pull request #14429 from hashicorp/ca-prune-intermediates
Prune old expired intermediate certs when appending a new one
2022-09-02 15:34:33 -07:00
cskh 0f7d4efac3
fix(txn api): missing proxy config in registering proxy service (#14471)
* fix(txn api): missing proxy config in registering proxy service
2022-09-02 14:28:05 -04:00
Chris S. Kim ec36755cc0 Properly assert for ServerAddresses replication request 2022-09-02 11:44:54 -04:00
Chris S. Kim d1d9dbff8e Fix terminate not returning early 2022-09-02 11:44:38 -04:00
Derek Menteer f64771c707 Address PR comments. 2022-09-01 16:54:24 -05:00
Kyle Havlovitz 0c2fb7252d Prune intermediates before appending new one 2022-09-01 14:24:30 -07:00
Luke Kysow 81d7cc41dc
Use proxy address for default check (#14433)
When a sidecar proxy is registered, a check is automatically added.
Previously, the address this check used was the underlying service's
address instead of the proxy's address, even though the check is testing
if the proxy is up.

This worked in most cases because the proxy ran on the same IP as the
underlying service but it's not guaranteed and so the proper default
address should be the proxy's address.
2022-09-01 14:03:35 -07:00
malizz f1054dada9
fix TestProxyConfigEntry (#14435) 2022-09-01 11:37:47 -07:00
malizz b3ac8f48ca
Add additional parameters to envoy passive health check config (#14238)
* draft commit

* add changelog, update test

* remove extra param

* fix test

* update type to account for nil value

* add test for custom passive health check

* update comments and tests

* update description in docs

* fix missing commas
2022-09-01 09:59:11 -07:00
Chris S. Kim f2b147e575 Add Internal.ServiceDump support for querying by PeerName 2022-09-01 10:32:59 -04:00
Chris S. Kim e62f830fa8
Merge pull request #13998 from jorgemarey/f-new-tracing-envoy
Add new envoy tracing configuration
2022-09-01 08:57:23 -04:00
Derek Menteer cf7f24a6ec Change serf-tag references to field references. 2022-08-31 16:38:42 -05:00
malizz a80e0bcd00
validate args before deleting proxy defaults (#14290)
* validate args before deleting proxy defaults

* add changelog

* validate name when normalizing proxy defaults

* add test for proxyConfigEntry

* add comments
2022-08-31 13:03:38 -07:00
Kyle Havlovitz 113454645d Prune old expired intermediate certs when appending a new one 2022-08-31 11:41:58 -07:00
Alessandro De Blasis 60c7c831c6 Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service 2022-08-30 18:49:20 +01:00
Eric Haberkorn 3726a0ab7a
Finish up cluster peering failover (#14396) 2022-08-30 11:46:34 -04:00
Chris S. Kim 560d410c6d Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer
# Conflicts:
#	agent/grpc-external/services/peerstream/stream_test.go
2022-08-30 11:09:25 -04:00
Jorge Marey 3f3bb8831e Fix typos. Add test. Add documentation 2022-08-30 16:59:02 +02:00
Jorge Marey ed7b34128f Add new tracing configuration 2022-08-30 16:59:02 +02:00
Freddy 97d1db759f
Merge pull request #13496 from maxb/fix-kv_entries-metric 2022-08-29 15:35:11 -06:00
Freddy 829a2a8722
Merge pull request #14364 from hashicorp/peering/term-delete 2022-08-29 15:33:18 -06:00
Max Bowsher decc9231ee Merge branch 'main' into fix-kv_entries-metric 2022-08-29 22:22:10 +01:00
Chris S. Kim 5010fa5c03
Merge pull request #14371 from hashicorp/kisunji/peering-metrics-update
Adjust metrics reporting for peering tracker
2022-08-29 17:16:19 -04:00
Chris S. Kim 74ddf040dd Add heartbeat timeout grace period when accounting for peering health 2022-08-29 16:32:26 -04:00
Derek Menteer 0ceec9017b Expose `grpc_tls` via serf for cluster peering. 2022-08-29 13:43:49 -05:00
Derek Menteer 1255a8a20d Add separate grpc_tls port.
To ease the transition for users, the original gRPC
port can still operate in a deprecated mode as either
plain-text or TLS mode. This behavior should be removed
in a future release whenever we no longer support this.

The resulting behavior from this commit is:
  `ports.grpc > 0 && ports.grpc_tls > 0` spawns both plain-text and tls ports.
  `ports.grpc > 0 && grpc.tls == undefined` spawns a single plain-text port.
  `ports.grpc > 0 && grpc.tls != undefined` spawns a single tls port (backwards compat mode).
2022-08-29 13:43:43 -05:00
freddygv 310608fb19 Add validation to prevent switching dialing mode
This prevents unexpected changes to the output of ShouldDial, which
should never change unless a peering is deleted and recreated.
2022-08-29 12:31:13 -06:00
Eric Haberkorn 72f90754ae
Update max_ejection_percent on outlier detection for peered clusters to 100% (#14373)
We can't trust health checks on peered services when service resolvers,
splitters and routers are used.
2022-08-29 13:46:41 -04:00
Alessandro De Blasis 26cc56bc68 fix(agent): removed redundant code in docker check as well 2022-08-29 18:15:59 +01:00
Alessandro De Blasis c0d647d11e fix(agent): removed redundant check on prev. running check 2022-08-29 17:53:39 +01:00
Chris S. Kim def529edd3 Rename test 2022-08-29 10:34:50 -04:00
Chris S. Kim 93271f649c Fix test 2022-08-29 10:20:30 -04:00
Eric Haberkorn 1099665473
Update the structs and discovery chain for service resolver redirects to cluster peers. (#14366) 2022-08-29 09:51:32 -04:00
Alessandro De Blasis f3437eaf05 Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
2022-08-28 18:09:31 +01:00
Alessandro De Blasis f634e36811 fix(OSServiceCheck): fixes following code-review 2022-08-28 17:56:30 +01:00
Chris S. Kim 4d97e2f936 Adjust metrics reporting for peering tracker 2022-08-26 17:34:17 -04:00
freddygv 650e48624d Allow terminated peerings to be deleted
Peerings are terminated when a peer decides to delete the peering from
their end. Deleting a peering sends a termination message to the peer
and triggers them to mark the peering as terminated but does NOT delete
the peering itself. This is to prevent peerings from disappearing from
both sides just because one side deleted them.

Previously the Delete endpoint was skipping the deletion if the peering
was not marked as active. However, terminated peerings are also
inactive.

This PR makes some updates so that peerings marked as terminated can be
deleted by users.
2022-08-26 10:52:47 -06:00
Chris S. Kim 937a8ec742 Fix casing 2022-08-26 11:56:26 -04:00
Chris S. Kim 87962b9713 Merge branch 'main' into catalog-service-list-filter 2022-08-26 11:16:06 -04:00
Chris S. Kim e2fe8b8d65 Fix tests for enterprise 2022-08-26 11:14:02 -04:00
Chris S. Kim 1c43a1a7b4 Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer
# Conflicts:
#	agent/grpc-external/services/peerstream/stream_test.go
2022-08-26 10:43:56 -04:00
Chris S. Kim 6ddcc04613
Replace ring buffer with async version (#14314)
We need to watch for changes to peerings and update the server addresses which get served by the ring buffer.

Also, if there is an active connection for a peer, we are getting up-to-date server addresses from the replication stream and can safely ignore the token's addresses which may be stale.
2022-08-26 10:27:13 -04:00
alex 30ff2e9a35
peering: add peer health metric (#14004)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-08-25 16:32:59 -07:00
Chris S. Kim 181063cd23 Exit loop when context is cancelled 2022-08-25 11:48:25 -04:00
cskh 41aea65214
Fix: the inboundconnection limit filter should be placed in front of http co… (#14325)
* fix: the inboundconnection limit should be placed in front of http connection manager

Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2022-08-24 14:13:10 -04:00
Chris S. Kim 8c94d1a80c Update test comment 2022-08-24 13:50:24 -04:00
Chris S. Kim 5f2959329f Add check for zero-length server addresses 2022-08-24 13:30:52 -04:00
skpratt 919da33331
no-op: refactor usagemetrics tests for clarity and DRY cases (#14313) 2022-08-24 12:00:09 -05:00
Pablo Ruiz García 1f293e5244
Added new auto_encrypt.grpc_server_tls config option to control AutoTLS enabling of GRPC Server's TLS usage
Fix for #14253

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
2022-08-24 12:31:38 -04:00
Dan Upton 3b993f2da7
dataplane: update envoy bootstrap params for consul-dataplane (#14017)
Contains 2 changes to the GetEnvoyBootstrapParams response to support
consul-dataplane.

Exposing node_name and node_id:

consul-dataplane will support providing either the node_id or node_name in its
configuration. Unfortunately, supporting both in the xDS meta adds a fair amount
of complexity (partly because most tables are currently indexed on node_name)
so for now we're going to return them both from the bootstrap params endpoint,
allowing consul-dataplane to exchange a node_id for a node_name (which it will
supply in the xDS meta).

Properly setting service for gateways:

To avoid the need to special case gateways in consul-dataplane, service will now
either be the destination service name for connect proxies, or the gateway
service name. This means it can be used as-is in Envoy configuration (i.e. as a
cluster name or in metric tags).
2022-08-24 12:03:15 +01:00
Daniel Upton 13c04a13af proxycfg: terminate stream on irrecoverable errors
This is the OSS portion of enterprise PR 2339.

It improves our handling of "irrecoverable" errors in proxycfg data sources.

The canonical example of this is what happens when the ACL token presented by
Envoy is deleted/revoked. Previously, the stream would get "stuck" until the
xDS server re-checked the token (after 5 minutes) and terminated the stream.

Materializers would also sit burning resources retrying something that could
never succeed.

Now, it is possible for data sources to mark errors as "terminal" which causes
the xDS stream to be closed immediately. Similarly, the submatview.Store will
evict materializers when it observes they have encountered such an error.
2022-08-23 20:17:49 +01:00
Chris S. Kim 81e965479b PR feedback to specify Node name in test mock 2022-08-23 11:51:04 -04:00
Eric Haberkorn 58901ad7df
Cluster peering failover disco chain changes (#14296) 2022-08-23 09:13:43 -04:00
Chris S. Kim cdc8b0634d Fix flakes 2022-08-22 14:45:31 -04:00
Chris S. Kim 03e92826aa Increase heartbeat rate to reduce test flakes 2022-08-22 14:24:05 -04:00
Chris S. Kim 06ba9775ee Remove check for ResponseNonce 2022-08-22 13:55:01 -04:00
Chris S. Kim 547fb9570e Add missing mock assertions 2022-08-22 13:55:01 -04:00
Chris S. Kim adff2eef16 Fix data race
newMockSnapshotHandler has an assertion on t.Cleanup which gets called before the event publisher is cancelled. This commit reorders the context.WithCancel so it properly gets cancelled before the assertion is made.
2022-08-22 13:55:01 -04:00
cskh 060531a29a
Fix: add missing ent meta for test (#14289) 2022-08-22 13:51:04 -04:00
Chris S. Kim 4e40e1d222 Handle server addresses update as client 2022-08-22 13:42:12 -04:00
Chris S. Kim 584d3409c4 Send server addresses on update from server 2022-08-22 13:41:44 -04:00
Chris S. Kim c9d8ad3939 Add new subscription for server addresses 2022-08-22 13:40:25 -04:00
Chris S. Kim 028b87d51f Cleanup unused logger 2022-08-22 13:40:23 -04:00
Chris S. Kim df951bd601 Expose external gRPC port in autopilot
The grpc_port was added to a NodeService's meta in ea58f235f5
2022-08-22 10:07:00 -04:00
cskh 527ebd068a
fix: missing MaxInboundConnections field in service-defaults config entry (#14072)
* fix:  missing max_inbound_connections field in merge config
2022-08-19 14:11:21 -04:00
cskh e84e4b8868
Fix: upgrade pkg imdario/merg to prevent merge config panic (#14237)
* upgrade imdario/merg to prevent merge config panic

* test: service definition takes precedence over service-defaults in merged results
2022-08-17 21:14:04 -04:00
James Hartig f92883bbce Use the maximum jitter when calculating the timeout
The timeout should include the maximum possible
jitter since the server will randomly add to it's
timeout a jitter. If the server's timeout is less
than the client's timeout then the client will
return an i/o deadline reached error.

Before:
```
time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644'
rpc error making call: i/o deadline reached
real    10m11.469s
user    0m0.018s
sys     0m0.023s
```

After:
```
time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644'
[...]
real    10m35.835s
user    0m0.021s
sys     0m0.021s
```
2022-08-17 10:24:09 -04:00
Eric Haberkorn 1a73b0ca20
Add `Targets` field to service resolver failovers. (#14162)
This field will be used for cluster peering failover.
2022-08-15 09:20:25 -04:00
Alessandro De Blasis 5dee555888 Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
2022-08-15 08:26:55 +01:00
Alessandro De Blasis ab611eabc3 Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
2022-08-15 08:09:56 +01:00
cskh d46b515b64
fix: missing segment and partition (#14194) 2022-08-12 15:21:39 -04:00
Eric Haberkorn ebd5513d4b
Refactor failover code to use Envoy's aggregate clusters (#14178) 2022-08-12 14:30:46 -04:00
cskh 81931e52c3
feat(telemetry): add labels to serf and memberlist metrics (#14161)
* feat(telemetry): add labels to serf and memberlist metrics
* changelog
* doc update

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-08-11 22:09:56 -04:00
Chris S. Kim 4c928cb2f7
Handle breaking change for ServiceVirtualIP restore (#14149)
Consul 1.13.0 changed ServiceVirtualIP to use PeeredServiceName instead of ServiceName which was a breaking change for those using service mesh and wanted to restore their snapshot after upgrading to 1.13.0.

This commit handles existing data with older ServiceName and converts it during restore so that there are no issues when restoring from older snapshots.
2022-08-11 14:47:10 -04:00
Chris S. Kim 3926009405 Add test to verify forwarding 2022-08-11 11:16:02 -04:00
Chris S. Kim 1ef22360c3 Register peerStreamServer internally to enable RPC forwarding 2022-08-11 11:16:02 -04:00
Chris S. Kim de73171202 Handle wrapped errors in isFailedPreconditionErr 2022-08-11 11:16:02 -04:00
Daniel Kimsey 3c4fa9b468 Add support for filtering the 'List Services' API
1. Create a bexpr filter for performing the filtering
2. Change the state store functions to return the raw (not aggregated)
   list of ServiceNodes.
3. Move the aggregate service tags by name logic out of the state store
   functions into a new function called from the RPC endpoint
4. Perform the filtering in the endpoint before aggregation.
2022-08-10 16:52:32 -05:00
cskh 11e7a0d547
fix: shadowed err in retryJoin() (#14112)
- err value will be used later to surface the error message
  if r.join() returns any err.
2022-08-10 10:53:57 -04:00
skpratt 79c23a7cd2
Merge pull request #14056 from hashicorp/proxy-register-port-race
Refactor sidecar_service method to separate port assignment
2022-08-10 09:46:29 -05:00
skpratt aa77559819 Merge branch 'main' into proxy-register-port-race 2022-08-10 08:40:45 -05:00
Chris S. Kim e3046120b3 Close active listeners on error
If startListeners successfully created listeners for some of its input addresses but eventually failed, the function would return an error and existing listeners would not be cleaned up.
2022-08-09 12:22:39 -04:00
Chris S. Kim 6311c651de Add retry in TestAgentConnectCALeafCert_good 2022-08-09 11:20:37 -04:00
Kyle Havlovitz 6938b8c755
Merge pull request #13958 from hashicorp/gateway-wildcard-fix
Fix wildcard picking up services it shouldn't for ingress/terminating gateways
2022-08-08 12:54:40 -07:00
Kyle Havlovitz fe1fcea34f Add some extra handling for destination deletes 2022-08-08 11:38:13 -07:00
freddygv d421e18172 Update snapshot test 2022-08-08 09:17:15 -06:00
freddygv 1031ffc3c7 Re-validate existing secrets at state store
Previously establishment and pending secrets were only checked at the
RPC layer. However, given that these are Check-and-Set transactions we
should ensure that the given secrets are still valid when persisting a
secret exchange or promotion.

Otherwise it would be possible for concurrent requests to overwrite each
other.
2022-08-08 09:06:07 -06:00
freddygv 0ea4bfae94 Test fixes 2022-08-08 08:31:47 -06:00
freddygv c04515a844 Use proto message for each secrets write op
Previously there was a field indicating the operation that triggered a
secrets write. Now there is a message for each operation and it contains
the secret ID being persisted.
2022-08-08 01:41:00 -06:00
Kyle Havlovitz 6580566c3b Update ingress/terminating wildcard logic and handle destinations 2022-08-05 07:56:10 -07:00
freddygv 8067890787 Inherit active secret when exchanging 2022-08-03 17:32:53 -05:00
freddygv 60d6e28c97 Pass explicit signal with op for secrets write
Previously the updates to the peering secrets UUID table relied on
inferring what action triggered the update based on a reconciliation
against the existing secrets.

Instead we now explicitly require the operation to be given so that the
inference isn't necessary. This makes the UUID table logic easier to
reason about and fixes some related bugs.

There is also an update so that the peering secrets get handled on
snapshots/restores.
2022-08-03 17:25:12 -05:00
freddygv 9ca687bc7c Avoid deleting peering secret UUIDs at dialers
Dialers do not keep track of peering secret UUIDs, so they should not
attempt to clean up data from that table when their peering is deleted.

We also now keep peer server addresses when marking peerings for
deletion. Peer server addresses are used by the ShouldDial() helper
when determining whether the peering is for a dialer or an acceptor.
We need to keep this data so that peering secrets can be cleaned up
accordingly.
2022-08-03 16:34:57 -05:00
skpratt 58eed6b049
Merge pull request #13906 from skpratt/validate-port-agent-split
Separate port and socket path validation for local agent
2022-08-02 16:58:41 -05:00
Dhia Ayachi 7154367892
add token to the request when creating a cacheIntentions query (#14005) 2022-08-02 14:27:34 -04:00
Kyle Havlovitz 499211f907 Fix wildcard picking up services it shouldn't for ingress/terminating gateways 2022-08-02 09:41:31 -07:00
Daniel Upton 6452118c15 proxycfg-sources: fix hot loop when service not found in catalog
Fixes a bug where a service getting deleted from the catalog would cause
the ConfigSource to spin in a hot loop attempting to look up the service.

This is because we were returning a nil WatchSet which would always
unblock the select.

Kudos to @freddygv for discovering this!
2022-08-02 15:42:29 +01:00
Freddy 42996411cc
Various peering fixes (#13979)
* Avoid logging StreamSecretID
* Wrap additional errors in stream handler
* Fix flakiness in leader test and rename servers for clarity. There was
  a race condition where the peering was being deleted in the test
  before the stream was active. Now the test waits for the stream to be
  connected on both sides before deleting the associated peering.
* Run flaky test serially
2022-08-01 15:06:18 -06:00
DanStough 169ff71132 fix: ipv4 destination dns resolution 2022-08-01 16:45:57 -04:00
Luke Kysow 988e1fd35d
peering: default to false (#13963)
* defaulting to false because peering will be released as beta
* Ignore peering disabled error in bundles cachetype

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: freddygv <freddy@hashicorp.com>
Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
2022-08-01 15:22:36 -04:00
Freddy dacf703d20
Merge branch 'main' into fix-kv_entries-metric 2022-08-01 13:19:27 -06:00
Freddy 72b6d69652
Merge pull request #13499 from maxb/delete-unused-metric
Delete definition of metric `consul.acl.blocked.node.deregistration`
2022-08-01 12:31:05 -06:00
Dhia Ayachi 6fd65a4a45
Tgtwy egress HTTP support (#13953)
* add golden files

* add support to http in tgateway egress destination

* fix slice sorting to include both address and port when using server_names

* fix listener loop for http destination

* fix routes to generate a route per port and a virtualhost per port-address combination

* sort virtual hosts list to have a stable order

* extract redundant serviceNode
2022-08-01 14:12:43 -04:00
Matt Keeler f74d0cef7a
Implement/Utilize secrets for Peering Replication Stream (#13977) 2022-08-01 10:33:18 -04:00
alex a45bb1f06b
block PeerName register requests (#13887)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-29 14:36:22 -07:00
Luke Kysow 95096e2c03
peering: retry establishing connection more quickly on certain errors (#13938)
When we receive a FailedPrecondition error, retry that more quickly
because we expect it will resolve shortly. This is particularly
important in the context of Consul servers behind a load balancer
because when establishing a connection we have to retry until we
randomly land on a leader node.

The default retry backoff goes from 2s, 4s, 8s, etc. which can result in
very long delays quite quickly. Instead, this backoff retries in 8ms
five times, then goes exponentially from there: 16ms, 32ms, ... up to a
max of 8152ms.
2022-07-29 13:04:32 -07:00
Sarah Pratt 10a4999a87 Separate port and socket path requirement in case of local agent assignment 2022-07-29 13:28:21 -05:00
alex 92c615c35f
Merge pull request #13952 from hashicorp/sync-more-acl
sync more acl enforcement
2022-07-28 12:31:02 -07:00
Dhia Ayachi 256694b603
inject gateway addons to destination clusters (#13951) 2022-07-28 15:17:35 -04:00
acpana eae4e71492
sync more acl enforcement
sync w ent at 32756f7

Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-28 12:01:52 -07:00
alex 41f3343eac
Merge pull request #13929 from hashicorp/fix-validation
[sync] fix empty partitions matching
2022-07-28 10:14:49 -07:00
Sarah Pratt a3ef6f016e refactor sidecare_service method into parts 2022-07-28 09:07:13 -05:00
Ashwin Venkatesh eef9edaed9
Add peer counts to emitted metrics. (#13930) 2022-07-27 18:34:04 -04:00
Luke Kysow 465a9801e1
Merge pull request #13924 from hashicorp/lkysow/util-metric-peering
peering: don't track imported services/nodes in usage
2022-07-27 14:49:55 -07:00
acpana 6033584349
use EqualPartitions
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-27 14:48:30 -07:00
acpana 0351ca5136
better fix
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-27 14:28:08 -07:00
acpana 8b2ef80336
sync w ent
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-27 11:41:39 -07:00
Chris S. Kim 0999e05a7d Reduce arm64 flakes for TestConnectCA_ConfigurationSet_ChangeKeyConfig_Primary
There were 16 combinations of tests but 4 of them were duplicates since the default key type and bits were "ec" and 256. That entry was commented out to reduce the subtest count to 12.

testrpc.WaitForLeader was failing on arm64 environments; the cause is unknown but it might be due to the environment being flooded with parallel tests making RPC calls. The RPC polling+retry was replaced with a simpler check for leadership based on raft.
2022-07-27 13:54:34 -04:00
Chris S. Kim 8ead1caf53 Retry checks for virtual IP metadata 2022-07-27 13:54:34 -04:00
Chris S. Kim 62ed0250c3 Sort slice of ServiceNames deterministically 2022-07-27 13:54:34 -04:00
Sarah Pratt f520f6dd0f Separate port and socket path requirement in case of local agent assignment 2022-07-27 12:30:52 -05:00
Luke Kysow 740d54e730 peering: don't track imported services/nodes in usage
Services/nodes that are imported from other peers are stored in
state. We don't want to count those as part of our own cluster's usage.
2022-07-27 09:08:51 -07:00
cskh 4e292b7b72
chore: clarify the error message: service.service must not be empty (#13907)
- when register service using catalog endpoint, the key of service
  name actually should be "service". Add this information to the
  error message will help user to quickly fix in the request.
2022-07-27 10:16:46 -04:00
cskh 59e81a728e
chore: removed unused method AddService (#13905)
- This AddService is not used anywhere.
  AddServiceWithChecks is place of AddService
- Test code is updated
2022-07-26 16:54:53 -04:00
Luke Kysow 021b00e321 Remove duplicate comment 2022-07-26 10:19:49 -07:00
alex 437a28d18a
peering: prevent peering in same partition (#13851)
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-07-25 18:00:48 -07:00
Nitya Dhanushkodi 27bd895ac8
peering: remove validation that forces peering token server addresses to be an IP, allow hostname based addresses (#13874) 2022-07-25 16:33:47 -07:00
Luke Kysow 8c5b70d227
Rename receive to recv in tracker (#13896)
Because it's shorter
2022-07-25 16:08:03 -07:00
Luke Kysow 3530d3782d
peering: read endpoints can now return failing status (#13849)
Track streams that have been disconnected due to an error and
set their statuses to failing.
2022-07-25 14:27:53 -07:00
Kyle Havlovitz 93de25f87c
Merge pull request #13872 from hashicorp/remove-upstream-log
Remove extra logging from ingress upstream watch shutdown
2022-07-25 12:55:30 -07:00
Chris S. Kim 73a84f256f
Preserve PeeringState on upsert (#13666)
Fixes a bug where if the generate token is called twice, the second call upserts the zero-value (undefined) of PeeringState.
2022-07-25 14:37:56 -04:00
Chris S. Kim 8ed49ea4d0
Update envoy metrics label extraction for peered clusters and listeners (#13818)
Now that peered upstreams can generate envoy resources (#13758), we need a way to disambiguate local from peered resources in our metrics. The key difference is that datacenter and partition will be replaced with peer, since in the context of peered resources partition is ambiguous (could refer to the partition in a remote cluster or one that exists locally). The partition and datacenter of the proxy will always be that of the source service.

Regexes were updated to make emitting datacenter and partition labels mutually exclusive with peer labels.

Listener filter names were updated to better match the existing regex.

Cluster names assigned to peered upstreams were updated to be synthesized from local peer name (it previously used the externally provided primary SNI, which contained the peer name from the other side of the peering). Integration tests were updated to assert for the new peer labels.
2022-07-25 13:49:00 -04:00
DanStough 2da8949d78 feat: convert destination address to slice 2022-07-25 12:31:58 -04:00
Freddy f03cca7576
[OSS] Add ACL enforcement to peering endpoints (#13878) 2022-07-25 10:04:10 -06:00
Matt Keeler 58e4d8235b
Enable/Disable Peering Support in the UI (#13816)
We enabled/disable based on the config flag.
2022-07-25 11:50:11 -04:00
freddygv b544ce6485 Add ACL enforcement to peering endpoints 2022-07-25 09:34:29 -06:00
Kyle Havlovitz 016f963e7e Remove excess debug log from ingress upstream shutdown 2022-07-22 17:29:38 -07:00
alex 279d458e6e
peering: use ShouldDial to validate peer role (#13823)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-22 15:56:25 -07:00
Luke Kysow a1e6d69454
peering: add config to enable/disable peering (#13867)
* peering: add config to enable/disable peering

Add config:

```
peering {
  enabled = true
}
```

Defaults to true. When disabled:
1. All peering RPC endpoints will return an error
2. Leader won't start its peering establishment goroutines
3. Leader won't start its peering deletion goroutines
2022-07-22 15:20:21 -07:00
Kyle Havlovitz 0786517b56
Merge pull request #13847 from hashicorp/gateway-goroutine-leak
Fix goroutine leaks in proxycfg when using ingress gateway
2022-07-22 14:43:22 -07:00
Freddy f99df57840
[OSS] Add new peering ACL rule (#13848)
This commit adds a new ACL rule named "peering" to authorize
actions taken against peering-related endpoints.

The "peering" rule has several key properties:
- It is scoped to a partition, and MUST be defined in the default
  namespace.

- Its access level must be "read', "write", or "deny".

- Granting an access level will apply to all peerings. This ACL rule
  cannot be used to selective grant access to some peerings but not
  others.

- If the peering rule is not specified, we fall back to the "operator"
  rule and then the default ACL rule.
2022-07-22 14:42:23 -06:00
alex 927cee692b
peering: emit exported services count metric (#13811)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-22 12:05:08 -07:00
Daniel Upton a8df87f574 proxycfg-glue: server-local implementation of `ExportedPeeredServices`
This is the OSS portion of enterprise PR 2377.

Adds a server-local implementation of the proxycfg.ExportedPeeredServices
interface that sources data from a blocking query against the server's
state store.
2022-07-22 15:23:23 +01:00
Eric Haberkorn 501089292e
Add Cluster Peering Failover Support to Prepared Queries (#13835)
Add peering failover support to prepared queries
2022-07-22 09:14:43 -04:00
Nitya Dhanushkodi f47319b7c6
update generate token endpoint to take external addresses (#13844)
Update generate token endpoint (rpc, http, and api module)

If ServerExternalAddresses are set, it will override any addresses gotten from the "consul" service, and be used in the token instead, and dialed by the dialer. This allows for setting up a load balancer for example, in front of the consul servers.
2022-07-21 14:56:11 -07:00
acpana 12b773ab02
Rename peering internal to ~
sync ENT to 5679392c81

Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-21 10:51:05 -07:00
Luke Kysow 0c87be0845
peering: Add heartbeating to peering streams (#13806)
* Add heartbeating to peering streams
2022-07-21 10:03:27 -07:00
Daniel Upton 3655802fdc proxycfg-glue: server-local implementation of `PeeredUpstreams`
This is the OSS portion of enterprise PR 2352.

It adds a server-local implementation of the proxycfg.PeeredUpstreams interface
based on a blocking query against the server's state store.

It also fixes an omission in the Virtual IP freeing logic where we were never
updating the max index (and therefore blocking queries against
VirtualIPsForAllImportedServices would not return on service deletion).
2022-07-21 13:51:59 +01:00
Luke Kysow c411e6b326
Add send mutex to protect against concurrent sends (#13805) 2022-07-20 15:48:18 -07:00
Kyle Havlovitz 0be7d923dc Cancel upstream watches when the discovery chain has been removed 2022-07-20 14:26:52 -07:00
Kyle Havlovitz 31318d7049 Fix duplicate Notify calls for discovery chains in ingress gateways 2022-07-20 14:25:20 -07:00
Evan Culver 4116537b83
connect: Add support for Envoy 1.23, remove 1.19 (#13807) 2022-07-19 14:51:04 -07:00
Paul Glass 77afe0e76e
Extract AWS auth implementation out of Consul (#13760) 2022-07-19 16:26:44 -05:00
Chris S. Kim 495936300e
Make envoy resources for inferred peered upstreams (#13758)
Peered upstreams has a separate loop in xds from discovery chain upstreams. This PR adds similar but slightly modified code to add filters for peered upstream listeners, clusters, and endpoints in the case of transparent proxy.
2022-07-19 14:56:28 -04:00
alex de5a991d8c
peering: refactor reconcile, cleanup (#13795)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-19 11:43:29 -07:00
Luke Kysow e8d965e56f
peerstream: set keepalive enforcement to 15s (#13796)
The client is set to send keepalive pings every 30s. The server
keepalive enforcement must be set to a number less than that,
otherwise it will disconnect clients for sending pings too often.
MinTime governs the minimum amount of time between pings.
2022-07-18 16:12:03 -07:00
alex a9ae2ff4fa
peering: track exported services (#13784)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-18 10:20:04 -07:00