20263 Commits

Author SHA1 Message Date
Eric Haberkorn
8bb16567cd
sidecar-proxy refactor (#17328) 2023-05-12 16:49:42 -04:00
cskh
2edfda998a
consul-container: mitigate the drift from ent repo (#17323) 2023-05-12 13:03:30 -04:00
Chris Thain
b9102c295d
Add Network Filter Support for Envoy Extensions (#17325) 2023-05-12 09:52:50 -07:00
Matt Keeler
456156ebec
Add type validations for the catalog resources (#17211)
Also adding some common resource validation error types to the internal/resource package.
2023-05-12 09:24:55 -04:00
Kyle Havlovitz
81d8332524
Attach service virtual IP info to compiled discovery chain (#17295)
* Add v1/internal/service-virtual-ip for manually setting service VIPs

* Attach service virtual IP info to compiled discovery chain

* Separate auto-assigned and manual VIPs in response
2023-05-12 02:28:16 +00:00
Kyle Havlovitz
bd0eb07ed3
Add /v1/internal/service-virtual-ip for manually setting service VIPs (#17294) 2023-05-12 00:38:52 +00:00
cskh
c61e994fc0
Container test: fix container test slow image build (#17316)
Container integ test: fix container test slow image build
2023-05-11 22:49:49 +00:00
Tu Nguyen
30eee13cb9
Update consul-k8s install command so it is valid (#17310) 2023-05-11 11:55:23 -07:00
R.B. Boyer
cd80ea18ff
grpc: ensure grpc resolver correctly uses lan/wan addresses on servers (#17270)
The grpc resolver implementation is fed from changes to the
router.Router. Within the router there is a map of various areas storing
the addressing information for servers in those areas. All map entries
are of the WAN variety except a single special entry for the LAN.

Addressing information in the LAN "area" are local addresses intended
for use when making a client-to-server or server-to-server request.

The client agent correctly updates this LAN area when receiving lan serf
events, so by extension the grpc resolver works fine in that scenario.

The server agent only initially populates a single entry in the LAN area
(for itself) on startup, and then never mutates that area map again.
For normal RPCs a different structure is used for LAN routing.

Additionally when selecting a server to contact in the local datacenter
it will randomly select addresses from either the LAN or WAN addressed
entries in the map.

Unfortunately this means that the grpc resolver stack as it exists on
server agents is either broken or only accidentally functions by having
servers dial each other over the WAN-accessible address. If the operator
disables the serf wan port completely likely this incidental functioning
would break.

This PR enforces that local requests for servers (both for stale reads
or leader forwarded requests) exclusively use the LAN "area" information
and also fixes it so that servers keep that area up to date in the
router.

A test for the grpc resolver logic was added, as well as a higher level
full-stack test to ensure the externally perceived bug does not return.
2023-05-11 11:08:57 -05:00
R.B. Boyer
0ee95df4e0
proto: clear out old ratelimit.tmp files before making new ones (#17292) 2023-05-11 10:36:41 -05:00
John Murret
e9986e3774
ci:upload test results to datadog (#17206)
* WIP

* ci:upload test results to datadog

* fix use of envvar in expression

* getting correct permission in reusable-unit.yml

* getting correct permission in reusable-unit.yml

* fixing DATADOG_API_KEY envvar expresssion

* pass datadog-api-key

* removing type from datadog-api-key
2023-05-10 14:49:18 -06:00
Dan Upton
5030101cdb
resource: add missing validation to the List and WatchList endpoints (#17213) 2023-05-10 10:38:48 +01:00
Dan Upton
6c24a66f73
resource: optionally compare timestamps in EqualStatus (#17275) 2023-05-10 10:37:54 +01:00
Derek Menteer
5ecab506a6
Fix ent bug caused by #17241. (#17278)
Fix ent bug caused by #17241

All tests passed in OSS, but not ENT. This is a patch to resolve
the problem for both.
2023-05-09 16:36:29 -05:00
cskh
48f7d99305
snapshot: some improvments to the snapshot process (#17236)
* snapshot: some improvments to the snapshot process

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2023-05-09 15:28:52 -04:00
Semir Patel
40eefaba18
Reaper controller for cascading deletes of owner resources (#17256) 2023-05-09 13:57:40 -05:00
Freddy
0f23def80c
Post a PR comment if the backport runner fails (#17197) 2023-05-09 12:28:34 -06:00
Freddy
7c3e9cd862
Hash namespace+proxy ID when creating socket path (#17204)
UNIX domain socket paths are limited to 104-108 characters, depending on
the OS. This limit was quite easy to exceed when testing the feature on
Kubernetes, due to how proxy IDs encode the Pod ID eg:
metrics-collector-59467bcb9b-fkkzl-hcp-metrics-collector-sidecar-proxy

To ensure we stay under that character limit this commit makes a
couple changes:
- Use a b64 encoded SHA1 hash of the namespace + proxy ID to create a
  short and deterministic socket file name.
- Add validation to proxy registrations and proxy-defaults to enforce a
  limit on the socket directory length.
2023-05-09 12:20:26 -06:00
Dan Upton
d53a1d4a27
resource: add helpers for more efficiently comparing IDs etc (#17224) 2023-05-09 19:02:24 +01:00
Derek Menteer
4f6da20fe5
Fix multiple issues related to proxycfg health queries. (#17241)
Fix multiple issues related to proxycfg health queries.

1. The datacenter was not being provided to a proxycfg query, which resulted in
bypassing agentless query optimizations and using the normal API instead.

2. The health rpc endpoint would return a zero index when insufficient ACLs were
detected. This would result in the agent cache performing an infinite loop of
queries in rapid succession without backoff.
2023-05-09 12:37:58 -05:00
Dan Upton
972998203e
controller: deduplicate items in queue (#17168) 2023-05-09 18:14:20 +01:00
Dan Bond
5f079eb05b
Revert "ci: remove test splitting for compatibility tests (#17166)" (#17262)
This reverts commit 861a8151d50377315c6c391833fef85b71b54d18.
2023-05-09 10:44:31 -06:00
Dan Upton
6e1bc57469
Controller Runtime 2023-05-09 15:25:55 +01:00
Dan Stough
5e4b736b70
chore(ci): fix backport assistant branch creation race (#17249) 2023-05-08 20:30:45 +00:00
John Murret
861a8151d5
ci: remove test splitting for compatibility tests (#17166)
* remove test splitting from compatibility-integration-tests

* enable on push

* remove ipv6 loopback fix

* re-add ipv6 loopback fix

* remove test splitting from upgrade-integration-tests

* remove test splitting from upgrade-integration-tests

* put test splitting back in for upgrade tests

* upgrade-integration tests-o
ne runner no retries
2023-05-08 20:26:16 +00:00
Matt Keeler
34915670f2
Register new catalog & mesh protobuf types with the resource registry (#17225) 2023-05-08 15:36:35 -04:00
Derek Menteer
50ef6a697e
Fix issue with peer stream node cleanup. (#17235)
Fix issue with peer stream node cleanup.

This commit encompasses a few problems that are closely related due to their
proximity in the code.

1. The peerstream utilizes node IDs in several locations to determine which
nodes / services / checks should be cleaned up or created. While VM deployments
with agents will likely always have a node ID, agentless uses synthetic nodes
and does not populate the field. This means that for consul-k8s deployments, all
services were likely bundled together into the same synthetic node in some code
paths (but not all), resulting in strange behavior. The Node.Node field should
be used instead as a unique identifier, as it should always be populated.

2. The peerstream cleanup process for unused nodes uses an incorrect query for
node deregistration. This query is NOT namespace aware and results in the node
(and corresponding services) being deregistered prematurely whenever it has zero
default-namespace services and 1+ non-default-namespace services registered on
it. This issue is tricky to find due to the incorrect logic mentioned in #1,
combined with the fact that the affected services must be co-located on the same
node as the currently deregistering service for this to be encountered.

3. The stream tracker did not understand differences between services in
different namespaces and could therefore report incorrect numbers. It was
updated to utilize the full service name to avoid conflicts and return proper
results.
2023-05-08 13:13:25 -05:00
John Murret
6fa104409e
security: update go version to 1.20.4 (#17240)
* update go version to 1.20.3

* add changelog

* rename changelog file to remove underscore

* update to use 1.20.4

* update change log entry to reflect 1.20.4
2023-05-08 11:57:11 -06:00
Jared Kirschner
f908ad82d0
docs: correct misspelling (#17229) 2023-05-08 13:30:48 -04:00
Semir Patel
991a002fcc
resource: List resources by owner (#17190) 2023-05-08 12:26:19 -05:00
cskh
db253b6395
upgrade test: use docker.mirror.hashicorp.services to avoid docker login (#17186)
* upgrade test: use docker.mirror.hashicorp.services to avoid docker login

* upgrade tests: remove docker login

Signed-off-by: Dan Bond <danbond@protonmail.com>

---------

Signed-off-by: Dan Bond <danbond@protonmail.com>
Co-authored-by: Dan Bond <danbond@protonmail.com>
2023-05-08 13:15:38 -04:00
cskh
83ad0dfa74
Upgrade test target image (#17226)
* upgrade test: add targetimage name as parameter to upgrade function

- the image name of latest version and target version could be
  different. Add the parameter of targetImage to the upgrade
  function

* fix a bug of expected error
2023-05-08 12:02:31 -04:00
Jared Kirschner
166d7a39e8
docs: consistently name Consul service mesh (#17222)
Remove outdated usage of "Consul Connect" instead of Consul service mesh.

The connect subsystem in Consul provides Consul's service mesh capabilities.
However, the term "Consul Connect" should not be used as an alternative to
the name "Consul service mesh".
2023-05-05 13:41:40 -04:00
Dan Upton
917afcf3c6
controller: make the WorkQueue generic (#16982) 2023-05-05 15:38:22 +01:00
Matt Keeler
4715a86358
Initial Catalog & Mesh Protobuf Definitions (#17123) 2023-05-05 09:47:28 -04:00
John Eikenberry
bd76fdeaeb
enable auto-tidy expired issuers in vault (as CA)
When using vault as a CA and generating the local signing cert, try to
enable the PKI endpoint's auto-tidy feature with it set to tidy expired
issuers.
2023-05-03 20:30:37 +00:00
Nathan Coleman
bdef22354b
Use auth context when evaluating service read permissions (#17207)
Co-authored-by: Blake Covarrubias <1812+blake@users.noreply.github.com>
2023-05-02 16:23:42 -04:00
Eddie Rowe
90fc9bd9e5
Fix broken lightstep link (#17201) 2023-05-01 14:24:52 +00:00
Poonam Jadhav
ef5d54fd4c
feat: add no-op reporting background routine (#17178) 2023-04-28 20:07:03 -04:00
Freddy
0fc4fc6429
Revert "[CC-4519] Include Consul NodeID in Envoy bootstrap metadata" (#17191) 2023-04-28 15:23:55 -06:00
Eric Haberkorn
2c0da88ce7
fix panic in injectSANMatcher when tlsContext is nil (#17185) 2023-04-28 16:27:57 -04:00
Paul Glass
e4a341c88a
Permissive mTLS: Config entry filtering and CLI warnings (#17183)
This adds filtering for service-defaults: consul config list -filter 'MutualTLSMode == "permissive"'.

It adds CLI warnings when the CLI writes a config entry and sees that either service-defaults or proxy-defaults contains MutualTLSMode=permissive, or sees that the mesh config entry contains AllowEnablingPermissiveMutualTLSMode=true.
2023-04-28 12:51:36 -05:00
R.B. Boyer
6b4986907d
peering: ensure that merged central configs of peered upstreams for partitioned downstreams work (#17179)
Partitioned downstreams with peered upstreams could not properly merge central config info (i.e. proxy-defaults and service-defaults things like mesh gateway modes) if the upstream had an empty DestinationPartition field in Enterprise.

Due to data flow, if this setup is done using Consul client agents the field is never empty and thus does not experience the bug.

When a service is registered directly to the catalog as is the case for consul-dataplane use this field may be empty and and the internal machinery of the merging function doesn't handle this well.

This PR ensures the internal machinery of that function is referentially self-consistent.
2023-04-28 12:36:08 -05:00
Semir Patel
1037bf7f69
Sync .golangci.yml from ENT (#17180) 2023-04-28 17:14:37 +00:00
John Landa
eded58b62a
Remove artificial ACLTokenMaxTTL limit for configuring acl token expiry (#17066)
* Remove artificial ACLTokenMaxTTL limit for configuring acl token expiry

* Add changelog

* Remove test on default MaxTokenTTL

* Change to imperitive tense for changelog entry
2023-04-28 10:57:30 -05:00
Semir Patel
9fef1c7f17
Create tombstone on resource Delete (#17108) 2023-04-28 10:49:08 -05:00
Dan Upton
eff5dd1812
resource: owner references must include a uid (#17169) 2023-04-28 11:22:42 +01:00
Freddy
e02ef16f02
Update HCP bootstrapping to support existing clusters (#16916)
* Persist HCP management token from server config

We want to move away from injecting an initial management token into
Consul clusters linked to HCP. The reasoning is that by using a separate
class of token we can have more flexibility in terms of allowing HCP's
token to co-exist with the user's management token.

Down the line we can also more easily adjust the permissions attached to
HCP's token to limit it's scope.

With these changes, the cloud management token is like the initial
management token in that iit has the same global management policy and
if it is created it effectively bootstraps the ACL system.

* Update SDK and mock HCP server

The HCP management token will now be sent in a special field rather than
as Consul's "initial management" token configuration.

This commit also updates the mock HCP server to more accurately reflect
the behavior of the CCM backend.

* Refactor HCP bootstrapping logic and add tests

We want to allow users to link Consul clusters that already exist to
HCP. Existing clusters need care when bootstrapped by HCP, since we do
not want to do things like change ACL/TLS settings for a running
cluster.

Additional changes:

* Deconstruct MaybeBootstrap so that it can be tested. The HCP Go SDK
  requires HTTPS to fetch a token from the Auth URL, even if the backend
  server is mocked. By pulling the hcp.Client creation out we can modify
  its TLS configuration in tests while keeping the secure behavior in
  production code.

* Add light validation for data received/loaded.

* Sanitize initial_management token from received config, since HCP will
  only ever use the CloudConfig.MangementToken.

* Add changelog entry
2023-04-27 22:27:39 +02:00
John Maguire
391ed069c4
APIGW: Update how status conditions for certificates are handled (#17115)
* Move status condition for invalid certifcate to reference the listener
that is using the certificate

* Fix where we set the condition status for listeners and certificate
refs, added tests

* Add changelog
2023-04-27 15:54:44 +00:00
Anita Akaeze
f03d6a06be
Merge pull request #5288 from hashicorp/NET-3648_fix (#17163)
NET-3648: perform envoy version verification
2023-04-26 20:29:43 -04:00