Commit Graph

4296 Commits

Author SHA1 Message Date
Riddhi Shah d8d8c8603e
Add support for merge-central-config query param (#13001)
Adds a new query param merge-central-config for use with the below endpoints:

/catalog/service/:service
/catalog/connect/:service
/health/service/:service
/health/connect/:service

If set on the request, the response will include a fully resolved service definition which is merged with the proxy-defaults/global and service-defaults/:service config entries (on-demand style). This is useful to view the full service definition for a mesh service (connect-proxy kind or gateway kind) which might not be merged before being written into the catalog (example: in case of services in the agentless model).
2022-05-25 13:20:17 -07:00
R.B. Boyer 31526139fd
remove a source of test panics (#13227) 2022-05-25 14:33:00 -05:00
R.B. Boyer a85b8a4705
api: ensure peering API endpoints do not use protobufs (#13204)
I noticed that the JSON api endpoints for peerings json encodes protobufs directly, rather than converting them into their `api` package equivalents before marshal/unmarshaling them.

I updated this and used `mog` to do the annoying part in the middle. 

Other changes:
- the status enum was converted into the friendlier string form of the enum for readability with tools like `curl`
- some of the `api` library functions were slightly modified to match other similar endpoints in UX (cc: @ndhanushkodi )
- peeringRead returns `nil` if not found
- partitions are NOT inferred from the agent's partition (matching 1.11-style logic)
2022-05-25 13:43:35 -05:00
R.B. Boyer 1a8834e1c8
peering: replicate expected SNI, SPIFFE, and service protocol to peers (#13218)
The importing peer will need to know what SNI and SPIFFE name
corresponds to each exported service. Additionally it will need to know
at a high level the protocol in use (L4/L7) to generate the appropriate
connection pool and local metrics.

For replicated connect synthetic entities we edit the `Connect{}` part
of a `NodeService` to have a new section:

    {
      "PeerMeta": {
        "SNI": [
          "web.default.default.owt.external.183150d5-1033-3672-c426-c29205a576b8.consul"
        ],
        "SpiffeID": [
          "spiffe://183150d5-1033-3672-c426-c29205a576b8.consul/ns/default/dc/dc1/svc/web"
        ],
        "Protocol": "tcp"
      }
    }

This data is then replicated and saved as-is at the importing side. Both
SNI and SpiffeID are slices for now until I can be sure we don't need
them for how mesh gateways will ultimately work.
2022-05-25 12:37:44 -05:00
R.B. Boyer be631ebdce
peering: disable requirement for mesh gateways initially (#13213) 2022-05-25 10:13:23 -05:00
Kyle Havlovitz 0ed9ff8ef7
Merge pull request #13143 from hashicorp/envoy-connection-limit
Add connection limit setting to service defaults
2022-05-25 07:48:50 -07:00
Kyle Havlovitz f2fbe8aec9 Fix proto lint errors after version bump 2022-05-24 18:44:54 -07:00
Kyle Havlovitz dbed8ae10b Specify go_package explicitly 2022-05-24 10:22:53 -07:00
cskh 8712a088b1
fix: non-leader agents return 404 on Get Intention exact api (#13179)
* fix: non-leader agents return 404 on Get Intention exact api

- rpc call method appends extra error message, so change == to
  "Strings.Contains"

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-05-24 13:21:15 -04:00
Kyle Havlovitz 4bc6c23357 Add connection limit setting to service defaults 2022-05-24 10:13:38 -07:00
DanStough 817449041d chore(test): Update bats version 2022-05-24 11:56:08 -04:00
DanStough 147fd96d97 feat: add endpoint struct to ServiceConfigEntry 2022-05-24 11:56:08 -04:00
alex 876f3bb971
peering: expose IsLeader, hung up on dialer if follower (#13164)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-05-23 11:30:58 -07:00
Matt Keeler 26f4ea3f01
Migrate from `protoc` to `buf` (#12841)
* Install `buf` instead of `protoc`
* Created `buf.yaml` and `buf.gen.yaml` files in the two proto directories to control how `buf` generates/lints proto code.
* Invoke `buf` instead of `protoc`
* Added a `proto-format` make target.
* Committed the reformatted proto files.
* Added a `proto-lint` make target.
* Integrated proto linting with CI
* Fixed tons of proto linter warnings.
* Got rid of deprecated builtin protoc-gen-go grpc plugin usage. Moved to direct usage of protoc-gen-go-grpc.
* Unified all proto directories / go packages around using pb prefixes but ensuring all proto packages do not have the prefix.
2022-05-23 10:37:52 -04:00
cskh c986940fda
Upgrade golangci-lint for go v1.18 (#13176) 2022-05-23 10:26:45 -04:00
R.B. Boyer 21bb0eef4a
test: fix flaky test TestEventBufferFuzz (#13175) 2022-05-23 09:22:30 -05:00
Matt Keeler d0fdf22f83
Fix tests broken in #13173 (#13178)
I changed the error type returned in a situation but didn’t update the tests to expect that error.
2022-05-23 10:00:06 -04:00
Matt Keeler 3c1e17cbd5
Fix flaky tests in the agent/grpc/public/services/serverdiscovery package (#13173)
Occasionally we had seen the TestWatchServers_ACLToken_PermissionDenied be flagged as flaky in circleci. This change should fix that.

Why it fixes it is complicated. The test was failing with a panic when a mocked ACL Resolver was being called more times than expected. I struggled for a while to determine how that could be. This test should call authorize once and only once and the error returned should cause the stream to be terminated and the error returned to the gRPC client. Another oddity was no amount of running this test locally seemed to be able to reproduce the issue. I ran the test hundreds of thousands of time and it always passed.

It turns out that there is nothing wrong with the test. It just so happens that the panic from unexpected invocation of a mocked call happened during the test but was caused by a previous test (specifically the TestWatchServers_StreamLifecycle test)

The stream from the previous test remained open after all the test Cleanup functions were run and it just so happened that when the EventPublisher eventually picked up that the context was cancelled during cleanup, it force closes all subscriptions which causes some loops to be re-entered and the streams to be reauthorized. Its that looping in response to forced subscription closures that causes the mock to eventually panic. All the components, publisher, server, client all operate based on contexts. We cancel all those contexts but there is no syncrhonous way to know when they are stopped.

We could have implemented a syncrhonous stop but in the context of an actual running Consul, context cancellation + async stopping is perfectly fine. What we (Dan and I) eventually thought was that the behavior of grpc streams such as this when a server was shutting down wasn’t super helpful. What we would want is for a client to be able to distinguish between subscription closed because something may have changed requiring re-authentication and subscription closed because the server is shutting down. That way we can send back appropriate error messages to detail that the server is shutting down and not confuse users with potentially needing to resubscribe.

So thats what this PR does. We have introduced a shutting down state to our event subscriptions and the various streaming gRPC services that rely on the event publisher will all just behave correctly and actually stop the stream (not attempt transparent reauthorization) if this particular error is the one we get from the stream. Additionally the error that gets transmitted back through gRPC when this does occur indicates to the consumer that the server is going away. That is more helpful so that a client can then attempt to reconnect to another server.
2022-05-23 08:59:13 -04:00
R.B. Boyer bbcb1fa805
agent: allow for service discovery queries involving peer name to use streaming (#13168) 2022-05-20 15:27:01 -05:00
Dan Upton d7f8a8e4ef
proxycfg: remove dependency on `cache.UpdateEvent` (#13144)
OSS portion of enterprise PR 1857.

This removes (most) references to the `cache.UpdateEvent` type in the
`proxycfg` package.

As we're going to be direct usage of the agent cache with interfaces that
can be satisfied by alternative server-local datasources, it doesn't make
sense to depend on this type everywhere anymore (particularly on the
`state.ch` channel).

We also plan to extract `proxycfg` out of Consul into a shared library in
the future, which would require removing this dependency.

Aside from a fairly rote find-and-replace, the main change is that the
`cache.Cache` and `health.Client` types now accept a callback function
parameter, rather than a `chan<- cache.UpdateEvents`. This allows us to
do the type conversion without running another goroutine.
2022-05-20 15:47:40 +01:00
R.B. Boyer 2e72f44fda
peering: accept replication stream of discovery chain information at the importing side (#13151) 2022-05-19 16:37:52 -05:00
R.B. Boyer c27e186334
test: TestServer_RPC_MetricsIntercept should use a concurrency-safe metrics store (#13157) 2022-05-19 15:39:28 -05:00
cskh 364d4f5efe
Retry on bad dogstatsd connection (#13091)
- Introduce a new telemetry configurable parameter retry_failed_connection. User can set the value to true to let consul agent continue its start process on failed connection to datadog server. When set to false, agent will stop on failed start. The default behavior is true.

Co-authored-by: Dan Upton <daniel@floppy.co>
Co-authored-by: Evan Culver <eculver@users.noreply.github.com>
2022-05-19 16:03:46 -04:00
R.B. Boyer 3e4a522882 peering: replicate discovery chains information to importing peers
Treat each exported service as a "discovery chain" and replicate one
synthetic CheckServiceNode for each chain and remote mesh gateway.

The health will be a flattened generated check of the checks for that
mesh gateway node.
2022-05-19 14:21:44 -05:00
R.B. Boyer 5a03536040 prefactor some functions out of the monolithic file 2022-05-19 14:21:29 -05:00
R.B. Boyer 1e31dc891a
test: fix incorrect use of t instead of r in retry test (#13146) 2022-05-19 14:00:07 -05:00
Dan Upton a76f63a695
config: prevent top-level `verify_incoming` enabling mTLS on gRPC port (#13118)
Fixes #13088

This is a backwards-compatibility bug introduced in 1.12.
2022-05-18 16:15:57 +01:00
Freddy b38be4c0ed
Patches to peering initiation for POC demo (#13076)
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2022-05-13 13:01:00 -06:00
Dhia Ayachi a0455774c0
When a host header is defined override `req.Host` in the metrics ui (#13071)
* When a host header is defined override the req.Host in the metrics ui endpoint.

* add changelog
2022-05-13 14:05:22 -04:00
Freddy e874b860c0
Actually block when syncing subscriptions (#13066)
By changing to use WatchCtx we will actually block for changes to the peering list. WatchCh creates a goroutine to collect errors from WatchCtx and returns immediately.

The existing behavior wouldn't result in a tight loop because of the rate limiting in the surrounding function, but it would still lead to more work than is necessary.
2022-05-12 17:36:14 -06:00
Evan Culver 0fa5e7be5a
peering: add TrustBundleListByService endpoint (#13048) 2022-05-12 15:58:22 -07:00
Freddy 4e215dc411
[OSS] Add upsert handling for receiving CheckServiceNode (#13061) 2022-05-12 15:04:44 -06:00
Matt Keeler b788691fa6
Watch the singular service resolver instead of the list + filtering to 1 (#13012)
* Watch the singular service resolver instead of the list + filtering to 1

* Rename the ConfigEntries cache type to ConfigEntryList
2022-05-12 16:34:17 -04:00
R.B. Boyer 93b164aac3
structs: add convenience methods to sort slices of ServiceName values (#13038) 2022-05-12 10:08:50 -05:00
R.B. Boyer cc15a11f9c
test: ensure this package uses freeport for port allocation (#13036) 2022-05-11 14:20:50 -05:00
R.B. Boyer 901fd4dd68
remove remaining shim runStep functions (#13015)
Wraps up the refactor from #13013
2022-05-10 16:24:45 -05:00
R.B. Boyer 0d6d16ddfb
add general runstep test helper instead of copying it all over the place (#13013) 2022-05-10 15:25:51 -05:00
Jared Kirschner f4e1ade46a
Merge pull request #12463 from hashicorp/docs/consistency-mode-improvements
Improve consistency mode docs
2022-05-09 23:04:00 -04:00
Jared Kirschner 05a648f530 docs: clarify consistency mode operation
Changes include:
- Add diagrams of the operation of different consistency modes
- Note that only stale reads benefit from horizontal scaling
- Increase scannability with headings
- Document consistency mode defaults and how to override for
  DNS and HTTP API interfaces
- Document X-Consul-Effective-Consistency response header
2022-05-09 16:39:48 -07:00
FFMMM b8ce8e36fb
add err msg on PeeringRead not found (#12986)
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2022-05-09 15:22:42 -07:00
FFMMM 37a1e33834
expose meta tags for peering (#12964) 2022-05-09 13:47:37 -07:00
Mark Anderson 4364e440db Add oss test
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-05-09 10:07:19 -07:00
Mark Anderson 346b68a441 Fix up enterprise version tag.
Changes to how the version string was handled created small regression with the release of consul 1.12.0 enterprise.

Many tools use the Config:Version field reported by the agent/self resource to determine whether Consul is an enterprise or OSS instance, expect something like 1.12.0+ent for enterprise and simply 1.12.0 for OSS. This was accidentally broken during the runup to 1.12.x

This work fixes the value returned by both the self endpoint in ["Config"]["Version"] and the metrics consul.version field.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-05-09 10:07:19 -07:00
Evan Culver 9c8606e138
peering: add store.PeeringsForService implementation (#12957) 2022-05-06 12:35:31 -07:00
Eric Haberkorn e7b9d025a4
Merge pull request #12956 from hashicorp/suport-lambda-connect-proxy
Support Invoking Lambdas from Sidecar Proxies
2022-05-06 08:17:38 -04:00
Eric 21c3134575 Support making requests to lambda from connect proxies. 2022-05-05 17:42:30 -04:00
FFMMM 745bd15b15
api: add PeeeringList, polish (#12934) 2022-05-05 14:15:42 -07:00
Riddhi Shah 0c855fab98
Validate port on mesh service registration (#12881)
Add validation to ensure connect native services have a port or socketpath specified on catalog registration.
This was the only missing piece to ensure all mesh services are validated for a port (or socketpath) specification on catalog registration.
2022-05-05 09:13:30 -07:00
Mark Anderson c6ff4ba7d8
Support vault namespaces in connect CA (#12904)
* Support vault namespaces in connect CA

Follow on to some missed items from #12655

From an internal ticket "Support standard "Vault namespace in the
path" semantics for Connect Vault CA Provider"

Vault allows the namespace to be specified as a prefix in the path of
a PKI definition, but our usage of the Vault API includes calls that
don't support a namespaced key. In particular the sys.* family of
calls simply appends the key, instead of prefixing the namespace in
front of the path.

Unfortunately it is difficult to reliably parse a path with a
namespace; only vault knows what namespaces are present, and the '/'
separator can be inside a key name, as well as separating path
elements. This is in use in the wild; for example
'dc1/intermediate-key' is a relatively common naming schema.

Instead we add two new fields: RootPKINamespace and
IntermediatePKINamespace, which are the absolute namespace paths
'prefixed' in front of the respective PKI Paths.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-05-04 19:41:55 -07:00
Chris S. Kim abc472f2a3
Default discovery chain when upstream targets a DestinationPeer (#12942) 2022-05-04 16:25:25 -04:00