Commit Graph

4388 Commits

Author SHA1 Message Date
trujillo-adam 4151dc097a fixing merge conflicts part 3 2022-03-15 15:25:03 -07:00
Eric Haberkorn e92dd9dc9a
Merge pull request #12556 from hashicorp/wire-up-serverless-patcher
Create and wire up the serverless patcher
2022-03-15 14:05:40 -04:00
Eric Haberkorn fc3c0f312c
Merge pull request #12557 from hashicorp/remove-healthcheck-gogo-stdduration
Remove Gogo Stdduration From the Healthcheck Protobufs
2022-03-15 13:20:49 -04:00
Eric 4e6b34725d Remove gogo stdduration from the healthcheck protobufs 2022-03-15 10:51:40 -04:00
Eric cf3e517d0e Create and wire up the serverless patcher 2022-03-15 10:12:57 -04:00
trujillo-adam 76d55ac2b4 merging new hierarchy for agent configuration 2022-03-14 15:44:41 -07:00
Mark Anderson 676ea58bc4
Refactor config checks oss (#12550)
Currently the config_entry.go subsystem delegates authorization decisions via the ConfigEntry interface CanRead and CanWrite code. Unfortunately this returns a true/false value and loses the details of the source.

This is not helpful, especially since it the config subsystem can be more complex to understand, since it covers so many domains.

This refactors CanRead/CanWrite to return a structured error message (PermissionDenied or the like) with more details about the reason for denial.

Part of #12241

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-03-11 13:45:51 -08:00
Eric Haberkorn d59364fa7f
Merge pull request #12536 from hashicorp/add-serverless-config
Add the `connect.enable_serverless_plugin` configuration option
2022-03-11 09:39:36 -05:00
Eric Haberkorn 44609c0ca5
Merge pull request #12539 from hashicorp/make-xds-lib
Make the xdscommon package
2022-03-11 09:21:10 -05:00
Eric 3302b2eec2 Add the `connect.enable_serverless_plugin` configuration option. 2022-03-11 09:16:00 -05:00
Mark Anderson aaefe15613
Bulk acl message fixup oss (#12470)
* First pass for helper for bulk changes

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Convert ACLRead and ACLWrite to new form

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* AgentRead and AgentWRite

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Fix EventWrite

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* KeyRead, KeyWrite, KeyList

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* KeyRing

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* NodeRead NodeWrite

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* OperatorRead and OperatorWrite

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* PreparedQuery

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Intention partial

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Fix ServiceRead, Write ,etc

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Error check ServiceRead?

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Fix Sessionread/Write

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Fixup snapshot ACL

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Error fixups for txn

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Add changelog

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Fixup review comments

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-03-10 18:48:27 -08:00
Eric f5c9fa6fa6 Make an xdscommon package that will be shared between Consul and Envoy plugins 2022-03-08 14:57:23 -05:00
Eric Haberkorn abfcde1bc6
Merge pull request #12529 from hashicorp/add-meta-to-service-config-response
Add `Meta` to `ServiceConfigResponse`
2022-03-07 16:35:21 -05:00
Eric Haberkorn 9d0ec2eec2 Code review changes 2022-03-07 14:39:33 -05:00
R.B. Boyer 2a56e0055b
proxycfg: change how various proxycfg test helpers for making ConfigSnapshot copies works to be more correct and less error prone (#12531)
Prior to this PR for the envoy xDS golden tests in the agent/xds package we
were hand-creating a proxycfg.ConfigSnapshot structure in the proper format for
input to the xDS generator. Over time this intermediate structure has gotten
trickier to build correctly for the various tests.

This PR proposes to switch to using the existing mechanism for turning a
structs.NodeService and a sequence of cache.UpdateEvent copies into a
proxycfg.ConfigSnapshot, as that is less error prone to construct and aligns
more with how the data arrives.

NOTE: almost all of this is in test-related code. I tried super hard to craft
correct event inputs to get the golden files to be the same, or similar enough
after construction to feel ok that i recreated the spirit of the original test
cases.
2022-03-07 11:47:14 -06:00
Eric f7cc7ff5cd Add `Meta` to `ServiceConfigResponse` 2022-03-07 10:05:18 -05:00
R.B. Boyer 8307e40f2b
reduce flakiness/raciness of errNotFound and errNotChanged blocking query tests (#12518)
Improves tests from #12362

These tests try to setup the following concurrent scenario:

1. (goroutine 1) execute read RPC with index=0
2. (goroutine 1) get response from (1) @ index=10
3. (goroutine 1) execute read RPC with index=10 and block
4. (goroutine 2) WHILE (3) is blocking, start slamming the system with stray writes that will cause the WatchSet to wakeup
5. (goroutine 2) after doing all writes, shut down the reader above
6. (goroutine 1) stops reading, double checks that it only ever woke up once (from 1)
2022-03-04 11:20:01 -06:00
R.B. Boyer 9268715697
server: fix spurious blocking query suppression for discovery chains (#12512)
Minor fix for behavior in #12362

IsDefault sometimes returns true even if there was a proxy-defaults or service-defaults config entry that was consulted. This PR fixes that.
2022-03-03 16:54:41 -06:00
Daniel Nephin 5ba994a73f
Merge pull request #12298 from jorgemarey/b-persistnewrootandconfig
Avoid raft change when no config is provided on persistNewRootAndConfig
2022-03-03 11:03:50 -05:00
Daniel Nephin 161206e24d ca: make sure the test fails without the fix
Also change the path used for the secondary so that both primary and secondary do not overwrite each other.
2022-03-02 18:22:49 -05:00
R.B. Boyer 58e053c336
raft: upgrade to v1.3.6 (#12496)
Add additional protections on the Consul side to prevent NonVoters from bootstrapping raft.

This should un-flake TestServer_Expect_NonVoters
2022-03-02 17:00:02 -06:00
Daniel Nephin 73c91ed80f
Merge pull request #12467 from hashicorp/dnephin/ci-vault-test-safer
ca: require that tests that use Vault are named correctly
2022-03-01 12:54:02 -05:00
R.B. Boyer 6666832077
test: parallelize more of TestLeader_ReapOrLeftMember_IgnoreSelf (#12468)
before:

    $ go test ./agent/consul -run TestLeader_ReapOrLeftMember_IgnoreSelf
    ok  	github.com/hashicorp/consul/agent/consul	21.147s

after:

    $ go test ./agent/consul -run TestLeader_ReapOrLeftMember_IgnoreSelf
    ok  	github.com/hashicorp/consul/agent/consul	5.402s
2022-03-01 10:30:06 -06:00
Jorge Marey f429c1a5d9 Fix vault test with suggested changes 2022-03-01 10:20:00 +01:00
Jorge Marey 1a0baf4024 Add test case to verify #12298 2022-03-01 09:25:52 +01:00
Jorge Marey 4375dd2409 Avoid raft change when no config is provided on CAmanager
- This avoids a change to the raft store when no roots or config
are provided to persistNewRootAndConfig
2022-03-01 09:25:52 +01:00
Daniel Nephin d669226784 ca: fix a test
This test does not use Vault, so does not need ca.SkipIfVaultNotPresent
2022-02-28 16:26:18 -05:00
Daniel Nephin 1f00ede559 ca: require that tests that use Vault are named correctly
Previously we were using two different criteria to decide where to run a
test.  The main `go-test` job would skip Vault tests based on the
presence of the `vault` binary, but the `test-connect-ca-providers` job
would run tests based on the name.

This led to a scenario where a test may never run in CI.

To fix this problem I added a name check to the function we use to skip
the test. This should ensure that any test that requires vault is named
correctly to be run as part of the `test-connect-ca-providers` job.

At the same time I relaxed the regex we use. I verified this runs the
same tests using `go test --list Vault`.  I made this change because a
bunch of tests in `agent/connect/ca` used `Vault` in the name, without
the underscores. Instead of changing a bunch of test names, this seemed
easier.

With this approach, the worst case is that we run a few extra tests in
the `test-connect-ca-providers` job, which doesn't seem like a problem.
2022-02-28 16:13:53 -05:00
R.B. Boyer 7b0548dd8d
server: suppress spurious blocking query returns where multiple config entries are involved (#12362)
Starting from and extending the mechanism introduced in #12110 we can specially handle the 3 main special Consul RPC endpoints that react to many config entries in a single blocking query in Connect:

- `DiscoveryChain.Get`
- `ConfigEntry.ResolveServiceConfig`
- `Intentions.Match`

All of these will internally watch for many config entries, and at least one of those will likely be not found in any given query. Because these are blends of multiple reads the exact solution from #12110 isn't perfectly aligned, but we can tweak the approach slightly and regain the utility of that mechanism.

### No Config Entries Found

In this case, despite looking for many config entries none may be found at all. Unlike #12110 in this scenario we do not return an empty reply to the caller, but instead synthesize a struct from default values to return. This can be handled nearly identically to #12110 with the first 1-2 replies being non-empty payloads followed by the standard spurious wakeup suppression mechanism from #12110.

### No Change Since Last Wakeup

Once a blocking query loop on the server has completed and slept at least once, there is a further optimization we can make here to detect if any of the config entries that were present at specific versions for the prior execution of the loop are identical for the loop we just woke up for. In that scenario we can return a slightly different internal sentinel error and basically externally handle it similar to #12110.

This would mean that even if 20 discovery chain read RPC handling goroutines wakeup due to the creation of an unrelated config entry, the only ones that will terminate and reply with a blob of data are those that genuinely have new data to report.

### Extra Endpoints

Since this pattern is pretty reusable, other key config-entry-adjacent endpoints used by `agent/proxycfg` also were updated:

- `ConfigEntry.List`
- `Internal.IntentionUpstreams` (tproxy)
2022-02-25 15:46:34 -06:00
Chris S. Kim 25f4a425d1
Merge pull request #12442 from danieleva/12422-keyring
Allows keyring operations on client agents
2022-02-25 16:28:56 -05:00
Evan Culver 522676ed8d
connect: Update supported Envoy versions to include 1.19.3 and 1.18.6 2022-02-24 16:59:33 -08:00
Evan Culver b95f010ac0
connect: Upgrade Envoy 1.20 to 1.20.2 (#12443) 2022-02-24 16:19:39 -08:00
R.B. Boyer ca112f8721
fix flaky test panic (#12446) 2022-02-24 17:35:46 -06:00
R.B. Boyer 957146401e
catalog: compare node names case insensitively in more places (#12444)
Many places in consul already treated node names case insensitively.
The state store indexes already do it, but there are a few places that
did a direct byte comparison which have now been corrected.

One place of particular consideration is ensureCheckIfNodeMatches
which is executed during snapshot restore (among other places). If a
node check used a slightly different casing than the casing of the node
during register then the snapshot restore here would deterministically
fail. This has been fixed.

Primary approach:

    git grep -i "node.*[!=]=.*node" -- ':!*_test.go' ':!docs'
    git grep -i '\[[^]]*member[^]]*\]
    git grep -i '\[[^]]*\(member\|name\|node\)[^]]*\]' -- ':!*_test.go' ':!website' ':!ui' ':!agent/proxycfg/testing.go:' ':!*.md'
2022-02-24 16:54:47 -06:00
Daniele Vazzola e76ca318dc Allows keyring operations on client agents 2022-02-24 17:24:57 +00:00
R.B. Boyer 64271289ec
server: partly fix config entry replication issue that prevents replication in some circumstances (#12307)
There are some cross-config-entry relationships that are enforced during
"graph validation" at persistence time that are required to be
maintained. This means that config entries may form a digraph at times.

Config entry replication procedes in a particular sorted order by kind
and name.

Occasionally there are some fixups to these digraphs that end up
replicating in the wrong order and replicating the leaves
(ingress-gateway) before the roots (service-defaults) leading to
replication halting due to a graph validation error related to things
like mismatched service protocol requirements.

This PR changes replication to give each computed change (upsert/delete)
a fair shot at being applied before deciding to terminate that round of
replication in error. In the case where we've simply tried to do the
operations in the wrong order at least ONE of the outstanding requests
will complete in the right order, leading the subsequent round to have
fewer operations to do, with a smaller likelihood of graph validation
errors.

This does not address all scenarios, but for scenarios where the edits
are being applied in the wrong order this should avoid replication
halting.

Fixes #9319

The scenario that is NOT ADDRESSED by this PR is as follows:

1. create: service-defaults: name=new-web, protocol=http
2. create: service-defaults: name=old-web, protocol=http
3. create: service-resolver: name=old-web, redirect-to=new-web
4. delete: service-resolver: name=old-web
5. update: service-defaults: name=old-web, protocol=grpc
6. update: service-defaults: name=new-web, protocol=grpc
7. create: service-resolver: name=old-web, redirect-to=new-web

If you shutdown dc2 just before (4) and turn it back on after (7)
replication is impossible as there is no single edit you can make to
make forward progress.
2022-02-23 17:27:48 -06:00
Chris S. Kim ea47f066d7
Merge pull request #12430 from hashicorp/ci/main-assetfs-build
auto-updated agent/uiserver/bindata_assetfs.go from commit 73b6687c5
2022-02-23 18:19:30 -05:00
Daniel Nephin 771df290d7
Merge pull request #11910 from hashicorp/dnephin/ca-provider-interface-for-ica-in-primary
ca: add support for an external trusted CA
2022-02-22 13:14:52 -05:00
R.B. Boyer 8b987a4d59
configentry: make a new package to hold shared config entry structs that aren't used for RPC or the FSM (#12384)
First two candidates are ConfigEntryKindName and DiscoveryChainConfigEntries.
2022-02-22 10:36:36 -06:00
Dhia Ayachi cd9d8d44a5
file watcher to be used for configuration auto-reload feature (#12301)
* add config watcher to the config package

* add logging to watcher

* add test and refactor to add WatcherEvent.

* add all API calls and fix a bug with recreated files

* add tests for watcher

* remove the unnecessary use of context

* Add debug log and a test for file rename

* use inode to detect if the file is recreated/replaced and only listen to create events.

* tidy ups (#1535)

* tidy ups

* Add tests for inode reconcile

* fix linux vs windows syscall

* fix linux vs windows syscall

* fix windows compile error

* increase timeout

* use ctime ID

* remove remove/creation test as it's a use case that fail in linux

* fix linux/windows to use Ino/CreationTime

* fix the watcher to only overwrite current file id

* fix linter error

* fix remove/create test

* set reconcile loop to 200 Milliseconds

* fix watcher to not trigger event on remove, add more tests

* on a remove event try to add the file back to the watcher and trigger the handler if success

* fix race condition

* fix flaky test

* fix race conditions

* set level to info

* fix when file is removed and get an event for it after

* fix to trigger handler when we get a remove but re-add fail

* fix error message

* add tests for directory watch and fixes

* detect if a file is a symlink and return an error on Add

* rename Watcher to FileWatcher and remove symlink deref

* add fsnotify@v1.5.1

* fix go mod

* fix flaky test

* Apply suggestions from code review

Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>

* fix a possible stack overflow

* do not reset timer on errors, rename OS specific files

* start the watcher when creating it

* fix data race in tests

* rename New func

* do not call handler when a remove event happen

* events trigger on write and rename

* fix watcher tests

* make handler async

* remove recursive call

* do not produce events for sub directories

* trim "/" at the end of a directory when adding

* add missing test

* fix logging

* add todo

* fix failing test

* fix flaking tests

* fix flaky test

* add logs

* fix log text

* increase timeout

* reconcile when remove

* check reconcile when removed

* fix reconcile move test

* fix logging

* delete invalid file

* Apply suggestions from code review

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* fix review comments

* fix is watched to properly catch a remove

* change test timeout

* fix test and rename id

* fix test to create files with different mod time.

* fix deadlock when stopping watcher

* Apply suggestions from code review

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* fix a deadlock when calling stop while emitting event is blocked

* make sure to close the event channel after the event loop is done

* add go doc

* back date file instead of sleeping

* Apply suggestions from code review

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* check error

Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-02-21 11:36:52 -05:00
hc-github-team-consul-core ad14a2bffd auto-updated agent/uiserver/bindata_assetfs.go from commit 73b6687c5 2022-02-21 12:27:52 +00:00
Evan Culver 602e08ada7
checks: populate interval and timeout when registering services (#11138) 2022-02-18 12:05:33 -08:00
Kyle Havlovitz 362753cad7
Merge pull request #12385 from hashicorp/tproxy-http-upstream-fix
xds: respect chain protocol on default discovery chain
2022-02-18 10:08:59 -08:00
Daniel Nephin dc484ee09e rpc: set response to nil when not found
Otherwise when the query times out we might incorrectly send a value for
the reply, when we should send an empty reply.

Also document errNotFound and how to handle the result in that case.
2022-02-18 12:26:06 -05:00
Daniel Nephin 6021105dfc ca: test that original certs from secondary still verify
There's a chance this could flake if the secondary hasn't received the
update yet, but running this test many times doesn't show any flakes
yet.
2022-02-17 18:45:16 -05:00
Daniel Nephin 6b679aa9d4 Update TODOs to reference an issue with more details
And remove a no longer needed TODO
2022-02-17 18:21:30 -05:00
Daniel Nephin 1853a32df6 ca: add test cases for rotating external trusted CA 2022-02-17 18:21:30 -05:00
Daniel Nephin 5e8ea2a039 ca: add a test for secondary with external CA 2022-02-17 18:21:30 -05:00
Daniel Nephin 42ec34d101 ca: examine the full chain in newCARoot
make TestNewCARoot much more strict
compare the full result instead of only a few fields.
add a test case with 2 and 3 certificates in the pem
2022-02-17 18:21:30 -05:00
Daniel Nephin 71f3ae04e2 ca: small docs improvements 2022-02-17 18:21:30 -05:00
Daniel Nephin 86994812ed ca: cleanup validateSetIntermediate 2022-02-17 18:21:30 -05:00
Daniel Nephin c1c1580bf8 ca: only return the leaf cert from Sign in vault provider
The interface is documented as 'Sign will only return the leaf', and the other providers
only return the leaf. It seems like this was added during the initial implementation, so
is likely just something we missed. It doesn't break anything , but it does cause confusing cert chains
in the API response which could break something in the future.
2022-02-17 18:21:30 -05:00
Daniel Nephin 85ecbaf109
Merge pull request #12110 from hashicorp/dnephin/blocking-queries-not-found
rpc: make blocking queries for non-existent items more efficient
2022-02-17 18:09:39 -05:00
Ashwin Venkatesh 6e6cd928a2
Parse datacenter from request (#12370)
* Parse datacenter from request
- Parse the value of the datacenter from the create/delete requests for AuthMethods and BindingRules so that they can be created in and deleted from the datacenters specified in the request.
2022-02-17 16:41:27 -05:00
Kyle Havlovitz 3fe358b831 xds: respect chain protocol on default discovery chain 2022-02-17 11:47:20 -08:00
Florian Apolloner f01f00fc84
Support for connect native services in topology view. (#12098) 2022-02-16 16:51:54 -05:00
Chris S. Kim 154b781bc8
Move IndexEntryName helpers to common files (#12365) 2022-02-16 12:56:38 -05:00
Daniel Nephin 8a6e75ac81 rpc: add errNotFound to all Get queries
Any query that returns a list of items is not part of this commit.
2022-02-15 18:24:34 -05:00
Daniel Nephin 4b33bdf396 Make blockingQuery efficient with 'not found' results.
By using the query results as state.

Blocking queries are efficient when the query matches some results,
because the ModifyIndex of those results, returned as queryMeta.Mindex,
will never change unless the items themselves change.

Blocking queries for non-existent items are not efficient because the
queryMeta.Index can (and often does) change when other entities are
written.

This commit reduces the churn of these queries by using a different
comparison for "has changed". Instead of using the modified index, we
use the existence of the results. If the previous result was "not found"
and the new result is still "not found", we know we can ignore the
modified index and continue to block.

This is done by setting the minQueryIndex to the returned
queryMeta.Index, which prevents the query from returning before a state
change is observed.
2022-02-15 18:24:33 -05:00
Daniel Nephin 897b953f66 Add a test for blocking query on non-existent entry
This test shows how blocking queries are not efficient when the query
returns no results.  The test fails with 100+ calls instead of the
expected 2.

This test is still a bit flaky because it depends on the timing of the
writes. It can sometimes return 3 calls.

A future commit should fix this and make blocking queries even more
optimal for not-found results.
2022-02-15 18:23:17 -05:00
Daniel Nephin 3301f94004 rpc: improve docs for blockingQuery
Follow the Go convention of accepting a small interface that documents
the methods used by the function.

Clarify the rules for implementing a query function passed to
blockingQuery.
2022-02-15 14:20:14 -05:00
R.B. Boyer 115946da99
server: conditionally avoid writing a config entry to raft if it was already the same (#12321)
This will both save on unnecessary raft operations as well as
unnecessarily incrementing the raft modify index of config entries
subject to no-op updates.
2022-02-14 14:39:12 -06:00
FFMMM 78264a8030
Vendor in rpc mono repo for net/rpc fork, go-msgpack, msgpackrpc. (#12311)
This commit syncs ENT changes to the OSS repo.

Original commit details in ENT:

```
commit 569d25f7f4578981c3801e6e067295668210f748
Author: FFMMM <FFMMM@users.noreply.github.com>
Date:   Thu Feb 10 10:23:33 2022 -0800

    Vendor fork net rpc (#1538)

    * replace net/rpc w consul-net-rpc/net/rpc

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

    * replace msgpackrpc and go-msgpack with fork from mono repo

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

    * gofmt all files touched

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
```

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2022-02-14 09:45:45 -08:00
R.B. Boyer 52009ae86a
missed this test adjustment (#12331) 2022-02-14 11:39:00 -06:00
R.B. Boyer fa4577d1a9
local: fixes a data race in anti-entropy sync (#12324)
The race detector noticed this initially in `TestAgentConfigWatcherSidecarProxy` but it is not restricted to just tests.

The two main changes here were:

- ensure that before we mutate the internal `agent/local` representation of a Service (for tags or VIPs) we clone those fields
- ensure that there's no function argument joint ownership between the caller of a function and the local state when calling `AddService`, `AddCheck`, and related using `copystructure` for now.
2022-02-14 10:41:33 -06:00
Dao Thanh Tung add15e12f7
URL-encode/decode resource names for HTTP API part 5 (#12297) 2022-02-14 10:47:06 -05:00
Mark Anderson 1a16f7ee70 Refactor to make ACL errors more structured. (#12308)
* First phase of refactoring PermissionDeniedError

Add extended type PermissionDeniedByACLError that captures information
about the accessor, particular permission type and the object and name
of the thing being checked.

It may be worth folding the test and error return into a single helper
function, that can happen at a later date.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-02-11 12:53:23 -08:00
Freddy 9580f79f86
Merge pull request #12223 from hashicorp/proxycfg/passthrough-cleanup 2022-02-10 17:35:51 -07:00
freddygv ceb52d649a Account for upstream targets in another DC.
Transparent proxies typically cannot dial upstreams in remote
datacenters. However, if their upstream configures a redirect to a
remote DC then the upstream targets will be in another datacenter.

In that sort of case we should use the WAN address for the passthrough.
2022-02-10 17:01:57 -07:00
freddygv cbea3d203c Fix race of upstreams with same passthrough ip
Due to timing, a transparent proxy could have two upstreams to dial
directly with the same address.

For example:
- The orders service can dial upstreams shipping and payment directly.
- An instance of shipping at address 10.0.0.1 is deregistered.
- Payments is scaled up and scheduled to have address 10.0.0.1.
- The orders service receives the event for the new payments instance
before seeing the deregistration for the shipping instance. At this
point two upstreams have the same passthrough address and Envoy will
reject the listener configuration.

To disambiguate this commit considers the Raft index when storing
passthrough addresses. In the example above, 10.0.0.1 would only be
associated with the newer payments service instance.
2022-02-10 17:01:57 -07:00
freddygv 659ebc05a9 Ensure passthrough addresses get cleaned up
Transparent proxies can set up filter chains that allow direct
connections to upstream service instances. Services that can be dialed
directly are stored in the PassthroughUpstreams map of the proxycfg
snapshot.

Previously these addresses were not being cleaned up based on new
service health data. The list of addresses associated with an upstream
service would only ever grow.

As services scale up and down, eventually they will have instances
assigned to an IP that was previously assigned to a different service.
When IP addresses are duplicated across filter chain match rules the
listener config will be rejected by Envoy.

This commit updates the proxycfg snapshot management so that passthrough
addresses can get cleaned up when no longer associated with a given
upstream.

There is still the possibility of a race condition here where due to
timing an address is shared between multiple passthrough upstreams.
That concern is mitigated by #12195, but will be further addressed
in a follow-up.
2022-02-10 17:01:57 -07:00
Freddy 378a7258e3
Prevent xDS tight loop on cfg errors (#12195) 2022-02-10 15:37:36 -07:00
Dhia Ayachi 4f0a71d7b4
fix race when starting a service while the agent `serviceManager` is … (#12302)
* fix race when starting a service while the agent `serviceManager` is stopping

* add changelog
2022-02-10 13:30:49 -05:00
Daniel Nephin 01784470f3
Merge pull request #12277 from hashicorp/dnephin/panic-in-service-register
catalog: initialize the refs map to prevent a nil panic
2022-02-09 19:48:22 -05:00
Daniel Nephin 82c264b2b3 config-entry: fix a panic when registering a service or ingress gateway 2022-02-09 18:49:48 -05:00
R.B. Boyer 89bd1f57b5
xds: allow only one outstanding delta request at a time (#12236)
Fixes #11876

This enforces that multiple xDS mutations are not issued on the same ADS connection at once, so that we can 100% control the order that they are applied. The original code made assumptions about the way multiple in-flight mutations were applied on the Envoy side that was incorrect.
2022-02-08 10:36:48 -06:00
Daniel Nephin 7ec658b7ac
Merge pull request #12265 from hashicorp/dnephin/logging-in-tests
sdk: add TestLogLevel for setting log level in tests
2022-02-07 16:11:23 -05:00
Daniel Nephin 437f769916 A test to reproduce the issue 2022-02-04 14:04:12 -05:00
Daniel Nephin 51b0f82d0e Make test more readable
And fix typo
2022-02-03 18:44:09 -05:00
Daniel Nephin 608597c7b6 ca: relax and move private key type/bit validation for vault
This commit makes two changes to the validation.

Previously we would call this validation in GenerateRoot, which happens
both on initialization (when a follower becomes leader), and when a
configuration is updated. We only want to do this validation during
config update so the logic was moved to the UpdateConfiguration
function.

Previously we would compare the config values against the actual cert.
This caused problems when the cert was created manually in Vault (not
created by Consul).  Now we compare the new config against the previous
config. Using a already created CA cert should never error now.

Adding the key bit and types to the config should only error when
the previous values were not the defaults.
2022-02-03 17:21:20 -05:00
Daniel Nephin d707173253 ca: small cleanup of TestConnectCAConfig_Vault_TriggerRotation_Fails
Before adding more test cases
2022-02-03 17:21:20 -05:00
Daniel Nephin 3f590bb8a1 testing: fix test failures caused by new log level
These two tests require debug logging enabled, because they look for log lines.

Also switched to testify assertions because the previous errors were not clear.
2022-02-03 17:07:39 -05:00
Daniel Nephin b058845110 sdk: add TestLogLevel for setting log level in tests
And default log level to WARN.
2022-02-03 13:42:28 -05:00
Daniel Nephin 7839b2d7e0 ca: add a test that uses an intermediate CA as the primary CA
This test found a bug in the secondary. We were appending the root cert
to the PEM, but that cert was already appended. This was failing
validation in Vault here:
https://github.com/hashicorp/vault/blob/sdk/v0.3.0/sdk/helper/certutil/types.go#L329

Previously this worked because self signed certs have the same
SubjectKeyID and AuthorityKeyID. So having the same self-signed cert
repeated doesn't fail that check.

However with an intermediate that is not self-signed, those values are
different, and so we fail the check. A test I added in a previous commit
should show that this continues to work with self-signed root certs as
well.
2022-02-02 13:41:35 -05:00
Daniel Nephin ac732ce82b acl: un-embed ACLIdentity
This is safer than embedding two interface because there are a number of
places where we check the concrete type. If we check the concrete type
on the top-level interface it will fail. So instead expose the
ACLIdentity from a method.
2022-02-02 12:07:31 -05:00
Daniel Nephin 9d80c1886a
Merge pull request #12167 from hashicorp/dnephin/acl-resolve-token-3
acl: rename ResolveTokenToIdentityAndAuthorizer to ResolveToken
2022-01-31 19:21:06 -05:00
Daniel Nephin 997bf1e5a4
Merge pull request #12166 from hashicorp/dnephin/acl-resolve-token-2
acl: remove ResolveTokenToIdentity
2022-01-31 19:19:21 -05:00
Daniel Nephin 343b6deb79 acl: rename ResolveTokenToIdentityAndAuthorizer to ResolveToken
This change allows us to remove one of the last remaining duplicate
resolve token methods (Server.ResolveToken).

With this change we are down to only 2, where the second one also
handles setting the default EnterpriseMeta from the token.
2022-01-31 18:04:19 -05:00
Daniel Nephin d363cc0f07 acl: remove unused methods on fakes, and add changelog
Also document the metric that was removed in a previous commit.
2022-01-31 17:53:53 -05:00
Daniel Nephin b2b84e7fc6
Merge pull request #12165 from hashicorp/dnephin/acl-resolve-token
acl: remove some of the duplicate resolve token methods
2022-01-31 13:27:49 -05:00
Mathew Estafanous c5d2bea92c
Change error-handling across handlers. (#12225) 2022-01-31 11:17:35 -05:00
Fulvio 66f0173355
URL-encode/decode resource names for HTTP API part 4 (#12190) 2022-01-28 15:01:47 -05:00
Dan Upton fdfe079674
streaming: split event buffer by key (#12080) 2022-01-28 12:27:00 +00:00
freddygv c31c1158a6 Add failing test
The updated test fails because passthrough upstream addresses are not
being cleaned up.
2022-01-27 18:56:47 -07:00
Daniel Nephin 9b7468f99e ca/provider: remove ActiveRoot from Provider 2022-01-27 13:07:37 -05:00
Daniel Nephin c2b9c81a55 ca: update MockProvider for new interface 2022-01-27 12:51:35 -05:00
Daniel Nephin f05bad4a1d ca: update GenerateRoot godoc 2022-01-27 12:51:35 -05:00
Daniel Nephin 9a59733b7d
Merge pull request #11663 from hashicorp/dnephin/ca-remove-one-call-to-active-root-2
ca: remove second call to Provider.ActiveRoot
2022-01-27 12:41:05 -05:00
Daniel Nephin db0478265b
Merge pull request #12109 from hashicorp/dnephin/blocking-query-1
rpc: make blockingQuery easier to read
2022-01-26 18:13:55 -05:00
Daniel Nephin 7a6e03c19b acl: Remove a call to aclAccessorID
I missed this on the first pass, we no longer need to look up this ID, because we have it
from the Authorizer.
2022-01-26 17:21:45 -05:00
Daniel Nephin 7125fec346
Merge pull request #11221 from hashicorp/dnephin/acl-resolver-5
acl: extract a backend type for the ACLResolverBackend
2022-01-26 16:57:03 -05:00
Dao Thanh Tung 759dd93544
URL-encode/decode resource names for HTTP API part 3 (#12103) 2022-01-26 13:12:42 -05:00
Daniel Nephin f9aef8018b Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-01-26 12:24:13 -05:00
Daniel Nephin 737c0097e0 acl: extract a backend type for the ACLResolverBackend
This is a small step to isolate the functionality that is used for the
ACLResolver from the large Client and Server structs.
2022-01-26 12:24:10 -05:00
R.B. Boyer d2c0945f52
xds: fix for delta xDS reconnect bug in LDS/CDS (#12174)
When a wildcard xDS type (LDS/CDS/SRDS) reconnects from a delta xDS stream,
prior to envoy `1.19.0` it would populate the `ResourceNamesSubscribe` field
with the full list of currently subscribed items, instead of simply omitting it
to infer that it wanted everything (which is what wildcard mode means).

This upstream issue was filed in envoyproxy/envoy#16063 and fixed in
envoyproxy/envoy#16153 which went out in Envoy `1.19.0` and is fixed in later
versions (later refactored in envoyproxy/envoy#16855).

This PR conditionally forces LDS/CDS to be wildcard-only even when the
connected Envoy requests a non-wildcard subscription, but only does so on
versions prior to `1.19.0`, as we should not need to do this on later versions.

This fixes the failure case as described here: #11833 (comment)

Co-authored-by: Huan Wang <fredwanghuan@gmail.com>
2022-01-25 11:24:27 -06:00
Daniel Nephin e134e43da6 acl: remove calls to ResolveIdentityFromToken
We already have an ACLResolveResult, so we can get the accessor ID from
it.
2022-01-22 15:05:42 -05:00
Daniel Nephin edca8d61a3 acl: remove ResolveTokenToIdentity
By exposing the AccessorID from the primary ResolveToken method we can
remove this duplication.
2022-01-22 14:47:59 -05:00
Daniel Nephin a5e8af79c3 acl: return a resposne from ResolveToken that includes the ACLIdentity
So that we can duplicate duplicate methods.
2022-01-22 14:33:09 -05:00
Daniel Nephin 8c9c48e219 acl: remove duplicate methods
Now that ACLResolver is embedded we don't need ResolveTokenToIdentity on
Client and Server.

Moving ResolveTokenAndDefaultMeta to ACLResolver removes the duplicate
implementation.
2022-01-22 14:12:08 -05:00
Daniel Nephin 241663a046 acl: embed ACLResolver in Client and Server
In preparation for removing duplicate resolve token methods.
2022-01-22 14:07:26 -05:00
Chris S. Kim bee18f4a1d
Generate bindata_assetfs.go (#12146) 2022-01-21 16:06:44 -05:00
R.B. Boyer b60d89e7ef bulk rewrite using this script
set -euo pipefail

    unset CDPATH

    cd "$(dirname "$0")"

    for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
        echo "=== require: $f ==="
        sed -i '/require := require.New(t)/d' $f
        # require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
        # require.XXX(tblah) but not require.XXX(t, blah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
        # require.XXX(rblah) but not require.XXX(r, blah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
        gofmt -s -w $f
    done

    for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
        echo "=== assert: $f ==="
        sed -i '/assert := assert.New(t)/d' $f
        # assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
        # assert.XXX(tblah) but not assert.XXX(t, blah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
        # assert.XXX(rblah) but not assert.XXX(r, blah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
        gofmt -s -w $f
    done
2022-01-20 10:46:23 -06:00
R.B. Boyer 31f6f55bbe test: normalize require.New and assert.New syntax 2022-01-20 10:45:56 -06:00
R.B. Boyer 424f3cdd2c
proxycfg: introduce explicit UpstreamID in lieu of bare string (#12125)
The gist here is that now we use a value-type struct proxycfg.UpstreamID
as the map key in ConfigSnapshot maps where we used to use "upstream
id-ish" strings. These are internal only and used just for bidirectional
trips through the agent cache keyspace (like the discovery chain target
struct).

For the few places where the upstream id needs to be projected into xDS,
that's what (proxycfg.UpstreamID).EnvoyID() is for. This lets us ALWAYS
inject the partition and namespace into these things without making
stuff like the golden testdata diverge.
2022-01-20 10:12:04 -06:00
Dan Upton ca3aca92c4
[OSS] Remove remaining references to master (#11827) 2022-01-20 12:47:50 +00:00
VictorBac 31a39c9528
Add GRPC and GRPCUseTLS to api.HealthCheckDefinition (#12108)
* Add GRPC to HealthCheckDefinition

* add GRPC and GRPCUseTLS
2022-01-19 16:09:15 -05:00
Evan Culver e35dd08a63
connect: Upgrade Envoy 1.20 to 1.20.1 (#11895) 2022-01-18 14:35:27 -05:00
Daniel Nephin 71767f1b3e rpc: cleanup exit and blocking condition logic in blockingQuery
Remove some unnecessary comments around query_blocking metric. The only
line that needs any comments in the atomic decrement.

Cleanup the block and return comments and logic. The old comment about
AbandonCh may have been relevant before, but it is expected behaviour
now.

The logic was simplified by inverting the err condition.
2022-01-17 16:59:25 -05:00
Daniel Nephin 72a733bed8 rpc: extract rpcQueryTimeout method
This helps keep the logic in blockingQuery more focused. In the future we
may have a separate struct for RPC queries which may allow us to move this
off of Server.
2022-01-17 16:59:25 -05:00
Daniel Nephin fd0a9fd4f3 rpc: move the index defaulting to setQueryMeta.
This safeguard should be safe to apply in general. We are already
applying it to non-blocking queries that call blockingQuery, so it
should be fine to apply it to others.
2022-01-17 16:59:25 -05:00
Daniel Nephin 4b67d6c18b rpc: add subtests to blockingQuery test 2022-01-17 16:59:25 -05:00
Daniel Nephin f92dc11002 rpc: refactor blocking query
To remove the TODO, and make it more readable.

In general this reduces the scope of variables, making them easier to reason about.
It also introduces more early returns so that we can see the flow from the structure
of the function.
2022-01-17 16:58:47 -05:00
Daniel Nephin f31e0b8b1a
Merge pull request #11661 from hashicorp/dnephin/ca-remove-one-call-to-active-root
ca: remove one call to Provider.ActiveRoot
2022-01-13 16:48:12 -05:00
Kyle Havlovitz 0db874c38b Add virtual IP generation for term gateway backed services 2022-01-12 12:08:49 -08:00
Chris S. Kim 98ea6d1cf1
Fix race with tags (#12041) 2022-01-12 11:24:51 -05:00
Chris S. Kim a0acf9978f
Fix races in anti-entropy tests (#12028) 2022-01-11 14:28:51 -05:00
Mike Morris 1b1a97e8f9
ingress: allow setting TLS min version and cipher suites in ingress gateway config entries (#11576)
* xds: refactor ingress listener SDS configuration

* xds: update resolveListenerSDS call args in listeners_test

* ingress: add TLS min, max and cipher suites to GatewayTLSConfig

* xds: implement envoyTLSVersions and envoyTLSCipherSuites

* xds: merge TLS config

* xds: configure TLS parameters with ingress TLS context from leaf

* xds: nil check in resolveListenerTLSConfig validation

* xds: nil check in makeTLSParameters* functions

* changelog: add entry for TLS params on ingress config entries

* xds: remove indirection for TLS params in TLSConfig structs

* xds: return tlsContext, nil instead of ambiguous err

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>

* xds: switch zero checks to types.TLSVersionUnspecified

* ingress: add validation for ingress config entry TLS params

* ingress: validate listener TLS config

* xds: add basic ingress with TLS params tests

* xds: add ingress listeners mixed TLS min version defaults precedence test

* xds: add more explicit tests for ingress listeners inheriting gateway defaults

* xds: add test for single TLS listener on gateway without TLS defaults

* xds: regen golden files for TLSVersionInvalid zero value, add TLSVersionAuto listener test

* types/tls: change TLSVersion to string

* types/tls: update TLSCipherSuite to string type

* types/tls: implement validation functions for TLSVersion and TLSCipherSuites, make some maps private

* api: add TLS params to GatewayTLSConfig, add tests

* api: add TLSMinVersion to ingress gateway config entry test JSON

* xds: switch to Envoy TLS cipher suite encoding from types package

* xds: fixup validation for TLSv1_3 min version with cipher suites

* add some kitchen sink tests and add a missing struct tag

* xds: check if mergedCfg.TLSVersion is in TLSVersionsWithConfigurableCipherSuites

* xds: update connectTLSEnabled comment

* xds: remove unsued resolveGatewayServiceTLSConfig function

 * xds: add makeCommonTLSContextFromLeafWithoutParams

* types/tls: add LessThan comparator function for concrete values

* types/tls: change tlsVersions validation map from string to TLSVersion keys

* types/tls: remove unused envoyTLSCipherSuites

* types/tls: enable chacha20 cipher suites for Consul agent

* types/tls: remove insecure cipher suites from allowed config

TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 and TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 are both explicitly listed as insecure and disabled in the Go source.

Refs https://cs.opensource.google/go/go/+/refs/tags/go1.17.3:src/crypto/tls/cipher_suites.go;l=329-330

* types/tls: add ValidateConsulAgentCipherSuites function, make direct lookup map private

* types/tls: return all unmatched cipher suites in validation errors

* xds: check that Envoy API value matching TLS version is found when building TlsParameters

* types/tls: check that value is found in map before appending to slice in MarshalEnvoyTLSCipherSuiteStrings

* types/tls: cast to string rather than fmt.Printf in TLSCihperSuite.String()

* xds: add TLSVersionUnspecified to list of configurable cipher suites

* structs: update note about config entry warning

* xds: remove TLS min version cipher suite unconfigurable test placeholder

* types/tls: update tests to remove assumption about private map values

Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2022-01-11 11:46:42 -05:00
Dao Thanh Tung 88c7cfa578
URL-encode/decode resource names for HTTP API part 2 (#11957) 2022-01-11 08:52:45 -05:00
Daniel Nephin d57dec5878 ca: remove unnecessary var, and slightly reduce cyclo complexity
`newIntermediate` is always equal to `needsNewIntermediate`, so we can
remove the extra variable and use the original directly.

Also remove the `activeRoot.ID != newActiveRoot.ID` case from an if,
because that case is already checked above, and `needsNewIntermediate` will
already be true in that case.

This condition now reads a lot better:

> Persist a new root if we did not have one before, or if generated a new intermediate.
2022-01-06 16:56:49 -05:00
Daniel Nephin 0de7efb316 ca: remove unused provider.ActiveRoot call
In the previous commit the single use of this storedRoot was removed.

In this commit the original objective is completed. The
Provider.ActiveRoot is being removed because

1. the secondary should get the active root from the Consul primary DC,
   not the provider, so that secondary DCs do not need to communicate
   with a provider instance in a different DC.
2. so that the Provider.ActiveRoot interface can be changed without
   impacting other code paths.
2022-01-06 16:56:48 -05:00
Daniel Nephin d0578c6dfc ca: extract the lookup of the active primary CA
This method had only one caller, which always looked for the active
root. This commit moves the lookup into the method to reduce the logic
in the one caller.

This is being done in preparation for a larger change. Keeping this
separate so it is easier to see.

The `storedRootID != primaryRoots.ActiveRootID` is being removed because
these can never be different.

The `storedRootID` comes from `provider.ActiveRoot`, the
`primaryRoots.ActiveRootID` comes from the store `CARoot` from the
primary. In both cases the source of the data is the primary DC.

Technically they could be different if someone modified the provider
outside of Consul, but that would break many things, so is not a
supported flow.

If these were out of sync because of ordering of events then the
secondary will soon receive an update to `primaryRoots` and everything
will be sorted out again.
2022-01-06 16:56:48 -05:00
Daniel Nephin 7121c78d34 ca: update godoc
To clarify what to expect from the data stored in this field, and the
behaviour of this function.
2022-01-06 16:56:48 -05:00
Daniel Nephin abac8baa5d ca: remove one call to provider.ActiveRoot
ActiveRoot should not be called from the secondary DC, because there
should not be a requirement to run the same Vault instance in a
secondary DC. SignIntermediate is called in a secondary DC, so it should
not call ActiveRoot

We would also like to change the interface of ActiveRoot so that we can
support using an intermediate cert as the primary CA in Consul. In
preparation for making that change I am reducing the number of calls to
ActiveRoot, so that there are fewer code paths to modify when the
interface changes.

This change required a change to the mockCAServerDelegate we use in
tests. It was returning the RootCert for SignIntermediate, but that is
not an accurate fake of production. In production this would also be a
separate cert.
2022-01-06 16:55:50 -05:00
Daniel Nephin eaa084fd41 ca: remove redundant append of an intermediate cert
Immediately above this line we are already appending the full list of
intermediates. The `provider.ActiveIntermediate` MUST be in this list of
intermediates because it must be available to all the other non-leader
Servers.  If it was not in this list of intermediates then any proxy
that received data from a non-leader would have the wrong certs.

This is being removed now because we are planning on changing the
`Provider.ActiveIntermediate` interface, and removing these extra calls ahead of
time helps make that change easier.
2022-01-06 16:55:50 -05:00
Daniel Nephin 11f4cdaa49 ca: only generate a single private key for the whole test case
Using tracing and cpu profiling I found that the majority of the time in
these test cases is spent generating a private key. We really don't need
separate private keys, so we can generate only one and use it for all
cases.

With this change the test runs much faster.
2022-01-06 16:55:50 -05:00
Daniel Nephin b3ffe7ac72 ca: cleanup a test
Fix the name to match the function it is testing

Remove unused code

Fix the signature, instead of returning (error, string) which should be (string, error)
accept a testing.T to emit errors.

Handle the error from encode.
2022-01-06 16:55:49 -05:00
Daniel Nephin 1fd6b16399 ca: use the new leaf signing lookup func in leader metrics 2022-01-06 16:55:49 -05:00
Blake Covarrubias 4bd92921f4
api: Return 404 when deregistering a non-existent check (#11950)
Update the `/agent/check/deregister/` API endpoint to return a 404
HTTP response code when an attempt is made to de-register a check ID
that does not exist on the agent.

This brings the behavior of /agent/check/deregister/ in line with the
behavior of /agent/service/deregister/ which was changed in #10632 to
similarly return a 404 when de-registering non-existent services.

Fixes #5821
2022-01-06 12:38:37 -08:00
Dhia Ayachi 1eac39ae9c
clone the service under lock to avoid a data race (#11940)
* clone the service under lock to avoid a data race

* add change log

* create a struct and copy the pointer to mutate it to avoid a data race

* fix failing test

* revert added space

* add comments, to clarify the data race.
2022-01-06 14:33:06 -05:00
Daniel Nephin 065f6f89fb
Merge pull request #11918 from hashicorp/dnephin/tob-followup
Fix a few small bugs
2022-01-05 18:50:48 -05:00
Daniel Nephin abfc1e4840 snapshot: return the error from replyFn
The only function passed to SnapshotRPC today always returns a nil error, so there's no
way to exercise this bug in practice. This change is being made for correctness so that
it doesn't become a problem in the future, if we ever pass a different function to
SnapshotRPC.
2022-01-05 17:51:03 -05:00
Daniel Nephin 0166b0839c config: correctly capture all errors.
Some calls to multierror.Append were not using the existing b.err, which meant we
were losing all previous errors.
2022-01-05 17:51:03 -05:00
Chris S. Kim 4cd2542a3e
Fix test for ENT (#11946) 2022-01-05 15:18:08 -05:00
Chris S. Kim e4bcaac08c
Fix test for ENT (#11941) 2022-01-05 12:24:44 -05:00
Dhia Ayachi e653f81919
reset `coalesceTimer` to nil as soon as the event is consumed (#11924)
* reset `coalesceTimer` to nil as soon as the event is consumed

* add change log

* refactor to add relevant test.

* fix linter

* Apply suggestions from code review

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* remove non needed check

Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2022-01-05 12:17:47 -05:00
Mathew Estafanous 0fdd1318e9
Ensure consistency with error-handling across all handlers. (#11599) 2022-01-05 12:11:03 -05:00
Jared Kirschner b393c90ce7 Clarify service and check error messages (use ID)
Error messages related to service and check operations previously included
the following substrings:
- service %q
- check %q

From this error message, it isn't clear that the expected field is the ID for
the entity, not the name. For example, if the user has a service named test,
the error message would read 'Unknown service "test"'. This is misleading -
a service with that *name* does exist, but not with that *ID*.

The substrings above have been modified to make it clear that ID is needed,
not name:
- service with ID %q
- check with ID %q
2022-01-04 11:42:37 -08:00
Jared Kirschner a36ddc31c7
Merge pull request #11335 from littlestar642/url-encoded-args
URL-encode/decode resource names for HTTP API
2022-01-04 14:00:14 -05:00
Chris S. Kim 30550f2c63
testing: Revert assertion for virtual IP flag (#11932) 2022-01-04 11:24:56 -05:00
Jared Kirschner e0ddb9e4c5
Merge pull request #11820 from hashicorp/improve-ui-disabled-api-response
http: improve UI not enabled response message
2022-01-03 12:00:01 -05:00
littlestar642 634c72d22f add path escape and unescape to path params 2022-01-03 08:18:32 -08:00
Daniel Nephin 1683da66b0
Merge pull request #11796 from hashicorp/dnephin/cleanup-test-server
testing: stop using an old version in testServer
2021-12-22 16:04:04 -05:00
freddygv 21f2c2e68d Purge chain if it shouldn't be there 2021-12-13 18:56:44 -07:00
freddygv fe85138453 additional test fixes 2021-12-13 18:56:44 -07:00
freddygv d26b4860fd Account for new upstreams constraint in tests 2021-12-13 18:56:28 -07:00
freddygv 2fe27b748d Check ingress upstreams when gating chain watches 2021-12-13 18:56:28 -07:00
freddygv 6814e84459 Use ptr receiver in all Upstream methods 2021-12-13 18:56:14 -07:00
freddygv 6af9a0d8cf Avoid storing chain without an upstream 2021-12-13 18:56:14 -07:00
freddygv ba12dc215b Clean up chains separately from their watches 2021-12-13 18:56:14 -07:00
freddygv c5c290c503 Validate chains are associated with upstreams
Previously we could get into a state where discovery chain entries were
not cleaned up after the associated watch was cancelled. These changes
add handling for that case where stray chain references are encountered.
2021-12-13 18:56:13 -07:00
freddygv 70d6358426 Store intention upstreams in snapshot 2021-12-13 18:56:13 -07:00
R.B. Boyer 81ea8129d7
proxycfg: ensure all of the watches are canceled if they are cancelable (#11824) 2021-12-13 15:56:17 -06:00
Jared Kirschner f81dd817ff
Merge pull request #11818 from hashicorp/improve-url-not-found-response
http: improve 404 Not Found response message
2021-12-13 16:08:50 -05:00
R.B. Boyer 4aabbe529c
proxycfg: use external addresses in tproxy when crossing partition boundaries (#11823) 2021-12-13 14:34:49 -06:00
Jared Kirschner 2de79abc00 http: improve 404 Not Found response message
When a URL path is not found, return a non-empty message with the 404 status
code to help the user understand what went wrong. If the URL path was not
prefixed with '/v1/', suggest that may be the cause of the problem (which is a
common mistake).
2021-12-13 11:03:25 -08:00
Freddy 85fe875d07
Use anonymousToken when querying by secret ID (#11813)
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Dan Upton <daniel@floppy.co>

This query has been incorrectly querying by accessor ID since New ACLs
were added. However, the legacy token compat allowed this to continue to
work, since it made a fallback query for the anonymousToken ID.

PR #11184 removed this legacy token query, which means that the query by
accessor ID is now the only check for the anonymous token's existence.

This PR updates the GetBySecret call to use the secret ID of the token.
2021-12-13 10:56:09 -07:00
R.B. Boyer 631c649291
various partition related todos (#11822) 2021-12-13 11:43:33 -06:00
Jared Kirschner 34ea9ae8c9 http: improve UI not enabled response message
Response now clearly indicates:
- the UI is disabled
- how to enable the UI
2021-12-13 08:48:33 -08:00
Kyle Havlovitz b50ef696c6
Merge pull request #11812 from hashicorp/metrics-ui-acls
oss: use wildcard partition in metrics proxy ui endpoint
2021-12-10 16:24:47 -08:00
Kyle Havlovitz 9dcaf0539c
Merge pull request #11798 from hashicorp/vip-goroutine-check
leader: move the virtual IP version check into a goroutine
2021-12-10 15:59:35 -08:00
Kyle Havlovitz 018693b6ee acl: use wildcard partition in metrics proxy ui endpoint 2021-12-10 15:58:17 -08:00
Kyle Havlovitz 80a4489844 state: fix freed VIP table id index 2021-12-10 14:41:45 -08:00
Kyle Havlovitz ecbd3eb2a6 Exit before starting the vip check routine if possible 2021-12-10 14:30:50 -08:00
Daniel Nephin 0a9cb62859 testing: Deprecate functions for creating a server.
These helper functions actually end up hiding important setup details
that should be visible from the test case. We already have a convenient
way of setting this config when calling newTestServerWithConfig.
2021-12-09 20:09:29 -05:00
Daniel Nephin c9a992f5e8 testing: remove old config.Build version
DefaultConfig already sets the version to version.Version, so by removing this
our tests will run with the version that matches the code.
2021-12-09 20:09:29 -05:00
Kyle Havlovitz 04ef1c3fa0 leader: move the virtual IP version check into a goroutine 2021-12-09 17:00:33 -08:00
FFMMM 74eb257b1c
[sync ent] increase segment max limit to 4*64, make configurable (#1424) (#11795)
* commit b6eb27563e747a78b7647d2b5da405e46364cc46
Author: FFMMM <FFMMM@users.noreply.github.com>
Date:   Thu Dec 9 13:53:44 2021 -0800

    increase segment max limit to 4*64, make configurable (#1424)

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

* fix: rename ent changelog file

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2021-12-09 15:36:11 -08:00
Daniel Nephin f9647ece05
Merge pull request #11780 from hashicorp/dnephin/ca-test-vault-in-secondary
ca: improve test coverage for RenewIntermediate
2021-12-09 12:29:43 -05:00
R.B. Boyer bb75e63eb4
agent: ensure service maintenance checks for matching partitions ahead of other errors (#11788)
This matches behavior in most other agent api endpoints.
2021-12-09 10:05:02 -06:00
Daniel Nephin 4116a143e0 fix misleading errors on vault shutdown 2021-12-08 18:42:52 -05:00
Daniel Nephin 968aeff1bb ca: prune some unnecessary lookups in the tests 2021-12-08 18:42:52 -05:00
Daniel Nephin 305655a8b1 ca: remove duplicate WaitFor function 2021-12-08 18:42:52 -05:00
Daniel Nephin 1dec6bb815 ca: fix flakes in RenewIntermediate tests
I suspect one problem was that we set structs.IntermediateCertRenewInterval to 1ms, which meant
that in some cases the intermediate could renew before we stored the original value.

Another problem was that the 'wait for intermediate' loop was calling the provider.ActiveIntermediate,
but the comparison needs to use the RPC endpoint to accurately represent a user request. So
changing the 'wait for' to use the state store ensures we don't race.

Also moves the patching into a separate function.

Removes the addition of ca.CertificateTimeDriftBuffer as part of calculating halfTime. This was added
in a previous commit to attempt to fix the flake, but it did not appear to fix the problem. Adding the
time here was making the tests fail when using the shared patch
function. It's not clear to me why, but there's no reason we should be
including this time in the halfTime calculation.
2021-12-08 18:42:52 -05:00
Daniel Nephin 2e4e8bd791 ca: improve RenewIntermediate tests
Use the new verifyLearfCert to show the cert verifies with intermediates
from both sources. This required using the RPC interface so that the
leaf pem was constructed correctly.

Add IndexedCARoots.Active since that is a common operation we see in a
few places.
2021-12-08 18:42:52 -05:00
Daniel Nephin a4ba1f348d ca: add a test for Vault in secondary DC 2021-12-08 18:42:51 -05:00
Daniel Nephin a5d9b1d322 ca: Add CARoots.Active method
Which will be used in the next commit.
2021-12-08 18:41:51 -05:00
R.B. Boyer 5f5720837b
acl: ensure that the agent recovery token is properly partitioned (#11782) 2021-12-08 17:11:55 -06:00
Daniel Nephin f72e285fe8
Merge pull request #11721 from hashicorp/dnephin/ca-export-fsm-operation
ca: use the real FSM operation in tests
2021-12-08 17:49:00 -05:00
Daniel Nephin 214dcf8d0d ca: use the real FSM operation in tests
Previously we had a couple copies that reproduced the FSM operation.
These copies introduce risk that the test does not accurately match
production.

This PR removes the test versions of the FSM operation, and exports the
real production FSM operation so that it can be used in tests.

The consul provider tests did need to change because of this. Previously
we would return a hardcoded value of 2, but in production this value is
always incremented.
2021-12-08 17:29:44 -05:00
R.B. Boyer 592ac8f96a
test: test server should auto cleanup (#11779) 2021-12-08 13:26:06 -06:00
Evan Culver 7a365fa0da
rpc: Unset partition before forwarding to remote datacenter (#11758) 2021-12-08 11:02:14 -08:00
Daniel Nephin dccd3f5806 Merge remote-tracking branch 'origin/main' into serve-panic-recovery 2021-12-07 16:30:41 -05:00
Dan Upton 7efab269c0
Rename `Master` and `AgentMaster` fields in config protobuf (#11764) 2021-12-07 19:59:38 +00:00
Chris S. Kim f8f8580ab2
Godocs updates for catalog endpoints (#11716) 2021-12-07 10:18:28 -05:00
Mathew Estafanous 0a9621ec7a
Transition all endpoint tests in agent_endpoint_test.go to go through ServeHTTP (#11499) 2021-12-07 09:44:03 -05:00
Dan Upton 205ce9a69d
Remove references to "master" ACL tokens in tests (#11751) 2021-12-07 12:48:50 +00:00
Dan Upton 7fe81171d9
Rename `ACLMasterToken` => `ACLInitialManagementToken` (#11746) 2021-12-07 12:39:28 +00:00
Dan Upton 3a91815169
agent/token: rename `agent_master` to `agent_recovery` (internally) (#11744) 2021-12-07 12:12:47 +00:00
R.B. Boyer 9315a9812f return the max 2021-12-06 15:36:52 -06:00
freddygv 60fe5f75bb Remove support for failover to partition
Failing over to a partition is more siimilar to failing over to another
datacenter than it is to failing over to a namespace. In a future
release we should update how localities for failover are specified. We
should be able to accept a list of localities which can include both
partition and datacenter.
2021-12-06 12:32:24 -07:00
freddygv 5c1f7aa372 Allow cross-partition references in disco chain
* Add partition fields to targets like service route destinations
* Update validation to prevent cross-DC + cross-partition references
* Handle partitions when reading config entries for disco chain
* Encode partition in compiled targets
2021-12-06 12:32:19 -07:00
R.B. Boyer b1605639fc
light refactors to support making partitions and serf-based wan federation are mutually exclusive (#11755) 2021-12-06 13:18:02 -06:00
R.B. Boyer e20e6348dd
areas: make the gRPC server tracker network area aware (#11748)
Fixes a bug whereby servers present in multiple network areas would be
properly segmented in the Router, but not in the gRPC mirror. This would
lead servers in the current datacenter leaving from a network area
(possibly during the network area's removal) from deleting their own
records that still exist in the standard WAN area.

The gRPC client stack uses the gRPC server tracker to execute all RPCs,
even those targeting members of the current datacenter (which is unlike
the net/rpc stack which has a bypass mechanism).

This would manifest as a gRPC method call never opening a socket because
it would block forever waiting for the current datacenter's pool of
servers to be non-empty.
2021-12-06 09:55:54 -06:00
Freddy a725f06c83
Merge pull request #11739 from hashicorp/ap/exports-rename 2021-12-06 08:20:50 -07:00
freddygv e91509383f Clean up additional refs to partition exports 2021-12-04 15:16:40 -07:00
freddygv ed6076db26 Rename partition-exports to exported-services
Using a name less tied to partitions gives us more flexibility to use
this config entry in OSS for exports between datacenters/meshes.
2021-12-03 17:47:31 -07:00
freddygv f5b25401b3 Update intention topology to use new table 2021-12-03 17:28:31 -07:00
freddygv 55970c6ccd Avoid updating default decision from wildcard ixn
Given that we do not allow wildcard partitions in intentions, no one ixn
can override the DefaultAllow setting. Only the default ACL policy
applies across all partitions.
2021-12-03 17:28:12 -07:00
freddygv 497aab669f Add a new table to query service names by kind
This table purposefully does not index by partition/namespace. It's a
global view into all service names.

This table is intended to replace the current serviceListTxn watch in
intentionTopologyTxn. For cross-partition transparent proxying we need
to be able to calculate upstreams from intentions in any partition. This
means that the existing serviceListTxn function is insufficient since
it's scoped to a partition.

Moving away from that function is also beneficial because it watches the
main "services" table, so watchers will wake up when any instance is
registered or deregistered.
2021-12-03 17:28:12 -07:00
freddygv e7a7042c69 Update listener generation to account for consul VIP 2021-12-03 17:27:56 -07:00
Freddy f032d6ef05
Merge pull request #11680 from hashicorp/ap/partition-exports-oss 2021-12-03 16:57:50 -07:00
Dan Upton 3b9dfca88d
internal: support `ResultsFilteredByACLs` flag/header (#11643) 2021-12-03 23:04:24 +00:00
Dan Upton c8204330ed
query: support `ResultsFilteredByACLs` in query list endpoint (#11620) 2021-12-03 23:04:09 +00:00
Dhia Ayachi ce326b6074
port oss changes (#11736) 2021-12-03 17:23:55 -05:00
Freddy e246defb6c
Merge pull request #11720 from hashicorp/bbolt 2021-12-03 14:44:36 -07:00
Dan Upton 047aa2ffb0
fedstate: support `ResultsFilteredByACLs` in `ListMeshGateways` endpoint (#11644) 2021-12-03 20:56:55 +00:00
Dan Upton 361d9c2862
catalog: support `ResultsFilteredByACLs` flag/header (#11594) 2021-12-03 20:56:14 +00:00
Dan Upton 4c0956c03a
coordinate: support `ResultsFilteredByACLs` flag/header (#11617) 2021-12-03 20:51:02 +00:00
Dan Upton bf1e2ca551
sessions: support `ResultsFilteredByACLs` flag/header (#11606) 2021-12-03 20:43:43 +00:00
Dan Upton d92f0d84c6
txn: support `ResultsFilteredByACLs` flag in `Read` endpoint (#11632) 2021-12-03 20:41:03 +00:00
Dan Upton 547aa219ea
agent: support `X-Consul-Results-Filtered-By-ACLs` header in agent-local endpoints (#11610) 2021-12-03 20:36:28 +00:00
Dhia Ayachi 86159c6ed8
sessions partitioning tests (#11734)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused operations_ent.go and operations_oss.go func

* remove unused const

* convert `IndexID` of `session_checks` table

* convert `indexSession` of `session_checks` table

* convert `indexNodeCheck` of `session_checks` table

* partition `indexID` and `indexSession` of `tableSessionChecks`

* fix oss linter

* fix review comments

* remove partition for Checks as it's always use the session partition

* fix tests

* fix tests

* do not namespace nodeChecks index

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-12-03 15:36:07 -05:00
Dan Upton c314be2ff9
intention: support `ResultsFilteredByACLs` flag/header (#11612) 2021-12-03 20:35:54 +00:00
Mark Anderson a89ffba2d4
Cross port of ent #1383 (#11726)
Cross port of ent #1383 "Reject non-default datacenter when making partitioned ACLs"

On the OSS side this is a minor refactor to add some more checks that are only applicable to enterprise code.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-12-03 10:20:25 -08:00
Dan Upton 599a4d6619
config: support `ResultsFilteredByACLs` in list/list all endpoints (#11621) 2021-12-03 17:39:47 +00:00
Dan Upton c4c68915c9
event: support `X-Consul-Results-Filtered-By-ACLs` header in list (#11616) 2021-12-03 17:38:59 +00:00
Dan Upton 474ef7cc1f
kv: support `ResultsFilteredByACLs` in list/list keys (#11593) 2021-12-03 17:31:48 +00:00
Dan Upton cf1bd585f6
health: support `ResultsFilteredByACLs` flag/header (#11602) 2021-12-03 17:31:32 +00:00
Dan Upton 1e47e3c82b
Groundwork for exposing when queries are filtered by ACLs (#11569) 2021-12-03 17:11:26 +00:00
Kyle Havlovitz 0546bbe08a dns: add endpoint for querying service virtual IPs 2021-12-02 16:40:28 -08:00
Kyle Havlovitz 6f34a4f777
Merge pull request #11724 from hashicorp/service-virtual-ips
oss: add virtual IP generation for connect services
2021-12-02 16:16:57 -08:00
Kyle Havlovitz 4f2cfee4b0 consul: add virtual IP generation for connect services 2021-12-02 15:42:47 -08:00
R.B. Boyer c46f9f9f31
agent: add variation of force-leave that exclusively works on the WAN (#11722)
Fixes #6548
2021-12-02 17:15:10 -06:00
Matt Keeler c7a94843ee Emit raft-boltdb metrics 2021-12-02 16:56:15 -05:00
Daniel Nephin e47cecc653 config: add NoFreelistSync option
# Conflicts:
#	agent/config/testdata/TestRuntimeConfig_Sanitize-enterprise.golden
#	agent/consul/server.go
2021-12-02 16:56:15 -05:00
Matt Keeler 42a5635bc3 Use raft-boltdb/v2 2021-12-02 16:56:15 -05:00
Daniel Nephin 17a2d14d49 ca: set the correct SigningKeyID after config update with Vault provider
The test added in this commit shows the problem. Previously the
SigningKeyID was set to the RootCert not the local leaf signing cert.

This same bug was fixed in two other places back in 2019, but this last one was
missed.

While fixing this bug I noticed I had the same few lines of code in 3
places, so I extracted a new function for them.

There would be 4 places, but currently the InitializeCA flow sets this
SigningKeyID in a different way, so I've left that alone for now.
2021-12-02 16:07:11 -05:00
Daniel Nephin 96f95889db
Merge pull request #11713 from hashicorp/dnephin/ca-test-names
ca: make test naming consistent
2021-12-02 16:05:42 -05:00
Daniel Nephin ff4581092e
Merge pull request #11671 from hashicorp/dnephin/ca-fix-storing-vault-intermediate
ca: fix storing the leaf signing cert with Vault provider
2021-12-02 16:02:24 -05:00
Daniel Nephin 81afb208ac
Merge pull request #11677 from hashicorp/dnephin/freeport-interface
sdk: use t.Cleanup in freeport and remove unnecessary calls
2021-12-02 15:58:41 -05:00
Daniel Nephin 447097b166 ca: make test naming consistent
While working on the CA system it is important to be able to run all the
tests related to the system, without having to wait for unrelated tests.
There are many slow and unrelated tests in agent/consul, so we need some
way to filter to only the relevant tests.

This PR renames all the CA system related tests to start with either
`TestCAMananger` for tests of internal operations that don't have RPC
endpoint, or `TestConnectCA` for tests of RPC endpoints. This allows us
to run all the test with:

    go test -run 'TestCAMananger|TestConnectCA' ./agent/consul

The test naming follows an undocumented convention of naming tests as
follows:

    Test[<struct name>_]<function name>[_<test case description>]

I tried to always keep Primary/Secondary at the end of the description,
and _Vault_ has to be in the middle because of our regex to run those
tests as a separate CI job.

You may notice some of the test names changed quite a bit. I did my best
to identify the underlying method being tested, but I may have been
slightly off in some cases.
2021-12-02 14:57:09 -05:00
FFMMM 384d497f26
add MustRevalidate flag to connect_ca_leaf cache type; always use on non-blocking queries (#11693)
* always use MustRevalidate on non-blocking queries for connect ca leaf

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

* Update agent/agent_endpoint_test.go

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* pr feedback

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-12-02 11:32:15 -08:00
Daniel Nephin 28a8a64019 ca: make getLeafSigningCertFromRoot safer
As a method on the struct type this would not be safe to call without first checking
c.isIntermediateUsedToSignLeaf.

So for now, move this logic to the CAMananger, so that it is always correct.
2021-12-02 12:42:49 -05:00
Daniel Nephin b29faa3e50 ca: fix stored CARoot representation with Vault provider
We were not adding the local signing cert to the CARoot. This commit
fixes that bug, and also adds support for fixing existing CARoot on
upgrade.

Also update the tests for both primary and secondary to be more strict.
Check the SigningKeyID is correct after initialization and rotation.
2021-12-02 12:42:49 -05:00
Dan Upton bf56a2c495
Rename `agent_master` ACL token in the API and CLI (#11669) 2021-12-02 17:05:27 +00:00
Dan Upton d8afd2f6c8
Rename `master` and `agent_master` ACL tokens in the config file format (#11665) 2021-12-01 21:08:14 +00:00
Chris S. Kim 54e4d1b7b2
ENT to OSS sync (#11703) 2021-12-01 14:56:10 -05:00
R.B. Boyer db91cbf484
auto-config: ensure the feature works properly with partitions (#11699) 2021-12-01 13:32:34 -06:00
Daniel Nephin 32ef9c5d5c ca: add some godoc and func for finding leaf signing cert
This will be used in a follow up commit.
2021-11-30 18:36:41 -05:00
Daniel Nephin 4185045a7f sdk/freeport: rename Port to GetOne
For better consistency with GetN
2021-11-30 17:32:41 -05:00
Chris S. Kim 56fab21582
Refactor test helper (#11689)
Allow custom ACL root tokens to be passed
2021-11-30 13:22:07 -05:00
Chris S. Kim 36246c5791
acl: Fill authzContext from token in Coordinate endpoints (#11688) 2021-11-30 13:17:41 -05:00
freddygv dd662d7058 Move ent config test to ent file 2021-11-29 12:15:17 -07:00
freddygv 5e1f7b7c36 Prevent partition-exports entry from OSS usage
Validation was added on the config entry kind since that is called when
validating config entries to bootstrap via agent configuration and when
applying entries via the config RPC endpoint.
2021-11-29 11:24:16 -07:00
Daniel Nephin e8312d6b5a testing: remove unnecessary calls to freeport
Previously we believe it was necessary for all code that required ports
to use freeport to prevent conflicts.

https://github.com/dnephin/freeport-test shows that it is actually save
to use port 0 (`127.0.0.1:0`) as long as it is passed directly to
`net.Listen`, and the listener holds the port for as long as it is
needed.

This works because freeport explicitly avoids the ephemeral port range,
and port 0 always uses that range. As you can see from the test output
of https://github.com/dnephin/freeport-test, the two systems never use
overlapping ports.

This commit converts all uses of freeport that were being passed
directly to a net.Listen to use port 0 instead. This allows us to remove
a bit of wrapping we had around httptest, in a couple places.
2021-11-29 12:19:43 -05:00
Daniel Nephin d795a73f78 testing: use the new freeport interfaces 2021-11-27 15:39:46 -05:00
Daniel Nephin 56f9238d15 go-sso: remove returnFunc now that freeport handles return 2021-11-27 15:29:38 -05:00
Daniel Nephin 8c7475d95e sdk: add freeport functions that use t.Cleanup 2021-11-27 15:04:43 -05:00
Daniel Nephin 59204598c8 ca: clean up unnecessary raft.Apply response checking
In d2ab767fef raftApply was changed to handle this check in
a single place, instad of having every caller check it. It looks like these few places
were missed when I did that clean up.

This commit removes the remaining resp.(error) checks, since they are all no-ops now.
2021-11-26 17:57:55 -05:00
Daniel Nephin 52f0853ff9
Merge pull request #11339 from hashicorp/dnephin/ca-manager-isolate-secondary-2
ca: reduce use of state in the secondary
2021-11-26 14:41:45 -05:00
Daniel Nephin 91a0c25932 ca: remove state check in secondarySetPrimaryRoots
This function is only ever called from operations that have already acquired the state lock, so checking
the value of state can never fail.

This change is being made in preparation for splitting out a separate type for the secondary logic. The
state can't easily be shared, so really only the expored top-level functions should acquire the 'state lock'.
2021-11-26 14:14:47 -05:00
Daniel Nephin f1944458e4 ca: remove actingSecondaryCA
This commit removes the actingSecondaryCA field, and removes the stateLock around it. This field
was acting as a proxy for providerRoot != nil, so replace it with that check instead.

The two methods which called secondarySetCAConfigured already set the state, so checking the
state again at this point will not catch runtime errors (only programming errors, which we can catch with tests).
In general, handling state transitions should be done on the "entrypoint" methods where execution starts, not
in every internal method.

This is being done to remove some unnecessary references to c.state, in preparations for extracting
types for primary/secondary.
2021-11-26 14:14:47 -05:00
Daniel Nephin b92084b8e8 ca: reduce consul provider backend interface a bit
This makes it easier to fake, which will allow me to use the ConsulProvider as
an 'external PKI' to test a customer setup where the actual root CA is not
the root we use for the Consul CA.

Replaces a call to the state store to fetch the clusterID with the
clusterID field already available on the built-in provider.
2021-11-25 11:46:06 -05:00
Dhia Ayachi 3820e09a47
Partition/kv indexid sessions (#11639)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused operations_ent.go and operations_oss.go func

* convert `indexNodeCheck` of `session_checks` table

* partition `indexID` and `indexSession` of `tableSessionChecks`

* remove partition for Checks as it's always use the session partition

* partition sessions index id table

* fix rebase issues

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-24 11:34:36 -05:00
Dhia Ayachi bb83624950
Partition session checks store (#11638)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused operations_ent.go and operations_oss.go func

* remove unused const

* convert `IndexID` of `session_checks` table

* convert `indexSession` of `session_checks` table

* convert `indexNodeCheck` of `session_checks` table

* partition `indexID` and `indexSession` of `tableSessionChecks`

* fix oss linter

* fix review comments

* remove partition for Checks as it's always use the session partition

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-24 09:10:38 -05:00
Chris S. Kim 2350e7e56a
cleanup: Clarify deprecated legacy intention endpoints (#11635) 2021-11-23 19:32:18 -05:00
Chris S. Kim db5ee0e4d2
Merge from ent (#11506) 2021-11-19 11:50:44 -05:00
R.B. Boyer dd4a59db8e
agent: purge service/check registration files for incorrect partitions on reload (#11607) 2021-11-18 14:44:20 -06:00
Iryna Shustava 0ee456649f
connect: Support auth methods for the vault connect CA provider (#11573)
* Support vault auth methods for the Vault connect CA provider
* Rotate the token (re-authenticate to vault using auth method) when the token can no longer be renewed
2021-11-18 13:15:28 -07:00
Daniel Nephin b4080bc0dc ca: use the cluster ID passed to the primary
instead of fetching it from the state store.
2021-11-16 16:57:22 -05:00
Daniel Nephin b9ab9bae12 ca: accept only the cluster ID to SpiffeIDSigningForCluster
To make it more obivous where ClusterID is used, and remove the need to create a struct
when only one field is used.
2021-11-16 16:57:21 -05:00
Will Jordan 68efecafed
Update node info sync comment (#11465) 2021-11-16 11:16:11 -08:00
R.B. Boyer 1e02460bd1
re-run gofmt on 1.17 (#11579)
This should let freshly recompiled golangci-lint binaries using Go 1.17
pass 'make lint'
2021-11-16 12:04:01 -06:00
R.B. Boyer eb21649f82
partitions: various refactors to support partitioning the serf LAN pool (#11568) 2021-11-15 09:51:14 -06:00
freddygv 0e507492d0 Update proxycfg for ingress service partitions 2021-11-12 14:33:31 -07:00
freddygv e5b7c4713f Accept partition for ingress services 2021-11-12 14:33:14 -07:00
freddygv 400697507b Move assertion to after config fetch 2021-11-10 10:50:08 -07:00
freddygv da5bcc574e Use ClusterID to check for readiness
The TrustDomain is populated from the Host() method which includes the
hard-coded "consul" domain. This means that despite having an empty
cluster ID, the TrustDomain won't be empty.
2021-11-10 10:45:22 -07:00
freddygv 6976044bc4 Prevent replicating partition-exports 2021-11-09 16:42:42 -07:00
freddygv 5c121d7a48 handle error scenario of empty local DC 2021-11-09 16:42:42 -07:00
freddygv af29cda415 Restrict DC for partition-exports writes
There are two restrictions:
- Writes from the primary DC which explicitly target a secondary DC.
- Writes to a secondary DC that do not explicitly target the primary DC.

The first restriction is because the config entry is not supported in
secondary datacenters.

The second restriction is to prevent the scenario where a user writes
the config entry to a secondary DC, the write gets forwarded to the
primary, but then the config entry does not apply in the secondary.
This makes the scope more explicit.
2021-11-09 16:42:42 -07:00
Freddy 00b5b0a0a2
Update filter chain creation for sidecar/ingress listeners (#11245)
The duo of `makeUpstreamFilterChainForDiscoveryChain` and `makeListenerForDiscoveryChain` were really hard to reason about, and led to concealing a bug in their branching logic. There were several issues here:

- They tried to accomplish too much: determining filter name, cluster name, and whether RDS should be used. 
- They embedded logic to handle significantly different kinds of upstream listeners (passthrough, prepared query, typical services, and catch-all)
- They needed to coalesce different data sources (Upstream and CompiledDiscoveryChain)

Rather than handling all of those tasks inside of these functions, this PR pulls out the RDS/clusterName/filterName logic.

This refactor also fixed a bug with the handling of [UpstreamDefaults](https://www.consul.io/docs/connect/config-entries/service-defaults#defaults). These defaults get stored as UpstreamConfig in the proxy snapshot with a DestinationName of "*", since they apply to all upstreams. However, this wildcard destination name must not be used when creating the name of the associated upstream cluster. The coalescing logic in the original functions here was in some situations creating clusters with a `*.` prefix, which is not a valid destination.
2021-11-09 14:43:51 -07:00
Kyle Havlovitz 6c0bd0550f
Merge pull request #11461 from deblasis/feature/empty_client_addr_warning
config: warn the user if client_addr is empty
2021-11-09 09:37:38 -08:00
Daniel Upton 50a1f20ff9
xds: prefer fed state gateway definitions if they're fresher (#11522)
Fixes an issue described in #10132, where if two DCs are WAN federated
over mesh gateways, and the gateway in the non-primary DC is terminated
and receives a new IP address (as is commonly the case when running them
on ephemeral compute instances) the primary DC is unable to re-establish
its connection until the agent running on its own gateway is restarted.

This was happening because we always preferred gateways discovered by
the `Internal.ServiceDump` RPC (which would fail because there's no way
to dial the remote DC) over those discovered in the federation state,
which is replicated as long as the primary DC's gateway is reachable.
2021-11-09 16:45:36 +00:00
Freddy 7d95d90fce
Merge pull request #11514 from hashicorp/dnephin/ca-fix-secondary-init
ca: properly handle the case where the secondary initializes after the primary
2021-11-08 17:16:16 -07:00
freddygv cc5a7ed36c Avoid returning empty roots with uninitialized CA
Currently getCARoots could return an empty object with an empty trust
domain before the CA is initialized. This commit returns an error while
there is no CA config or no trust domain.

There could be a CA config and no trust domain because the CA config can
be created in InitializeCA before initialization succeeds.
2021-11-08 16:51:49 -07:00
Dhia Ayachi 7916268c40
refactor session state store tables to use the new index pattern (#11525)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused func `prefixIndexFromServiceNameAsString`

* fix test error check

* remove unused operations_ent.go and operations_oss.go func

* remove unused const

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-08 16:20:50 -05:00
Dhia Ayachi 98735a6d12
KV refactoring, part 2 (#11512)
* add partition to the kv get pretty print

* fix failing test

* add test for kvs RPC endpoint
2021-11-08 11:43:21 -05:00
Dhia Ayachi 520cb5858c
KV state store refactoring and partitioning (#11510)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* partition kvs indexID table

* add `partitionedIndexEntryName` in oss for test purpose

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* remove entmeta reference from oss

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-08 09:35:56 -05:00
Giulio Micheloni af7b7b5693
Merge branch 'main' into serve-panic-recovery 2021-11-06 16:12:06 +01:00
Daniel Nephin d9110136f2 ca: Only initialize clusterID in the primary
The secondary must get the clusterID from the primary
2021-11-05 18:08:44 -04:00
Daniel Nephin 01bd3d118d ca: return an error when secondary fails to initialize
Previously secondaryInitialize would return nil in this case, which prevented the
deferred initialize from happening, and left the CA in an uninitialized state until a config
update or root rotation.

To fix this I extracted the common parts into the delegate implementation. However looking at this
again, it seems like the handling in secondaryUpdateRoots is impossible, because that function
should never be called before the secondary is initialzied. I beleive we can remove some of that
logic in a follow up.
2021-11-05 18:02:51 -04:00
Daniel Nephin 8ba760a2fc acl: remove id and revision from Policy constructors
The fields were removed in a previous commit.

Also remove an unused constructor for PolicyMerger
2021-11-05 15:45:08 -04:00
Daniel Nephin 7c679c11e6 acl: remove Policy.ID and Policy.Revision
These two fields do not appear to be used anywhere. We use the structs.ACLPolicy ID in the
ACLResolver cache, but the acl.Policy ID and revision are not used.
2021-11-05 15:43:52 -04:00
R.B. Boyer c7c5013edd
rename helper method to reflect the non-deprecated terminology (#11509) 2021-11-05 13:51:50 -05:00
Connor efe4b21287
Support Vault Namespaces explicitly in CA config (#11477)
* Support Vault Namespaces explicitly in CA config

If there is a Namespace entry included in the Vault CA configuration,
set it as the Vault Namespace on the Vault client

Currently the only way to support Vault namespaces in the Consul CA
config is by doing one of the following:
1) Set the VAULT_NAMESPACE environment variable which will be picked up
by the Vault API client
2) Prefix all Vault paths with the namespace

Neither of these are super pleasant. The first requires direct access
and modification to the Consul runtime environment. It's possible and
expected, not super pleasant.

The second requires more indepth knowledge of Vault and how it uses
Namespaces and could be confusing for anyone without that context. It
also infers that it is not supported

* Add changelog

* Remove fmt.Fprint calls

* Make comment clearer

* Add next consul version to website docs

* Add new test for default configuration

* go mod tidy

* Add skip if vault not present

* Tweak changelog text
2021-11-05 11:42:28 -05:00
R.B. Boyer 44c023a302
segments: ensure that the serf_lan_allowed_cidrs applies to network segments (#11495) 2021-11-04 17:17:19 -05:00
Mark Anderson 7e8228a20b
Remove some usage of md5 from the system (#11491)
* Remove some usage of md5 from the system

OSS side of https://github.com/hashicorp/consul-enterprise/pull/1253

This is a potential security issue because an attacker could conceivably manipulate inputs to cause persistence files to collide, effectively deleting the persistence file for one of the colliding elements.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-11-04 13:07:54 -07:00
FFMMM 61bd417a82
plumb thru root cert tll to the aws ca provider (#11449)
* plumb thru root cert ttl to the aws ca provider

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

* Update .changelog/11449.txt

Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>

Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
2021-11-04 12:19:08 -07:00
FFMMM 6004a21f35
fix aws pca certs (#11470)
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2021-11-03 12:21:24 -07:00