Commit Graph

2113 Commits

Author SHA1 Message Date
Mark Anderson aaefe15613
Bulk acl message fixup oss (#12470)
* First pass for helper for bulk changes

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Convert ACLRead and ACLWrite to new form

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* AgentRead and AgentWRite

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Fix EventWrite

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* KeyRead, KeyWrite, KeyList

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* KeyRing

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* NodeRead NodeWrite

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* OperatorRead and OperatorWrite

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* PreparedQuery

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Intention partial

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Fix ServiceRead, Write ,etc

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Error check ServiceRead?

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Fix Sessionread/Write

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Fixup snapshot ACL

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Error fixups for txn

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Add changelog

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Fixup review comments

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-03-10 18:48:27 -08:00
Eric Haberkorn 9d0ec2eec2 Code review changes 2022-03-07 14:39:33 -05:00
Eric f7cc7ff5cd Add `Meta` to `ServiceConfigResponse` 2022-03-07 10:05:18 -05:00
R.B. Boyer 8307e40f2b
reduce flakiness/raciness of errNotFound and errNotChanged blocking query tests (#12518)
Improves tests from #12362

These tests try to setup the following concurrent scenario:

1. (goroutine 1) execute read RPC with index=0
2. (goroutine 1) get response from (1) @ index=10
3. (goroutine 1) execute read RPC with index=10 and block
4. (goroutine 2) WHILE (3) is blocking, start slamming the system with stray writes that will cause the WatchSet to wakeup
5. (goroutine 2) after doing all writes, shut down the reader above
6. (goroutine 1) stops reading, double checks that it only ever woke up once (from 1)
2022-03-04 11:20:01 -06:00
R.B. Boyer 9268715697
server: fix spurious blocking query suppression for discovery chains (#12512)
Minor fix for behavior in #12362

IsDefault sometimes returns true even if there was a proxy-defaults or service-defaults config entry that was consulted. This PR fixes that.
2022-03-03 16:54:41 -06:00
Daniel Nephin 5ba994a73f
Merge pull request #12298 from jorgemarey/b-persistnewrootandconfig
Avoid raft change when no config is provided on persistNewRootAndConfig
2022-03-03 11:03:50 -05:00
Daniel Nephin 161206e24d ca: make sure the test fails without the fix
Also change the path used for the secondary so that both primary and secondary do not overwrite each other.
2022-03-02 18:22:49 -05:00
R.B. Boyer 58e053c336
raft: upgrade to v1.3.6 (#12496)
Add additional protections on the Consul side to prevent NonVoters from bootstrapping raft.

This should un-flake TestServer_Expect_NonVoters
2022-03-02 17:00:02 -06:00
Daniel Nephin 73c91ed80f
Merge pull request #12467 from hashicorp/dnephin/ci-vault-test-safer
ca: require that tests that use Vault are named correctly
2022-03-01 12:54:02 -05:00
R.B. Boyer 6666832077
test: parallelize more of TestLeader_ReapOrLeftMember_IgnoreSelf (#12468)
before:

    $ go test ./agent/consul -run TestLeader_ReapOrLeftMember_IgnoreSelf
    ok  	github.com/hashicorp/consul/agent/consul	21.147s

after:

    $ go test ./agent/consul -run TestLeader_ReapOrLeftMember_IgnoreSelf
    ok  	github.com/hashicorp/consul/agent/consul	5.402s
2022-03-01 10:30:06 -06:00
Jorge Marey f429c1a5d9 Fix vault test with suggested changes 2022-03-01 10:20:00 +01:00
Jorge Marey 1a0baf4024 Add test case to verify #12298 2022-03-01 09:25:52 +01:00
Jorge Marey 4375dd2409 Avoid raft change when no config is provided on CAmanager
- This avoids a change to the raft store when no roots or config
are provided to persistNewRootAndConfig
2022-03-01 09:25:52 +01:00
Daniel Nephin d669226784 ca: fix a test
This test does not use Vault, so does not need ca.SkipIfVaultNotPresent
2022-02-28 16:26:18 -05:00
R.B. Boyer 7b0548dd8d
server: suppress spurious blocking query returns where multiple config entries are involved (#12362)
Starting from and extending the mechanism introduced in #12110 we can specially handle the 3 main special Consul RPC endpoints that react to many config entries in a single blocking query in Connect:

- `DiscoveryChain.Get`
- `ConfigEntry.ResolveServiceConfig`
- `Intentions.Match`

All of these will internally watch for many config entries, and at least one of those will likely be not found in any given query. Because these are blends of multiple reads the exact solution from #12110 isn't perfectly aligned, but we can tweak the approach slightly and regain the utility of that mechanism.

### No Config Entries Found

In this case, despite looking for many config entries none may be found at all. Unlike #12110 in this scenario we do not return an empty reply to the caller, but instead synthesize a struct from default values to return. This can be handled nearly identically to #12110 with the first 1-2 replies being non-empty payloads followed by the standard spurious wakeup suppression mechanism from #12110.

### No Change Since Last Wakeup

Once a blocking query loop on the server has completed and slept at least once, there is a further optimization we can make here to detect if any of the config entries that were present at specific versions for the prior execution of the loop are identical for the loop we just woke up for. In that scenario we can return a slightly different internal sentinel error and basically externally handle it similar to #12110.

This would mean that even if 20 discovery chain read RPC handling goroutines wakeup due to the creation of an unrelated config entry, the only ones that will terminate and reply with a blob of data are those that genuinely have new data to report.

### Extra Endpoints

Since this pattern is pretty reusable, other key config-entry-adjacent endpoints used by `agent/proxycfg` also were updated:

- `ConfigEntry.List`
- `Internal.IntentionUpstreams` (tproxy)
2022-02-25 15:46:34 -06:00
R.B. Boyer ca112f8721
fix flaky test panic (#12446) 2022-02-24 17:35:46 -06:00
R.B. Boyer 957146401e
catalog: compare node names case insensitively in more places (#12444)
Many places in consul already treated node names case insensitively.
The state store indexes already do it, but there are a few places that
did a direct byte comparison which have now been corrected.

One place of particular consideration is ensureCheckIfNodeMatches
which is executed during snapshot restore (among other places). If a
node check used a slightly different casing than the casing of the node
during register then the snapshot restore here would deterministically
fail. This has been fixed.

Primary approach:

    git grep -i "node.*[!=]=.*node" -- ':!*_test.go' ':!docs'
    git grep -i '\[[^]]*member[^]]*\]
    git grep -i '\[[^]]*\(member\|name\|node\)[^]]*\]' -- ':!*_test.go' ':!website' ':!ui' ':!agent/proxycfg/testing.go:' ':!*.md'
2022-02-24 16:54:47 -06:00
R.B. Boyer 64271289ec
server: partly fix config entry replication issue that prevents replication in some circumstances (#12307)
There are some cross-config-entry relationships that are enforced during
"graph validation" at persistence time that are required to be
maintained. This means that config entries may form a digraph at times.

Config entry replication procedes in a particular sorted order by kind
and name.

Occasionally there are some fixups to these digraphs that end up
replicating in the wrong order and replicating the leaves
(ingress-gateway) before the roots (service-defaults) leading to
replication halting due to a graph validation error related to things
like mismatched service protocol requirements.

This PR changes replication to give each computed change (upsert/delete)
a fair shot at being applied before deciding to terminate that round of
replication in error. In the case where we've simply tried to do the
operations in the wrong order at least ONE of the outstanding requests
will complete in the right order, leading the subsequent round to have
fewer operations to do, with a smaller likelihood of graph validation
errors.

This does not address all scenarios, but for scenarios where the edits
are being applied in the wrong order this should avoid replication
halting.

Fixes #9319

The scenario that is NOT ADDRESSED by this PR is as follows:

1. create: service-defaults: name=new-web, protocol=http
2. create: service-defaults: name=old-web, protocol=http
3. create: service-resolver: name=old-web, redirect-to=new-web
4. delete: service-resolver: name=old-web
5. update: service-defaults: name=old-web, protocol=grpc
6. update: service-defaults: name=new-web, protocol=grpc
7. create: service-resolver: name=old-web, redirect-to=new-web

If you shutdown dc2 just before (4) and turn it back on after (7)
replication is impossible as there is no single edit you can make to
make forward progress.
2022-02-23 17:27:48 -06:00
Daniel Nephin 771df290d7
Merge pull request #11910 from hashicorp/dnephin/ca-provider-interface-for-ica-in-primary
ca: add support for an external trusted CA
2022-02-22 13:14:52 -05:00
R.B. Boyer 8b987a4d59
configentry: make a new package to hold shared config entry structs that aren't used for RPC or the FSM (#12384)
First two candidates are ConfigEntryKindName and DiscoveryChainConfigEntries.
2022-02-22 10:36:36 -06:00
Daniel Nephin dc484ee09e rpc: set response to nil when not found
Otherwise when the query times out we might incorrectly send a value for
the reply, when we should send an empty reply.

Also document errNotFound and how to handle the result in that case.
2022-02-18 12:26:06 -05:00
Daniel Nephin 6021105dfc ca: test that original certs from secondary still verify
There's a chance this could flake if the secondary hasn't received the
update yet, but running this test many times doesn't show any flakes
yet.
2022-02-17 18:45:16 -05:00
Daniel Nephin 6b679aa9d4 Update TODOs to reference an issue with more details
And remove a no longer needed TODO
2022-02-17 18:21:30 -05:00
Daniel Nephin 1853a32df6 ca: add test cases for rotating external trusted CA 2022-02-17 18:21:30 -05:00
Daniel Nephin 5e8ea2a039 ca: add a test for secondary with external CA 2022-02-17 18:21:30 -05:00
Daniel Nephin 42ec34d101 ca: examine the full chain in newCARoot
make TestNewCARoot much more strict
compare the full result instead of only a few fields.
add a test case with 2 and 3 certificates in the pem
2022-02-17 18:21:30 -05:00
Daniel Nephin 85ecbaf109
Merge pull request #12110 from hashicorp/dnephin/blocking-queries-not-found
rpc: make blocking queries for non-existent items more efficient
2022-02-17 18:09:39 -05:00
Florian Apolloner f01f00fc84
Support for connect native services in topology view. (#12098) 2022-02-16 16:51:54 -05:00
Chris S. Kim 154b781bc8
Move IndexEntryName helpers to common files (#12365) 2022-02-16 12:56:38 -05:00
Daniel Nephin 8a6e75ac81 rpc: add errNotFound to all Get queries
Any query that returns a list of items is not part of this commit.
2022-02-15 18:24:34 -05:00
Daniel Nephin 4b33bdf396 Make blockingQuery efficient with 'not found' results.
By using the query results as state.

Blocking queries are efficient when the query matches some results,
because the ModifyIndex of those results, returned as queryMeta.Mindex,
will never change unless the items themselves change.

Blocking queries for non-existent items are not efficient because the
queryMeta.Index can (and often does) change when other entities are
written.

This commit reduces the churn of these queries by using a different
comparison for "has changed". Instead of using the modified index, we
use the existence of the results. If the previous result was "not found"
and the new result is still "not found", we know we can ignore the
modified index and continue to block.

This is done by setting the minQueryIndex to the returned
queryMeta.Index, which prevents the query from returning before a state
change is observed.
2022-02-15 18:24:33 -05:00
Daniel Nephin 897b953f66 Add a test for blocking query on non-existent entry
This test shows how blocking queries are not efficient when the query
returns no results.  The test fails with 100+ calls instead of the
expected 2.

This test is still a bit flaky because it depends on the timing of the
writes. It can sometimes return 3 calls.

A future commit should fix this and make blocking queries even more
optimal for not-found results.
2022-02-15 18:23:17 -05:00
Daniel Nephin 3301f94004 rpc: improve docs for blockingQuery
Follow the Go convention of accepting a small interface that documents
the methods used by the function.

Clarify the rules for implementing a query function passed to
blockingQuery.
2022-02-15 14:20:14 -05:00
R.B. Boyer 115946da99
server: conditionally avoid writing a config entry to raft if it was already the same (#12321)
This will both save on unnecessary raft operations as well as
unnecessarily incrementing the raft modify index of config entries
subject to no-op updates.
2022-02-14 14:39:12 -06:00
FFMMM 78264a8030
Vendor in rpc mono repo for net/rpc fork, go-msgpack, msgpackrpc. (#12311)
This commit syncs ENT changes to the OSS repo.

Original commit details in ENT:

```
commit 569d25f7f4578981c3801e6e067295668210f748
Author: FFMMM <FFMMM@users.noreply.github.com>
Date:   Thu Feb 10 10:23:33 2022 -0800

    Vendor fork net rpc (#1538)

    * replace net/rpc w consul-net-rpc/net/rpc

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

    * replace msgpackrpc and go-msgpack with fork from mono repo

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

    * gofmt all files touched

    Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
```

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2022-02-14 09:45:45 -08:00
Mark Anderson 1a16f7ee70 Refactor to make ACL errors more structured. (#12308)
* First phase of refactoring PermissionDeniedError

Add extended type PermissionDeniedByACLError that captures information
about the accessor, particular permission type and the object and name
of the thing being checked.

It may be worth folding the test and error return into a single helper
function, that can happen at a later date.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-02-11 12:53:23 -08:00
Freddy 9580f79f86
Merge pull request #12223 from hashicorp/proxycfg/passthrough-cleanup 2022-02-10 17:35:51 -07:00
freddygv cbea3d203c Fix race of upstreams with same passthrough ip
Due to timing, a transparent proxy could have two upstreams to dial
directly with the same address.

For example:
- The orders service can dial upstreams shipping and payment directly.
- An instance of shipping at address 10.0.0.1 is deregistered.
- Payments is scaled up and scheduled to have address 10.0.0.1.
- The orders service receives the event for the new payments instance
before seeing the deregistration for the shipping instance. At this
point two upstreams have the same passthrough address and Envoy will
reject the listener configuration.

To disambiguate this commit considers the Raft index when storing
passthrough addresses. In the example above, 10.0.0.1 would only be
associated with the newer payments service instance.
2022-02-10 17:01:57 -07:00
Daniel Nephin 01784470f3
Merge pull request #12277 from hashicorp/dnephin/panic-in-service-register
catalog: initialize the refs map to prevent a nil panic
2022-02-09 19:48:22 -05:00
Daniel Nephin 82c264b2b3 config-entry: fix a panic when registering a service or ingress gateway 2022-02-09 18:49:48 -05:00
Daniel Nephin 7ec658b7ac
Merge pull request #12265 from hashicorp/dnephin/logging-in-tests
sdk: add TestLogLevel for setting log level in tests
2022-02-07 16:11:23 -05:00
Daniel Nephin 437f769916 A test to reproduce the issue 2022-02-04 14:04:12 -05:00
Daniel Nephin 51b0f82d0e Make test more readable
And fix typo
2022-02-03 18:44:09 -05:00
Daniel Nephin 608597c7b6 ca: relax and move private key type/bit validation for vault
This commit makes two changes to the validation.

Previously we would call this validation in GenerateRoot, which happens
both on initialization (when a follower becomes leader), and when a
configuration is updated. We only want to do this validation during
config update so the logic was moved to the UpdateConfiguration
function.

Previously we would compare the config values against the actual cert.
This caused problems when the cert was created manually in Vault (not
created by Consul).  Now we compare the new config against the previous
config. Using a already created CA cert should never error now.

Adding the key bit and types to the config should only error when
the previous values were not the defaults.
2022-02-03 17:21:20 -05:00
Daniel Nephin d707173253 ca: small cleanup of TestConnectCAConfig_Vault_TriggerRotation_Fails
Before adding more test cases
2022-02-03 17:21:20 -05:00
Daniel Nephin 3f590bb8a1 testing: fix test failures caused by new log level
These two tests require debug logging enabled, because they look for log lines.

Also switched to testify assertions because the previous errors were not clear.
2022-02-03 17:07:39 -05:00
Daniel Nephin b058845110 sdk: add TestLogLevel for setting log level in tests
And default log level to WARN.
2022-02-03 13:42:28 -05:00
Daniel Nephin 7839b2d7e0 ca: add a test that uses an intermediate CA as the primary CA
This test found a bug in the secondary. We were appending the root cert
to the PEM, but that cert was already appended. This was failing
validation in Vault here:
https://github.com/hashicorp/vault/blob/sdk/v0.3.0/sdk/helper/certutil/types.go#L329

Previously this worked because self signed certs have the same
SubjectKeyID and AuthorityKeyID. So having the same self-signed cert
repeated doesn't fail that check.

However with an intermediate that is not self-signed, those values are
different, and so we fail the check. A test I added in a previous commit
should show that this continues to work with self-signed root certs as
well.
2022-02-02 13:41:35 -05:00
Daniel Nephin ac732ce82b acl: un-embed ACLIdentity
This is safer than embedding two interface because there are a number of
places where we check the concrete type. If we check the concrete type
on the top-level interface it will fail. So instead expose the
ACLIdentity from a method.
2022-02-02 12:07:31 -05:00
Daniel Nephin 9d80c1886a
Merge pull request #12167 from hashicorp/dnephin/acl-resolve-token-3
acl: rename ResolveTokenToIdentityAndAuthorizer to ResolveToken
2022-01-31 19:21:06 -05:00
Daniel Nephin 997bf1e5a4
Merge pull request #12166 from hashicorp/dnephin/acl-resolve-token-2
acl: remove ResolveTokenToIdentity
2022-01-31 19:19:21 -05:00
Daniel Nephin 343b6deb79 acl: rename ResolveTokenToIdentityAndAuthorizer to ResolveToken
This change allows us to remove one of the last remaining duplicate
resolve token methods (Server.ResolveToken).

With this change we are down to only 2, where the second one also
handles setting the default EnterpriseMeta from the token.
2022-01-31 18:04:19 -05:00
Daniel Nephin d363cc0f07 acl: remove unused methods on fakes, and add changelog
Also document the metric that was removed in a previous commit.
2022-01-31 17:53:53 -05:00
Daniel Nephin b2b84e7fc6
Merge pull request #12165 from hashicorp/dnephin/acl-resolve-token
acl: remove some of the duplicate resolve token methods
2022-01-31 13:27:49 -05:00
Dan Upton fdfe079674
streaming: split event buffer by key (#12080) 2022-01-28 12:27:00 +00:00
Daniel Nephin 9b7468f99e ca/provider: remove ActiveRoot from Provider 2022-01-27 13:07:37 -05:00
Daniel Nephin 9a59733b7d
Merge pull request #11663 from hashicorp/dnephin/ca-remove-one-call-to-active-root-2
ca: remove second call to Provider.ActiveRoot
2022-01-27 12:41:05 -05:00
Daniel Nephin db0478265b
Merge pull request #12109 from hashicorp/dnephin/blocking-query-1
rpc: make blockingQuery easier to read
2022-01-26 18:13:55 -05:00
Daniel Nephin f9aef8018b Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-01-26 12:24:13 -05:00
Daniel Nephin 737c0097e0 acl: extract a backend type for the ACLResolverBackend
This is a small step to isolate the functionality that is used for the
ACLResolver from the large Client and Server structs.
2022-01-26 12:24:10 -05:00
Daniel Nephin e134e43da6 acl: remove calls to ResolveIdentityFromToken
We already have an ACLResolveResult, so we can get the accessor ID from
it.
2022-01-22 15:05:42 -05:00
Daniel Nephin edca8d61a3 acl: remove ResolveTokenToIdentity
By exposing the AccessorID from the primary ResolveToken method we can
remove this duplication.
2022-01-22 14:47:59 -05:00
Daniel Nephin a5e8af79c3 acl: return a resposne from ResolveToken that includes the ACLIdentity
So that we can duplicate duplicate methods.
2022-01-22 14:33:09 -05:00
Daniel Nephin 8c9c48e219 acl: remove duplicate methods
Now that ACLResolver is embedded we don't need ResolveTokenToIdentity on
Client and Server.

Moving ResolveTokenAndDefaultMeta to ACLResolver removes the duplicate
implementation.
2022-01-22 14:12:08 -05:00
Daniel Nephin 241663a046 acl: embed ACLResolver in Client and Server
In preparation for removing duplicate resolve token methods.
2022-01-22 14:07:26 -05:00
R.B. Boyer b60d89e7ef bulk rewrite using this script
set -euo pipefail

    unset CDPATH

    cd "$(dirname "$0")"

    for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
        echo "=== require: $f ==="
        sed -i '/require := require.New(t)/d' $f
        # require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
        # require.XXX(tblah) but not require.XXX(t, blah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
        # require.XXX(rblah) but not require.XXX(r, blah)
        sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
        gofmt -s -w $f
    done

    for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
        echo "=== assert: $f ==="
        sed -i '/assert := assert.New(t)/d' $f
        # assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
        # assert.XXX(tblah) but not assert.XXX(t, blah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
        # assert.XXX(rblah) but not assert.XXX(r, blah)
        sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
        gofmt -s -w $f
    done
2022-01-20 10:46:23 -06:00
R.B. Boyer 31f6f55bbe test: normalize require.New and assert.New syntax 2022-01-20 10:45:56 -06:00
Dan Upton ca3aca92c4
[OSS] Remove remaining references to master (#11827) 2022-01-20 12:47:50 +00:00
Daniel Nephin 71767f1b3e rpc: cleanup exit and blocking condition logic in blockingQuery
Remove some unnecessary comments around query_blocking metric. The only
line that needs any comments in the atomic decrement.

Cleanup the block and return comments and logic. The old comment about
AbandonCh may have been relevant before, but it is expected behaviour
now.

The logic was simplified by inverting the err condition.
2022-01-17 16:59:25 -05:00
Daniel Nephin 72a733bed8 rpc: extract rpcQueryTimeout method
This helps keep the logic in blockingQuery more focused. In the future we
may have a separate struct for RPC queries which may allow us to move this
off of Server.
2022-01-17 16:59:25 -05:00
Daniel Nephin fd0a9fd4f3 rpc: move the index defaulting to setQueryMeta.
This safeguard should be safe to apply in general. We are already
applying it to non-blocking queries that call blockingQuery, so it
should be fine to apply it to others.
2022-01-17 16:59:25 -05:00
Daniel Nephin 4b67d6c18b rpc: add subtests to blockingQuery test 2022-01-17 16:59:25 -05:00
Daniel Nephin f92dc11002 rpc: refactor blocking query
To remove the TODO, and make it more readable.

In general this reduces the scope of variables, making them easier to reason about.
It also introduces more early returns so that we can see the flow from the structure
of the function.
2022-01-17 16:58:47 -05:00
Daniel Nephin f31e0b8b1a
Merge pull request #11661 from hashicorp/dnephin/ca-remove-one-call-to-active-root
ca: remove one call to Provider.ActiveRoot
2022-01-13 16:48:12 -05:00
Kyle Havlovitz 0db874c38b Add virtual IP generation for term gateway backed services 2022-01-12 12:08:49 -08:00
Daniel Nephin d57dec5878 ca: remove unnecessary var, and slightly reduce cyclo complexity
`newIntermediate` is always equal to `needsNewIntermediate`, so we can
remove the extra variable and use the original directly.

Also remove the `activeRoot.ID != newActiveRoot.ID` case from an if,
because that case is already checked above, and `needsNewIntermediate` will
already be true in that case.

This condition now reads a lot better:

> Persist a new root if we did not have one before, or if generated a new intermediate.
2022-01-06 16:56:49 -05:00
Daniel Nephin 0de7efb316 ca: remove unused provider.ActiveRoot call
In the previous commit the single use of this storedRoot was removed.

In this commit the original objective is completed. The
Provider.ActiveRoot is being removed because

1. the secondary should get the active root from the Consul primary DC,
   not the provider, so that secondary DCs do not need to communicate
   with a provider instance in a different DC.
2. so that the Provider.ActiveRoot interface can be changed without
   impacting other code paths.
2022-01-06 16:56:48 -05:00
Daniel Nephin d0578c6dfc ca: extract the lookup of the active primary CA
This method had only one caller, which always looked for the active
root. This commit moves the lookup into the method to reduce the logic
in the one caller.

This is being done in preparation for a larger change. Keeping this
separate so it is easier to see.

The `storedRootID != primaryRoots.ActiveRootID` is being removed because
these can never be different.

The `storedRootID` comes from `provider.ActiveRoot`, the
`primaryRoots.ActiveRootID` comes from the store `CARoot` from the
primary. In both cases the source of the data is the primary DC.

Technically they could be different if someone modified the provider
outside of Consul, but that would break many things, so is not a
supported flow.

If these were out of sync because of ordering of events then the
secondary will soon receive an update to `primaryRoots` and everything
will be sorted out again.
2022-01-06 16:56:48 -05:00
Daniel Nephin 7121c78d34 ca: update godoc
To clarify what to expect from the data stored in this field, and the
behaviour of this function.
2022-01-06 16:56:48 -05:00
Daniel Nephin abac8baa5d ca: remove one call to provider.ActiveRoot
ActiveRoot should not be called from the secondary DC, because there
should not be a requirement to run the same Vault instance in a
secondary DC. SignIntermediate is called in a secondary DC, so it should
not call ActiveRoot

We would also like to change the interface of ActiveRoot so that we can
support using an intermediate cert as the primary CA in Consul. In
preparation for making that change I am reducing the number of calls to
ActiveRoot, so that there are fewer code paths to modify when the
interface changes.

This change required a change to the mockCAServerDelegate we use in
tests. It was returning the RootCert for SignIntermediate, but that is
not an accurate fake of production. In production this would also be a
separate cert.
2022-01-06 16:55:50 -05:00
Daniel Nephin eaa084fd41 ca: remove redundant append of an intermediate cert
Immediately above this line we are already appending the full list of
intermediates. The `provider.ActiveIntermediate` MUST be in this list of
intermediates because it must be available to all the other non-leader
Servers.  If it was not in this list of intermediates then any proxy
that received data from a non-leader would have the wrong certs.

This is being removed now because we are planning on changing the
`Provider.ActiveIntermediate` interface, and removing these extra calls ahead of
time helps make that change easier.
2022-01-06 16:55:50 -05:00
Daniel Nephin 11f4cdaa49 ca: only generate a single private key for the whole test case
Using tracing and cpu profiling I found that the majority of the time in
these test cases is spent generating a private key. We really don't need
separate private keys, so we can generate only one and use it for all
cases.

With this change the test runs much faster.
2022-01-06 16:55:50 -05:00
Daniel Nephin b3ffe7ac72 ca: cleanup a test
Fix the name to match the function it is testing

Remove unused code

Fix the signature, instead of returning (error, string) which should be (string, error)
accept a testing.T to emit errors.

Handle the error from encode.
2022-01-06 16:55:49 -05:00
Daniel Nephin 1fd6b16399 ca: use the new leaf signing lookup func in leader metrics 2022-01-06 16:55:49 -05:00
Daniel Nephin abfc1e4840 snapshot: return the error from replyFn
The only function passed to SnapshotRPC today always returns a nil error, so there's no
way to exercise this bug in practice. This change is being made for correctness so that
it doesn't become a problem in the future, if we ever pass a different function to
SnapshotRPC.
2022-01-05 17:51:03 -05:00
Jared Kirschner b393c90ce7 Clarify service and check error messages (use ID)
Error messages related to service and check operations previously included
the following substrings:
- service %q
- check %q

From this error message, it isn't clear that the expected field is the ID for
the entity, not the name. For example, if the user has a service named test,
the error message would read 'Unknown service "test"'. This is misleading -
a service with that *name* does exist, but not with that *ID*.

The substrings above have been modified to make it clear that ID is needed,
not name:
- service with ID %q
- check with ID %q
2022-01-04 11:42:37 -08:00
Chris S. Kim 30550f2c63
testing: Revert assertion for virtual IP flag (#11932) 2022-01-04 11:24:56 -05:00
Daniel Nephin 1683da66b0
Merge pull request #11796 from hashicorp/dnephin/cleanup-test-server
testing: stop using an old version in testServer
2021-12-22 16:04:04 -05:00
Freddy 85fe875d07
Use anonymousToken when querying by secret ID (#11813)
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Dan Upton <daniel@floppy.co>

This query has been incorrectly querying by accessor ID since New ACLs
were added. However, the legacy token compat allowed this to continue to
work, since it made a fallback query for the anonymousToken ID.

PR #11184 removed this legacy token query, which means that the query by
accessor ID is now the only check for the anonymous token's existence.

This PR updates the GetBySecret call to use the secret ID of the token.
2021-12-13 10:56:09 -07:00
R.B. Boyer 631c649291
various partition related todos (#11822) 2021-12-13 11:43:33 -06:00
Kyle Havlovitz 9dcaf0539c
Merge pull request #11798 from hashicorp/vip-goroutine-check
leader: move the virtual IP version check into a goroutine
2021-12-10 15:59:35 -08:00
Kyle Havlovitz 80a4489844 state: fix freed VIP table id index 2021-12-10 14:41:45 -08:00
Kyle Havlovitz ecbd3eb2a6 Exit before starting the vip check routine if possible 2021-12-10 14:30:50 -08:00
Daniel Nephin 0a9cb62859 testing: Deprecate functions for creating a server.
These helper functions actually end up hiding important setup details
that should be visible from the test case. We already have a convenient
way of setting this config when calling newTestServerWithConfig.
2021-12-09 20:09:29 -05:00
Daniel Nephin c9a992f5e8 testing: remove old config.Build version
DefaultConfig already sets the version to version.Version, so by removing this
our tests will run with the version that matches the code.
2021-12-09 20:09:29 -05:00
Kyle Havlovitz 04ef1c3fa0 leader: move the virtual IP version check into a goroutine 2021-12-09 17:00:33 -08:00
Daniel Nephin 968aeff1bb ca: prune some unnecessary lookups in the tests 2021-12-08 18:42:52 -05:00
Daniel Nephin 305655a8b1 ca: remove duplicate WaitFor function 2021-12-08 18:42:52 -05:00
Daniel Nephin 1dec6bb815 ca: fix flakes in RenewIntermediate tests
I suspect one problem was that we set structs.IntermediateCertRenewInterval to 1ms, which meant
that in some cases the intermediate could renew before we stored the original value.

Another problem was that the 'wait for intermediate' loop was calling the provider.ActiveIntermediate,
but the comparison needs to use the RPC endpoint to accurately represent a user request. So
changing the 'wait for' to use the state store ensures we don't race.

Also moves the patching into a separate function.

Removes the addition of ca.CertificateTimeDriftBuffer as part of calculating halfTime. This was added
in a previous commit to attempt to fix the flake, but it did not appear to fix the problem. Adding the
time here was making the tests fail when using the shared patch
function. It's not clear to me why, but there's no reason we should be
including this time in the halfTime calculation.
2021-12-08 18:42:52 -05:00
Daniel Nephin 2e4e8bd791 ca: improve RenewIntermediate tests
Use the new verifyLearfCert to show the cert verifies with intermediates
from both sources. This required using the RPC interface so that the
leaf pem was constructed correctly.

Add IndexedCARoots.Active since that is a common operation we see in a
few places.
2021-12-08 18:42:52 -05:00
Daniel Nephin a4ba1f348d ca: add a test for Vault in secondary DC 2021-12-08 18:42:51 -05:00
Daniel Nephin a5d9b1d322 ca: Add CARoots.Active method
Which will be used in the next commit.
2021-12-08 18:41:51 -05:00
R.B. Boyer 5f5720837b
acl: ensure that the agent recovery token is properly partitioned (#11782) 2021-12-08 17:11:55 -06:00
Daniel Nephin f72e285fe8
Merge pull request #11721 from hashicorp/dnephin/ca-export-fsm-operation
ca: use the real FSM operation in tests
2021-12-08 17:49:00 -05:00
Daniel Nephin 214dcf8d0d ca: use the real FSM operation in tests
Previously we had a couple copies that reproduced the FSM operation.
These copies introduce risk that the test does not accurately match
production.

This PR removes the test versions of the FSM operation, and exports the
real production FSM operation so that it can be used in tests.

The consul provider tests did need to change because of this. Previously
we would return a hardcoded value of 2, but in production this value is
always incremented.
2021-12-08 17:29:44 -05:00
R.B. Boyer 592ac8f96a
test: test server should auto cleanup (#11779) 2021-12-08 13:26:06 -06:00
Evan Culver 7a365fa0da
rpc: Unset partition before forwarding to remote datacenter (#11758) 2021-12-08 11:02:14 -08:00
Daniel Nephin dccd3f5806 Merge remote-tracking branch 'origin/main' into serve-panic-recovery 2021-12-07 16:30:41 -05:00
Chris S. Kim f8f8580ab2
Godocs updates for catalog endpoints (#11716) 2021-12-07 10:18:28 -05:00
Dan Upton 7fe81171d9
Rename `ACLMasterToken` => `ACLInitialManagementToken` (#11746) 2021-12-07 12:39:28 +00:00
Dan Upton 3a91815169
agent/token: rename `agent_master` to `agent_recovery` (internally) (#11744) 2021-12-07 12:12:47 +00:00
R.B. Boyer 9315a9812f return the max 2021-12-06 15:36:52 -06:00
freddygv 60fe5f75bb Remove support for failover to partition
Failing over to a partition is more siimilar to failing over to another
datacenter than it is to failing over to a namespace. In a future
release we should update how localities for failover are specified. We
should be able to accept a list of localities which can include both
partition and datacenter.
2021-12-06 12:32:24 -07:00
freddygv 5c1f7aa372 Allow cross-partition references in disco chain
* Add partition fields to targets like service route destinations
* Update validation to prevent cross-DC + cross-partition references
* Handle partitions when reading config entries for disco chain
* Encode partition in compiled targets
2021-12-06 12:32:19 -07:00
R.B. Boyer b1605639fc
light refactors to support making partitions and serf-based wan federation are mutually exclusive (#11755) 2021-12-06 13:18:02 -06:00
Freddy a725f06c83
Merge pull request #11739 from hashicorp/ap/exports-rename 2021-12-06 08:20:50 -07:00
freddygv e91509383f Clean up additional refs to partition exports 2021-12-04 15:16:40 -07:00
freddygv ed6076db26 Rename partition-exports to exported-services
Using a name less tied to partitions gives us more flexibility to use
this config entry in OSS for exports between datacenters/meshes.
2021-12-03 17:47:31 -07:00
freddygv f5b25401b3 Update intention topology to use new table 2021-12-03 17:28:31 -07:00
freddygv 55970c6ccd Avoid updating default decision from wildcard ixn
Given that we do not allow wildcard partitions in intentions, no one ixn
can override the DefaultAllow setting. Only the default ACL policy
applies across all partitions.
2021-12-03 17:28:12 -07:00
freddygv 497aab669f Add a new table to query service names by kind
This table purposefully does not index by partition/namespace. It's a
global view into all service names.

This table is intended to replace the current serviceListTxn watch in
intentionTopologyTxn. For cross-partition transparent proxying we need
to be able to calculate upstreams from intentions in any partition. This
means that the existing serviceListTxn function is insufficient since
it's scoped to a partition.

Moving away from that function is also beneficial because it watches the
main "services" table, so watchers will wake up when any instance is
registered or deregistered.
2021-12-03 17:28:12 -07:00
Freddy f032d6ef05
Merge pull request #11680 from hashicorp/ap/partition-exports-oss 2021-12-03 16:57:50 -07:00
Dan Upton 3b9dfca88d
internal: support `ResultsFilteredByACLs` flag/header (#11643) 2021-12-03 23:04:24 +00:00
Dan Upton c8204330ed
query: support `ResultsFilteredByACLs` in query list endpoint (#11620) 2021-12-03 23:04:09 +00:00
Dhia Ayachi ce326b6074
port oss changes (#11736) 2021-12-03 17:23:55 -05:00
Freddy e246defb6c
Merge pull request #11720 from hashicorp/bbolt 2021-12-03 14:44:36 -07:00
Dan Upton 047aa2ffb0
fedstate: support `ResultsFilteredByACLs` in `ListMeshGateways` endpoint (#11644) 2021-12-03 20:56:55 +00:00
Dan Upton 361d9c2862
catalog: support `ResultsFilteredByACLs` flag/header (#11594) 2021-12-03 20:56:14 +00:00
Dan Upton 4c0956c03a
coordinate: support `ResultsFilteredByACLs` flag/header (#11617) 2021-12-03 20:51:02 +00:00
Dan Upton bf1e2ca551
sessions: support `ResultsFilteredByACLs` flag/header (#11606) 2021-12-03 20:43:43 +00:00
Dan Upton d92f0d84c6
txn: support `ResultsFilteredByACLs` flag in `Read` endpoint (#11632) 2021-12-03 20:41:03 +00:00
Dhia Ayachi 86159c6ed8
sessions partitioning tests (#11734)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused operations_ent.go and operations_oss.go func

* remove unused const

* convert `IndexID` of `session_checks` table

* convert `indexSession` of `session_checks` table

* convert `indexNodeCheck` of `session_checks` table

* partition `indexID` and `indexSession` of `tableSessionChecks`

* fix oss linter

* fix review comments

* remove partition for Checks as it's always use the session partition

* fix tests

* fix tests

* do not namespace nodeChecks index

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-12-03 15:36:07 -05:00
Dan Upton c314be2ff9
intention: support `ResultsFilteredByACLs` flag/header (#11612) 2021-12-03 20:35:54 +00:00
Mark Anderson a89ffba2d4
Cross port of ent #1383 (#11726)
Cross port of ent #1383 "Reject non-default datacenter when making partitioned ACLs"

On the OSS side this is a minor refactor to add some more checks that are only applicable to enterprise code.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-12-03 10:20:25 -08:00
Dan Upton 599a4d6619
config: support `ResultsFilteredByACLs` in list/list all endpoints (#11621) 2021-12-03 17:39:47 +00:00
Dan Upton 474ef7cc1f
kv: support `ResultsFilteredByACLs` in list/list keys (#11593) 2021-12-03 17:31:48 +00:00
Dan Upton cf1bd585f6
health: support `ResultsFilteredByACLs` flag/header (#11602) 2021-12-03 17:31:32 +00:00
Dan Upton 1e47e3c82b
Groundwork for exposing when queries are filtered by ACLs (#11569) 2021-12-03 17:11:26 +00:00
Kyle Havlovitz 6f34a4f777
Merge pull request #11724 from hashicorp/service-virtual-ips
oss: add virtual IP generation for connect services
2021-12-02 16:16:57 -08:00
Kyle Havlovitz 4f2cfee4b0 consul: add virtual IP generation for connect services 2021-12-02 15:42:47 -08:00
R.B. Boyer c46f9f9f31
agent: add variation of force-leave that exclusively works on the WAN (#11722)
Fixes #6548
2021-12-02 17:15:10 -06:00
Matt Keeler c7a94843ee Emit raft-boltdb metrics 2021-12-02 16:56:15 -05:00
Daniel Nephin e47cecc653 config: add NoFreelistSync option
# Conflicts:
#	agent/config/testdata/TestRuntimeConfig_Sanitize-enterprise.golden
#	agent/consul/server.go
2021-12-02 16:56:15 -05:00
Matt Keeler 42a5635bc3 Use raft-boltdb/v2 2021-12-02 16:56:15 -05:00
Daniel Nephin 17a2d14d49 ca: set the correct SigningKeyID after config update with Vault provider
The test added in this commit shows the problem. Previously the
SigningKeyID was set to the RootCert not the local leaf signing cert.

This same bug was fixed in two other places back in 2019, but this last one was
missed.

While fixing this bug I noticed I had the same few lines of code in 3
places, so I extracted a new function for them.

There would be 4 places, but currently the InitializeCA flow sets this
SigningKeyID in a different way, so I've left that alone for now.
2021-12-02 16:07:11 -05:00
Daniel Nephin 96f95889db
Merge pull request #11713 from hashicorp/dnephin/ca-test-names
ca: make test naming consistent
2021-12-02 16:05:42 -05:00
Daniel Nephin ff4581092e
Merge pull request #11671 from hashicorp/dnephin/ca-fix-storing-vault-intermediate
ca: fix storing the leaf signing cert with Vault provider
2021-12-02 16:02:24 -05:00
Daniel Nephin 81afb208ac
Merge pull request #11677 from hashicorp/dnephin/freeport-interface
sdk: use t.Cleanup in freeport and remove unnecessary calls
2021-12-02 15:58:41 -05:00
Daniel Nephin 447097b166 ca: make test naming consistent
While working on the CA system it is important to be able to run all the
tests related to the system, without having to wait for unrelated tests.
There are many slow and unrelated tests in agent/consul, so we need some
way to filter to only the relevant tests.

This PR renames all the CA system related tests to start with either
`TestCAMananger` for tests of internal operations that don't have RPC
endpoint, or `TestConnectCA` for tests of RPC endpoints. This allows us
to run all the test with:

    go test -run 'TestCAMananger|TestConnectCA' ./agent/consul

The test naming follows an undocumented convention of naming tests as
follows:

    Test[<struct name>_]<function name>[_<test case description>]

I tried to always keep Primary/Secondary at the end of the description,
and _Vault_ has to be in the middle because of our regex to run those
tests as a separate CI job.

You may notice some of the test names changed quite a bit. I did my best
to identify the underlying method being tested, but I may have been
slightly off in some cases.
2021-12-02 14:57:09 -05:00
Daniel Nephin 28a8a64019 ca: make getLeafSigningCertFromRoot safer
As a method on the struct type this would not be safe to call without first checking
c.isIntermediateUsedToSignLeaf.

So for now, move this logic to the CAMananger, so that it is always correct.
2021-12-02 12:42:49 -05:00
Daniel Nephin b29faa3e50 ca: fix stored CARoot representation with Vault provider
We were not adding the local signing cert to the CARoot. This commit
fixes that bug, and also adds support for fixing existing CARoot on
upgrade.

Also update the tests for both primary and secondary to be more strict.
Check the SigningKeyID is correct after initialization and rotation.
2021-12-02 12:42:49 -05:00
Chris S. Kim 54e4d1b7b2
ENT to OSS sync (#11703) 2021-12-01 14:56:10 -05:00
R.B. Boyer db91cbf484
auto-config: ensure the feature works properly with partitions (#11699) 2021-12-01 13:32:34 -06:00
Daniel Nephin 32ef9c5d5c ca: add some godoc and func for finding leaf signing cert
This will be used in a follow up commit.
2021-11-30 18:36:41 -05:00
Daniel Nephin 4185045a7f sdk/freeport: rename Port to GetOne
For better consistency with GetN
2021-11-30 17:32:41 -05:00
Chris S. Kim 56fab21582
Refactor test helper (#11689)
Allow custom ACL root tokens to be passed
2021-11-30 13:22:07 -05:00
Chris S. Kim 36246c5791
acl: Fill authzContext from token in Coordinate endpoints (#11688) 2021-11-30 13:17:41 -05:00
freddygv dd662d7058 Move ent config test to ent file 2021-11-29 12:15:17 -07:00
Daniel Nephin e8312d6b5a testing: remove unnecessary calls to freeport
Previously we believe it was necessary for all code that required ports
to use freeport to prevent conflicts.

https://github.com/dnephin/freeport-test shows that it is actually save
to use port 0 (`127.0.0.1:0`) as long as it is passed directly to
`net.Listen`, and the listener holds the port for as long as it is
needed.

This works because freeport explicitly avoids the ephemeral port range,
and port 0 always uses that range. As you can see from the test output
of https://github.com/dnephin/freeport-test, the two systems never use
overlapping ports.

This commit converts all uses of freeport that were being passed
directly to a net.Listen to use port 0 instead. This allows us to remove
a bit of wrapping we had around httptest, in a couple places.
2021-11-29 12:19:43 -05:00
Daniel Nephin d795a73f78 testing: use the new freeport interfaces 2021-11-27 15:39:46 -05:00
Daniel Nephin 56f9238d15 go-sso: remove returnFunc now that freeport handles return 2021-11-27 15:29:38 -05:00
Daniel Nephin 59204598c8 ca: clean up unnecessary raft.Apply response checking
In d2ab767fef raftApply was changed to handle this check in
a single place, instad of having every caller check it. It looks like these few places
were missed when I did that clean up.

This commit removes the remaining resp.(error) checks, since they are all no-ops now.
2021-11-26 17:57:55 -05:00
Daniel Nephin 52f0853ff9
Merge pull request #11339 from hashicorp/dnephin/ca-manager-isolate-secondary-2
ca: reduce use of state in the secondary
2021-11-26 14:41:45 -05:00
Daniel Nephin 91a0c25932 ca: remove state check in secondarySetPrimaryRoots
This function is only ever called from operations that have already acquired the state lock, so checking
the value of state can never fail.

This change is being made in preparation for splitting out a separate type for the secondary logic. The
state can't easily be shared, so really only the expored top-level functions should acquire the 'state lock'.
2021-11-26 14:14:47 -05:00
Daniel Nephin f1944458e4 ca: remove actingSecondaryCA
This commit removes the actingSecondaryCA field, and removes the stateLock around it. This field
was acting as a proxy for providerRoot != nil, so replace it with that check instead.

The two methods which called secondarySetCAConfigured already set the state, so checking the
state again at this point will not catch runtime errors (only programming errors, which we can catch with tests).
In general, handling state transitions should be done on the "entrypoint" methods where execution starts, not
in every internal method.

This is being done to remove some unnecessary references to c.state, in preparations for extracting
types for primary/secondary.
2021-11-26 14:14:47 -05:00
Daniel Nephin b92084b8e8 ca: reduce consul provider backend interface a bit
This makes it easier to fake, which will allow me to use the ConsulProvider as
an 'external PKI' to test a customer setup where the actual root CA is not
the root we use for the Consul CA.

Replaces a call to the state store to fetch the clusterID with the
clusterID field already available on the built-in provider.
2021-11-25 11:46:06 -05:00
Dhia Ayachi 3820e09a47
Partition/kv indexid sessions (#11639)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused operations_ent.go and operations_oss.go func

* convert `indexNodeCheck` of `session_checks` table

* partition `indexID` and `indexSession` of `tableSessionChecks`

* remove partition for Checks as it's always use the session partition

* partition sessions index id table

* fix rebase issues

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-24 11:34:36 -05:00
Dhia Ayachi bb83624950
Partition session checks store (#11638)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused operations_ent.go and operations_oss.go func

* remove unused const

* convert `IndexID` of `session_checks` table

* convert `indexSession` of `session_checks` table

* convert `indexNodeCheck` of `session_checks` table

* partition `indexID` and `indexSession` of `tableSessionChecks`

* fix oss linter

* fix review comments

* remove partition for Checks as it's always use the session partition

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-24 09:10:38 -05:00
Daniel Nephin b9ab9bae12 ca: accept only the cluster ID to SpiffeIDSigningForCluster
To make it more obivous where ClusterID is used, and remove the need to create a struct
when only one field is used.
2021-11-16 16:57:21 -05:00
R.B. Boyer 1e02460bd1
re-run gofmt on 1.17 (#11579)
This should let freshly recompiled golangci-lint binaries using Go 1.17
pass 'make lint'
2021-11-16 12:04:01 -06:00
R.B. Boyer eb21649f82
partitions: various refactors to support partitioning the serf LAN pool (#11568) 2021-11-15 09:51:14 -06:00
freddygv 400697507b Move assertion to after config fetch 2021-11-10 10:50:08 -07:00
freddygv da5bcc574e Use ClusterID to check for readiness
The TrustDomain is populated from the Host() method which includes the
hard-coded "consul" domain. This means that despite having an empty
cluster ID, the TrustDomain won't be empty.
2021-11-10 10:45:22 -07:00
freddygv 6976044bc4 Prevent replicating partition-exports 2021-11-09 16:42:42 -07:00
freddygv 5c121d7a48 handle error scenario of empty local DC 2021-11-09 16:42:42 -07:00
freddygv af29cda415 Restrict DC for partition-exports writes
There are two restrictions:
- Writes from the primary DC which explicitly target a secondary DC.
- Writes to a secondary DC that do not explicitly target the primary DC.

The first restriction is because the config entry is not supported in
secondary datacenters.

The second restriction is to prevent the scenario where a user writes
the config entry to a secondary DC, the write gets forwarded to the
primary, but then the config entry does not apply in the secondary.
This makes the scope more explicit.
2021-11-09 16:42:42 -07:00
Freddy 7d95d90fce
Merge pull request #11514 from hashicorp/dnephin/ca-fix-secondary-init
ca: properly handle the case where the secondary initializes after the primary
2021-11-08 17:16:16 -07:00
freddygv cc5a7ed36c Avoid returning empty roots with uninitialized CA
Currently getCARoots could return an empty object with an empty trust
domain before the CA is initialized. This commit returns an error while
there is no CA config or no trust domain.

There could be a CA config and no trust domain because the CA config can
be created in InitializeCA before initialization succeeds.
2021-11-08 16:51:49 -07:00
Dhia Ayachi 7916268c40
refactor session state store tables to use the new index pattern (#11525)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* partition `tableSessions` table

* fix sessions to use UUID and fix prefix index

* fix oss build

* clean up unused functions

* fix oss compilation

* add a partition indexer for sessions

* Fix oss to not have partition index

* fix oss tests

* remove unused func `prefixIndexFromServiceNameAsString`

* fix test error check

* remove unused operations_ent.go and operations_oss.go func

* remove unused const

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-08 16:20:50 -05:00
Dhia Ayachi 98735a6d12
KV refactoring, part 2 (#11512)
* add partition to the kv get pretty print

* fix failing test

* add test for kvs RPC endpoint
2021-11-08 11:43:21 -05:00
Dhia Ayachi 520cb5858c
KV state store refactoring and partitioning (#11510)
* state: port KV and Tombstone tables to new pattern

* go fmt'ed

* handle wildcards for tombstones

* Fix graveyard ent vs oss

* fix oss compilation error

* add partition to tombstones and kv state store indexes

* refactor to use `indexWithEnterpriseIndexable`

* partition kvs indexID table

* add `partitionedIndexEntryName` in oss for test purpose

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add `singleValueID` implementation assertions

* remove entmeta reference from oss

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2021-11-08 09:35:56 -05:00
Giulio Micheloni af7b7b5693
Merge branch 'main' into serve-panic-recovery 2021-11-06 16:12:06 +01:00
Daniel Nephin d9110136f2 ca: Only initialize clusterID in the primary
The secondary must get the clusterID from the primary
2021-11-05 18:08:44 -04:00
Daniel Nephin 01bd3d118d ca: return an error when secondary fails to initialize
Previously secondaryInitialize would return nil in this case, which prevented the
deferred initialize from happening, and left the CA in an uninitialized state until a config
update or root rotation.

To fix this I extracted the common parts into the delegate implementation. However looking at this
again, it seems like the handling in secondaryUpdateRoots is impossible, because that function
should never be called before the secondary is initialzied. I beleive we can remove some of that
logic in a follow up.
2021-11-05 18:02:51 -04:00
Daniel Nephin 8ba760a2fc acl: remove id and revision from Policy constructors
The fields were removed in a previous commit.

Also remove an unused constructor for PolicyMerger
2021-11-05 15:45:08 -04:00
R.B. Boyer c7c5013edd
rename helper method to reflect the non-deprecated terminology (#11509) 2021-11-05 13:51:50 -05:00
FFMMM 4ddf973a31
add root_cert_ttl option for consul connect, vault ca providers (#11428)
* add root_cert_ttl option for consul connect, vault ca providers

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>

* add changelog, pr feedback

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

* Update .changelog/11428.txt, more docs

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* Update website/content/docs/agent/options.mdx

Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
2021-11-02 11:02:10 -07:00
Daniel Nephin b57cae94de
Merge pull request #10771 from hashicorp/dnephin/emit-telemetry-metrics-immediately
telemetry: improve cert expiry metrics
2021-11-01 18:31:03 -04:00
Daniel Nephin eee598e91c
Merge pull request #11338 from hashicorp/dnephin/ca-manager-isolate-secondary
ca: clearly identify methods that are primary-only or secondary-only
2021-11-01 14:10:31 -04:00
Daniel Upton d47b7311b8
Support Check-And-Set deletion of config entries (#11419)
Implements #11372
2021-11-01 16:42:01 +00:00
Dhia Ayachi 2801785710
regenerate expired certs (#11462)
* regenerate expired certs

* add documentation to generate tests certificates
2021-11-01 11:40:16 -04:00
R.B. Boyer af9ffc214d
agent: add a clone function for duplicating the serf lan configuration (#11443) 2021-10-28 16:11:26 -05:00
Daniel Nephin a8e2e1c365 agent: move agent tls metric monitor to a more appropriate place
And add a test for it
2021-10-27 16:26:09 -04:00
Daniel Nephin c92513ec16 telemetry: set cert expiry metrics to NaN on start
So that followers do not report 0, which would make alerting difficult.
2021-10-27 15:19:25 -04:00
Daniel Nephin 9264ce89d2 telemetry: fix cert expiry metrics by removing labels
These labels should be set by whatever process scrapes Consul (for
prometheus), or by the agent that receives them (for datadog/statsd).

We need to remove them here because the labels are part of the "metric
key", so we'd have to pre-declare the metrics with the labels. We could
do that, but that is extra work for labels that should be added from
elsewhere.

Also renames the closure to be more descriptive.
2021-10-27 15:19:25 -04:00
Daniel Nephin 7948720bbb telemetry: only emit leader cert expiry metrics on the servers 2021-10-27 15:19:25 -04:00
Daniel Nephin 7fe60e5989 telemetry: prevent stale values from cert monitors
Prometheus scrapes metrics from each process, so when leadership transfers to a different node
the previous leader would still be reporting the old cached value.

By setting NaN, I believe we should zero-out the value, so that prometheus should only consider the
value from the new leader.
2021-10-27 15:19:25 -04:00
Daniel Nephin 0cc58f54de telemetry: improve cert expiry metrics
Emit the metric immediately so that after restarting an agent, the new expiry time will be
emitted. This is particularly important when this metric is being monitored, because we want
the alert to resovle itself immediately.

Also fixed a bug that was exposed in one of these metrics. The CARoot can be nil, so we have
to handle that case.
2021-10-27 15:19:25 -04:00
Daniel Nephin a3c781682d subscribe: attempt to fix a flaky test
TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages has been
flaking a few times. This commit cleans up the test a bit, and improves
the failure output.

I don't believe this actually fixes the flake, but I'm not able to
reproduce it reliably.

The failure appears to be that the event with Port=0 is being sent in
both the snapshot and as the first event after the EndOfSnapshot event.

Hopefully the improved logging will show us if these are really
duplicate events, or actually different events with different indexes.
2021-10-27 15:09:09 -04:00
freddygv 43360eb216 Rework acl exports interface 2021-10-27 12:50:39 -06:00
Freddy ec7e94d129
Merge pull request #11433 from hashicorp/exported-service-acls
[OSS] acl: Expand ServiceRead and NodeRead to account for partition exports
2021-10-27 12:48:08 -06:00
Freddy a8762be529
Merge pull request #11431 from hashicorp/ap/exports-proxycfg
[OSS] Update partitioned mesh gw handling for connect proxies
2021-10-27 11:27:43 -06:00
Freddy b1b6f682e1
Merge pull request #11416 from hashicorp/ap/exports-update
Rename service-exports to partition-exports
2021-10-27 11:27:31 -06:00
freddygv aa931682ea Avoid mixing named and unnamed params 2021-10-26 23:42:25 -06:00
freddygv bf350224a0 Avoid passing nil config pointer 2021-10-26 23:42:25 -06:00
freddygv df7b5af6f0 Avoid panic on nil partitionAuthorizer config
partitionAuthorizer.config can be nil if it wasn't provided on calls to
newPartitionAuthorizer outside of the ACLResolver. This usage happens
often in tests.

This commit: adds a nil check when the config is going to be used,
updates non-test usage of NewPolicyAuthorizerWithDefaults to pass a
non-nil config, and dettaches setEnterpriseConf from the ACLResolver.
2021-10-26 23:42:25 -06:00
freddygv 22bdf279d1 Update NodeRead for partition-exports
When issuing cross-partition service discovery requests, ACL filtering
often checks for NodeRead privileges. This is because the common return
type is a CheckServiceNode, which contains node data.
2021-10-26 23:42:11 -06:00
Kyle Havlovitz 65c9109396 acl: pass PartitionInfo through ent ACLConfig 2021-10-26 23:41:52 -06:00
Kyle Havlovitz d03f849e49 acl: Expand ServiceRead logic to look at service-exports for cross-partition 2021-10-26 23:41:32 -06:00
freddygv b9b6447977 Finish removing useInDatacenter 2021-10-26 23:36:01 -06:00
freddygv eacb73cb78 Remove useInDatacenter from disco chain requests
useInDatacenter was used to determine whether the mesh gateway mode of
the upstream should be returned in the discovery chain target. This
commit makes it so that the mesh gateway mode is returned every time,
and it is up to the caller to decide whether mesh gateways should be
watched or used.
2021-10-26 23:35:21 -06:00
R.B. Boyer ef559dfdd4
agent: refactor the agent delegate interface to be partition friendly (#11429) 2021-10-26 15:08:55 -05:00
Chris S. Kim fa293362be
agent: Ensure partition is considered in agent endpoints (#11427) 2021-10-26 15:20:57 -04:00
freddygv 5c24ed61a8 Rename service-exports to partition-exports
Existing config entries prefixed by service- are specific to individual
services. Since this config entry applies to partitions it is being
renamed.

Additionally, the Partition label was changed to Name because using
Partition at the top-level and in the enterprise meta was leading to the
enterprise meta partition being dropped by msgpack.
2021-10-25 17:58:48 -06:00
Daniel Nephin 4ae2c8de9d
Merge pull request #11232 from hashicorp/dnephin/acl-legacy-remove-docs
acl: add docs and changelog for the removal of the legacy ACL system
2021-10-25 18:38:00 -04:00
Daniel Nephin 5d41b4d2f4 Update agent/consul/acl_client.go
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2021-10-25 17:25:14 -04:00
Daniel Nephin 65d48e5042 state: remove support for updating legacy ACL tokens 2021-10-25 17:25:14 -04:00
Daniel Nephin 0784a31e85 acl: remove init check for legacy anon token
This token should always already be migrated from a previous version.
2021-10-25 17:25:14 -04:00
Daniel Nephin daba3c2309 acl: remove legacy parameter to ACLDatacenter
It is no longer used now that legacy ACLs have been removed.
2021-10-25 17:25:14 -04:00
Daniel Nephin 3390f85ab4 acl: remove ACLTokenTypeManagement 2021-10-25 17:25:14 -04:00
Daniel Nephin aea4cc5a6d acl: remove legacy arg to store.ACLTokenSet
And remove the tests for legacy=true
2021-10-25 17:25:14 -04:00
Daniel Nephin c77e5747b1 acl: remove EmbeddedPolicy
This method is no longer. It only existed for legacy tokens, which are no longer supported.
2021-10-25 17:25:14 -04:00
Daniel Nephin 121431bf17 acl: remove tests for resolving legacy tokens
The code for this was already removed, which suggests this is not actually testing what it claims.

I'm guessing these are still resolving because the tokens are converted to non-legacy tokens?
2021-10-25 17:25:14 -04:00
Daniel Nephin 0d0761927a acl: stop replication on leadership lost
It seems like this was missing. Previously this was only called by init of ACLs during an upgrade.
Now that legacy ACLs are  removed, nothing was calling stop.

Also remove an unused method from client.
2021-10-25 17:24:12 -04:00
Daniel Nephin 98823e573f Remove incorrect TODO 2021-10-25 17:20:06 -04:00
Daniel Nephin 1344137ce2 acl: move the legacy ACL struct to the one package where it is used
It is now only used for restoring snapshots. We can remove it in phase 2.
2021-10-25 17:20:06 -04:00
Daniel Nephin 531f2f8a3f acl: remove most of the rest of structs/acl_legacy.go 2021-10-25 17:20:06 -04:00
FFMMM fea6f08bf9
fix autopilot_failure_tolerance, add autopilot metrics test case (#11399)
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2021-10-25 10:55:59 -07:00
Dhia Ayachi 58f5686c08
fix leadership transfer on leave suggestions (#11387)
* add suggestions

* set isLeader to false when leadership transfer succeed
2021-10-21 14:02:26 -04:00
Dhia Ayachi f424faffdd
try to perform a leadership transfer when leaving (#11376)
* try to perform a leadership transfer when leaving

* add a changelog
2021-10-21 12:44:31 -04:00
Kyle Havlovitz 04cd2c983e Add new service-exports config entry 2021-10-20 12:24:18 -07:00
Giulio Micheloni 0c78ddacde Merge branch 'main' of https://github.com/hashicorp/consul into hashicorp-main 2021-10-16 16:59:32 +01:00
R.B. Boyer cc2abb79ba
acl: small OSS refactors to help ensure that auth methods with namespace rules work with partitions (#11323) 2021-10-14 15:38:05 -05:00
freddygv e22f0cc033 Use stored entmeta to fill authzContext 2021-10-14 08:57:40 -06:00
freddygv 53ea1f634a Ensure partition is handled by auto-encrypt 2021-10-14 08:32:45 -06:00
Chris S. Kim c6906b4d37
Update Intentions.List with partitions (#11299) 2021-10-13 10:47:12 -04:00
Connor 257d00c908
Merge pull request #11222 from hashicorp/clly/service-mesh-metrics
Start tracking connect service mesh usage metrics
2021-10-11 14:35:03 -05:00
Connor Kelly 786d2896ff
Replace fmt.Sprintf with function 2021-10-11 12:43:38 -05:00
Daniel Nephin 1d14889eca ca: extract primaryUpdateRootCA
This function is only run when the CAManager is a primary. Extracting this function
makes it clear which parts of UpdateConfiguration are run only in the primary and
also makes the cleanup logic simpler. Instead of both a defer and a local var we
can call the cleanup function in two places.
2021-10-10 15:26:55 -04:00
Daniel Nephin 0bc812a8e5 ca: rename functions to use a primary or secondary prefix
This commit renames functions to use a consistent pattern for identifying the functions that
can only be called when the Manager is run as the primary or secondary.

This is a step toward eventually creating separate types and moving these methods off of CAManager.
2021-10-10 15:26:55 -04:00
Daniel Nephin eaea56c7b2 ca: make receiver variable name consistent
Every other method uses c not ca
2021-10-10 15:26:55 -04:00
FFMMM a0bba9171d
fix consul_autopilot_healthy metric emission (#11231)
https://github.com/hashicorp/consul/issues/10730
2021-10-08 10:31:50 -07:00
Connor Kelly a5cf4a9b57
Rename ConfigUsageEnterprise to EnterpriseConfigEntryUsage 2021-10-08 10:53:34 -05:00
Connor Kelly 8c519d5458
Rename and prefix ConfigEntry in Usage table
Rename ConfigUsage functions to ConfigEntry

prefix ConfigEntry kinds with the ConfigEntry table name to prevent
potential conflicts
2021-10-07 16:19:55 -05:00
Connor Kelly 533e7dbe85
Add connect specific prefix to Usage table
Ensure that connect Kind's are separate from ConfigEntry Kind's to
prevent miscounting
2021-10-07 16:16:23 -05:00
Daniel Nephin b4e3367e63 docs: add notice that legacy ACLs have been removed.
Add changelog

Also remove a metric that is no longer emitted that was missed in a
previous step.
2021-10-05 18:30:22 -04:00
Connor Kelly 024715eb11
Add changelog, website and metric docs
Add changelog to document what changed.
Add entry to telemetry section of the website to document what changed
Add docs to the usagemetric endpoint to help document the metrics in code
2021-10-05 13:34:24 -05:00
Daniel Nephin ab587f5221
Merge pull request #11182 from hashicorp/dnephin/acl-legacy-remove-upgrade
acl: remove upgrade from legacy, start in non-legacy mode
2021-10-04 17:25:39 -04:00
Daniel Nephin c7f74deb17 acl: remove updateEnterpriseSerfTags
The only remaining caller is a test helper, and the tests don't use the enterprise gossip
pools.
2021-10-04 17:01:51 -04:00
Daniel Nephin 9b1d2685bf
Merge pull request #11126 from hashicorp/dnephin/acl-legacy-remove-resolve-and-get-policy
acl: remove ACL.GetPolicy RPC endpoint and ACLResolver.resolveTokenLegacy
2021-10-04 16:29:51 -04:00
Connor Kelly c2583a1b7f
Add metrics to count the number of service-mesh config entries 2021-10-04 14:50:17 -05:00
Connor Kelly 536838b004
Add metrics to count connect native service mesh instances
This will add the counts of the service mesh instances tagged by
whether or not it is connect native
2021-10-04 14:37:05 -05:00
Connor Kelly 46bf882620
Add metrics to count service mesh Kind instance counts
This will add the counts of service mesh instances tagged by the
different ServiceKind's.
2021-10-04 14:36:59 -05:00
Daniel Nephin a1e3fa818c acl: fix test failures caused by remocving legacy ACLs
This commit two test failures:

1. Remove check for "in legacy ACL mode", the actual upgrade will be removed in a following commit.
2. Remove the early WaitForLeader in dc2, because with it the test was
   failing with ACL not found.
2021-10-01 18:03:10 -04:00
Dhia Ayachi a5b09493ab
fix token list by auth method (#11196)
* add tests to OIDC authmethod and fix entMeta when retrieving auth-methods

* fix oss compilation error
2021-10-01 12:00:43 -04:00
Daniel Nephin 4faf805716 acl: call stop for the upgrade goroutine when done
TestAgentLeaks_Server was reporting a goroutine leak without this. Not sure if it would actually
be a leak in production or if this is due to the test setup, but seems easy enough to call it
this way until we remove legacyACLTokenUpgrade.
2021-09-29 17:36:43 -04:00
Daniel Nephin 02da08ce77 acl: only run startACLUpgrade once
Since legacy ACL tokens can no longer be created we only need to run this upgrade a single
time when leadership is estalbished.
2021-09-29 16:22:01 -04:00
Daniel Nephin 3ac910606c acl: remove reading of serf acl tags
We no long need to read the acl serf tag, because servers are always either ACL enabled or
ACL disabled.

We continue to write the tag so that during an upgarde older servers will see the tag.
2021-09-29 15:45:11 -04:00
Daniel Nephin 3d7c07e1e4 acl: fix test failure
For some reason removing legacy ACL upgrade requires using an ACL token now
for this WaitForLeader.
2021-09-29 15:21:30 -04:00
Daniel Nephin 5c721832dc acl: remove legacy ACL upgrades from Server
As part of removing the legacy ACL system
2021-09-29 15:19:23 -04:00
Daniel Nephin 94be1835b2 acl: fix test failures caused by remocving legacy ACLs
This commit two test failures:

1. Remove check for "in legacy ACL mode", the actual upgrade will be removed in a following commit.
2. Use the root token in WaitForLeader, because without it the test was
   failing with ACL not found.
2021-09-29 15:15:50 -04:00
Daniel Nephin 8e9773e20b acl: remove ACL.GetPolicy endpoint and resolve legacy acls
And all code that was no longer used once those two were removed.
2021-09-29 14:33:19 -04:00
Daniel Nephin d12dd48c61 acl: remove ACL upgrading from Clients
As part of removing the legacy ACL system ACL upgrading and the flag for
legacy ACLs is removed from Clients.

This commit also removes the 'acls' serf tag from client nodes. The tag is only ever read
from server nodes.

This commit also introduces a constant for the acl serf tag, to make it easier to track where
it is used.
2021-09-29 14:02:38 -04:00
Daniel Nephin 19040586ce
Merge pull request #11136 from hashicorp/dnephin/acl-resolver-fix-default-authz
acl: fix default Authorizer for down_policy extend-cache/async-cache
2021-09-29 13:45:12 -04:00
Daniel Nephin 6e1ebd3df7 acl: remove the last of the legacy FSM
Replace it with an implementation that returns an error, and rename some symbols
to use a Deprecated suffix to make it clear.

Also remove the ACLRequest struct, which is no longer referenced.
2021-09-29 12:42:23 -04:00
Daniel Nephin ed928511ca acl: remove bootstrap-init FSM operation 2021-09-29 12:42:23 -04:00
Daniel Nephin dab5d1bdc8 acl: remove initializeLegacyACL from leader init 2021-09-29 12:42:23 -04:00
Daniel Nephin 05f0cc3993 acl: remove ACLDelete FSM command, and state store function
These are no longer used now that ACL.Apply has been removed.
2021-09-29 12:42:23 -04:00
Daniel Nephin 966e50e00e acl: remove legacy field to ACLBoostrap 2021-09-29 12:42:23 -04:00
Daniel Nephin 0330966315
Merge pull request #11101 from hashicorp/dnephin/acl-legacy-remove-rpc-2
acl: remove legacy ACL.Apply RPC
2021-09-29 12:23:55 -04:00
Daniel Nephin ea4a8343cd
Merge pull request #11177 from hashicorp/dnephin/remove-entmeta-methods
structs: remove EnterpriseMeta helper methods
2021-09-29 12:08:07 -04:00
Daniel Nephin 4c579a49ed
Merge pull request #10986 from hashicorp/dnephin/acl-legacy-remove-rpc
acl: remove legacy ACL RPC - part 1
2021-09-29 12:04:09 -04:00
Daniel Nephin eb632c53a2 structs: rename the last helper method.
This one gets used a bunch, but we can rename it to make the behaviour more obvious.
2021-09-29 11:48:38 -04:00
Daniel Nephin 8d8c1f9d5e structs: remove another helper
We already have a helper funtion.
2021-09-29 11:48:03 -04:00
Chris S. Kim 5c37819d09
Cleanup unnecessary normalizing method (#11169) 2021-09-28 15:31:12 -04:00
Daniel Nephin bf2a3f6e79
Merge pull request #11084 from krastin/krastin-autopilot-loggingtypo
Fix a tiny typo in logging in autopilot.go
2021-09-28 15:11:11 -04:00
Daniel Nephin cd4e70b34c acl: fix default authorizer for down_policy
This was causing a nil panic because a nil authorizer is no longer valid after the cleanup done
in https://github.com/hashicorp/consul/pull/10632.
2021-09-23 18:12:22 -04:00
Daniel Nephin 6bb7aef15c Remove t.Parallel from TestACLResolver_DownPolicy
These tests run in under 10ms, t.Parallel does nothing but slow them down and
make failures harder to debug when one panics.
2021-09-23 18:12:22 -04:00
Dhia Ayachi e664dbc352
Refactor table index acl phase 2 (#11133)
* extract common methods from oss and ent

* remove unreachable code

* add missing normalize for binding rules

* fix oss to use Query
2021-09-23 15:26:09 -04:00
Dhia Ayachi 5904e4ac79
Refactor table index (#11131)
* convert tableIndex to use the new pattern

* make `indexFromString` available for oss as well

* refactor `indexUpdateMaxTxn`
2021-09-23 11:06:23 -04:00
Daniel Nephin c84867feda acl: remove ACL.Apply
As part of removing the legacy ACL system.
2021-09-22 18:28:08 -04:00
Daniel Nephin ae05419aea acl: made acl rules in tests slightly more specific
When converting these tests from the legacy ACL system to the new RPC endpoints I
initially changed most things to use _prefix rules, because that was equivalent to
the old legacy rules.

This commit modifies a few of those rules to be a bit more specific by replacing the _prefix
rule with a non-prefix one where possible.
2021-09-22 18:24:56 -04:00
Mark Anderson d88d9e71c2
partitions/authmethod-index work from enterprise (#11056)
* partitions/authmethod-index work from enterprise

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-09-22 13:19:20 -07:00
R.B. Boyer 706fc8bcd0
grpc: strip local ACL tokens from RPCs during forwarding if crossing datacenters (#11099)
Fixes #11086
2021-09-22 13:14:26 -05:00
Connor 1e3ba26223
Merge pull request #11090 from hashicorp/clly/kv-usage-metrics
Add KVUsage to consul state usage metrics
2021-09-22 11:26:56 -05:00
Connor Kelly ba706501e1
Strip out go 1.17 bits 2021-09-22 11:04:48 -05:00
Daniel Nephin 72f2199ea1 acl: remove remaining tests that use ACL.Apply
In preparation for removing ACL.Apply.

Tests for ACL.Apply, ACL.GetPolicy, and ACL upgrades were removed
because all 3 of those will be removed shortly.

The forth test appears to be for the ACLResolver cache, so the test was moved to the correct
test file, and the name was updated to make it obvious what is being tested.
2021-09-21 19:35:26 -04:00
Daniel Nephin eb991c18c2 fsm: restore the legacy commands
and emit a helpful error message.
2021-09-21 18:35:12 -04:00
Daniel Nephin fa4b33ab8f Convert tests to the new ACL system
In preparation for removing ACL.Apply
2021-09-21 18:35:12 -04:00
Daniel Nephin 5779af1f62 config: use the new ACL system in tests
In preparation for removing ACL.Apply
2021-09-21 17:57:29 -04:00
Daniel Nephin d64409f66f catalog: use the new ACL system in tests
In preparation for removing ACL.Apply
2021-09-21 17:57:29 -04:00
Daniel Nephin 746f67b3a1 acl: remove two commented out tests for legacy ACL replication
They were commented out in 2018.
2021-09-21 17:57:29 -04:00
Daniel Nephin abd9cd0e15 acl: replace legacy Get and List RPCs with an error impl
These endpoints are being removed as part of the legacy ACL system.
2021-09-21 17:57:29 -04:00
Daniel Nephin e7c63004a8 acl: remove a couple legacy ACL operation constants
structs.ACLForceSet was deprecated 4 years ago, it should be safe to remove now.
ACLBootstrapNow was removed in a recent commit. While it is technically possible that a cluster with mixed version
could still attempt a legacy boostrap, we documented that the legacy system was deprecated in 1.4, so no
clusters that are being upgraded should be attempting a legacy boostrap.
2021-09-21 17:57:29 -04:00
Daniel Nephin aee8a9511d
Merge pull request #10985 from hashicorp/dnephin/acl-legacy-remove-replication
acl: remove legacy ACL replication
2021-09-21 17:56:54 -04:00
Connor 1ddee0680c
Apply suggestions from code review
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2021-09-21 10:52:46 -05:00
Connor Kelly ae3457b96a
Fix test 2021-09-20 13:44:43 -05:00
Connor Kelly 052f224ee5
Add KVUsage to consul state usage metrics
This change will add the number of entries in the consul KV store to the
already existing usage metrics.
2021-09-20 12:41:54 -05:00
Krastin Krastev f7b0938633
Update autopilot.go
Fixing a minuscule typo in logging
2021-09-20 14:40:58 +02:00
Freddy 8591620b5d
Merge pull request #11071 from hashicorp/partitions/ixn-decisions 2021-09-16 15:18:23 -06:00
R.B. Boyer faa6fd0919
acl: ensure the global management policy grants all necessary partition privileges (#11072) 2021-09-16 15:53:10 -05:00
freddygv 47109e0c0c Default the partition in ixn check 2021-09-16 14:39:01 -06:00
freddygv 82d2caa288 Fixup test 2021-09-16 14:39:01 -06:00
freddygv 95a6db9cfa Account for partitions in ixn match/decision 2021-09-16 14:39:01 -06:00
Jeff Widman 2dc62aa0c4
Bump `go-discover` to fix broken dep tree (#10898) 2021-09-16 15:31:22 -04:00
R.B. Boyer ca73abdea1
acl: fix intention:*:write checks (#11061)
This is a partial revert of #10793
2021-09-16 11:08:45 -05:00
Freddy cd08a36ce0
Merge pull request #11051 from hashicorp/partitions/fixes 2021-09-16 09:29:00 -06:00
Freddy fcef19f94b
acl: small resolver changes to account for partitions (#11052)
Also refactoring the enterprise side of a test to make it easier to reason about.
2021-09-16 09:17:02 -05:00
freddygv 99c6e4fe41 Default partition in match endpoint 2021-09-15 17:23:52 -06:00
Mark Anderson 9f12fbd3cc
ACL Binding Rules table partitioning (#11044)
* ACL Binding Rules table partitioning

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-09-15 13:26:08 -07:00
Dhia Ayachi af21578039
use const instead of literals for `tableIndex` (#11039) 2021-09-15 10:24:04 -04:00
Mark Anderson 6be54052f7
Refactor `indexAuthMethod` in `tableACLBindingRules` (#11029)
* Port consul-enterprise #1123 to OSS

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* Fixup missing query field

Signed-off-by: Mark Anderson <manderson@hashicorp.com>

* change to re-trigger ci system

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-09-15 09:34:19 -04:00
Dhia Ayachi b4d5860197
convert expiration indexed in ACLToken table to use `indexerSingle` (#11018)
* move intFromBool to be available for oss

* add expiry indexes

* remove dead code: `TokenExpirationIndex`

* fix remove indexer `TokenExpirationIndex`

* fix rebase issue
2021-09-13 14:37:16 -04:00
Dhia Ayachi 11f44dfcf8
add locality indexer partitioning (#11016)
* convert `Roles` index to use `indexerSingle`

* split authmethod write indexer to oss and ent

* add index locality

* add locality unit tests

* move intFromBool to be available for oss

* use Bool func

* refactor `aclTokenList` to merge func
2021-09-13 11:53:00 -04:00
Dhia Ayachi ba4ee6e67c
convert `indexAuthMethod` index to use `indexerSingle` (#11014)
* convert `Roles` index to use `indexerSingle`

* fix oss build

* split authmethod write indexer to oss and ent

* add auth method unit tests
2021-09-10 16:56:56 -04:00
Paul Banks 3004eadd08 Fix enterprise discovery chain tests; Fix multi-level split merging 2021-09-10 21:11:00 +01:00
Paul Banks f1c0876b4c Fix discovery chain test fixtures 2021-09-10 21:09:24 +01:00
Paul Banks e22cc9c53a Header manip for split legs plumbing 2021-09-10 21:09:24 +01:00
Dhia Ayachi 6cac30aa22
convert `Roles` index to use `indexerMulti` (#11013)
* convert `Roles` index to use `indexerMulti`

* add role test in oss

* fix oss to use the right index func

* preallocate slice
2021-09-10 16:04:33 -04:00
Dhia Ayachi f3f0654038
convert indexPolicies in ACLTokens table to the new index (#11011) 2021-09-10 14:57:37 -04:00
Dhia Ayachi 584faec6e3
convert indexSecret to the new index (#11007) 2021-09-10 09:10:11 -04:00
Dhia Ayachi 6e6cf1c043
convert indexAccessor to the new index (#11002) 2021-09-09 16:28:04 -04:00
Hans Hasselberg 13238dbab6
tls: consider presented intermediates during server connection tls handshake. (#10964)
* use intermediates when verifying

* extract connection state

* remove useless import

* add changelog entry

* golint

* better error

* wording

* collect errors

* use SAN.DNSName instead of CommonName

* Add test for unknown intermediate

* improve changelog entry
2021-09-09 21:48:54 +02:00
Chris S. Kim 9bbfa048a2
Sync enterprise changes to oss (#10994)
This commit updates OSS with files for enterprise-specific admin partitions feature work
2021-09-08 11:59:30 -04:00
Kyle Havlovitz a14950025a
Merge pull request #10984 from hashicorp/mesh-resource
acl: adding a new mesh resource
2021-09-07 15:06:20 -07:00
Dhia Ayachi bc0e4f2f46
partition dicovery chains (#10983)
* partition dicovery chains

* fix default partition for OSS
2021-09-07 16:29:32 -04:00
Daniel Nephin d63cef1219 acl: remove legacy ACL replication 2021-09-03 12:42:06 -04:00
R.B. Boyer ee372a854a acl: adding a new mesh resource 2021-09-03 09:12:03 -04:00
Evan Culver 79c7e73618
rpc: authorize raft requests (#10925) 2021-08-26 15:04:32 -07:00
Chris S. Kim 1a9b2f09dd
ent->oss test fix (#10926) 2021-08-26 14:06:49 -04:00
Chris S. Kim 45dcc8b553
api: expose upstream routing configurations in topology view (#10811)
Some users are defining routing configurations that do not have associated services. This commit surfaces these configs in the topology visualization. Also fixes a minor internal bug with non-transparent proxy upstream/downstream references.
2021-08-25 15:20:32 -04:00
R.B. Boyer a6d22efb49
acl: some acl authz refactors for nodes (#10909) 2021-08-25 13:43:11 -05:00
R.B. Boyer 5b6d96d27d
grpc: ensure that streaming gRPC requests work over mesh gateway based wan federation (#10838)
Fixes #10796
2021-08-24 16:28:44 -05:00
Giulio Micheloni 655da1fc42
Merge branch 'main' into serve-panic-recovery 2021-08-22 20:31:11 +02:00
Giulio Micheloni 4b0eaa4bff grpc, xds: recovery middleware to return and log error in case of panic
1) xds and grpc servers:
   1.1) to use recovery middleware with callback that prints stack trace to log
   1.2) callback turn the panic into a core.Internal error
2) added unit test for grpc server
2021-08-22 19:06:26 +01:00
R.B. Boyer d66a43f5f2
fixing various bits of enterprise meta plumbing to be more correct (#10889) 2021-08-20 14:34:23 -05:00
Dhia Ayachi 1950ebbe1f
oss portion of ent #1069 (#10883) 2021-08-20 12:57:45 -04:00
R.B. Boyer ac41e30614
state: partition the nodes.uuid and nodes.meta indexes as well (#10882) 2021-08-19 16:17:59 -05:00
R.B. Boyer 097e1645e3
agent: ensure that most agent behavior correctly respects partition configuration (#10880) 2021-08-19 15:09:42 -05:00
R.B. Boyer e44bce3c4f
state: partition the usage metrics subsystem (#10867) 2021-08-18 09:27:15 -05:00
R.B. Boyer 613dd7d053
state: adjust streaming event generation to account for partitioned nodes (#10860)
Also re-enabled some tests that had to be disabled in the prior PR.
2021-08-17 16:49:26 -05:00
R.B. Boyer 310e775a8a
state: partition nodes and coordinates in the state store (#10859)
Additionally:

- partitioned the catalog indexes appropriately for partitioning
- removed a stray reference to a non-existent index named "node.checks"
2021-08-17 13:29:39 -05:00
Daniel Nephin 01bf115c2b acl: small improvements to ACLResolver disable due to RPC error
Remove the error return, so that not handling is not reported as an
error by errcheck. It was returning the error passed as an arg
unmodified so there is no reason to return the same value that was
passed in.

Remove the term upstreams to remove any confusion with the term used in
service mesh.

Remove the AutoDisable field, and replace it with the TTL value, using 0
to indicate the setting is turned off.

Replace "not Before" with "After".

Add some test coverage to show the behaviour is still correct.
2021-08-17 13:34:18 -04:00
Daniel Nephin d5498770fa acl: make ACLDisabledTTL a constant
This field was never user-configurable. We always overwrote the value with 120s from
NonUserSource. However, we also never copied the value from RuntimeConfig to consul.Config,
So the value in NonUserSource was always ignored, and we used the default value of 30s
set by consul.DefaultConfig.

All of this code is an unnecessary distraction because a user can not actually configure
this value.

This commit removes the fields and uses a constant value instad. Someone attempting to set
acl.disabled_ttl in their config will now get an error about an unknown field, but previously
the value was completely ignored, so the new behaviour seems more correct.

We have to keep this field in the AutoConfig response for backwards compatibility, but the value
will be ignored by the client, so it doesn't really matter what value we set.
2021-08-17 13:34:18 -04:00
Daniel Nephin abd2e160f9 Fix test failures
Tests only specified one of the fields, but in production we copy the
value from a single place, so we can do the same in tests.

The AutoConfig test broke because of the problem noticed in a previous
commit. The DisabledTTL is not wired up properly so it reports 0s here.
Changed the test to use an explicit value.
2021-08-17 13:32:52 -04:00
Daniel Nephin 17841248dd config: remove ACLResolver settings from RuntimeConfig 2021-08-17 13:32:52 -04:00
Daniel Nephin 31e034215f acl: remove ACLResolver config fields from consul.Config 2021-08-17 13:32:52 -04:00
Daniel Nephin d4701903f6 acl: replace ACLResolver.Config with its own struct
This is step toward decoupling ACLResolver from the agent/consul
package.
2021-08-17 13:32:52 -04:00
Daniel Nephin b7bced9bcf acl: remove legacy bootstrap
Return an explicit error from the RPC, and remove the flag from the HTTP API.
2021-08-17 13:10:00 -04:00
Daniel Nephin 9671dd6b97 acl: add some notes about removing legacy ACL system 2021-08-17 13:08:29 -04:00
Daniel Nephin 887d11923b
Merge pull request #10792 from hashicorp/dnephin/rename-authz-vars
acl: use authz consistently as the variable name for an acl.Authorizer
2021-08-17 13:07:17 -04:00
Daniel Nephin c85c62dffb
Merge pull request #10807 from hashicorp/dnephin/remove-acl-datacenter
config: remove ACLDatacenter
2021-08-17 13:07:09 -04:00
Daniel Nephin e637cd71f3 acl: use authz consistently as the variable name for an acl.Authorizer
Follow up to https://github.com/hashicorp/consul/pull/10737#discussion_r682147950

Renames all variables for acl.Authorizer to use `authz`. Previously some
places used `rule` which I believe was an old name carried over from the
legacy ACL system.

A couple places also used authorizer.

This commit also removes another couple of authorizer nil checks that
are no longer necessary.
2021-08-17 12:14:10 -04:00
Daniel Nephin 67fc97522f server: remove defaulting of PrimaryDatacenter
The constructor for Server is not at all the appropriate place to be setting default
values for a config struct that was passed in.

In production this value is always set from agent/config. In tests we should set the
default in a test helper.
2021-08-06 18:45:24 -04:00
Daniel Nephin d3325b0253
Merge pull request #10612 from bigmikes/acl-replication-fix
acl: acl replication routine to report the last error message
2021-08-06 18:29:51 -04:00
Daniel Nephin 7160f7a614 acl: remove ACLDatacenter
This field has been unnecessary for a while now. It was always set to the same value
as PrimaryDatacenter. So we can remove the duplicate field and use PrimaryDatacenter
directly.

This change was made by GoLand refactor, which did most of the work for me.
2021-08-06 18:27:00 -04:00
Giulio Micheloni d4a3fe33e8 String type instead of error type and changelog. 2021-08-06 22:35:27 +01:00
Daniel Nephin b837ba35a0 acl: remove Server.ResolveTokenIdentityAndDefaultMeta
This method suffered from similar naming to a couple other methods on Server, and had not great
re-use (2 callers). By copying a few of the lines into one of the callers we can move the
implementation into the second caller.

Once moved, we can see that ResolveTokenAndDefaultMeta is identical in both Client and Server, and
likely should be further refactored, possibly into ACLResolver.

This change is being made to make ACL resolution easier to trace.
2021-08-05 15:20:13 -04:00
Daniel Nephin 6cf6e7c5fe acl: remove Server.ResolveTokenToIdentityAndAuthorizer
This method was an alias for ACLResolver.ResolveTokenToIdentityAndAuthorizer. By removing the
method that does nothing the code becomes easier to trace.
2021-08-05 15:20:13 -04:00
Daniel Nephin cc4f155801 acl: recouple acl filtering from ACLResolver
ACL filtering only needs an authorizer and a logger. We can decouple filtering from
the ACLResolver by passing in the necessary logger.

This change is being made in preparation for moving the ACLResolver into an acl package
2021-08-05 15:20:13 -04:00
Daniel Nephin 111f3620a8 acl: remove unused error return
filterACLWithAuthorizer could never return an error. This change moves us a little bit
closer to being able to enable errcheck and catch problems caused by unhandled error
return values.
2021-08-05 15:20:13 -04:00
Daniel Nephin c0100543d0 acl: rename acl.Authorizer vars to authz
For consistency
2021-08-05 15:19:47 -04:00
Daniel Nephin 4f5477ccfa acl: move vet functions
These functions are moved to the one place they are called to improve code locality.

They are being moved out of agent/consul/acl.go in preparation for moving
ACLResolver to an acl package.
2021-08-05 15:19:24 -04:00
Daniel Nephin c4eadb6b96 acl: move vetRegisterWithACL and vetDeregisterWithACL
These functions are used in only one place. Move the functions next to their one caller
to improve code locality.

This change is being made in preparation for moving the ACLResolver into an
acl package. The moved functions were previously in the same file as the ACLResolver.
By moving them out of that file we may be able to move the entire file
with fewer modifications.
2021-08-05 15:17:54 -04:00
Daniel Nephin 1deba421c7
Merge pull request #10770 from hashicorp/dnephin/log-cert-expiration
telemetry: add log message when certs are about to expire
2021-08-05 15:17:20 -04:00
Daniel Nephin 0c42b38c92
Merge pull request #10793 from hashicorp/dnephin/acl-intentions
acl: small cleanup of a couple Authorization flows
2021-08-05 15:16:49 -04:00
Dhia Ayachi b495036823
defer setting the state before returning to avoid stuck in `INITIALIZING` state (#10630)
* defer setting the state before returning to avoid being stuck in `INITIALIZING` state

* add changelog

* move comment with the right if statement

* ca: report state transition error from setSTate

* update comment to reflect state transition

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-08-05 14:51:19 -04:00
Daniel Nephin e94016872a
Merge pull request #10768 from hashicorp/dnephin/agent-tls-cert-expiration-metric
telemetry: add Agent TLS Certificate expiration metric
2021-08-04 18:42:02 -04:00
Daniel Nephin c718de730b acl: remove special handling of services in txn_endpoint
Follow up to: https://github.com/hashicorp/consul/pull/10738#discussion_r680190210

Previously we were passing an Authorizer that would always allow the
operation, then later checking the authorization using vetServiceTxnOp.

On the surface this seemed strange, but I think it was actually masking
a bug as well. Over time `servicePreApply` was changed to add additional
authorization for `service.Proxy.DestinationServiceName`, but because
we were passing a nil Authorizer, that authorization was not handled on
the txn_endpoint.

`TxnServiceOp.FillAuthzContext` has some special handling in enterprise,
so we need to make sure to continue to use that from the Txn endpoint.

This commit removes the `vetServiceTxnOp` function, and passes in the
`FillAuthzContext` function so that `servicePreApply` can be used by
both the catalog and txn endpoints. This should be much less error prone
and prevent bugs like this in the future.
2021-08-04 18:32:20 -04:00
Daniel Nephin bbce192b4d
Merge pull request #10738 from hashicorp/dnephin/remove-authorizer-nil-checks-2
acl: remove the last of the authz == nil checks
2021-08-04 17:41:40 -04:00
Daniel Nephin 9cdd823ffc
Merge pull request #10737 from hashicorp/dnephin/remove-authorizer-nil-checks
acl: remove authz == nil checks
2021-08-04 17:39:34 -04:00
Daniel Nephin 0e58e1ac4b telemetry: add log message when certs are about to expire 2021-08-04 14:18:59 -04:00
Daniel Nephin 9420506fae telemetry: fix a couple bugs in cert expiry metrics
1. do not emit the metric if Query fails
2. properly check for PrimaryUsersIntermediate, the logic was inverted

Also improve the logging by including the metric name in the log message
2021-08-04 13:51:44 -04:00
Daniel Nephin 8c575445da telemetry: add a metric for agent TLS cert expiry 2021-08-04 13:51:44 -04:00
Dhia Ayachi cfa9cf6d84
fix state index for `CAOpSetRootsAndConfig` op (#10675)
* fix state index for `CAOpSetRootsAndConfig` op

* add changelog

* Update changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* remove the change log as it's not needed

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-08-04 13:07:49 -04:00
Daniel Nephin 8cf1aa1bda acl: Remove the remaining authz == nil checks
These checks were a bit more involved. They were previously skipping some code paths
when the authorizer was nil. After looking through these it seems correct to remove the
authz == nil check, since it will never evaluate to true.
2021-07-30 14:55:35 -04:00
Daniel Nephin dc50b36b0f acl: remove acl == nil checks 2021-07-30 14:28:19 -04:00
Daniel Nephin 4f1a36629a acl: remove authz == nil checks
These case are already impossible conditions, because most of these functions already start
with a check for ACLs being disabled. So the code path being removed could never be reached.

The one other case (ConnectAuthorized) was already changed in a previous commit. This commit
removes an impossible branch because authz == nil can never be true.
2021-07-30 13:58:35 -04:00
Daniel Nephin f497d5ab30 acl: remove many instances of authz == nil 2021-07-30 13:58:35 -04:00
Daniel Nephin 9dd6d26d05 acl: remove rule == nil checks 2021-07-30 13:58:35 -04:00
Daniel Nephin 84fac3ce0e acl: use acl.ManangeAll when ACLs are disabled
Instead of returning nil and checking for nilness

Removes a bunch of nil checks, and fixes one test failures.
2021-07-30 12:58:24 -04:00
Freddy ff9700b068
Reset root prune interval after TestLeader_CARootPruning completes
#10645

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-07-26 15:43:40 -06:00
R.B. Boyer 3343c7cb3a
state: refactor some node/coordinate state store functions to take an EnterpriseMeta (#10687)
Note the field is not used yet.
2021-07-23 13:42:23 -05:00
R.B. Boyer fc9b1a277d
sync changes to oss files made in enterprise (#10670) 2021-07-22 13:58:08 -05:00
R.B. Boyer 188e8dc51f
agent/structs: add a bunch more EnterpriseMeta helper functions to help with partitioning (#10669) 2021-07-22 13:20:45 -05:00
Dhia Ayachi c6859b3fb0
config raft apply silent error (#10657)
* return an error when the index is not valid

* check response as bool when applying `CAOpSetConfig`

* remove check for bool response

* fix error message and add check to test

* fix comment

* add changelog
2021-07-22 10:32:27 -04:00
Daniel Nephin a77575e93e acl: use SetHash consistently in testPolicyForID
A previous commit used SetHash on two of the cases to fix a data race. This commit applies
that change to all cases. Using SetHash in this test helper should ensure that the
test helper behaves closer to production.
2021-07-16 17:59:56 -04:00
Giulio Micheloni 814ef6b103 acl: fix error type into a string type for serialization issue
acl_endpoint_test.go:507:
        	Error Trace:	acl_endpoint_test.go:507
        	            				retry.go:148
        	            				retry.go:149
        	            				retry.go:103
        	            				acl_endpoint_test.go:504
        	Error:      	Received unexpected error:
        	            	codec.decoder: decodeValue: Cannot decode non-nil codec value into nil error (1 methods)
        	Test:       	TestACLEndpoint_ReplicationStatus
2021-07-15 11:31:44 +02:00
Daniel Nephin fa47c04065 Fix a data race in TestACLResolver_Client
By setting the hash when we create the policy.

```
WARNING: DATA RACE
Read at 0x00c0028b4b10 by goroutine 1182:
  github.com/hashicorp/consul/agent/structs.(*ACLPolicy).SetHash()
      /home/daniel/pers/code/consul/agent/structs/acl.go:701 +0x40d
  github.com/hashicorp/consul/agent/structs.ACLPolicies.resolveWithCache()
      /home/daniel/pers/code/consul/agent/structs/acl.go:779 +0xfe
  github.com/hashicorp/consul/agent/structs.ACLPolicies.Compile()
      /home/daniel/pers/code/consul/agent/structs/acl.go:809 +0xf1
  github.com/hashicorp/consul/agent/consul.(*ACLResolver).ResolveTokenToIdentityAndAuthorizer()
      /home/daniel/pers/code/consul/agent/consul/acl.go:1226 +0x6ef
  github.com/hashicorp/consul/agent/consul.resolveTokenAsync()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:66 +0x5c

Previous write at 0x00c0028b4b10 by goroutine 1509:
  github.com/hashicorp/consul/agent/structs.(*ACLPolicy).SetHash()
      /home/daniel/pers/code/consul/agent/structs/acl.go:730 +0x3a8
  github.com/hashicorp/consul/agent/structs.ACLPolicies.resolveWithCache()
      /home/daniel/pers/code/consul/agent/structs/acl.go:779 +0xfe
  github.com/hashicorp/consul/agent/structs.ACLPolicies.Compile()
      /home/daniel/pers/code/consul/agent/structs/acl.go:809 +0xf1
  github.com/hashicorp/consul/agent/consul.(*ACLResolver).ResolveTokenToIdentityAndAuthorizer()
      /home/daniel/pers/code/consul/agent/consul/acl.go:1226 +0x6ef
  github.com/hashicorp/consul/agent/consul.resolveTokenAsync()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:66 +0x5c

Goroutine 1182 (running) created at:
  github.com/hashicorp/consul/agent/consul.TestACLResolver_Client.func4()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:1669 +0x459
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202

Goroutine 1509 (running) created at:
  github.com/hashicorp/consul/agent/consul.TestACLResolver_Client.func4()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:1668 +0x415
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202
```
2021-07-14 18:58:16 -04:00
Daniel Nephin baa2b8628e consul: fix data race in leader CA tests
Some global variables are patched to shorter values in these tests. But the goroutines that read
them can outlive the test because nothing waited for them to exit.

This commit adds a Wait() method to the routine manager, so that tests can wait for the goroutines
to exit. This prevents the data race because the 'reset to original value' can happen
after all other goroutines have stopped.
2021-07-14 18:58:15 -04:00
Giulio Micheloni 529fe737ef acl: acl replication routine to report the last error message 2021-07-14 11:50:23 +02:00
Dhia Ayachi 58bd817336
check expiry date of the root/intermediate before using it to sign a leaf (#10500)
* ca: move provider creation into CAManager

This further decouples the CAManager from Server. It reduces the interface between them and
removes the need for the SetLogger method on providers.

* ca: move SignCertificate to CAManager

To reduce the scope of Server, and keep all the CA logic together

* ca: move SignCertificate to the file where it is used

* auto-config: move autoConfigBackend impl off of Server

Most of these methods are used exclusively for the AutoConfig RPC
endpoint. This PR uses a pattern that we've used in other places as an
incremental step to reducing the scope of Server.

* fix linter issues

* check error when `raftApplyMsgpack`

* ca: move SignCertificate to CAManager

To reduce the scope of Server, and keep all the CA logic together

* check expiry date of the intermediate before using it to sign a leaf

* fix typo in comment

Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>

* Fix test name

* do not check cert start date

* wrap error to mention it is the intermediate expired

* Fix failing test

* update comment

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* use shim to avoid sleep in test

* add root cert validation

* remove duplicate code

* Revert "fix linter issues"

This reverts commit 6356302b54f06c8f2dee8e59740409d49e84ef24.

* fix import issue

* gofmt leader_connect_ca

* add changelog entry

* update error message

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* fix error message in test

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2021-07-13 12:15:06 -04:00
R.B. Boyer 6c47efd532
connect/ca: ensure edits to the key type/bits for the connect builtin CA will regenerate the roots (#10330)
progress on #9572
2021-07-13 11:12:07 -05:00
R.B. Boyer 7bf9ea55cf
connect/ca: require new vault mount points when updating the key type/bits for the vault connect CA provider (#10331)
progress on #9572
2021-07-13 11:11:46 -05:00
Daniel Nephin 0ccad1d6f7
Merge pull request #10479 from hashicorp/dnephin/ca-provider-explore-2
ca: move Server.SignIntermediate to CAManager
2021-07-12 19:03:43 -04:00
Daniel Nephin b89ec826c8
Merge pull request #10445 from hashicorp/dnephin/ca-provider-explore
ca: isolate more of the CA logic in CAManager
2021-07-12 15:26:23 -04:00
Daniel Nephin bf292cbae4 ca: use provider constructors to be more consistent
Adds a contructor for the one provider that did not have one.
2021-07-12 14:04:34 -04:00
Dhia Ayachi 5ed56fc786 check error when `raftApplyMsgpack` 2021-07-12 13:42:51 -04:00
Daniel Nephin 3091026e02 auto-config: move autoConfigBackend impl off of Server
Most of these methods are used exclusively for the AutoConfig RPC
endpoint. This PR uses a pattern that we've used in other places as an
incremental step to reducing the scope of Server.
2021-07-12 13:42:40 -04:00
Daniel Nephin 0512cb2813 ca: move SignCertificate to the file where it is used 2021-07-12 13:42:39 -04:00
Daniel Nephin dfebfe508e ca: move SignCertificate to CAManager
To reduce the scope of Server, and keep all the CA logic together
2021-07-12 13:42:39 -04:00
Dhia Ayachi 047537833d add missing state reset when stopping ca manager 2021-07-12 09:32:36 -04:00
Daniel Nephin 6228c4a53c ca: fix mockCAServerDelegate to work with the new interface
raftApply was removed so ApplyCARequest needs to handle all the possible operations

Also set the providerShim to use the mock provider.

other changes are small test improvements that were necessary to debug the failures.
2021-07-12 09:32:36 -04:00
Daniel Nephin 7dae65cd56 ca: remove unused method
and small refactor to getCAProvider so that GoLand is less confused about what it is doing.
Previously it was reporting that the for condition was always true, which was not the case.
2021-07-12 09:32:35 -04:00
Daniel Nephin 1960c717af ca: remove raftApply from delegate interface
After moving ca.ConsulProviderStateDelegate into the interface we now
have the ApplyCARequest method which does the same thing. Use this more
specific method instead of raftApply.
2021-07-12 09:32:35 -04:00
Daniel Nephin 1f4cdde9cc ca: move generateCASignRequest to the delegate
This method on Server was only used by the caDelegateWithState, so move it there
until we can move it entirely into CAManager.
2021-07-12 09:32:35 -04:00
Daniel Nephin fc14f5ab14 ca: move provider creation into CAManager
This further decouples the CAManager from Server. It reduces the interface between them and
removes the need for the SetLogger method on providers.
2021-07-12 09:32:33 -04:00
Daniel Nephin b1877660d5 ca-manager: move provider shutdown into CAManager
Reducing the coupling between Server and CAManager
2021-07-12 09:27:28 -04:00
Daniel Nephin be8c675942 config: remove misleading UseTLS field
This field was documented as enabling TLS for outgoing RPC, but that was not the case.
All this field did was set the use_tls serf tag.

Instead of setting this field in a place far from where it is used, move the logic to where
the serf tag is set, so that the code is much more obvious.
2021-07-09 19:01:45 -04:00
Daniel Nephin 70770db345 config: remove duplicate TLSConfig fields from agent/consul.Config
tlsutil.Config already presents an excellent structure for this
configuration. Copying the runtime config fields to agent/consul.Config
makes code harder to trace, and provides no advantage.

Instead of copying the fields around, use the tlsutil.Config struct
directly instead.

This is one small step in removing the many layers of duplicate
configuration.
2021-07-09 18:49:42 -04:00
Evan Culver 13bd86527b
Add support for returning ACL secret IDs for accessors with acl:write (#10546) 2021-07-08 15:13:08 -07:00
Dhia Ayachi 6390e91be5
Add ca certificate metrics (#10504)
* add intermediate ca metric routine

* add Gauge config for intermediate cert

* Stop metrics routine when stopping leader

* add changelog entry

* updage changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* use variables instead of a map

* go imports sort

* Add metrics for primary and secondary ca

* start metrics routine in the right DC

* add telemetry documentation

* update docs

* extract expiry fetching in a func

* merge metrics for primary and secondary into signing ca metric

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-07-07 09:41:01 -04:00
Daniel Nephin 2c4f22a9f0
Merge pull request #10552 from hashicorp/dnephin/ca-remove-rotation-period
ca: remove unused RotationPeriod field
2021-07-06 18:49:33 -04:00
jkirschner-hashicorp 5f73de6fbc
Merge pull request #10560 from jkirschner-hashicorp/change-sane-to-reasonable
Replace use of 'sane' where appropriate
2021-07-06 11:46:04 -04:00
Daniel Nephin 3a045cca8d ca: remove unused RotationPeriod field
This field was never used. Since it is persisted as part of a map[string]interface{} it
is pretty easy to remove it.
2021-07-05 19:15:44 -04:00
Jared Kirschner bd536151e1 Replace use of 'sane' where appropriate
HashiCorp voice, style, and language guidelines recommend avoiding ableist
language unless its reference to ability is accurate in a particular use.
2021-07-02 12:18:46 -04:00
Dhia Ayachi 9b45107c1e
Format certificates properly (rfc7468) with a trailing new line (#10411)
* trim carriage return from certificates when inserting rootCA in the inMemDB

* format rootCA properly when returning the CA on the connect CA endpoint

* Fix linter warnings

* Fix providers to trim certs before returning it

* trim newlines on write when possible

* add changelog

* make sure all provider return a trailing newline after the root and intermediate certs

* Fix endpoint to return trailing new line

* Fix failing test with vault provider

* make test more robust

* make sure all provider return a trailing newline after the leaf certs

* Check for suffix before removing newline and use function

* Add comment to consul provider

* Update change log

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* fix typo

* simplify code callflow

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* extract requireNewLine as shared func

* remove dependency to testify in testing file

* remove extra newline in vault provider

* Add cert newline fix to envoy xds

* remove new line from mock provider

* Remove adding a new line from provider and fix it when the cert is read

* Add a comment to explain the fix

* Add missing for leaf certs

* fix missing new line

* fix missing new line in leaf certs

* remove extra new line in test

* updage changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* fix in vault provider and when reading cache (RPC call)

* fix AWS provider

* fix failing test in the provider

* remove comments and empty lines

* add check for empty cert in test

* fix linter warnings

* add new line for leaf and private key

* use string concat instead of Sprintf

* fix new lines for leaf signing

* preallocate slice and remove append

* Add new line to `SignIntermediate` and `CrossSignCA`

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-06-30 20:48:29 -04:00
Daniel Nephin 7531a6681d docs: correct some misleading telemetry docs
The query metrics are actually reported for all read queries, not only
ones that use a MinIndex to block for updates.

Also clarify the raft.apply metric is only on the leader.
2021-06-28 12:20:53 -04:00
R.B. Boyer a2876453a5
connect/ca: cease including the common name field in generated certs (#10424)
As part of this change, we ensure that the SAN extensions are marked as
critical when the subject is empty so that AWS PCA tolerates the loss of
common names well and continues to function as a Connect CA provider.

Parts of this currently hack around a bug in crypto/x509 and can be
removed after https://go-review.googlesource.com/c/go/+/329129 lands in
a Go release.

Note: the AWS PCA tests do not run automatically, but the following
passed locally for me:

    ENABLE_AWS_PCA_TESTS=1 go test ./agent/connect/ca -run TestAWS
2021-06-25 13:00:00 -05:00
Daniel Nephin f52d76f096 ca: replace ca.PrimaryIntermediateProviders
With an optional interface that providers can use to indicate if they
use an intermediate cert in the primary DC.

This removes the need to look up the provider config when renewing the
intermediate.
2021-06-23 15:47:30 -04:00
Daniel Nephin d81f527be8
Merge pull request #9924 from hashicorp/dnephin/cert-expiration-metric
connect: emit a metric for the seconds until root CA expiry
2021-06-18 14:18:55 -04:00
Daniel Nephin aec7e798b0 Update metric name
and handle the case where there is no active root CA.
2021-06-14 17:01:16 -04:00
Daniel Nephin 1c980e4700 connect: emit a metric for the number of seconds until root CA expiration 2021-06-14 16:57:01 -04:00
Freddy ffb13f35f1
Rename CatalogDestinationsOnly (#10397)
CatalogDestinationsOnly is a passthrough that would enable dialing
addresses outside of Consul's catalog. However, when this flag is set to
true only _connect_ endpoints for services can be dialed.

This flag is being renamed to signal that non-Connect endpoints can't be
dialed by transparent proxies when the value is set to true.
2021-06-14 14:15:09 -06:00
Freddy 429f9d8bb8
Add flag for transparent proxies to dial individual instances (#10329) 2021-06-09 14:34:17 -06:00
Daniel Nephin eb0c0d7740 stream: remove bufferItem.NextLink
Both NextLink and NextNoBlock had the same logic, with slightly
different return values. By adding a bool return value (similar to map
lookups) we can remove the duplicate method.
2021-06-07 17:04:46 -04:00
Daniel Nephin 5ef8a045f3 stream: fix a bug with creating a snapshot
The head of the topic buffer was being ignored when creating a snapshot. This commit fixes
the bug by ensuring that the head of the topic buffer is included in the snapshot
before handing it off to the subscription.
2021-06-04 18:33:04 -04:00
Paul Ewing 42a51b1a2c
usagemetrics: add cluster members to metrics API (#10340)
This PR adds cluster members to the metrics API. The number of members per
segment are reported as well as the total number of members.

Tested by running a multi-node cluster locally and ensuring the numbers were
correct. Also added unit test coverage to add the new expected gauges to
existing test cases.
2021-06-03 08:25:53 -07:00
Daniel Nephin 29e93f6338 grpc: fix a data race by using a static resolver
We have seen test flakes caused by 'concurrent map read and map write', and the race detector
reports the problem as well (prevent us from running some tests with -race).

The root of the problem is the grpc expects resolvers to be registered at init time
before any requests are made, but we were using a separate resolver for each test.

This commit introduces a resolver registry. The registry is registered as the single
resolver for the consul scheme. Each test uses the Authority section of the target
(instead of the scheme) to identify the resolver that should be used for the test.
The scheme is used for lookup, which is why it can no longer be used as the unique
key.

This allows us to use a lock around the map of resolvers, preventing the data race.
2021-06-02 11:35:38 -04:00
Dhia Ayachi f785c5b332
RPC Timeout/Retries account for blocking requests (#8978) 2021-05-27 17:29:43 -04:00
Matt Keeler da31e0449e Move some things around to allow for license updating via config reload
The bulk of this commit is moving the LeaderRoutineManager from the agent/consul package into its own package: lib/gort. It also got a renaming and its Start method now requires a context. Requiring that context required updating a whole bunch of other places in the code.
2021-05-25 09:57:50 -04:00
Matt Keeler caafc02449 hcs-1936: Prepare for adding license auto-retrieval to auto-config in enterprise 2021-05-24 13:20:30 -04:00
Matt Keeler 234d0a3c2a Preparation for changing where license management is done. 2021-05-24 10:19:31 -04:00
Daniel Nephin ed37a2dbc2 Refactor of serf feature flag tags.
This refactor is to make it easier to see how serf feature flags are
encoded as serf tags, and where those feature flags are read.

- use constants for both the prefix and feature flag name. A constant
  makes it much easier for an IDE to locate the read and write location.
- isolate the feature-flag encoding logic in the metadata package, so
  that the feature flag prefix can be unexported. Only expose a function
  for encoding the flags into tags. This logic is now next to the logic
  which reads the tags.
- remove the duplicate `addEnterpriseSerfTags` functions. Both Client
  and Server structs had the same implementation. And neither
  implementation needed the method receiver.
2021-05-20 12:57:06 -04:00
R.B. Boyer 597448da47
server: ensure that central service config flattening properly resets the state each time (#10239)
The prior solution to call reply.Reset() aged poorly since newer fields
were added to the reply, but not added to Reset() leading serial
blocking query loops on the server to blend replies.

This could manifest as a service-defaults protocol change from
default=>http not reverting back to default after the config entry
reponsible was deleted.
2021-05-14 10:21:44 -05:00
Daniel Nephin 9c8b0b451f testing: don't run t.Parallel in a goroutine
TestACLEndpoint_Login_with_TokenLocality was reguardly being reported as failed even though
it was not failing. I took another look and I suspect it is because t.Parllel was being
called in a goroutine.

This would lead to strange behaviour which apparently confused the 'go test' runner.
2021-05-10 13:30:10 -04:00
Daniel Nephin d19137a429
Merge pull request #10075 from hashicorp/dnephin/handle-raft-apply-errors
rpc: some cleanup of canRetry and ForwardRPC
2021-05-06 16:59:53 -04:00
Daniel Nephin 45c5ba46f3 state: reduce arguments to validateProposedConfigEntryInServiceGraph 2021-05-06 13:47:40 -04:00
Daniel Nephin 6b513c1ba4 rpc: add tests for canRetry
Also accept an RPCInfo instead of interface{}. Accepting an interface
lead to a bug where the caller was expecting the arg to be the response
when in fact it was always passed the request. By accepting RPCInfo
it should indicate that this is actually the request value.

One caller of canRetry already passed an RPCInfo, the second handles
the type assertion before calling canRetry.
2021-05-06 13:30:07 -04:00
Daniel Nephin 5a6f15713c rpc: remove unnecessary arg to ForwardRPC 2021-05-06 13:30:07 -04:00
Daniel Nephin 347f3d2128
Merge pull request #10155 from hashicorp/dnephin/config-entry-remove-fields
config-entry: remove Kind and Name field from Mesh config entry
2021-05-04 17:27:56 -04:00
Daniel Nephin c8c85523e1 config-entries: add a test for the API client
Also fixes a bug with listing kind=mesh config entries. ValidateConfigEntryKind was only being used by
the List endpoint, and was yet another place where we have to enumerate all the kinds.

This commit removes ValidateConfigEntryKind and uses MakeConfigEntry instead. This change removes
the need to maintain two separate functions at the cost of creating an instance of the config entry which will be thrown away immediately.
2021-05-04 17:14:21 -04:00
Daniel Nephin 6713afdff3 lint: fix warning by removing reference to deprecated interface 2021-05-04 14:09:14 -04:00
Paul Banks 3ad754ca7b
Make Raft trailing logs and snapshot timing reloadable (#10129)
* WIP reloadable raft config

* Pre-define new raft gauges

* Update go-metrics to change gauge reset behaviour

* Update raft to pull in new metric and reloadable config

* Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important

* Update telemetry docs

* Update config and telemetry docs

* Add note to oldestLogAge on when it is visible

* Add changelog entry

* Update website/content/docs/agent/options.mdx

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2021-05-04 15:36:53 +01:00
Luke Kysow 8d6cbe7281
Give descriptive error if auth method not found (#10163)
* Give descriptive error if auth method not found

Previously during a `consul login -method=blah`, if the auth method was not found, the
error returned would be "ACL not found". This is potentially confusing
because there may be many different ACLs involved in a login: the ACL of
the Consul client, perhaps the binding rule or the auth method.

Now the error will be "auth method blah not found", which is much easier
to debug.
2021-05-03 13:39:13 -07:00
Daniel Nephin 62efaaab21 config-entry: remove Kind and Name field from Mesh config entry
No config entry needs a Kind field. It is only used to determine the Go type to
target. As we introduce new config entries (like this one) we can remove the kind field
and have the GetKind method return the single supported value.

In this case (similar to proxy-defaults) the Name field is also unnecessary. We always
use the same value. So we can omit the name field entirely.
2021-04-29 17:11:21 -04:00
Freddy 078c40425f
Rename "cluster" config entry to "mesh" (#10127)
This config entry is being renamed primarily because in k8s the name
cluster could be confusing given that the config entry applies across
federated datacenters.

Additionally, this config entry will only apply to Consul as a service
mesh, so the more generic "cluster" name is not needed.
2021-04-28 16:13:29 -06:00
Daniel Nephin e229b877d8 health: create health.Client in Agent.New 2021-04-27 19:03:16 -04:00
Matt Keeler 65d73771a5
Add prometheus guage definitions for replication metrics. (#10109) 2021-04-23 17:05:33 -04:00
Matt Keeler ecbccdc261
Add replication metrics (#10073) 2021-04-22 11:20:53 -04:00