Commit Graph

1279 Commits

Author SHA1 Message Date
Daniel Nephin 1c4e0cfa2a
Merge pull request #9718 from hashicorp/oss/dnephin/ent-meta-in-state-store-3
state: convert all table name constants to the new prefix pattern
2021-02-05 14:02:07 -05:00
Daniel Nephin 0814f22715
Merge pull request #9665 from hashicorp/dnephin/state-store-indexes-2
state: move config-entries table definition to config_entries_schema.go
2021-02-05 14:01:08 -05:00
Daniel Nephin 912dbb4cb4
Merge pull request #9664 from hashicorp/dnephin/state-store-indexes
state: move ACL schema and index definitions to acl_schema.go
2021-02-05 13:38:31 -05:00
Daniel Nephin 05d5ec4804 state: remove the need for registerSchema
registerSchema creates some indirection which is not necessary in this
case. newDBSchema can call each of the tables.

Enterprise tables can be added from the existing withEnterpriseSchema
shim.
2021-02-05 12:19:56 -05:00
Daniel Nephin 2cbf8b5fd0 state: rename table name constants to use pattern
the 'table' prefix is shorter, and also reads better in queries.
2021-02-05 12:12:19 -05:00
Daniel Nephin 8ac9d54ccc state: rename connect constants 2021-02-05 12:12:19 -05:00
Daniel Nephin 0c34e474c5 state: rename table name constants to new pattern
Using Apps Hungarian Notation for these constants makes the memdb queries more readable.
2021-02-05 12:12:18 -05:00
Pierre Souchay 7a024ed074 Streaming filter tags + case insensitive lookups for Service Names
Will fix:
 * https://github.com/hashicorp/consul/issues/9695
 * https://github.com/hashicorp/consul/issues/9702
2021-02-04 11:00:51 +01:00
Daniel Nephin 2d5b5afec1 state: Remove unnecessary entMeta arg to EnsureConfigEntry 2021-02-03 18:10:38 -05:00
Kyle Havlovitz 7dac583863 connect/ca: Allow ForceWithoutCrossSigning for all providers
This allows setting ForceWithoutCrossSigning when reconfiguring the CA
for any provider, in order to forcibly move to a new root in cases where
the old provider isn't reachable or able to cross-sign for whatever
reason.
2021-01-29 13:38:11 -08:00
Daniel Nephin 5b4703f0e4 state: rename config-entries table const to match new pattern 2021-01-28 20:34:34 -05:00
Daniel Nephin cd06b5728c state: move config-entries table to new pattern 2021-01-28 20:34:15 -05:00
Daniel Nephin e8931b868c state: use indexID
this change was already made to enterprise, so backporting it.
2021-01-28 20:30:08 -05:00
Daniel Nephin 1cccdc45c2 state: Move ACL schema indexes to match Ent
and use constants for table and index names.
2021-01-28 20:05:09 -05:00
Matt Keeler f561462064
Upgrade raft-autopilot and wait for autopilot it to stop when revoking leadership (#9644)
Fixes: 9626
2021-01-27 11:14:52 -05:00
Hans Hasselberg 444cdeb8fb
Add flags to support CA generation for Connect (#9585) 2021-01-27 08:52:15 +01:00
R.B. Boyer c608dc0d60
server: initialize mgw-wanfed to use local gateways more on startup (#9528)
Fixes #9342
2021-01-25 17:30:38 -06:00
Daniel Nephin 48112e6298
Merge pull request #9420 from hashicorp/dnephin/reduce-duplicate-in-catalog-schema
state: reduce interface for Enterprise schema
2021-01-25 17:04:25 -05:00
R.B. Boyer 51e3ca6cbb
server: use the presense of stored federation state data as a sign that we already activated the federation state feature flag (#9519)
This way we only have to wait for the serf barrier to pass once before
we can make use of federation state APIs Without this patch every
restart needs to re-compute the change.
2021-01-25 13:24:32 -06:00
R.B. Boyer 9ef3f20127
server: when wan federating via mesh gateways only do heuristic primary DC bypass on the leader (#9366)
Fixes #9341
2021-01-22 10:03:24 -06:00
Freddy e50019b092
Update topology mapping Refs on all proxy instance deletions (#9589)
* Insert new upstream/downstream mapping to persist new Refs

* Avoid upserting mapping copy if it's a no-op

* Add test with panic repro

* Avoid deleting up/downstreams from inside memdb iterator

* Avoid deleting gateway mappings from inside memdb iterator

* Add CHANGELOG entry

* Tweak changelog entry

Co-authored-by: Paul Banks <banks@banksco.de>
2021-01-20 15:17:26 +00:00
Daniel Nephin 810424a61e state: do not delete from inside an iteration
Deleting from memdb inside an interation can cause a panic from Iterator.Next. This
case is technically safe (for now) because the iterator is using the root radix tree
not a modified one.

However this could break at any time if someone adds an insert or delete to the coordinates table
before this place in the function.

It also sets a bad example, because generally deletes in an interator are not safe. So this
commit uses the pattern we have in other places to move the deletes out of the iteration.
2021-01-19 17:00:07 -05:00
Matt Keeler d9d4c492ab
Ensure that CA initialization does not block leader election.
After fixing that bug I uncovered a couple more:

Fix an issue where we might try to cross sign a cert when we never had a valid root.
Fix a potential issue where reconfiguring the CA could cause either the Vault or AWS PCA CA providers to delete resources that are still required by the new incarnation of the CA.
2021-01-19 15:27:48 -05:00
Daniel Nephin a6000e6ad8 state: add a regression test for state store schema
To allow the index to be refactored without accidental changes.

To update the expected value run: 'go test ./agent/consul/state -update'
2021-01-15 18:49:55 -05:00
Daniel Nephin 24312f8c96 state: reduce interface for Enterprise schema
Using withEnterpriseSchema() we can apply any enterprise schema changes
with a single shim, removing the need to duplicate all of the table
definitions.

Also move all the catalog schemas to a new file to shrink catalog.go a bit.
2021-01-15 18:49:55 -05:00
Daniel Nephin e66af1a559 agent/consuk: Rename RPCRate -> RPCRateLimit
so that the field name is consistent across config structs.
2021-01-14 17:26:00 -05:00
Daniel Nephin 5684223e36 agent/consul: make Client/Server config reloading more obvious
I believe this commit also fixes a bug. Previously RPCMaxConnsPerClient was not being re-read from the RuntimeConfig, so passing it to Server.ReloadConfig was never changing the value.

Also improve the test runtime by not doing a lot of unnecessary work.
2021-01-14 17:21:10 -05:00
Daniel Nephin 8fdc789ded
Merge pull request #9460 from hashicorp/dnephin/fix-data-races
Fix a couple data races in tests
2021-01-14 17:07:01 -05:00
Chris Piraino 0712e03f33
Fix bug in usage metrics when multiple service instances are changed in a single transaction (#9440)
* Fix bug in usage metrics that caused a negative count to occur

There were a couple of instances were usage metrics would do the wrong
thing and result in incorrect counts, causing the count to attempt to
decrement below zero and return an error. The usage metrics did not
account for various places where a single transaction could
delete/update/add multiple service instances at once.

We also remove the error when attempting to decrement below zero, and
instead just make sure we do not accidentally underflow the unsigned
integer. This is a more graceful failure than returning an error and not
allowing a transaction to commit.

* Add changelog
2021-01-12 15:31:47 -06:00
Chris Piraino aabdccdfa0
Log replication warnings when no error suppression is defined (#9320)
* Log replication warnings when no error suppression is defined

* Add changelog file
2021-01-08 14:03:06 -06:00
Daniel Nephin d113f0e690 structs: Fix printing of IDs
These types are used as values (not pointers) in other structs. Using a pointer receiver causes
problems when the value is printed. fmt will not call the String method if it is passed a value
and the String method has a pointer receiver. By using a value receiver the correct string is printed.

Also remove some unused methods.
2021-01-07 18:47:38 -05:00
Daniel Nephin d64425d2e4
Merge pull request #9213 from hashicorp/dnephin/resolve-tokens-take-2
acl: Remove some unused things and document delegate method
2021-01-06 18:51:51 -05:00
R.B. Boyer 19baf4bc25
acl: use the presence of a management policy in the state store as a sign that we already migrated to v2 acls (#9505)
This way we only have to wait for the serf barrier to pass once before
we can upgrade to v2 acls. Without this patch every restart needs to
re-compute the change, and potentially if a stray older node joins after
a migration it might regress back to v1 mode which would be problematic.
2021-01-05 17:04:27 -06:00
Matt Keeler 85e5da53d5
Special case the error returned when we have a Raft leader but are not tracking it in the ServerLookup (#9487)
This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error.

That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.
2021-01-04 14:05:23 -05:00
R.B. Boyer d5d62d9e08
server: deletions of intentions by name using the intention API is now idempotent (#9278)
Restoring a behavior inadvertently changed while fixing #9254
2021-01-04 11:27:00 -06:00
Daniel Nephin 71b82a7e5b Maybe fix another data race in a test 2020-12-22 18:53:54 -05:00
Daniel Nephin bae9125fc1 Fix one race caused by t.Parallel 2020-12-22 18:27:18 -05:00
Daniel Nephin cb3dbc92f9
Merge pull request #9340 from hashicorp/dnephin/skip-slow-tests-with-short
testing: skip slow tests with -short
2020-12-11 13:33:44 -05:00
R.B. Boyer d921690bfe
acl: global tokens created by auth methods now correctly replicate to secondary datacenters (#9351)
Previously the tokens would fail to insert into the secondary's state
store because the AuthMethod field of the ACLToken did not point to a
known auth method from the primary.
2020-12-09 15:22:29 -06:00
Daniel Nephin b9e60c0775 testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
Kyle Havlovitz 88d669c0e0 connect: Fix a case where the active root would get unset even when there wasn't a new one 2020-12-02 11:42:23 -08:00
Kyle Havlovitz c4eff420be
Merge pull request #9009 from hashicorp/update-secondary-ca
connect: Fix an issue with updating CA config in a secondary datacenter
2020-11-30 14:49:28 -08:00
Kyle Havlovitz 781cae5809 Use a buffered channel for CA intermediate renew func 2020-11-30 14:37:24 -08:00
R.B. Boyer d2d1b05a4e
server: fix panic when deleting a non existent intention (#9254)
* server: fix panic when deleting a non existent intention

* add changelog

* Always return an error when deleting non-existent ixn

Co-authored-by: freddygv <gh@freddygv.xyz>
2020-11-24 13:44:20 -05:00
Hans Hasselberg 57701695c3 add missing descriptions for metrics 2020-11-23 22:06:30 +01:00
Kit Patella fcec25de40 add entries for missing fsm operations and mark duplicated metrics prefixes as deprecated 2020-11-23 12:42:51 -08:00
Kyle Havlovitz 13c31ccfce Clean up the logic in persistNewRootAndConfig 2020-11-20 15:54:44 -08:00
Kyle Havlovitz 0bfda4481f Add CA server delegate interface for testing 2020-11-19 20:08:06 -08:00
Kit Patella 5c09dc322e add telemetry and definition help entries for missing catalog and acl metrics 2020-11-19 13:29:44 -08:00
Kit Patella 9e54e897d7 remove stale entries and rename/define acl.resolveToken 2020-11-19 13:06:28 -08:00
Freddy fd5928fa4e
Require operator:write to get Connect CA config (#9240)
A vulnerability was identified in Consul and Consul Enterprise (“Consul”) such that operators with `operator:read` ACL permissions are able to read the Consul Connect CA configuration when explicitly configured with the `/v1/connect/ca/configuration` endpoint, including the private key. This allows the user to effectively privilege escalate by enabling the ability to mint certificates for any Consul Connect services. This would potentially allow them to masquerade (receive/send traffic) as any service in the mesh.

--

This PR increases the permissions required to read the Connect CA's private key when it was configured via the `/connect/ca/configuration` endpoint. They are now `operator:write`.
2020-11-19 10:14:48 -07:00
Kyle Havlovitz 9be7c6401c connect: update some function comments in CA manager 2020-11-17 16:00:19 -08:00
Daniel Nephin 3885835e8c acl: remove a test-only method 2020-11-17 18:16:34 -05:00
Daniel Nephin 0ee86935f0 Remove two unused delegate methods 2020-11-17 18:16:26 -05:00
Matt Keeler 66fd23d67f
Refactor to call non-voting servers read replicas (#9191)
Co-authored-by: Kit Patella <kit@jepsen.io>
2020-11-17 10:53:57 -05:00
Kit Patella d15b6fddd3
Merge pull request #9198 from hashicorp/mkcp/telemetry/add-all-metric-definitions
Add metric definitions for all metrics known at Consul start
2020-11-16 15:54:50 -08:00
Matt Keeler 748d56b8ab
Prevent panic if autopilot health is requested prior to leader establishment finishing. (#9204) 2020-11-16 17:08:17 -05:00
Daniel Nephin b7367467f6
Merge pull request #9114 from hashicorp/dnephin/filtering-in-stream
stream: improve naming of Payload methods
2020-11-16 14:20:07 -05:00
Kit Patella 15af5ead0b trim help strings to save a few bytes 2020-11-16 11:02:11 -08:00
Kit Patella 3966ecb02f merge master 2020-11-16 10:46:53 -08:00
Kit Patella 5da2f1efa8 finish adding static server metrics 2020-11-13 16:26:08 -08:00
Kyle Havlovitz 16e95f1d7b Reorganize some CA manager code for correctness/readability 2020-11-13 14:46:01 -08:00
Kyle Havlovitz 6fba82a4fa connect: Add CAManager for synchronizing CA operations 2020-11-13 14:33:44 -08:00
Kyle Havlovitz af34b26221 connect: Add logic for updating secondary DC intermediate on config set 2020-11-13 14:33:44 -08:00
R.B. Boyer 9eb262252a
server: intentions CRUD requires connect to be enabled (#9194)
Fixes #9123
2020-11-13 16:19:12 -06:00
Kit Patella 06d59c03b9 add the service name in the agent rather than in the definitions themselves 2020-11-13 13:18:04 -08:00
R.B. Boyer c7233ba871
server: remove config entry CAS in legacy intention API bridge code (#9151)
Change so line-item intention edits via the API are handled via the state store instead of via CAS operations.

Fixes #9143
2020-11-13 14:42:21 -06:00
R.B. Boyer c52bc632df
server: skip deleted and deleting namespaces when migrating intentions to config entries (#9186) 2020-11-13 13:56:41 -06:00
Mike Morris 7af643ac37
ci: update to Go 1.15.4 and alpine:3.12 (#9036)
* ci: stop building darwin/386 binaries

Go 1.15 drops support for 32-bit binaries on Darwin https://golang.org/doc/go1.15#darwin

* tls: ConnectionState::NegotiatedProtocolIsMutual is deprecated in Go 1.15, this value is always true

* correct error messages that changed slightly

* Completely regenerate some TLS test data

Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-11-13 13:02:59 -05:00
R.B. Boyer c003871c54
server: break up Intention.Apply monolithic method (#9007)
The Intention.Apply RPC is quite large, so this PR attempts to break it down into smaller functions and dissolves the pre-config-entry approach to the breakdown as it only confused things.
2020-11-13 09:15:39 -06:00
Kit Patella 24a2471029 first pass on agent-configured prometheusDefs and adding defs for every consul metric 2020-11-12 18:12:12 -08:00
R.B. Boyer 61eac21f1a
agent: return the default ACL policy to callers as a header (#9101)
Header is: X-Consul-Default-ACL-Policy=<allow|deny>

This is of particular utility when fetching matching intentions, as the
fallthrough for a request that doesn't match any intentions is to
enforce using the default acl policy.
2020-11-12 10:38:32 -06:00
Matt Keeler 71da0209bf
Add a paramter in state store methods to indicate whether a resource insertion is from a snapshot restoration (#9156)
The Catalog, Config Entry, KV and Session resources potentially re-validate the input as its coming in. We need to prevent snapshot restoration failures due to missing namespaces or namespaces that are being deleted in enterprise.
2020-11-11 11:21:42 -05:00
Matt Keeler a3a653342b
Fix a bunch of linter warnings 2020-11-09 09:22:12 -05:00
Matt Keeler c048e86bb2
Switch to using the external autopilot module 2020-11-09 09:22:11 -05:00
Daniel Nephin fb70c8bac2 stream: document that Payload must be immutable
If they are sent to EventPublisher.Publish.

Also document that PayloadEvents is expected to come from a subscription and that it is
not immutable.
2020-11-06 13:00:33 -05:00
Daniel Nephin 43af0ba7a3 stream: rename FilterByKey 2020-11-05 19:21:16 -05:00
Daniel Nephin 868cfe1eac stream: Add HasReadPermission to Payload
Required now that filter is a method on PayloadEvents instead of Event
2020-11-05 19:17:18 -05:00
Daniel Nephin 36202f7938 stream: move event filtering to PayloadEvents
Removes the weirdness around PayloadEvents.FilterByKey
2020-11-05 17:50:17 -05:00
Daniel Nephin 79b5ca1ce6 stream: Remove unused method 2020-11-05 16:49:59 -05:00
Daniel Nephin a33c50ef0d
Merge pull request #9073 from hashicorp/dnephin/backport-streaming-namespaces
streaming: backport namespace changes
2020-11-05 14:19:10 -05:00
Daniel Nephin c82f6ef2d8
Merge pull request #9061 from hashicorp/dnephin/event-fields
stream: support filtering by namespace
2020-11-05 14:18:35 -05:00
Daniel Nephin b95b14e168 state: test EventPayloadCheckServiceNode.FilterByKey
Also fix a bug in that function when only one of key or namespace were the empty string.
2020-10-30 14:35:57 -04:00
Daniel Nephin 56d6079da3 stream: Add tests for filterByKey with namespace
And fix a bug where a request with a Namespace but no Key would not be properly filtered
2020-10-30 14:35:42 -04:00
Daniel Nephin 2c00045161 stream: Move FilterByKey events to a table
In preparation for adding new tests.
2020-10-30 14:35:28 -04:00
Daniel Nephin 43c5803a25 state: use enterprise meta for creating events 2020-10-30 14:34:04 -04:00
Daniel Nephin 0ad2406d7c stream: include the namespace in the snap cache key
Otherwise the wrong snapshot could be returned when the same key is used in different namespaces
2020-10-30 14:34:04 -04:00
Daniel Nephin c42fe5ae43 subscribe: set the request namespace 2020-10-30 14:34:04 -04:00
R.B. Boyer fa4b0854fb
state: ensure we unblock intentions queries upon the upgrade to config entries (#9062)
1. do a state store query to list intentions as the agent would do over in `agent/proxycfg` backing `agent/xds`
2. upgrade the database and do a fresh `service-intentions` config entry write
3. the blocking query inside of the agent cache in (1) doesn't notice (2)
2020-10-29 15:28:31 -05:00
R.B. Boyer b24b4169e1 restore prior signature of test helper so enterprise compiles 2020-10-29 13:52:15 -05:00
Daniel Nephin a5dd2001cf stream: remove Event.Key
Makes Payload a type with FilterByKey so that Payloads can implement
filtering by key. With this approach we don't need to expose a Namespace
field on Event, and we don't need to invest micro formats or require a
bunch of code to be aware of exactly how the key field is encoded.
2020-10-28 16:48:04 -04:00
Daniel Nephin 1c094da40d state: use go-cmp for comparison
The output of the previous assertions made it impossible to debug the tests without code changes.

With go-cmp comparing the entire slice we can see the full diffs making it easier to debug failures.
2020-10-28 16:33:00 -04:00
Daniel Nephin 3dfb7c224b stream: Use a no-op event publisher if streaming is disabled 2020-10-28 13:54:19 -04:00
Daniel Nephin 23eee604c9 store: use a ReadDB for snapshots
to remove the cyclic dependency between the snapshot handlers and the state.Store
2020-10-28 13:07:42 -04:00
Daniel Nephin 7b9ee25956
Merge pull request #9026 from hashicorp/dnephin/streaming-without-cache-query-param
streaming: rename config and remove requirement for cache=1
2020-10-28 12:33:25 -04:00
Daniel Nephin 477d665309
Merge pull request #8618 from hashicorp/dnephin/remove-txn-readtxn
state: Use ReadTxn everywhere
2020-10-28 12:32:47 -04:00
Daniel Nephin c398a6b272 state: disable streaming connect topic 2020-10-26 11:49:47 -04:00
R.B. Boyer 58387fef0a
server: config entry replication now correctly uses namespaces in comparisons (#9024)
Previously config entries sharing a kind & name but in different
namespaces could occasionally cause "stuck states" in replication
because the namespace fields were ignored during the differential
comparison phase.

Example:

Two config entries written to the primary:

    kind=A,name=web,namespace=bar
    kind=A,name=web,namespace=foo

Under the covers these both get saved to memdb, so they are sorted by
all 3 components (kind,name,namespace) during natural iteration. This
means that before the replication code does it's own incomplete sort,
the underlying data IS sorted by namespace ascending (bar comes before
foo).

After one pass of replication the primary and secondary datacenters have
the same set of config entries present. If
"kind=A,name=web,namespace=bar" were to be deleted, then things get
weird. Before replication the two sides look like:

primary: [
    kind=A,name=web,namespace=foo
]
secondary: [
    kind=A,name=web,namespace=bar
    kind=A,name=web,namespace=foo
]

The differential comparison phase walks these two lists in sorted order
and first compares "kind=A,name=web,namespace=foo" vs
"kind=A,name=web,namespace=bar" and falsely determines they are the SAME
and are thus cause an update of "kind=A,name=web,namespace=foo". Then it
compares "<nothing>" with "kind=A,name=web,namespace=foo" and falsely
determines that the latter should be DELETED.

During reconciliation the deletes are processed before updates, and so
for a brief moment in the secondary "kind=A,name=web,namespace=foo" is
erroneously deleted and then immediately restored.

Unfortunately after this replication phase the final state is identical
to the initial state, so when it loops around again (rate limited) it
repeats the same set of operations indefinitely.
2020-10-23 13:41:54 -05:00
Daniel Nephin 0f1fb24d19 state: convert the remaining functions to ReadTxn
Required also converting some of the transaction functions to WriteTxn
because TxnRO() called the same helper as TxnRW.

This change allows us to return a memdb.Txn for read-only txn instead of
wrapping them with state.txn.
2020-10-23 14:29:22 -04:00
Daniel Nephin 8bd1a2cd16
Merge pull request #8975 from hashicorp/dnephin/stream-close-on-unsub
stream: close the subscription on Unsubscribe
2020-10-23 12:58:12 -04:00
Freddy 9c04cbc40f
Add HasExact to topology endpoint (#9010) 2020-10-23 10:45:41 -06:00
Daniel Nephin fb57d9b26a stream: close the subscription on Unsubscribe 2020-10-22 13:39:27 -04:00
Pierre Souchay 9b7ed75552
Consul Service meta wrongly computes and exposes non_voter meta (#8731)
* Consul Service meta wrongly computes and exposes non_voter meta

In Serf Tags, entreprise members being non-voters use the tag
`nonvoter=1`, not `non_voter = false`, so non-voters in members
were wrongly displayed as voter.

Demonstration:

```
consul members -detailed|grep voter
consul20-hk5 10.200.100.110:8301   alive   acls=1,build=1.8.4+ent,dc=hk5,expect=3,ft_fs=1,ft_ns=1,id=xxxxxxxx-5629-08f2-3a79-10a1ab3849d5,nonvoter=1,port=8300,raft_vsn=3,role=consul,segment=<all>,use_tls=1,vsn=2,vsn_max=3,vsn_min=2,wan_join_port=8302
```

* Added changelog

* Added changelog entry
2020-10-09 17:18:24 -04:00
s-christoff 9bb348c6c7
Enhance the output of consul snapshot inspect (#8787) 2020-10-09 14:57:29 -05:00
Kyle Havlovitz ff12fc9f38 Stop intermediate renew routine on leader stop 2020-10-09 12:30:57 -07:00
Kyle Havlovitz e5ab1b45bc
Merge pull request #8784 from hashicorp/renew-intermediate-primary
connect: Enable renewing the intermediate cert in the primary DC
2020-10-09 12:18:59 -07:00
Daniel Nephin ea77eccb14
Merge pull request #8825 from hashicorp/streaming/add-config
streaming: add config and docs
2020-10-09 14:33:58 -04:00
Chris Piraino 30540e406b
Emit service usage metrics with correct labeling strategy (#8856)
Previously, we would emit service usage metrics both with and without a
namespace label attached. This is problematic in the case when you want
to aggregate metrics together, i.e. "sum(consul.state.services)". This
would cause services to be counted twice in that aggregate, once via the
metric emitted with a namespace label, and once in the metric emited
without any namespace label.
2020-10-09 11:01:45 -05:00
Kyle Havlovitz 876500e0dc Fix intermediate refresh test comments 2020-10-09 08:53:33 -07:00
R.B. Boyer e113dc0fe2
upstream some differences from enterprise (#8902) 2020-10-09 09:42:53 -05:00
Kyle Havlovitz 01ce9f5b18 Update CI for leader renew CA test using Vault 2020-10-09 05:48:15 -07:00
Kyle Havlovitz 4fc0f6d9a4
Merge branch 'master' into renew-intermediate-primary 2020-10-09 04:40:34 -07:00
Kyle Havlovitz e13f4af06b connect: Check for expired root cert when cross-signing 2020-10-09 04:35:56 -07:00
Freddy 13df5d5bf8
Add protocol to the topology endpoint response (#8868) 2020-10-08 17:31:54 -06:00
Matt Keeler 38f5ddce2a
Add per-agent reconnect timeouts (#8781)
This allows for client agent to be run in a more stateless manner where they may be abruptly terminated and not expected to come back. If advertising a per-agent reconnect timeout using the advertise_reconnect_timeout configuration when that agent leaves, other agents will wait only that amount of time for the agent to come back before reaping it.

This has the advantageous side effect of causing servers to deregister the node/services/checks for that agent sooner than if the global reconnect_timeout was used.
2020-10-08 15:02:19 -04:00
Daniel Nephin b93577c94f config: add field for enabling streaming RPC endpoint 2020-10-08 12:11:20 -04:00
Freddy 164ce57db2
Support ingress gateways in mesh viz endpoint (#8864)
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-10-08 09:47:09 -06:00
Daniel Nephin 2513f42c68
Merge pull request #8809 from hashicorp/streaming/materialize-view
Add StreamingHealthServices cache-type
2020-10-07 21:26:38 -04:00
Daniel Nephin b103568e98
Merge pull request #8818 from hashicorp/streaming/add-subscribe-service-batch-events
stream: handle batch events as a special case of Event
2020-10-07 21:25:32 -04:00
Daniel Nephin da6400192b
Merge pull request #8768 from hashicorp/streaming/add-subscribe-service
subscribe: add subscribe service for streaming change events
2020-10-07 21:24:03 -04:00
Freddy da91e999f6
Return intention info in svc topology endpoint (#8853) 2020-10-07 18:35:34 -06:00
R.B. Boyer 1b413b0444
connect: support defining intentions using layer 7 criteria (#8839)
Extend Consul’s intentions model to allow for request-based access control enforcement for HTTP-like protocols in addition to the existing connection-based enforcement for unspecified protocols (e.g. tcp).
2020-10-06 17:09:13 -05:00
R.B. Boyer a2a8e9c783
connect: intentions are now managed as a new config entry kind "service-intentions" (#8834)
- Upgrade the ConfigEntry.ListAll RPC to be kind-aware so that older
copies of consul will not see new config entries it doesn't understand
replicate down.

- Add shim conversion code so that the old API/CLI method of interacting
with intentions will continue to work so long as none of these are
edited via config entry endpoints. Almost all of the read-only APIs will
continue to function indefinitely.

- Add new APIs that operate on individual intentions without IDs so that
the UI doesn't need to implement CAS operations.

- Add a new serf feature flag indicating support for
intentions-as-config-entries.

- The old line-item intentions way of interacting with the state store
will transparently flip between the legacy memdb table and the config
entry representations so that readers will never see a hiccup during
migration where the results are incomplete. It uses a piece of system
metadata to control the flip.

- The primary datacenter will begin migrating intentions into config
entries on startup once all servers in the datacenter are on a version
of Consul with the intentions-as-config-entries feature flag. When it is
complete the old state store representations will be cleared. We also
record a piece of system metadata indicating this has occurred. We use
this metadata to skip ALL of this code the next time the leader starts
up.

- The secondary datacenters continue to run the old intentions
replicator until all servers in the secondary DC and primary DC support
intentions-as-config-entries (via serf flag). Once this condition it met
the old intentions replicator ceases.

- The secondary datacenters replicate the new config entries as they are
migrated in the primary. When they detect that the primary has zeroed
it's old state store table it waits until all config entries up to that
point are replicated and then zeroes its own copy of the old state store
table. We also record a piece of system metadata indicating this has
occurred. We use this metadata to skip ALL of this code the next time
the leader starts up.
2020-10-06 13:24:05 -05:00
Daniel Nephin 5972bdc87c streaming: improve godoc for cache-type
And fix a bug where any error that implemented the temporary interface was considered
a temporary error, even when the method would return false.
2020-10-06 13:52:02 -04:00
Daniel Nephin 3fa08beecf submatview: add a test for handling of NewSnapshotToFollow
Also add some godoc
Rename some vars and functions
Fix a data race in the new cache test for entry closing.
2020-10-06 13:22:02 -04:00
Daniel Nephin b27068b72a stream: Return a single event from a subscription.Next
Handle batch events as a single event
2020-10-06 13:18:20 -04:00
Daniel Nephin e3290f5971 Move agent/subscribe -> agent/rpc/subscribe 2020-10-06 12:49:35 -04:00
Daniel Nephin dbb8bd679f subscirbe: extract streamID and logging from Subscribe
By extracting all of the tracing logic the core logic of the Subscribe
endpoint is much easier to read.
2020-10-06 12:49:35 -04:00
Daniel Nephin 9e4ebacb05 subscribe: add integration test for acl token updates 2020-10-06 12:49:35 -04:00
Daniel Nephin d0256a0c07 subscribe: add a stateless subscribe service for the gRPC server
With a Backend that provides access to the necessary dependencies.
2020-10-06 12:49:35 -04:00
Daniel Nephin 364f6589c8
Merge pull request #8799 from hashicorp/streaming/rename-framing-events
stream: remove EndOfEmptySnapshot, add NewSnapshotToFollow
2020-10-06 12:42:58 -04:00
R.B. Boyer 4998a08c56
server: create new memdb table for storing system metadata (#8703)
This adds a new very tiny memdb table and corresponding raft operation
for updating a very small effective map[string]string collection of
"system metadata". This can persistently record a fact about the Consul
state machine itself.

The first use of this feature will come in a later PR.
2020-10-06 10:08:37 -05:00
Daniel Nephin 5a5fd4f0b1
Merge pull request #8802 from hashicorp/dnephin/extract-lib-retry
lib/retry - extract a new package from lib/retry.go
2020-10-05 14:22:37 -04:00
freddygv 413a894a1a Do not evaluate discovery chain for topology upstreams 2020-10-05 10:24:50 -06:00
freddygv cf7b7fcdd6 Single DB txn for ServiceTopology and other PR comments 2020-10-05 10:24:50 -06:00
freddygv 7c26a71b4b Add topology HTTP endpoint 2020-10-05 10:24:50 -06:00
freddygv dbbf6b2e46 Add topology RPC endpoint 2020-10-05 10:24:50 -06:00
freddygv 98c81976f5 Add topology ACL filter 2020-10-05 10:24:50 -06:00
freddygv f906b94351 Add func to combine up+downstream queries 2020-10-05 10:24:50 -06:00
freddygv 5c913ec312 factor in discovery chain when querying up/downstreams 2020-10-05 10:24:50 -06:00
freddygv b012d8374e support querying upstreams/downstreams from registrations 2020-10-05 10:24:50 -06:00
freddygv a86cf88a4a Add method for downstreams from disco chain 2020-10-05 10:24:50 -06:00
Daniel Nephin e54567223b lib/retry: Refactor to reduce the interface surface
Reduce Jitter to one function

Rename NewRetryWaiter

Fix a bug in calculateWait where maxWait was applied before jitter, which would make it
possible to wait longer than maxWait.
2020-10-04 18:12:42 -04:00
Daniel Nephin ca26dfb4a2 lib/retry: extract a new package from lib 2020-10-04 17:43:01 -04:00
Daniel Nephin 1c6be5ac75 stream: full test coverage for EventPublisher.Subscribe 2020-10-02 13:46:24 -04:00
Daniel Nephin a5df5d17b4 stream: refactor to support change in framing events
Removing EndOfEmptySnapshot, add NewSnapshotToFollow
2020-10-02 13:41:31 -04:00
Daniel Nephin 04b51de783
Merge pull request #8769 from hashicorp/streaming/prep-for-subscribe-service
state: use protobuf Topic and and export payload type
2020-10-02 13:30:06 -04:00
R.B. Boyer 9a55c50694
ensure these tests work fine with namespaces in enterprise (#8794) 2020-10-01 09:54:46 -05:00
R.B. Boyer 237a7a0da0
server: ensure that we also shutdown network segment serf instances on server shutdown (#8786)
This really only matters for unit tests, since typically if an agent shuts down its server, it follows that up by exiting the process, which would also clean up all of the networking anyway.
2020-09-30 16:23:43 -05:00
Kyle Havlovitz 2ec94b027e connect: Enable renewing the intermediate cert in the primary DC 2020-09-30 12:31:21 -07:00
freddygv 9fa1b13df9 Resolve conflicts 2020-09-29 08:59:18 -06:00
Daniel Nephin b7ca15e910 stream: move goroutine out of New
This change will make it easier to manage goroutine lifecycle from the caller.

Also expose EventPublisher from state.Store
2020-09-28 18:40:10 -04:00
Daniel Nephin 0fb2a5b992 state: use pbsubscribe.Topic for topic values 2020-09-28 18:40:10 -04:00
Daniel Nephin 7b1534ef05 state: rename and export EventPayload
The subscribe endpoint needs to be able to inspect the payload to filter
events, and convert them into the protobuf types.

Use the protobuf CatalogOp type for the operation field, for now. In the
future if we end up with multiple interfaces we should be able to remove
the protobuf dependency by changing this to an int32 and adding a test
for the mapping between the values.

Make the value of the payload a concrete type instead of interface{}. We
can create other payloads for other event types.
2020-09-28 18:34:30 -04:00
R.B. Boyer 0064f1936e
server: make sure that the various replication loggers use consistent logging (#8745) 2020-09-24 15:49:38 -05:00
Daniel Nephin bad4d3ff7c grpc: redeuce dependencies, unexport, and add godoc
Rename GRPCClient to ClientConnPool. This type appears to be more of a
conn pool than a client. The clients receive the connections from this
pool.

Reduce some dependencies by adjusting the interface baoundaries.

Remove the need to create a second slice of Servers, just to pick one and throw the rest away.

Unexport serverResolver, it is not used outside the package.

Use a RWMutex for ServerResolverBuilder, some locking is read-only.

Add more godoc.
2020-09-24 12:53:10 -04:00
Daniel Nephin 25f47b46e1 grpc: move client conn pool to grpc package 2020-09-24 12:48:12 -04:00
Daniel Nephin f936ca5aea grpc: client conn pool and resolver
Extracted from 936522a13c

Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-24 12:46:22 -04:00
Daniel Nephin c18516ad7d
Merge pull request #8680 from hashicorp/dnephin/replace-consul-opts-with-base-deps
agent: Repalce ConsulOptions with a new struct from agent.BaseDeps
2020-09-24 12:45:54 -04:00
Paul Banks 7d58901ae8
Fix bad int -> string conversions caught by go vet changes in 1.15 (#8739) 2020-09-24 11:14:07 +01:00
Hans Hasselberg d4877f03e7
fix TestLeader_SecondaryCA_IntermediateRenew (#8702)
* fix lessThanHalfTime
* get lock for CAProvider()
* make a var to relate both vars
* rename to getCAProviderWithLock
* move CertificateTimeDriftBuffer to agent/connect/ca
2020-09-18 10:13:29 +02:00
Mike Morris 6b62751921
test: update tags for database service registrations and queries (#8693) 2020-09-16 14:05:01 -04:00
Kyle Havlovitz b1b21139ca Merge branch 'master' into vault-ca-renew-token 2020-09-15 14:39:04 -07:00
Daniel Nephin cdd392d77f agent/consul: pass dependencies directly from agent
In an upcoming change we will need to pass a grpc.ClientConnPool from
BaseDeps into Server. While looking at that change I noticed all of the
existing consulOption fields are already on BaseDeps.

Instead of duplicating the fields, we can create a struct used by
agent/consul, and use that struct in BaseDeps. This allows us to pass
along dependencies without translating them into different
representations.

I also looked at moving all of BaseDeps in agent/consul, however that
created some circular imports. Resolving those cycles wouldn't be too
bad (it was only an error in agent/consul being imported from
cache-types), however this change seems a little better by starting to
introduce some structure to BaseDeps.

This change is also a small step in reducing the scope of Agent.

Also remove some constants that were only used by tests, and move the
relevant comment to where the live configuration is set.

Removed some validation from NewServer and NewClient, as these are not
really runtime errors. They would be code errors, which will cause a
panic anyway, so no reason to handle them specially here.
2020-09-15 17:29:32 -04:00
Daniel Nephin 3aa9bd4c23 agent/consul: make router required 2020-09-15 17:26:26 -04:00
Kyle Havlovitz 7ffef62ed7 Clean up CA shutdown logic and error 2020-09-15 12:28:58 -07:00
freddygv 7fd518ff1d Merge master 2020-09-14 16:17:43 -06:00
Daniel Nephin 20aea3dbc9
Merge pull request #8587 from hashicorp/streaming/add-grpc-server
streaming: add gRPC server for handling connections
2020-09-14 15:24:54 -04:00
freddygv 7b9d1b41d5 Resolve conflicts against master 2020-09-11 18:41:58 -06:00
Kyle Havlovitz 49056fe70f Clean up Vault renew tests and shutdown 2020-09-11 08:41:05 -07:00
freddygv eab90ea9fa Revert EnvoyConfig nesting 2020-09-11 09:21:43 -06:00
Kyle Havlovitz aa97366020 Add a stop function to make sure the renewer is shut down on leader change 2020-09-10 06:12:48 -07:00
Kyle Havlovitz 411b6537ef Add a test for token renewal 2020-09-09 16:36:37 -07:00
Daniel Nephin 2257247095 server: add gRPC server for streaming events
Includes a stats handler and stream interceptor for grpc metrics.

Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-08 12:10:41 -04:00
Hans Hasselberg 436a7032d1
secondaryIntermediateCertRenewalWatch abort on success (#8588)
secondaryIntermediateCertRenewalWatch was using `retryLoopBackoff` to
renew the intermediate certificate. Once it entered the inner loop and
started `retryLoopBackoff` it would never leave that.
`retryLoopBackoffAbortOnSuccess` will return when renewing is
successful, like it was intended originally.
2020-09-04 11:47:16 +02:00
Daniel Nephin e573e64d58 state: handle terminating gateways in service health events 2020-09-03 16:58:05 -04:00
Daniel Nephin 3775392fb5 state: improve comments in catalog_events.go
Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-03 16:58:05 -04:00
Daniel Nephin 417c5c93a8 state: use changeType in serviceChanges
To be a little more explicit, instead of nil implying an indirect change
2020-09-03 16:58:05 -04:00
Daniel Nephin 01424ba146 don't over allocate slice 2020-09-03 16:58:04 -04:00
Daniel Nephin d210242875 state: fix a bug in building service health events
The nodeCheck slice was being used as the first arg in append, which in some cases will modify the array backing the slice. This would lead to service checks for other services in the wrong event.

Also refactor some things to reduce the arguments to functions.
2020-09-03 16:58:04 -04:00
Daniel Nephin 7581305523 state: Remove unused args and return values
Also rename some functions to identify them as constructors for events
2020-09-03 16:58:04 -04:00
Daniel Nephin 27b02d391c state: use an enum for tracking node changes 2020-09-03 16:58:04 -04:00
Daniel Nephin 09329b542d state: serviceHealthSnapshot
refactored to remove unused return value and remove duplication
2020-09-03 16:58:04 -04:00
Daniel Nephin bf523420ee state: Add Change processor and snapshotter for service health
Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-03 16:58:04 -04:00
Daniel Nephin e03e911144 state: fix bug in changeTrackerDB.publish
Creating a new readTxn does not work because it will not see the newly created objects that are about to be committed. Instead use the active write Txn.
2020-09-03 16:58:01 -04:00
Daniel Nephin 5de4d5bbe3 stream: have SnapshotFunc accept a non-pointer SubscribeRequest
The value is not expected to be modified. Passing a value makes that explicit.
2020-09-03 16:54:02 -04:00
freddygv cd4cf5161f Update resolver defaulting 2020-09-03 13:08:44 -06:00
freddygv eaa250cc80 Ensure resolver node with LB isn't considered default 2020-09-03 08:55:57 -06:00
freddygv f81fe6a1a1 Remove LB infix and move injection to xds 2020-09-02 15:13:50 -06:00
Chris Piraino 28f163c2d2
Merge pull request #8603 from hashicorp/feature/usage-metrics
Track node and service counts in the state store and emit them periodically as metrics
2020-09-02 13:23:39 -05:00
R.B. Boyer d0f74cd1e8
connect: fix bug in preventing some namespaced config entry modifications (#8601)
Whenever an upsert/deletion of a config entry happens, within the open
state store transaction we speculatively test compile all discovery
chains that may be affected by the pending modification to verify that
the write would not create an erroneous scenario (such as splitting
traffic to a subset that did not exist).

If a single discovery chain evaluation references two config entries
with the same kind and name in different namespaces then sometimes the
upsert/deletion would be falsely rejected. It does not appear as though
this bug would've let invalid writes through to the state store so the
correction does not require a cleanup phase.
2020-09-02 10:47:19 -05:00
Chris Piraino bcb586bee2 Set metrics reporting interval to 9 seconds
This is below the 10 second interval that lib/telemetry.go implements as
its aggregation interval, ensuring that we always report these metrics.
2020-09-02 10:24:23 -05:00
Chris Piraino a3028cad89 Update godoc string for memdb wrapper functions/structs 2020-09-02 10:24:22 -05:00
Chris Piraino d301145e62 Refactor state store usage to track unique service names
This commit refactors the state store usage code to track unique service
name changes on transaction commit. This means we only need to lookup
usage entries when reading the information, as opposed to iterating over
a large number of service indices.

- Take into account a service instance's name being changed
- Do not iterate through entire list of service instances, we only care
about whether there is 0, 1, or more than 1.
2020-09-02 10:24:21 -05:00
Chris Piraino 086a8ea8eb Use ReadTxn interface in state store helper functions 2020-09-02 10:24:20 -05:00
Chris Piraino 69dbc926ad Add WriteTxn interface and convert more functions to ReadTxn
We add a WriteTxn interface for use in updating the usage memdb table,
with the forward-looking prospect of incrementally converting other
functions to accept interfaces.

As well, we use the ReadTxn in new usage code, and as a side effect
convert a couple of existing functions to use that interface as well.
2020-09-02 10:24:19 -05:00
Chris Piraino 3feae7f77b Report node/service usage metrics from every server
Using the newly provided state store methods, we periodically emit usage
metrics from the servers.

We decided to emit these metrics from all servers, not just the leader,
because that means we do not have to care about leader election flapping
causing metrics turbulence, and it seems reasonable for each server to
emit its own view of the state, even if they should always converge
rapidly.
2020-09-02 10:24:17 -05:00
Chris Piraino 04705e90f9 Add new usage memdb table that tracks usage counts of various elements
We update the usage table on Commit() by using the TrackedChanges() API
of memdb.

Track memdb changes on restore so that usage data can be compiled
2020-09-02 10:24:16 -05:00
freddygv 63f79e5f9b Restructure structs and other PR comments 2020-09-02 09:10:50 -06:00
Matt Keeler 91d680b830
Merge of auto-config and auto-encrypt code (#8523)
auto-encrypt is now handled as a special case of auto-config.

This also is moving all the cert-monitor code into the auto-config package.
2020-08-31 13:12:17 -04:00