1075 Commits

Author SHA1 Message Date
Daniel Nephin
5a8fc428bd Merge pull request #9772 from hashicorp/streamin-fix-bad-cached-snapshot
streaming: fix snapshot cache bug
2021-02-16 20:28:33 +00:00
Chris Piraino
db8cc8624b Log replication warnings when no error suppression is defined (#9320)
* Log replication warnings when no error suppression is defined

* Add changelog file
2021-02-10 23:32:04 +00:00
R.B. Boyer
1b01d6f9f8
connect: connect CA Roots in the primary datacenter should use a SigningKeyID derived from their local intermediate (#9428) (#9733)
1.9.x backport of #9428
2021-02-09 16:55:11 -06:00
R.B. Boyer
fa9b61ba15 server: use the presense of stored federation state data as a sign that we already activated the federation state feature flag (#9519)
This way we only have to wait for the serf barrier to pass once before
we can make use of federation state APIs Without this patch every
restart needs to re-compute the change.
2021-01-28 16:35:19 +00:00
Matt Keeler
ab1e689c4a Upgrade raft-autopilot and wait for autopilot it to stop when revoking leadership (#9644)
Fixes: 9626
2021-01-27 16:15:37 +00:00
Hans Hasselberg
a625d8f11b Add flags to support CA generation for Connect (#9585) 2021-01-27 07:55:24 +00:00
R.B. Boyer
f25a21960e server: initialize mgw-wanfed to use local gateways more on startup (#9528)
Fixes #9342
2021-01-25 23:31:21 +00:00
R.B. Boyer
5fe99cc2bd server: when wan federating via mesh gateways only do heuristic primary DC bypass on the leader (#9366)
Fixes #9341
2021-01-22 16:07:06 +00:00
Freddy
f2cfbde1b0 Update topology mapping Refs on all proxy instance deletions (#9589)
* Insert new upstream/downstream mapping to persist new Refs

* Avoid upserting mapping copy if it's a no-op

* Add test with panic repro

* Avoid deleting up/downstreams from inside memdb iterator

* Avoid deleting gateway mappings from inside memdb iterator

* Add CHANGELOG entry

* Tweak changelog entry

Co-authored-by: Paul Banks <banks@banksco.de>
2021-01-20 15:18:09 +00:00
Daniel Nephin
7f14ce7a7e Merge pull request #9591 from hashicorp/dnephin/state-store-coordinates-iter
state: do not delete from inside an iteration (coordinates)
2021-01-19 23:17:29 +00:00
Matt Keeler
5f3a185cb0 Merge pull request #9570 from hashicorp/bugfix/9498 2021-01-19 21:30:47 +00:00
Chris Piraino
db3400c22d Fix bug in usage metrics when multiple service instances are changed in a single transaction (#9440)
* Fix bug in usage metrics that caused a negative count to occur

There were a couple of instances were usage metrics would do the wrong
thing and result in incorrect counts, causing the count to attempt to
decrement below zero and return an error. The usage metrics did not
account for various places where a single transaction could
delete/update/add multiple service instances at once.

We also remove the error when attempting to decrement below zero, and
instead just make sure we do not accidentally underflow the unsigned
integer. This is a more graceful failure than returning an error and not
allowing a transaction to commit.

* Add changelog
2021-01-12 21:32:29 +00:00
Daniel Nephin
223b85f89e Merge pull request #7583 from hashicorp/dnephin/id-printing
Fix printing of ID types
2021-01-08 00:02:59 +00:00
Daniel Nephin
7292fe7db0 Merge pull request #9213 from hashicorp/dnephin/resolve-tokens-take-2
acl: Remove some unused things and document delegate method
2021-01-06 23:52:17 +00:00
Matt Keeler
3faee062a5 Special case the error returned when we have a Raft leader but are not tracking it in the ServerLookup (#9487)
This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error.

That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.
2021-01-04 19:05:53 +00:00
R.B. Boyer
85205a63e8 server: deletions of intentions by name using the intention API is now idempotent (#9278)
Restoring a behavior inadvertently changed while fixing #9254
2021-01-04 17:27:50 +00:00
R.B. Boyer
aa03e9979e acl: global tokens created by auth methods now correctly replicate to secondary datacenters (#9351)
Previously the tokens would fail to insert into the secondary's state
store because the AuthMethod field of the ACLToken did not point to a
known auth method from the primary.
2020-12-09 21:27:24 +00:00
Kyle Havlovitz
38bbf32a9c Merge pull request #9318 from hashicorp/ca-update-followup
connect: Fix issue with updating config in secondary
2020-12-02 20:18:26 +00:00
Kyle Havlovitz
ff93919034 Merge pull request #9009 from hashicorp/update-secondary-ca
connect: Fix an issue with updating CA config in a secondary datacenter
2020-11-30 22:50:26 +00:00
R.B. Boyer
7467ffbff3 server: fix panic when deleting a non existent intention (#9254)
* server: fix panic when deleting a non existent intention

* add changelog

* Always return an error when deleting non-existent ixn

Co-authored-by: freddygv <gh@freddygv.xyz>
2020-11-24 18:44:58 +00:00
Kit Patella
727780140e Merge pull request #9261 from hashicorp/telemetry/fix-missing-and-stale-docs-2
Telemetry/fix missing and stale docs
2020-11-23 21:34:59 +00:00
Kit Patella
fe6ef7e414 Merge pull request #9245 from hashicorp/telemetry/fix-missing-and-stale-docs
Telemetry/fix missing and stale docs
2020-11-20 20:55:51 +00:00
Freddy
5137e4501d Require operator:write to get Connect CA config (#9240)
A vulnerability was identified in Consul and Consul Enterprise (“Consul”) such that operators with `operator:read` ACL permissions are able to read the Consul Connect CA configuration when explicitly configured with the `/v1/connect/ca/configuration` endpoint, including the private key. This allows the user to effectively privilege escalate by enabling the ability to mint certificates for any Consul Connect services. This would potentially allow them to masquerade (receive/send traffic) as any service in the mesh.

--

This PR increases the permissions required to read the Connect CA's private key when it was configured via the `/connect/ca/configuration` endpoint. They are now `operator:write`.
2020-11-19 17:15:17 +00:00
Matt Keeler
dfaaa0b73a Refactor to call non-voting servers read replicas (#9191)
Co-authored-by: Kit Patella <kit@jepsen.io>
2020-11-17 15:54:38 +00:00
Kit Patella
82e7363b90 Merge pull request #9198 from hashicorp/mkcp/telemetry/add-all-metric-definitions
Add metric definitions for all metrics known at Consul start
2020-11-17 00:13:51 +00:00
Matt Keeler
e421da3b59 Prevent panic if autopilot health is requested prior to leader establishment finishing. (#9204) 2020-11-16 22:08:44 +00:00
Daniel Nephin
9b904de406 Merge pull request #9114 from hashicorp/dnephin/filtering-in-stream
stream: improve naming of Payload methods
2020-11-16 19:21:20 +00:00
R.B. Boyer
2747b5145a server: intentions CRUD requires connect to be enabled (#9194)
Fixes #9123
2020-11-13 22:19:47 +00:00
R.B. Boyer
fee0c44ab2 server: remove config entry CAS in legacy intention API bridge code (#9151)
Change so line-item intention edits via the API are handled via the state store instead of via CAS operations.

Fixes #9143
2020-11-13 20:42:57 +00:00
R.B. Boyer
a955705e5e server: skip deleted and deleting namespaces when migrating intentions to config entries (#9186) 2020-11-13 19:57:12 +00:00
Mike Morris
0ba0391bdd ci: update to Go 1.15.4 and alpine:3.12 (#9036)
* ci: stop building darwin/386 binaries

Go 1.15 drops support for 32-bit binaries on Darwin https://golang.org/doc/go1.15#darwin

* tls: ConnectionState::NegotiatedProtocolIsMutual is deprecated in Go 1.15, this value is always true

* correct error messages that changed slightly

* Completely regenerate some TLS test data

Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2020-11-13 18:03:37 +00:00
R.B. Boyer
d69640a6e9 server: break up Intention.Apply monolithic method (#9007)
The Intention.Apply RPC is quite large, so this PR attempts to break it down into smaller functions and dissolves the pre-config-entry approach to the breakdown as it only confused things.
2020-11-13 15:16:34 +00:00
R.B. Boyer
f815014432 agent: return the default ACL policy to callers as a header (#9101)
Header is: X-Consul-Default-ACL-Policy=<allow|deny>

This is of particular utility when fetching matching intentions, as the
fallthrough for a request that doesn't match any intentions is to
enforce using the default acl policy.
2020-11-12 16:39:16 +00:00
Matt Keeler
e669899abf Add a paramter in state store methods to indicate whether a resource insertion is from a snapshot restoration (#9156)
The Catalog, Config Entry, KV and Session resources potentially re-validate the input as its coming in. We need to prevent snapshot restoration failures due to missing namespaces or namespaces that are being deleted in enterprise.
2020-11-11 16:22:11 +00:00
Matt Keeler
8539565046 Merge pull request #9103 from hashicorp/feature/autopilot-mod
Switch to using the external autopilot module
2020-11-09 16:30:48 +00:00
Daniel Nephin
82a753285b Merge pull request #9073 from hashicorp/dnephin/backport-streaming-namespaces
streaming: backport namespace changes
2020-11-05 19:19:49 +00:00
Daniel Nephin
f0beecad24 Merge pull request #9061 from hashicorp/dnephin/event-fields
stream: support filtering by namespace
2020-11-05 19:19:02 +00:00
Daniel Nephin
b577dc19bf Merge pull request #9068 from hashicorp/restore-test-signature
restore prior signature of test helper so enterprise compiles
2020-11-02 23:16:21 +00:00
R.B. Boyer
211051d92b state: ensure we unblock intentions queries upon the upgrade to config entries (#9062)
1. do a state store query to list intentions as the agent would do over in `agent/proxycfg` backing `agent/xds`
2. upgrade the database and do a fresh `service-intentions` config entry write
3. the blocking query inside of the agent cache in (1) doesn't notice (2)
2020-10-29 20:29:07 +00:00
Daniel Nephin
3fca80a52e Merge pull request #9025 from hashicorp/dnephin/streaming-options
streaming: Use a no-op event publisher if streaming is disabled
2020-10-29 19:31:08 +00:00
Daniel Nephin
7b9ee25956
Merge pull request #9026 from hashicorp/dnephin/streaming-without-cache-query-param
streaming: rename config and remove requirement for cache=1
2020-10-28 12:33:25 -04:00
Daniel Nephin
477d665309
Merge pull request #8618 from hashicorp/dnephin/remove-txn-readtxn
state: Use ReadTxn everywhere
2020-10-28 12:32:47 -04:00
Daniel Nephin
c398a6b272 state: disable streaming connect topic 2020-10-26 11:49:47 -04:00
R.B. Boyer
58387fef0a
server: config entry replication now correctly uses namespaces in comparisons (#9024)
Previously config entries sharing a kind & name but in different
namespaces could occasionally cause "stuck states" in replication
because the namespace fields were ignored during the differential
comparison phase.

Example:

Two config entries written to the primary:

    kind=A,name=web,namespace=bar
    kind=A,name=web,namespace=foo

Under the covers these both get saved to memdb, so they are sorted by
all 3 components (kind,name,namespace) during natural iteration. This
means that before the replication code does it's own incomplete sort,
the underlying data IS sorted by namespace ascending (bar comes before
foo).

After one pass of replication the primary and secondary datacenters have
the same set of config entries present. If
"kind=A,name=web,namespace=bar" were to be deleted, then things get
weird. Before replication the two sides look like:

primary: [
    kind=A,name=web,namespace=foo
]
secondary: [
    kind=A,name=web,namespace=bar
    kind=A,name=web,namespace=foo
]

The differential comparison phase walks these two lists in sorted order
and first compares "kind=A,name=web,namespace=foo" vs
"kind=A,name=web,namespace=bar" and falsely determines they are the SAME
and are thus cause an update of "kind=A,name=web,namespace=foo". Then it
compares "<nothing>" with "kind=A,name=web,namespace=foo" and falsely
determines that the latter should be DELETED.

During reconciliation the deletes are processed before updates, and so
for a brief moment in the secondary "kind=A,name=web,namespace=foo" is
erroneously deleted and then immediately restored.

Unfortunately after this replication phase the final state is identical
to the initial state, so when it loops around again (rate limited) it
repeats the same set of operations indefinitely.
2020-10-23 13:41:54 -05:00
Daniel Nephin
0f1fb24d19 state: convert the remaining functions to ReadTxn
Required also converting some of the transaction functions to WriteTxn
because TxnRO() called the same helper as TxnRW.

This change allows us to return a memdb.Txn for read-only txn instead of
wrapping them with state.txn.
2020-10-23 14:29:22 -04:00
Daniel Nephin
8bd1a2cd16
Merge pull request #8975 from hashicorp/dnephin/stream-close-on-unsub
stream: close the subscription on Unsubscribe
2020-10-23 12:58:12 -04:00
Freddy
9c04cbc40f
Add HasExact to topology endpoint (#9010) 2020-10-23 10:45:41 -06:00
Daniel Nephin
fb57d9b26a stream: close the subscription on Unsubscribe 2020-10-22 13:39:27 -04:00
Pierre Souchay
9b7ed75552
Consul Service meta wrongly computes and exposes non_voter meta (#8731)
* Consul Service meta wrongly computes and exposes non_voter meta

In Serf Tags, entreprise members being non-voters use the tag
`nonvoter=1`, not `non_voter = false`, so non-voters in members
were wrongly displayed as voter.

Demonstration:

```
consul members -detailed|grep voter
consul20-hk5 10.200.100.110:8301   alive   acls=1,build=1.8.4+ent,dc=hk5,expect=3,ft_fs=1,ft_ns=1,id=xxxxxxxx-5629-08f2-3a79-10a1ab3849d5,nonvoter=1,port=8300,raft_vsn=3,role=consul,segment=<all>,use_tls=1,vsn=2,vsn_max=3,vsn_min=2,wan_join_port=8302
```

* Added changelog

* Added changelog entry
2020-10-09 17:18:24 -04:00
s-christoff
9bb348c6c7
Enhance the output of consul snapshot inspect (#8787) 2020-10-09 14:57:29 -05:00