Commit Graph

2947 Commits

Author SHA1 Message Date
Daniel Nephin 6b36ab744c streaming: double the cache TTL
10 minutes is the default blocking query timeout. Using the same value results in us hitting
the expired cache entry bug frequently. By extending this TTL we at least mitigate the problem.

The underlying bug still needs to be fixed.
2021-02-09 14:36:26 -05:00
Daniel Nephin 9423c887c4 submatview: do not reset retry waiter when materializer is reset
The materializer is often reset when an error is received. By resetting
the retryWaiter we effectively never wait. The retryWaiter should only
be reset when we get an event without error. This is done in
Materializer.updateView().
2021-02-09 13:56:50 -05:00
Daniel Nephin 0fa51e5ba9 api: Use blocking query for health when near is set
Streaming can not be used for these queries because the near query
paramter indicates a specific sort of the results, and that sort
requires data that is not available to the client from the streaming
API.
2021-02-09 13:55:33 -05:00
Pierre Souchay 6f91085869 Use lower case for serviceName computation of cache keys 2021-02-09 19:19:40 +01:00
Matt Keeler 8dbe342ec9
Stop background refresh of cached data for requests that result in ACL not found errors (#9738) 2021-02-09 10:15:53 -05:00
Freddy 82c269a7c5
Avoid potential proxycfg/xDS deadlock using non-blocking send 2021-02-08 16:14:06 -07:00
R.B. Boyer 39e4ae25ac
connect: connect CA Roots in the primary datacenter should use a SigningKeyID derived from their local intermediate (#9428)
This fixes an issue where leaf certificates issued in primary
datacenters using Vault as a Connect CA would be reissued very
frequently (every ~20 seconds) because the logic meant to detect root
rotation was errantly triggering.

The hash of the rootCA was being compared against a hash of the
intermediateCA and always failing. This doesn't apply to the Consul
built-in CA provider because there is no intermediate in use in the
primary DC.

This is reminiscent of #6513
2021-02-08 13:18:51 -06:00
Pierre Souchay 2fe3ab7db0 [Streaming] Properly filters node-meta queries on health
This wil fix https://github.com/hashicorp/consul/issues/9730
2021-02-08 17:53:18 +01:00
freddygv ec5f75776b Update comments on avoiding proxycfg deadlock 2021-02-08 09:45:45 -07:00
R.B. Boyer 43193a35c6
xds: prevent LDS flaps in mesh gateways due to unstable datacenter lists (#9651)
Also fix a similar issue in Terminating Gateways that was masked by an overzealous test.
2021-02-08 10:19:57 -06:00
Mohammad Banikazemi bcadd341eb Correcting the changed function name in comment
Signed-off-by: Mohammad Banikazemi <mbanikazemi@gmail.com>
2021-02-06 20:23:40 -05:00
freddygv 6e443e5536 Retry send after timer fires, in case no updates occur 2021-02-05 18:00:59 -07:00
Daniel Nephin 30332ffb43 state: Use the tableIndex constant 2021-02-05 18:37:45 -05:00
Daniel Nephin 3ecbeda234 state: Document index table
And move the IndexEntry (which is stored in the table) next to the table
schema definition.
2021-02-05 18:37:45 -05:00
R.B. Boyer adff0c05a7
xds: deduplicate mesh gateway listeners in a stable way (#9650)
In a situation where the mesh gateway is configured to bind to multiple
network interfaces, we use a feature called 'tagged addresses'.
Sometimes an address is duplicated across multiple tags such as 'lan'
and 'lan_ipv4'.

There is code to deduplicate these things when creating envoy listeners,
but that code doesn't ensure that the same tag wins every time. If the
winning tag flaps between xDS discovery requests it will cause the
listener to be drained and replaced.
2021-02-05 16:28:07 -06:00
freddygv de0cb1af7f Make xDS labeling consistent with proxycfg 2021-02-05 15:15:52 -07:00
freddygv 95e7641faa Update proxycfg logging, labels were already attached 2021-02-05 15:14:49 -07:00
Daniel Nephin a4690ac7d9
Merge pull request #9719 from hashicorp/oss/state-store-4
state: remove registerSchema
2021-02-05 14:02:38 -05:00
Daniel Nephin 1c4e0cfa2a
Merge pull request #9718 from hashicorp/oss/dnephin/ent-meta-in-state-store-3
state: convert all table name constants to the new prefix pattern
2021-02-05 14:02:07 -05:00
Daniel Nephin 0814f22715
Merge pull request #9665 from hashicorp/dnephin/state-store-indexes-2
state: move config-entries table definition to config_entries_schema.go
2021-02-05 14:01:08 -05:00
Daniel Nephin 912dbb4cb4
Merge pull request #9664 from hashicorp/dnephin/state-store-indexes
state: move ACL schema and index definitions to acl_schema.go
2021-02-05 13:38:31 -05:00
Daniel Nephin 05d5ec4804 state: remove the need for registerSchema
registerSchema creates some indirection which is not necessary in this
case. newDBSchema can call each of the tables.

Enterprise tables can be added from the existing withEnterpriseSchema
shim.
2021-02-05 12:19:56 -05:00
Daniel Nephin 2cbf8b5fd0 state: rename table name constants to use pattern
the 'table' prefix is shorter, and also reads better in queries.
2021-02-05 12:12:19 -05:00
Daniel Nephin 8ac9d54ccc state: rename connect constants 2021-02-05 12:12:19 -05:00
Daniel Nephin 0c34e474c5 state: rename table name constants to new pattern
Using Apps Hungarian Notation for these constants makes the memdb queries more readable.
2021-02-05 12:12:18 -05:00
Pierre Souchay 7a024ed074 Streaming filter tags + case insensitive lookups for Service Names
Will fix:
 * https://github.com/hashicorp/consul/issues/9695
 * https://github.com/hashicorp/consul/issues/9702
2021-02-04 11:00:51 +01:00
Daniel Nephin 2d5b5afec1 state: Remove unnecessary entMeta arg to EnsureConfigEntry 2021-02-03 18:10:38 -05:00
freddygv 5ba14ad41d Add trace logs to proxycfg state runner and xds srv 2021-02-02 12:26:38 -07:00
freddygv 37190c0d0d Avoid potential deadlock using non-blocking send
Deadlock scenario:
    1. Due to scheduling, the state runner sends one snapshot into
    snapCh and then attempts to send a second. The first send succeeds
    because the channel is buffered, but the second blocks.
    2. Separately, Manager.Watch is called by the xDS server after
    getting a discovery request from Envoy. This function acquires the
    manager lock and then blocks on receiving the CurrentSnapshot from
    the state runner.
    3. Separately, there is a Manager goroutine that reads the snapshots
    from the channel in step 1. These reads are done to notify proxy
    watchers, but they require holding the manager lock. This goroutine
    goes to acquire that lock, but can't because it is held by step 2.

Now, the goroutine from step 3 is waiting on the one from step 2 to
release the lock. The goroutine from step 2 won't release the lock until
the goroutine in step 1 advances. But the goroutine in step 1 is waiting
for the one in step 3. Deadlock.

By making this send non-blocking step 1 above can proceed. The coalesce
timer will be reset and a new valid snapshot will be delivered after it
elapses or when one is requested by xDS.
2021-02-02 11:31:14 -07:00
hashicorp-ci 6fa9a6a1d9 auto-updated agent/uiserver/bindata_assetfs.go from commit e0ff7080a 2021-02-02 10:08:48 +00:00
hashicorp-ci 90917400c6 auto-updated agent/uiserver/bindata_assetfs.go from commit 0b7d676dc 2021-02-01 17:55:03 +00:00
hashicorp-ci a33ff40816 auto-updated agent/uiserver/bindata_assetfs.go from commit 3aef5cde2 2021-02-01 17:35:20 +00:00
hashicorp-ci 40c2a3d108 auto-updated agent/uiserver/bindata_assetfs.go from commit 3477b1de7 2021-01-29 16:03:41 +00:00
Daniel Nephin 5b4703f0e4 state: rename config-entries table const to match new pattern 2021-01-28 20:34:34 -05:00
Daniel Nephin cd06b5728c state: move config-entries table to new pattern 2021-01-28 20:34:15 -05:00
Daniel Nephin e8931b868c state: use indexID
this change was already made to enterprise, so backporting it.
2021-01-28 20:30:08 -05:00
Daniel Nephin 1cccdc45c2 state: Move ACL schema indexes to match Ent
and use constants for table and index names.
2021-01-28 20:05:09 -05:00
Daniel Nephin 727a402810
Merge pull request #9302 from hashicorp/dnephin/add-service-3
agent: remove ServiceManager.Start goroutine
2021-01-28 16:59:41 -05:00
Daniel Nephin 1dcafa51a4 config: make config.TestLoad_FullConfig use config.Load
This commit makes a number of changes that should make
TestLoad_FullConfig easier to work with, and make the test more like
real world scenarios.

* use separate files in testdata/ dir to store the config source.
  Separate files are much easier to edit because editors can syntax
  highlight json/hcl, and it makes strings easier to find. Previously
  trying to find strings would match strings used in other tests.
* use the exported config.Load interface instead of internal NewBuilder
  and BuildAndValidate.
* remove the tail config overrides, which are only necessary with
  nonZero works.
2021-01-27 17:51:53 -05:00
Daniel Nephin a0dd45941b config: Unexport Builder and NewBuilder
This type and constructor are implementation details of config loading.
All callers should use config.Load.
2021-01-27 17:41:53 -05:00
Daniel Nephin 32d36d0dd4 config: replace calls to config.NewBuilder with config.Load
This is another incremental change to reduce config loading to a single
small interface. All calls to NewBuilder can be replaced with Load.
2021-01-27 17:34:43 -05:00
Daniel Nephin 97a577502d config: improve the interface of Load
This commit reduces the interface to Load() a bit, in preparation for
unexporting NewBuilder and having everything call Load.

The three arguments are reduced to a single argument by moving the other
two into the options struct.

The three return values are reduced to two by moving the RuntimeConfig
and Warnings into a LoadResult struct.
2021-01-27 17:34:43 -05:00
Daniel Nephin 27df0d9a78
Merge pull request #9252 from hashicorp/dnephin/config-unmethod
config: remove Builder receiver from funcs that dont use it
2021-01-27 17:31:17 -05:00
Matt Keeler f561462064
Upgrade raft-autopilot and wait for autopilot it to stop when revoking leadership (#9644)
Fixes: 9626
2021-01-27 11:14:52 -05:00
hashicorp-ci 49b773233b auto-updated agent/uiserver/bindata_assetfs.go from commit 25f989753 2021-01-27 10:47:58 +00:00
Hans Hasselberg 444cdeb8fb
Add flags to support CA generation for Connect (#9585) 2021-01-27 08:52:15 +01:00
hashicorp-ci 174878eb63 auto-updated agent/uiserver/bindata_assetfs.go from commit 92f0eb3bd 2021-01-26 18:00:09 +00:00
hashicorp-ci 5c7f815469 auto-updated agent/uiserver/bindata_assetfs.go from commit 82a62cd2e 2021-01-26 17:47:18 +00:00
Daniel Nephin f36d77ee4a
Merge pull request #9301 from hashicorp/dnephin/add-service-2
agent: reduce AddService 2
2021-01-26 12:01:34 -05:00
R.B. Boyer c608dc0d60
server: initialize mgw-wanfed to use local gateways more on startup (#9528)
Fixes #9342
2021-01-25 17:30:38 -06:00
Daniel Nephin 8c45f2c1fe agent: use the new lib/mutex for stateLock
Previously the ServiceManager had to run a separate goroutine so that it could block on a channel
send/receive instead of a lock. Using this mutex with TryLock allows us to cancel the lock when
the serviceConfigWatch is stopped.

Without this change removing the ServiceManager.Start goroutine would not be possible because
when AddService is called it acquires the stateLock. While that lock is held, if there are
existing watches for the service, the old watch will be stopped, and the goroutine holding the
lock will attempt to wait for that watcher goroutine to exit.

If the goroutine is handling an update (serviceConfigWatch.handleUpdate) then it can block on
acquiring the stateLock and deadlock the agent.  With this change the context is cancelled as
and the goroutine will exit instead of waiting on the stateLock.
2021-01-25 18:01:47 -05:00
Daniel Nephin 4e42b8f1d4 agent: remove ServiceManager goroutine
The ServiceManager.Start goroutine was used to serialize calls to
agent.addServiceInternal.

All the goroutines which sent events to the channel would block waiting
for a response from that same goroutine, which is effectively the same
as a synchronous call without any channels.

This commit removes the goroutine and channels, and instead calls
addServiceInternal directly. Since all of these goroutines will need to
take the agent.stateLock, the mutex handles the serializing of calls.
2021-01-25 18:01:47 -05:00
Daniel Nephin 45b315060a agent: Minor cosmetic changes in ServiceManager
Also use the non-deprecated func in a test
2021-01-25 18:01:47 -05:00
Daniel Nephin 8b67231e8c agent: update godoc for AddServiceRequest 2021-01-25 18:01:03 -05:00
Daniel Nephin f34703fc63 agent: move checkStateSnapshot
Move the field into the struct for addServiceLocked. Also don't require
setting a default value, so that the callers can leave it as nil if they
don't already have a snapshot.
2021-01-25 18:01:03 -05:00
Daniel Nephin 1495291054 agent: move two fields off of AddServiceRequest 2021-01-25 18:01:03 -05:00
Daniel Nephin 60c6a1c220 agent: Replace two fields on AddServiceRequest with a func field
The two previous fields were mutually exclusive. They can be represented
with a single function which provides the value.
2021-01-25 18:01:03 -05:00
Daniel Nephin d0386ae025 agent: remove serviceRegiration type
Replace with the existing AddServiceRequest struct. These structs are
almost identical. Additionally, the only reason the serviceRegistration
struct existed was to recreate an AddServiceRequest.

By storing and re-using the AddServiceRequest we remove the need to
translate into one type and back to the original type.

We also remove the extra parameters to a function, because those values
are already available from the AddServiceRequest field.

Also a minor optimization to only call tokens.AgentToken() when
necessary. Previous it was being called every time, but the value was
being ignored if the AddServiceRequest had a token.
2021-01-25 18:01:03 -05:00
Daniel Nephin 5f9930a9be agent: remove an a branch in the AddService flow
Handle the decision to use ServiceManager in a single place. Instead of
calling ServiceManager.AddService, then calling back into
addServiceInternal, only call ServiceManager.AddService if we are going
to use it.

This change removes some small duplication and removes a branch from the
AddService flow.
2021-01-25 18:01:03 -05:00
Daniel Nephin 4da0718c57 agent: use fields directly, not temp variables
The temprorary variables make it much harder to trace where and how struct
fields are used. If a field is only used a small number of times than
refer to the field directly.
2021-01-25 17:25:04 -05:00
Daniel Nephin 5b6f806f4f agent: addServiceIternalRequest
Move fields that are only relevant for addServiceInternal onto a new
struct and embed AddServiceRequest.
2021-01-25 17:25:04 -05:00
Daniel Nephin 3d39359bcb agent: move deprecated AddServiceFromSource to a test file
The method is only used in tests, and only exists for legacy calls.

There was one other package which used this method in tests. Export
the AddServiceRequest and a couple of its fields so the new function can
be used in those tests.
2021-01-25 17:25:03 -05:00
Daniel Nephin e44fd1cb92 agent: use a single method for Agent.AddService 2021-01-25 17:25:03 -05:00
Daniel Nephin 6757231b82 agent: rename AddService->AddServiceFromSource
In preparation for extracting a single AddService func that accepts a request struct.
2021-01-25 17:25:01 -05:00
Daniel Nephin 48112e6298
Merge pull request #9420 from hashicorp/dnephin/reduce-duplicate-in-catalog-schema
state: reduce interface for Enterprise schema
2021-01-25 17:04:25 -05:00
Chris Boulton 8a35df81c7
connect: add local_request_timeout_ms to configure local_app http timeouts (#9554) 2021-01-25 13:50:00 -06:00
R.B. Boyer 51e3ca6cbb
server: use the presense of stored federation state data as a sign that we already activated the federation state feature flag (#9519)
This way we only have to wait for the serf barrier to pass once before
we can make use of federation state APIs Without this patch every
restart needs to re-compute the change.
2021-01-25 13:24:32 -06:00
hashicorp-ci 501fc855d3 auto-updated agent/uiserver/bindata_assetfs.go from commit 09d917618 2021-01-25 19:09:57 +00:00
hashicorp-ci 4584771bbf auto-updated agent/uiserver/bindata_assetfs.go from commit 88d5e00f9 2021-01-25 18:52:54 +00:00
hashicorp-ci 8d440c532a auto-updated agent/uiserver/bindata_assetfs.go from commit bb9573832 2021-01-25 18:19:10 +00:00
R.B. Boyer 03790a1f91
server: add OSS stubs supporting validation of source namespaces in service-intentions config entries (#9527) 2021-01-25 11:27:38 -06:00
R.B. Boyer 9ef3f20127
server: when wan federating via mesh gateways only do heuristic primary DC bypass on the leader (#9366)
Fixes #9341
2021-01-22 10:03:24 -06:00
hashicorp-ci 27a5434cb0 auto-updated agent/uiserver/bindata_assetfs.go from commit 02772e46a 2021-01-20 18:46:55 +00:00
John Cowen 02772e46a2
Fix -ui-content-path without regex (#9569)
* Add templating to inject JSON into an application/json script tag

Plus an external script in order to pick it out and inject the values we
need injecting into ember's environment meta tag.

The UI still uses env style naming (CONSUL_*) but we uses the new style
JSON/golang props behind the scenes.

Co-authored-by: Paul Banks <banks@banksco.de>
2021-01-20 18:40:46 +00:00
hashicorp-ci 972fa320f8 auto-updated agent/uiserver/bindata_assetfs.go from commit ffb6680e7 2021-01-20 17:08:19 +00:00
John Cowen aedd338e70
api: Ensure the internal/ui/gateway-service-nodes endpoint responds with an array (#9593)
In some circumstances this endpoint will have no results in it (dues to
ACLs, Namespaces, filtering or missing configuration).

This ensures that the response is at least an empty array (`[]`) rather
than `null`
2021-01-20 16:59:02 +00:00
Matt Keeler 4fa884ee3d
Fix flaky test by marking mock expectations as optional (#9596)
These expectations are optional because in a slow CI environment the deadline to cancell the context might occur before the go routine reaches issuing the RPC. Either way we are successfully ensuring context cancellation is working.
2021-01-20 10:58:27 -05:00
hashicorp-ci daa384b64d auto-updated agent/uiserver/bindata_assetfs.go from commit 30014ff8f 2021-01-20 15:43:19 +00:00
Freddy e50019b092
Update topology mapping Refs on all proxy instance deletions (#9589)
* Insert new upstream/downstream mapping to persist new Refs

* Avoid upserting mapping copy if it's a no-op

* Add test with panic repro

* Avoid deleting up/downstreams from inside memdb iterator

* Avoid deleting gateway mappings from inside memdb iterator

* Add CHANGELOG entry

* Tweak changelog entry

Co-authored-by: Paul Banks <banks@banksco.de>
2021-01-20 15:17:26 +00:00
Daniel Nephin 810424a61e state: do not delete from inside an iteration
Deleting from memdb inside an interation can cause a panic from Iterator.Next. This
case is technically safe (for now) because the iterator is using the root radix tree
not a modified one.

However this could break at any time if someone adds an insert or delete to the coordinates table
before this place in the function.

It also sets a bad example, because generally deletes in an interator are not safe. So this
commit uses the pattern we have in other places to move the deletes out of the iteration.
2021-01-19 17:00:07 -05:00
Matt Keeler a4327305d1
Merge pull request #9570 from hashicorp/bugfix/9498 2021-01-19 16:30:04 -05:00
Matt Keeler d9d4c492ab
Ensure that CA initialization does not block leader election.
After fixing that bug I uncovered a couple more:

Fix an issue where we might try to cross sign a cert when we never had a valid root.
Fix a potential issue where reconfiguring the CA could cause either the Vault or AWS PCA CA providers to delete resources that are still required by the new incarnation of the CA.
2021-01-19 15:27:48 -05:00
hashicorp-ci 57256fa158 auto-updated agent/uiserver/bindata_assetfs.go from commit be694366a 2021-01-19 15:47:02 +00:00
hashicorp-ci 06a25d901f auto-updated agent/uiserver/bindata_assetfs.go from commit 41a4a9f4f 2021-01-19 15:29:55 +00:00
Daniel Nephin a6000e6ad8 state: add a regression test for state store schema
To allow the index to be refactored without accidental changes.

To update the expected value run: 'go test ./agent/consul/state -update'
2021-01-15 18:49:55 -05:00
Daniel Nephin 24312f8c96 state: reduce interface for Enterprise schema
Using withEnterpriseSchema() we can apply any enterprise schema changes
with a single shim, removing the need to duplicate all of the table
definitions.

Also move all the catalog schemas to a new file to shrink catalog.go a bit.
2021-01-15 18:49:55 -05:00
Daniel Nephin cf4fa18af4
Merge pull request #8696 from hashicorp/dnephin/fix-load-limits
agent/consul: make Client/Server config reloading more obvious
2021-01-14 17:40:42 -05:00
Daniel Nephin 964ab23280
Merge pull request #9436 from hashicorp/dnephin/fix-service-health-req-cache-key
structs: fix caching of ServiceSpecificRequest when ingress=true
2021-01-14 17:26:25 -05:00
Daniel Nephin e66af1a559 agent/consuk: Rename RPCRate -> RPCRateLimit
so that the field name is consistent across config structs.
2021-01-14 17:26:00 -05:00
Daniel Nephin 5684223e36 agent/consul: make Client/Server config reloading more obvious
I believe this commit also fixes a bug. Previously RPCMaxConnsPerClient was not being re-read from the RuntimeConfig, so passing it to Server.ReloadConfig was never changing the value.

Also improve the test runtime by not doing a lot of unnecessary work.
2021-01-14 17:21:10 -05:00
Daniel Nephin 8fdc789ded
Merge pull request #9460 from hashicorp/dnephin/fix-data-races
Fix a couple data races in tests
2021-01-14 17:07:01 -05:00
Daniel Nephin 29ce5ec575 structs: fix caching of ServiceSpecificRequest when ingress=true
The field was not being included in the cache info key. This would result in a DNS request for
web.service.consul returning the same result as web.ingress.consul, when those results should
not be the same.
2021-01-14 17:01:40 -05:00
hashicorp-ci 408fee901a auto-updated agent/uiserver/bindata_assetfs.go from commit 1e30503ec 2021-01-13 09:47:00 +00:00
kevinkengne 2e7e78999d
add completeness test for types with CacheInfo method (#9480)
include all fields when fuzzing in tests
split tests by struct type

Ensure the new value for the field is different

fuzzer.Fuzz could produce the same value again in some cases.

Use a custom fuzz function for QueryOptions. That type is an embedded struct in the request types
but only one of the fields is important to include in the cache key.

Move enterpriseMetaField to an oss file so that we can change it in enterprise.
2021-01-12 19:45:46 -05:00
Chris Piraino 0712e03f33
Fix bug in usage metrics when multiple service instances are changed in a single transaction (#9440)
* Fix bug in usage metrics that caused a negative count to occur

There were a couple of instances were usage metrics would do the wrong
thing and result in incorrect counts, causing the count to attempt to
decrement below zero and return an error. The usage metrics did not
account for various places where a single transaction could
delete/update/add multiple service instances at once.

We also remove the error when attempting to decrement below zero, and
instead just make sure we do not accidentally underflow the unsigned
integer. This is a more graceful failure than returning an error and not
allowing a transaction to commit.

* Add changelog
2021-01-12 15:31:47 -06:00
hashicorp-ci 7ff4763c8b auto-updated agent/uiserver/bindata_assetfs.go from commit b86eea4fb 2021-01-12 14:57:52 +00:00
Daniel Nephin e35d5c93c7 config: remove Builder receiver from funcs that dont use it
This change allows us to re-use these functions in other places without the Builder, and makes it
more explicit about which functions can warn/error and which can not.
2021-01-11 17:41:58 -05:00
Daniel Nephin c500182105 config: Use golden for TestRuntimeConfig_Sanitize
A golden file makes the expected value easier to work with. This change also
removes a number of shims for enterprise and replaces them with a single one
for the golden filename.
2021-01-11 14:34:03 -05:00
Pierre Souchay 7c1b69cbeb
Display a warning when rpc.enable_streaming = true is set on a client (#9530)
* Display a warning when rpc.enable_streaming = true is set on a client

This option has no effect when running as an agent

* Added warning when server starts with use_streaming_backend but without rpc.enable_streaming

* Added unit test
2021-01-08 15:23:23 -05:00
Chris Piraino aabdccdfa0
Log replication warnings when no error suppression is defined (#9320)
* Log replication warnings when no error suppression is defined

* Add changelog file
2021-01-08 14:03:06 -06:00