Commit Graph

2629 Commits

Author SHA1 Message Date
Daniel Nephin e54567223b lib/retry: Refactor to reduce the interface surface
Reduce Jitter to one function

Rename NewRetryWaiter

Fix a bug in calculateWait where maxWait was applied before jitter, which would make it
possible to wait longer than maxWait.
2020-10-04 18:12:42 -04:00
Daniel Nephin ca26dfb4a2 lib/retry: extract a new package from lib 2020-10-04 17:43:01 -04:00
Kit Patella f5c51ae13b remove consul.api.http from filtered metric prefixes 2020-10-02 14:16:02 -07:00
Kit Patella 52451cf846
Merge pull request #8271 from coignetp/http-metrics-label
Use method and path as labels for http metrics
2020-10-02 13:41:48 -07:00
hashicorp-ci 90b5f1a838 auto-updated agent/uiserver/bindata_assetfs.go from commit 8b409529a 2020-10-02 19:28:38 +00:00
Daniel Nephin 1c6be5ac75 stream: full test coverage for EventPublisher.Subscribe 2020-10-02 13:46:24 -04:00
Daniel Nephin a5df5d17b4 stream: refactor to support change in framing events
Removing EndOfEmptySnapshot, add NewSnapshotToFollow
2020-10-02 13:41:31 -04:00
Daniel Nephin 04b51de783
Merge pull request #8769 from hashicorp/streaming/prep-for-subscribe-service
state: use protobuf Topic and and export payload type
2020-10-02 13:30:06 -04:00
Paul Banks d0c160130b
Merge pull request #8694 from hashicorp/ui-config-metrics
Add config changes for UI metrics
2020-10-01 17:38:03 +01:00
Paul Banks 3b8125a24d
Update all the references in CI and makefile to the bindata file location 2020-10-01 16:19:10 +01:00
R.B. Boyer 9a55c50694
ensure these tests work fine with namespaces in enterprise (#8794) 2020-10-01 09:54:46 -05:00
R.B. Boyer 9801ef8eb1
agent: enable enable_central_service_config by default (#8746) 2020-10-01 09:19:14 -05:00
Paul Banks 3ff5901be8
Fix ui dir where there is no index tests and lint issue. 2020-10-01 12:26:19 +01:00
Paul Banks e4db845246
Refactor uiserver to separate package, cleaner Reloading 2020-10-01 11:32:25 +01:00
R.B. Boyer 237a7a0da0
server: ensure that we also shutdown network segment serf instances on server shutdown (#8786)
This really only matters for unit tests, since typically if an agent shuts down its server, it follows that up by exiting the process, which would also clean up all of the networking anyway.
2020-09-30 16:23:43 -05:00
Kyle Havlovitz 2ec94b027e connect: Enable renewing the intermediate cert in the primary DC 2020-09-30 12:31:21 -07:00
Paul Banks f6d55e1d25
Fix reload test; address other PR feedback 2020-09-30 18:00:07 +01:00
Paul Banks 54a33efa4b
Fix JSON encoding of metrics options which broke the index but didn't break tests.
Also add tests that do catch that error.
2020-09-30 17:59:19 +01:00
Paul Banks 526bab6164
Add config changes for UI metrics 2020-09-30 17:59:16 +01:00
hashicorp-ci b6406df147 auto-updated agent/bindata_assetfs.go from commit 1a6f3d524 2020-09-30 15:28:06 +00:00
hashicorp-ci 1a6f3d5246 auto-updated agent/bindata_assetfs.go from commit 8e174cae6 2020-09-30 15:23:27 +00:00
hashicorp-ci 8e174cae66 auto-updated agent/bindata_assetfs.go from commit 823d6dadb 2020-09-30 15:17:41 +00:00
Aliaksandr Mianzhynski 74cfba7065 Fix GRPCUseTLS flag HTTP API mapping 2020-09-29 18:29:56 +03:00
freddygv 9fa1b13df9 Resolve conflicts 2020-09-29 08:59:18 -06:00
Daniel Nephin b7ca15e910 stream: move goroutine out of New
This change will make it easier to manage goroutine lifecycle from the caller.

Also expose EventPublisher from state.Store
2020-09-28 18:40:10 -04:00
Daniel Nephin 0fb2a5b992 state: use pbsubscribe.Topic for topic values 2020-09-28 18:40:10 -04:00
Daniel Nephin 7b1534ef05 state: rename and export EventPayload
The subscribe endpoint needs to be able to inspect the payload to filter
events, and convert them into the protobuf types.

Use the protobuf CatalogOp type for the operation field, for now. In the
future if we end up with multiple interfaces we should be able to remove
the protobuf dependency by changing this to an int32 and adding a test
for the mapping between the values.

Make the value of the payload a concrete type instead of interface{}. We
can create other payloads for other event types.
2020-09-28 18:34:30 -04:00
Daniel Nephin 6200325e3b
Merge pull request #8726 from amenzhinsky/grpc-hc-error
Return grpc serving status in health check errors
2020-09-25 13:24:32 -04:00
Hans Hasselberg 98d7ea82bd
fix ent error (#8750) 2020-09-25 10:31:42 -05:00
R.B. Boyer 7eef25daf5
agent: when enable_central_service_config is enabled ensure agent reload doesn't revert check state to critical (#8747)
Likely introduced when #7345 landed.
2020-09-24 16:24:04 -05:00
R.B. Boyer 0064f1936e
server: make sure that the various replication loggers use consistent logging (#8745) 2020-09-24 15:49:38 -05:00
R.B. Boyer 0fb088aac3
agent: make the json/hcl decoding of ConnectProxyConfig fully work with CamelCase and snake_case (#8741)
Fixes #7418
2020-09-24 13:58:52 -05:00
Daniel Nephin f14145e6d9 agent/grpc: always close the conn when dialing fails. 2020-09-24 12:53:14 -04:00
Daniel Nephin e6ffd987a3 agent/grpc: seed the rand for shuffling servers 2020-09-24 12:53:14 -04:00
Daniel Nephin 2294793357 agent/grpc: use router.Manager to handle the rebalance
The router.Manager is already rebalancing servers for other connection pools, so it can call into our resolver to do the same.
This change allows us to remove the serf dependency from resolverBuilder, and remove Datacenter from the config.

Also revert the change to refreshServerRebalanceTimer
2020-09-24 12:53:14 -04:00
Daniel Nephin 2273673500 grpc: restore integration tests for grpc client conn pool
Add a fake rpc Listener
2020-09-24 12:53:14 -04:00
Daniel Nephin 07b4507f1e router: remove grpcServerTracker from managers
It only needs to be refereced from the Router, because there is only 1 instance, and the
Router can call AddServer/RemoveServer like it does on the Manager.
2020-09-24 12:53:14 -04:00
Daniel Nephin bad4d3ff7c grpc: redeuce dependencies, unexport, and add godoc
Rename GRPCClient to ClientConnPool. This type appears to be more of a
conn pool than a client. The clients receive the connections from this
pool.

Reduce some dependencies by adjusting the interface baoundaries.

Remove the need to create a second slice of Servers, just to pick one and throw the rest away.

Unexport serverResolver, it is not used outside the package.

Use a RWMutex for ServerResolverBuilder, some locking is read-only.

Add more godoc.
2020-09-24 12:53:10 -04:00
Daniel Nephin 25f47b46e1 grpc: move client conn pool to grpc package 2020-09-24 12:48:12 -04:00
Daniel Nephin f936ca5aea grpc: client conn pool and resolver
Extracted from 936522a13c

Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-24 12:46:22 -04:00
Daniel Nephin c18516ad7d
Merge pull request #8680 from hashicorp/dnephin/replace-consul-opts-with-base-deps
agent: Repalce ConsulOptions with a new struct from agent.BaseDeps
2020-09-24 12:45:54 -04:00
Paul Banks 7d58901ae8
Fix bad int -> string conversions caught by go vet changes in 1.15 (#8739) 2020-09-24 11:14:07 +01:00
Alexander Mykolaichuk af753ee6a5
added permission denied error message (#8044) 2020-09-22 20:36:07 +02:00
Hans Hasselberg a89ee1a7ca
use service datacenter for dns name (#8704)
* Use args.Datacenter instead of configured datacenter
2020-09-22 20:34:09 +02:00
Aliaksandr Mianzhynski 2c6fd6b796 Return grpc serving status in health check errors 2020-09-22 21:16:58 +03:00
Daniel Nephin 282fbdfa75 api: rename HTTPServer to HTTPHandlers
Resolves a TODO about naming. This type is a set of handlers for an http.Server, it is not
itself a Server. It provides http.Handler functions.
2020-09-18 17:38:23 -04:00
Hans Hasselberg d4877f03e7
fix TestLeader_SecondaryCA_IntermediateRenew (#8702)
* fix lessThanHalfTime
* get lock for CAProvider()
* make a var to relate both vars
* rename to getCAProviderWithLock
* move CertificateTimeDriftBuffer to agent/connect/ca
2020-09-18 10:13:29 +02:00
Daniel Nephin ed6a0ebe4d
Merge pull request #8620 from hashicorp/dnephin/better-impl-of-TestAgent.HTTPAddr
http: fix tests incorrectly using HTTPAddr to get the address of the https server
2020-09-17 11:48:57 -04:00
Mike Morris 6b62751921
test: update tags for database service registrations and queries (#8693) 2020-09-16 14:05:01 -04:00
Kyle Havlovitz 1d22a0bc51
Merge pull request #8560 from hashicorp/vault-ca-renew-token
Automatically renew the token used by the Vault CA provider
2020-09-16 07:30:30 -07:00
Daniel Nephin 3995cc3408
Merge pull request #8685 from pierresouchay/do_not_flood_logs_with_Non-server_in_server-only_area
[BUGFIX] Avoid GetDatacenter* methods to flood Consul servers logs
2020-09-15 17:57:05 -04:00
Kyle Havlovitz b1b21139ca Merge branch 'master' into vault-ca-renew-token 2020-09-15 14:39:04 -07:00
Daniel Nephin cdd392d77f agent/consul: pass dependencies directly from agent
In an upcoming change we will need to pass a grpc.ClientConnPool from
BaseDeps into Server. While looking at that change I noticed all of the
existing consulOption fields are already on BaseDeps.

Instead of duplicating the fields, we can create a struct used by
agent/consul, and use that struct in BaseDeps. This allows us to pass
along dependencies without translating them into different
representations.

I also looked at moving all of BaseDeps in agent/consul, however that
created some circular imports. Resolving those cycles wouldn't be too
bad (it was only an error in agent/consul being imported from
cache-types), however this change seems a little better by starting to
introduce some structure to BaseDeps.

This change is also a small step in reducing the scope of Agent.

Also remove some constants that were only used by tests, and move the
relevant comment to where the live configuration is set.

Removed some validation from NewServer and NewClient, as these are not
really runtime errors. They would be code errors, which will cause a
panic anyway, so no reason to handle them specially here.
2020-09-15 17:29:32 -04:00
Daniel Nephin 3aa9bd4c23 agent/consul: make router required 2020-09-15 17:26:26 -04:00
Daniel Nephin d5edce269e
Merge pull request #8679 from hashicorp/streaming/fix-TestHandler_EmitsStats
streaming: Fix TestHandler_EmitsStats
2020-09-15 17:04:55 -04:00
Kyle Havlovitz 1cd7c43544 Update vault CA for latest api client 2020-09-15 13:33:55 -07:00
Paul Banks 2ae5230851
Update UI Config passing to not use an inline script (#8645)
* Update UI Config passing to not use an inline script

* Update agent/http.go

* Fix incorrect placeholder name
2020-09-15 20:57:37 +01:00
Kyle Havlovitz 7ffef62ed7 Clean up CA shutdown logic and error 2020-09-15 12:28:58 -07:00
Kyle Havlovitz 35bb09f85c
Merge pull request #8646 from hashicorp/common-intermediate-ttl
Move IntermediateCertTTL to common CA config
2020-09-15 12:03:29 -07:00
Pierre Souchay 638dcd3360 [BUGFIX] Avoid GetDatacenter* methods to flood Consul servers logs
When calling `GetDatacentersByDistance()` or `GetDatacentersMap()`, an
incorrect condition was used to diplay log message, thus flooding
Consul's logs.

Example of message:

```
  [WARN] agent.router: Non-server in server-only area: non_server=myClientNode area=lan
```

This message is only valid for WAN areas, filter to avoid creating
hundreds of logs/s on our clusters, each time someone is calling this
method.

Our logs were flooded by such messages when migrating our Consul servers
from 1.7.7 to 1.8.4.

This will issue fix #8663
2020-09-15 11:54:59 +02:00
Daniel Nephin 636f76f6f1 agent/grpc: make TestHandler_EmitsStats predictable
Occasionally this test would flake. The flakes were fixed by:

1. Stopping the service and retrying to check on metrics. This way we
   also include the active_streams going to 0 in the metric calls.

2. Using a reference to the global Metrics. This way when other tests
   have background goroutines that are still shutting down, they won't
   emit metrics to the metric instance with the fake Sink. The stats
   test can patch the local reference to the global, so the existing
   statHandlers will continue to emit to the global, but the stats
   test will send all metrics to the replacement.
2020-09-14 19:05:22 -04:00
Daniel Nephin ee65ee541e grpc: add Datacenter field to testing service response 2020-09-14 19:02:09 -04:00
freddygv 856d5a25ee Fix text type assertion 2020-09-14 16:28:40 -06:00
freddygv 7fd518ff1d Merge master 2020-09-14 16:17:43 -06:00
freddygv 87541ab80a Fix type assertion 2020-09-14 16:12:21 -06:00
Daniel Nephin 20aea3dbc9
Merge pull request #8587 from hashicorp/streaming/add-grpc-server
streaming: add gRPC server for handling connections
2020-09-14 15:24:54 -04:00
freddygv 7b9d1b41d5 Resolve conflicts against master 2020-09-11 18:41:58 -06:00
freddygv 768dbaa68d Add session flag to cookie config 2020-09-11 18:34:03 -06:00
freddygv 9d2a9169fd PR comments 2020-09-11 10:49:26 -06:00
Kyle Havlovitz 49056fe70f Clean up Vault renew tests and shutdown 2020-09-11 08:41:05 -07:00
freddygv eab90ea9fa Revert EnvoyConfig nesting 2020-09-11 09:21:43 -06:00
Kyle Havlovitz f40fb577fe Use mapstructure for decoding vault data 2020-09-10 06:31:04 -07:00
Kyle Havlovitz aa97366020 Add a stop function to make sure the renewer is shut down on leader change 2020-09-10 06:12:48 -07:00
Kyle Havlovitz 2f7210bde2 Move IntermediateCertTTL to common CA config 2020-09-10 00:23:22 -07:00
Kyle Havlovitz 411b6537ef Add a test for token renewal 2020-09-09 16:36:37 -07:00
Daniel Nephin fd42804063 grpc: Add a simple test service for testing the gRPC server 2020-09-08 12:10:43 -04:00
Daniel Nephin 2257247095 server: add gRPC server for streaming events
Includes a stats handler and stream interceptor for grpc metrics.

Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-08 12:10:41 -04:00
Daniel Nephin 0bb9c318b7 http: fix tests incorrectly using HTTPAddr to get the address of the
https server.

In #8234 I changed a few tests to use TestAgent.HTTPAddr() to find the
addr used in the test. Due to the way HTTPAddr() was implemented these
tests were passing, but I think the pass was incidental. HTTPAddr() was
not matching any servers, and was instead returning the last server,
which happened to be the one these tests wanted.

This commit fixes the implementation of HTTPAddr to panic if no match
was found. The tests which require an HTTPS server are changed to use
a new firstAddr() to look up the correct address.
2020-09-04 15:29:17 -04:00
freddygv 403a180430 Set tgw filter router config name to cluster name 2020-09-04 12:45:05 -06:00
Hans Hasselberg 436a7032d1
secondaryIntermediateCertRenewalWatch abort on success (#8588)
secondaryIntermediateCertRenewalWatch was using `retryLoopBackoff` to
renew the intermediate certificate. Once it entered the inner loop and
started `retryLoopBackoff` it would never leave that.
`retryLoopBackoffAbortOnSuccess` will return when renewing is
successful, like it was intended originally.
2020-09-04 11:47:16 +02:00
freddygv 959d9913b8 Add server receiver to routes and log tgw err 2020-09-03 16:19:58 -06:00
Daniel Nephin ed4b51f1ae
Merge pull request #8357 from hashicorp/streaming/add-service-health-events
streaming: add ServiceHealth events
2020-09-03 17:53:56 -04:00
Daniel Nephin 4c9ed41eab
Merge pull request #8554 from hashicorp/dnephin/agent-setup-persisted-tokens
agent: move token persistence from agent into token.Store
2020-09-03 17:29:21 -04:00
Daniel Nephin e573e64d58 state: handle terminating gateways in service health events 2020-09-03 16:58:05 -04:00
Daniel Nephin 3775392fb5 state: improve comments in catalog_events.go
Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-03 16:58:05 -04:00
Daniel Nephin 417c5c93a8 state: use changeType in serviceChanges
To be a little more explicit, instead of nil implying an indirect change
2020-09-03 16:58:05 -04:00
Daniel Nephin 01424ba146 don't over allocate slice 2020-09-03 16:58:04 -04:00
Daniel Nephin d210242875 state: fix a bug in building service health events
The nodeCheck slice was being used as the first arg in append, which in some cases will modify the array backing the slice. This would lead to service checks for other services in the wrong event.

Also refactor some things to reduce the arguments to functions.
2020-09-03 16:58:04 -04:00
Daniel Nephin 7581305523 state: Remove unused args and return values
Also rename some functions to identify them as constructors for events
2020-09-03 16:58:04 -04:00
Daniel Nephin 27b02d391c state: use an enum for tracking node changes 2020-09-03 16:58:04 -04:00
Daniel Nephin 09329b542d state: serviceHealthSnapshot
refactored to remove unused return value and remove duplication
2020-09-03 16:58:04 -04:00
Daniel Nephin bf523420ee state: Add Change processor and snapshotter for service health
Co-authored-by: Paul Banks <banks@banksco.de>
2020-09-03 16:58:04 -04:00
Daniel Nephin e03e911144 state: fix bug in changeTrackerDB.publish
Creating a new readTxn does not work because it will not see the newly created objects that are about to be committed. Instead use the active write Txn.
2020-09-03 16:58:01 -04:00
Daniel Nephin 5de4d5bbe3 stream: have SnapshotFunc accept a non-pointer SubscribeRequest
The value is not expected to be modified. Passing a value makes that explicit.
2020-09-03 16:54:02 -04:00
freddygv cd4cf5161f Update resolver defaulting 2020-09-03 13:08:44 -06:00
freddygv 00f2794bfa Update golden files after default route fix for tgw 2020-09-03 12:35:11 -06:00
Daniel Nephin 6ca45e1a61 agent: add apiServers type for managing HTTP servers
Remove Server field from HTTPServer. The field is no longer used.
2020-09-03 13:40:12 -04:00
freddygv 318aa094fd Fix http assertion in route creation 2020-09-03 10:21:20 -06:00
freddygv 30ba080d25 Add explicit protocol overrides in tgw xds test cases 2020-09-03 08:57:48 -06:00
freddygv eaa250cc80 Ensure resolver node with LB isn't considered default 2020-09-03 08:55:57 -06:00
freddygv ef877449ce Move valid policies to pkg level 2020-09-02 15:49:03 -06:00
freddygv f81fe6a1a1 Remove LB infix and move injection to xds 2020-09-02 15:13:50 -06:00
R.B. Boyer 119e945c3e
connect: all config entries pick up a meta field (#8596)
Fixes #8595
2020-09-02 14:10:25 -05:00
Chris Piraino 28f163c2d2
Merge pull request #8603 from hashicorp/feature/usage-metrics
Track node and service counts in the state store and emit them periodically as metrics
2020-09-02 13:23:39 -05:00
R.B. Boyer d0f74cd1e8
connect: fix bug in preventing some namespaced config entry modifications (#8601)
Whenever an upsert/deletion of a config entry happens, within the open
state store transaction we speculatively test compile all discovery
chains that may be affected by the pending modification to verify that
the write would not create an erroneous scenario (such as splitting
traffic to a subset that did not exist).

If a single discovery chain evaluation references two config entries
with the same kind and name in different namespaces then sometimes the
upsert/deletion would be falsely rejected. It does not appear as though
this bug would've let invalid writes through to the state store so the
correction does not require a cleanup phase.
2020-09-02 10:47:19 -05:00
Chris Piraino bcb586bee2 Set metrics reporting interval to 9 seconds
This is below the 10 second interval that lib/telemetry.go implements as
its aggregation interval, ensuring that we always report these metrics.
2020-09-02 10:24:23 -05:00
Chris Piraino a3028cad89 Update godoc string for memdb wrapper functions/structs 2020-09-02 10:24:22 -05:00
Chris Piraino d301145e62 Refactor state store usage to track unique service names
This commit refactors the state store usage code to track unique service
name changes on transaction commit. This means we only need to lookup
usage entries when reading the information, as opposed to iterating over
a large number of service indices.

- Take into account a service instance's name being changed
- Do not iterate through entire list of service instances, we only care
about whether there is 0, 1, or more than 1.
2020-09-02 10:24:21 -05:00
Chris Piraino 086a8ea8eb Use ReadTxn interface in state store helper functions 2020-09-02 10:24:20 -05:00
Chris Piraino 69dbc926ad Add WriteTxn interface and convert more functions to ReadTxn
We add a WriteTxn interface for use in updating the usage memdb table,
with the forward-looking prospect of incrementally converting other
functions to accept interfaces.

As well, we use the ReadTxn in new usage code, and as a side effect
convert a couple of existing functions to use that interface as well.
2020-09-02 10:24:19 -05:00
Chris Piraino 3feae7f77b Report node/service usage metrics from every server
Using the newly provided state store methods, we periodically emit usage
metrics from the servers.

We decided to emit these metrics from all servers, not just the leader,
because that means we do not have to care about leader election flapping
causing metrics turbulence, and it seems reasonable for each server to
emit its own view of the state, even if they should always converge
rapidly.
2020-09-02 10:24:17 -05:00
Chris Piraino 04705e90f9 Add new usage memdb table that tracks usage counts of various elements
We update the usage table on Commit() by using the TrackedChanges() API
of memdb.

Track memdb changes on restore so that usage data can be compiled
2020-09-02 10:24:16 -05:00
freddygv 63f79e5f9b Restructure structs and other PR comments 2020-09-02 09:10:50 -06:00
Daniel Nephin f1a41318d7 token: OSS support for enterprise tokens 2020-08-31 15:10:15 -04:00
Daniel Nephin 629e4aaa65 config: use token.Config for ACLToken config
Using the target Config struct reduces the amount of copying and
translating of configuration structs.
2020-08-31 15:10:15 -04:00
Daniel Nephin 330be5b740 agent/token: Move token persistence out of agent
And into token.Store. This change isolates any awareness of token
persistence in a single place.

It is a small step in allowing Agent.New to accept its dependencies.
2020-08-31 15:00:34 -04:00
Daniel Nephin a80de898ea fix TestStore_RegularTokens
This test was only passing because t.Parallel was causing every subtest to run with the last value in the iteration,
which sets a value for all tokens. The test started to fail once t.Parallel was removed, but the same failure could
have been produced by adding 'tt := tt' to the t.Run() func.

These tests run in under 10ms, so there is no reason to use t.Parallel.
2020-08-31 14:59:14 -04:00
Matt Keeler 91d680b830
Merge of auto-config and auto-encrypt code (#8523)
auto-encrypt is now handled as a special case of auto-config.

This also is moving all the cert-monitor code into the auto-config package.
2020-08-31 13:12:17 -04:00
freddygv 0236e169bb Add documentation for resolver LB cfg 2020-08-28 14:46:13 -06:00
freddygv 28d0602fc1 Pass LB config to Envoy via xDS 2020-08-28 14:27:40 -06:00
freddygv 2bbbd9e1da Log error as error 2020-08-28 13:11:55 -06:00
freddygv 81115b6eaa Compile down LB policy to disco chain nodes 2020-08-28 13:11:04 -06:00
Daniel Nephin 6956477be5
Merge pull request #8548 from edevil/fix_flake
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve
2020-08-28 15:10:55 -04:00
Daniel Nephin 72bf350069
Merge pull request #8552 from pierresouchay/reload_cache_throttling_config
Ensure that Cache options are reloaded when `consul reload` is performed
2020-08-28 15:04:42 -04:00
Pierre Souchay d5974b1d17 Added Unit test for cache reloading 2020-08-28 13:03:58 +02:00
freddygv ff56a64b08 Add LB policy to service-resolver 2020-08-27 19:44:02 -06:00
Jack 9e1c6727f9
Add http2 and grpc support to ingress gateways (#8458) 2020-08-27 15:34:08 -06:00
R.B. Boyer 74d5df7c7a
xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569) 2020-08-27 12:20:58 -05:00
R.B. Boyer d1843456d2
agent: ensure that we normalize bootstrapped config entries (#8547) 2020-08-27 11:37:25 -05:00
Pierre Souchay 9a64d3e5fe Also test reload of EntryFetchMaxBurst 2020-08-27 18:14:05 +02:00
Matt Keeler f97cc0445a
Move RPC router from Client/Server and into BaseDeps (#8559)
This will allow it to be a shared component which is needed for AutoConfig
2020-08-27 11:23:52 -04:00
Pierre Souchay 5842a902df Tests that changes in rate limit are taken into account by agent 2020-08-27 16:41:20 +02:00
Pierre Souchay 879d087f65 Added `options.Equals()` and minor fixes indentation fixes 2020-08-27 13:44:45 +02:00
R.B. Boyer fead4fc2a5
agent: expose the list of supported envoy versions on /v1/agent/self (#8545) 2020-08-26 10:04:11 -05:00
Kyle Havlovitz 97f1f341d6 Automatically renew the token used by the Vault CA provider 2020-08-25 10:34:49 -07:00
Pierre Souchay d2be9d38da Ensure that Cache options are reloaded when `consul reload` is performed.
This will apply cache throttling parameters are properly applied:
 * cache.EntryFetchMaxBurst
 * cache.EntryFetchRate

When values are updated, a log is displayed in info.
2020-08-24 23:33:10 +02:00
André Cruz 9a0792139c
Decrease test flakiness
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve and TestCacheNotifyPolling
2020-08-24 20:30:02 +01:00
André Cruz aa212423e3
testing: Fix govet errors 2020-08-21 18:01:55 +01:00
Daniel Nephin 01745feec0
Merge pull request #8537 from hashicorp/dnephin/fix-panic-on-connect-nil
Fix panic when decoding 'Connect: null'
2020-08-20 18:00:25 -04:00
Daniel Nephin 07ad662131 Fix panic when decoding 'Connect: null'
Surprisingly the json Unmarshal updates the aux pointer to a nil.
2020-08-20 17:52:14 -04:00
Daniel Nephin e16375216d config: use logging.Config in RuntimeConfig
To add structure to RuntimeConfig, and remove the need to translate into a third type.
2020-08-19 13:21:00 -04:00
Daniel Nephin f2373a5575 logging: move init of grpclog
This line initializes global state. Moving it out of the constructor and closer to where logging
is setup helps keep related things together.
2020-08-19 13:21:00 -04:00
Daniel Nephin 33c401a16e logging: Setup accept io.Writer instead of []io.Writer
Also accept a non-pointer Config, since the config is not modified
2020-08-19 13:20:41 -04:00
Daniel Nephin 63bad36de7 testing: disable global metrics sink in tests
This might be better handled by allowing configuration for the InMemSink interval and retail, and disabling
the global. For now this is a smaller change to remove the goroutine leak caused by tests because go-metrics
does not provide any way of shutting down the global goroutine.
2020-08-18 19:04:57 -04:00
Daniel Nephin 5d4df54296 agent: extract dependency creation from New
With this change, Agent.New() accepts many of the dependencies instead
of creating them in New. Accepting fully constructed dependencies from
a constructor makes the type easier to test, and easier to change.

There are still a number of dependencies created in Start() which can
be addressed in a follow up.
2020-08-18 19:04:55 -04:00
Daniel Nephin 51b08c645b
Merge pull request #8514 from hashicorp/dnephin/testing-improvements-1
testing: small improvements to TestSessionCreate and testutil.retry
2020-08-18 18:26:05 -04:00
Daniel Nephin ab2157bbc9
Merge pull request #8528 from hashicorp/dnephin/move-node-name-validation
config: Move some config validation from Agent.Start to config.Builder.Validate
2020-08-18 18:25:41 -04:00
Hans Hasselberg a932aafc91
add primary keys to list keyring (#8522)
During gossip encryption key rotation it would be nice to be able to see if all nodes are using the same key. This PR adds another field to the json response from `GET v1/operator/keyring` which lists the primary keys in use per dc. That way an operator can tell when a key was successfully setup as primary key.

Based on https://github.com/hashicorp/serf/pull/611 to add primary key to list keyring output:

```json
[
  {
    "WAN": true,
    "Datacenter": "dc2",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 6,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6
    },
    "NumNodes": 6
  },
  {
    "WAN": false,
    "Datacenter": "dc2",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 8,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "NumNodes": 8
  },
  {
    "WAN": false,
    "Datacenter": "dc1",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 3,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "NumNodes": 8
  }
]
```

I intentionally did not change the CLI output because I didn't find a good way of displaying this information. There are a couple of options that we could implement later:
* add a flag to show the primary keys
* add a flag to show json output

Fixes #3393.
2020-08-18 09:50:24 +02:00
Daniel Nephin 35f1ecee0b config: Move remote-script-checks warning to config
Previously it was done in Agent.Start, but it can be done much earlier
2020-08-17 17:39:49 -04:00
Daniel Nephin 27b36bfc4e config: move NodeName validation to config validation
Previsouly it was done in Agent.Start, which is much later then it needs to be.

The new 'dns' package was required, because otherwise there would be an
import cycle. In the future we should move more of the dns server into
the dns package.
2020-08-17 17:25:02 -04:00
Daniel Nephin b4015969c9
Merge pull request #8515 from hashicorp/dnephin/unexport-testing-shims
config: unexport fields and resolve TODOs in config.Builder
2020-08-17 16:03:07 -04:00
Daniel Nephin 16217fe9b9 testing: use t.Cleanup in testutil.TempFile
So that it has the same behaviour as TempDir.

Also remove the now unnecessary 'defer os.Remove'
2020-08-14 20:06:01 -04:00
Daniel Nephin d68edcecf4 testing: Remove all the defer os.Removeall
Now that testutil uses t.Cleanup to remove the directory the caller no longer has to manage
the removal
2020-08-14 19:58:53 -04:00
Daniel Nephin 8a4d292c8e config: unexport and resolve TODOs in config.Builder
- unexport testing shims, and document their purpose
- resolve a TODO by moving validation to NewBuilder and storing the one
  field that is used instead of all of Options
- create a slice with the correct size to avoid extra allocations
2020-08-14 19:23:32 -04:00
Daniel Nephin a4b201af36 testing: Improve session_endpoint_test
While working on another change I caused a bunch of these tests to fail.
Unfortunately the failure messages were not super helpful at first.

One problem was that the request and response were created outside of
the retry. This meant that when the second attempt happened, the request
body was empty (because the buffer had been consumed), and so the
request was not actually being retried. This was fixed by moving more of
the request creation into the retry block.

Another problem was that these functions can return errors in two ways, and
are not consistent about which way they use. Some errors are returned to
the response writer, but the tests were not checking those errors, which
was causing a panic later on. This was fixed by adding a check for the
response code.

Also adds some missing t.Helper(), and has assertIndex use checkIndex so
that it is clear these are the same implementation.
2020-08-14 18:55:52 -04:00
Daniel Nephin 070e843113 testutil: Add t.Cleanup to TempDir
TempDir registers a Cleanup so that the directory is always removed. To disable to cleanup, set the TEST_NOCLEANUP
env var.
2020-08-14 13:19:10 -04:00
Daniel Nephin 2b920ad199 testing: fix flaky test TestDNS_NonExistentDC_RPC
I saw this test flake locally, and it was easy to reproduce with -count=10.

The failure was: 'TestAgent.dns: rpc error: error=No known Consul servers'.

Waiting for the agent seems to fix it.
2020-08-13 18:03:04 -04:00
Daniel Nephin 1912c5ad89 testing: wait until monitor has started before shutdown
This commit fixes a test that I saw flake locally while running tests. The test output from the monitor
started immediately after the line the test was looking for.

To fix the problem a channel is closed when the goroutine starts. Shutdown is not called until this channel
is closed, which seems to greatly reduce the chance of a flake.
2020-08-13 17:53:29 -04:00
Daniel Nephin 3a4e62836b testing: Remove TestAgent.Key and change TestAgent.DataDir
TestAgent.Key was only used by 3 tests. Extracting it from the common helper that is used in hundreds of
tests helps keep the shared part small and more focused.

This required a second change (which I was planning on making anyway), which was to change the behaviour of
DataDir. Now in all cases the TestAgent will use the DataDir, and clean it up once the test is complete.
2020-08-13 17:53:24 -04:00
Daniel Nephin b1679508d4 testing: use t.Cleanup in TestAgent for returnPorts 2020-08-13 17:09:37 -04:00
Daniel Nephin 4e8e0de8f0 testing: remove unused fields from TestACLAgent 2020-08-13 17:03:55 -04:00
Daniel Nephin 399c77dfb6 agent: rename vars in newConsulConfig
'base' is a bit misleading, since it is the return value. Renamed to cfg.
2020-08-13 11:58:21 -04:00
Daniel Nephin 7b5b170a0d agent: Move setupKeyring functions to keyring.go
There are a couple reasons for this change:

1. agent.go is way too big. Smaller files makes code eaasier to read
   because tools that show usage also include filename which can give
   a lot more context to someone trying to understand which functions
   call other functions.
2. these two functions call into a large number of functions already in
   keyring.go.
2020-08-13 11:58:21 -04:00
Daniel Nephin 9919e5dfa5 agent: unmethod consulConfig
To allow us to move newConsulConfig out of Agent.
2020-08-13 11:58:21 -04:00
Daniel Nephin 8f596f5551 Fix conflict in merged PRs
One PR renamed the var from config->cfg, and another used the old name config, which caused the
build to fail on master.
2020-08-13 11:28:26 -04:00
Daniel Nephin d677706625 state: remove unused Store method receiver
And use ReadTxn interface where appropriate.
2020-08-13 11:25:22 -04:00
Daniel Nephin 190fcc14a3
Merge pull request #8463 from hashicorp/dnephin/unmethod-make-node-id
agent: convert NodeID methods to functions
2020-08-13 11:18:11 -04:00
Daniel Nephin 912aae8624
Merge pull request #8461 from hashicorp/dnephin/remove-notify-shutdown
agent/consul: Remove NotifyShutdown
2020-08-13 11:16:48 -04:00
Daniel Nephin 5b37efd91b
Merge pull request #8365 from hashicorp/dnephin/fix-service-by-node-meta-flake
state: speed up tests that use watchLimit
2020-08-13 11:16:12 -04:00
Daniel Nephin 37eacf8192 auto-config: reduce awareness of config
This is a small step to allowing Agent to accept its dependencies
instead of creating them in New.

There were two fields in autoconfig.Config that were used exclusively
to load config. These were replaced with a single function, allowing us
to move LoadConfig back to the config package.

Also removed the WithX functions for building a Config. Since these were
simple assignment, it appeared we were not getting much value from them.
2020-08-12 13:23:23 -04:00
Daniel Nephin e07554500e Remove check that hostID is a uuid.
Immediately afterward we hash the ID, so it does not need to be a uuid anymore.
2020-08-12 13:05:10 -04:00
Daniel Nephin 875d8bde42 agent: convert NodeID methods to functions
Making these functions allows us to cleanup how an agent is initialized. They only make use of a config and a logger, so they do not need to be agent methods.

Also cleanup the testing to use t.Run and require.
2020-08-12 13:05:10 -04:00
Daniel Nephin 0738eb8596 Extract nodeID functions to a different file
In preparation for turning them into functions.
To reduce the scope of Agent, and refactor how Agent is created and started.
2020-08-12 13:05:10 -04:00
R.B. Boyer e3cd4a8539
connect: use stronger validation that ingress gateways have compatible protocols defined for their upstreams (#8470)
Fixes #8466

Since Consul 1.8.0 there was a bug in how ingress gateway protocol
compatibility was enforced. At the point in time that an ingress-gateway
config entry was modified the discovery chain for each upstream was
checked to ensure the ingress gateway protocol matched. Unfortunately
future modifications of other config entries were not validated against
existing ingress-gateway definitions, such as:

1. create tcp ingress-gateway pointing to 'api' (ok)
2. create service-defaults for 'api' setting protocol=http (worked, but not ok)
3. create service-splitter or service-router for 'api' (worked, but caused an agent panic)

If you were to do these in a different order, it would fail without a
crash:

1. create service-defaults for 'api' setting protocol=http (ok)
2. create service-splitter or service-router for 'api' (ok)
3. create tcp ingress-gateway pointing to 'api' (fail with message about
   protocol mismatch)

This PR introduces the missing validation. The two new behaviors are:

1. create tcp ingress-gateway pointing to 'api' (ok)
2. (NEW) create service-defaults for 'api' setting protocol=http ("ok" for back compat)
3. (NEW) create service-splitter or service-router for 'api' (fail with
   message about protocol mismatch)

In consideration for any existing users that may be inadvertently be
falling into item (2) above, that is now officiall a valid configuration
to be in. For anyone falling into item (3) above while you cannot use
the API to manufacture that scenario anymore, anyone that has old (now
bad) data will still be able to have the agent use them just enough to
generate a new agent/proxycfg error message rather than a panic.
Unfortunately we just don't have enough information to properly fix the
config entries.
2020-08-12 11:19:20 -05:00
Freddy d72f72dcd5
Notify alias checks when aliased service is [de]registered (#8456) 2020-08-12 09:47:41 -06:00
Daniel Nephin 3d96c5b651
Merge pull request #8469 from hashicorp/dnephin/config-source
config: make Source an interface to avoid the marshal/unmarshal cycle in auto-config
2020-08-12 11:17:15 -04:00
Hans Hasselberg aacf0fd777
Merge pull request #8471 from hashicorp/local_only
thread local-only through the layers
2020-08-12 08:54:51 +02:00
Freddy 875816d0d3
Internal endpoint to query intentions associated with a gateway (#8400) 2020-08-11 17:20:41 -06:00
Kyle Havlovitz 635952681e Fix a state store comment about version 2020-08-11 13:46:12 -07:00
Kyle Havlovitz c39a275666 fsm: Fix snapshot bug with restoring node/service/check indexes 2020-08-11 11:49:52 -07:00
Hans Hasselberg aff02198d7 Refactor keyring ops:
* changes some functions to return data instead of modifying pointer
  arguments
* renames globalRPC() to keyringRPCs() to make its purpose more clear
* restructures KeyringOperation() to make it more understandable
2020-08-11 13:42:03 +02:00
Hans Hasselberg 07261db64d thread local-only through the layers
$ consul keyring -list -local-only
==> Gathering installed encryption keys...

dc1 (LAN):
  aUlAW4ST3+vwseI61so24CoORkyjZofcmHk+j7QPSYQ= [1/1]
2020-08-11 13:41:53 +02:00
Daniel Nephin 4297a8ba07 auto-config: Avoid the marshal/unmarshal cycle in auto-config
Use a LiteralConfig and return a config.Config from translate.
2020-08-10 20:07:52 -04:00
freddygv de0b574a26 Update error handling 2020-08-10 17:48:22 -06:00
Daniel Nephin 38980ebb4c config: Make Source an interface
This will allow us to accept config from auto-config without needing to
go through a serialziation cycle.
2020-08-10 12:46:28 -04:00
Mike Morris ff37af1129
changelog: Update for 1.8.2, 1.7.6, 1.7.5 and 1.6.7 (#8462)
* update bindata_assetfs.go

* Release v1.8.2

* Putting source back into Dev Mode

* changelog: add entries for 1.7.6, 1.7.5 and 1.6.7

Co-authored-by: hashicorp-ci <hashicorp-ci@users.noreply.github.com>
2020-08-07 18:58:09 -04:00
Daniel Nephin 80e99cb3e6 testing: remove unnecessary defers in tests
The data directory is now removed by the test helper that created it.
2020-08-07 17:28:16 -04:00
Daniel Nephin 7dbacf297c testing: Remove NotifyShutdown
NotifyShutdown was only used for testing. Now that t.Cleanup exists, we
can use that instead of attaching cleanup to the Server shutdown.

The Autopilot test which used NotifyShutdown doesn't need this
notification because Shutdown is synchronous. Waiting for the function
to return is equivalent.
2020-08-07 17:14:44 -04:00
Matt Keeler 67dec3b609
Require token replication to be enabled in secondary dcs when ACLs are enabled with AutoConfig (#8451)
AutoConfig will generate local tokens for clients and the ability to use local tokens is gated off of token replication being enabled and being configured with a replication token. Therefore we already have a hard requirement on having token replication enabled, this commit just makes sure to surface that to the operator instead of having to discern what the issue is from RPC errors.
2020-08-07 10:20:27 -04:00
Hans Hasselberg d316cd06c1
auto_config implies connect (#8433) 2020-08-07 12:02:02 +02:00
freddygv c8f5215e9d Fix test build 2020-08-06 11:31:56 -06:00
Hans Hasselberg 51a8e15cf8
Mark its own cluster as healthy when rebalancing. (#8406)
This code started as an optimization to avoid doing an RPC Ping to
itself. But in a single server cluster the rebalancing was led to
believe that there were no healthy servers because foundHealthyServer
was not set. Now this is being set properly.

Fixes #8401 and #8403.
2020-08-06 10:42:09 +02:00
freddygv 15c3cfce5e PR comments and addtl tests 2020-08-05 16:07:11 -06:00
Daniel Nephin ae382805bd
Merge pull request #8404 from hashicorp/dnephin/remove-log-output-field
Use Logger consistently, instead of LogOutput
2020-08-05 14:31:43 -04:00
Daniel Nephin 3b82ad0955 Rename NewClient/NewServer
Now that duplicate constructors have been removed we can use the shorter names for the single constructor.
2020-08-05 14:00:55 -04:00
Daniel Nephin 0420d91cdd Remove LogOutput from Agent
Now that it is no longer used, we can remove this unnecessary field. This is a pre-step in cleanup up RuntimeConfig->Consul.Config, which is a pre-step to adding a gRPCHandler component to Server for streaming.

Removing this field also allows us to remove one of the return values from logging.Setup.
2020-08-05 14:00:44 -04:00
Daniel Nephin 5acf01ceeb Remove LogOutput from Server 2020-08-05 14:00:44 -04:00
Daniel Nephin 0c5428eea8 Remove LogOutput from Client 2020-08-05 14:00:42 -04:00
Daniel Nephin e8ee2cf2f7 Pass a logger to ConnPool and yamux, instead of an io.Writer
Allowing us to remove the LogOutput field from config.
2020-08-05 13:25:08 -04:00
Daniel Nephin ed8210fe4d api: Use a Logger instead of an io.Writer in api.Watch
So that we can pass around only a Logger, not a LogOutput
2020-08-05 13:25:08 -04:00
Daniel Nephin 1e17a0c3e1 config: Remove unused field 2020-08-05 13:25:08 -04:00
Daniel Nephin ba3ace1219 Return nil value on error.
The main bug was fixed in cb050b280c, but the return value of 'result' is still misleading.
Change the return value to nil to make the code more clear.
2020-08-05 13:10:17 -04:00
R.B. Boyer c599a2f5f4
xds: add support for envoy 1.15.0 and drop support for 1.11.x (#8424)
Related changes:

- hard-fail the xDS connection attempt if the envoy version is known to be too old to be supported
- remove the RouterMatchSafeRegex proxy feature since all supported envoy versions have it
- stop using --max-obj-name-len (due to: envoyproxy/envoy#11740)
2020-07-31 15:52:49 -05:00
freddygv 0956624e39 collect GatewayServices from iter in a function 2020-07-31 13:30:40 -06:00
Freddy f1e8addbdf
Avoid panics during shutdown routine (#8412) 2020-07-30 11:11:10 -06:00
freddygv aa6c59dbfc end to end changes to pass gatewayservices to /ui/services/ 2020-07-30 10:21:11 -06:00
Matt Keeler 1a78cf9b4c
Ensure certificates retrieved through the cache get persisted with auto-config (#8409) 2020-07-30 11:37:18 -04:00
freddygv 51255eede7 Support ConnectedWithProxy 2020-07-30 09:32:12 -06:00
Matt Keeler dbb461a5d3
Allow setting verify_incoming* when using auto_encrypt or auto_config (#8394)
Ensure that enabling AutoConfig sets the tls configurator properly

This also refactors the TLS configurator a bit so the naming doesn’t imply only AutoEncrypt as the source of the automatically setup TLS cert info.
2020-07-30 10:15:12 -04:00
Hans Hasselberg 054595b1f8
agent/cache test for cache throttling. (#8396) 2020-07-30 14:41:13 +02:00
Matt Keeler 34034b76f5
Agent Auto Config: Implement Certificate Generation (#8360)
Most of the groundwork was laid in previous PRs between adding the cert-monitor package to extracting the logic of signing certificates out of the connect_ca_endpoint.go code and into a method on the server.

This also refactors the auto-config package a bit to split things out into multiple files.
2020-07-28 15:31:48 -04:00
Matt Keeler be01c4241d
Default Cache rate limiting options in New
Also get rid of the TestCache helper which was where these defaults were happening previously.
2020-07-28 12:34:35 -04:00
Matt Keeler 83d09de230
Fix some broken code in master
There were several PRs that while all passed CI independently, when they all got merged into the same branch caused compilation errors in test code.

The main changes that caused issues where changing agent/cache.Cache.New to require a concrete options struct instead of a pointer. This broke the cert monitor tests and the catalog_list_services_test.go. Another change was made to unembed the http.Server from the agent.HTTPServer struct. That coupled with another change to add a test to ensure cache rate limiting coming from HTTP requests was working as expected caused compilation failures.
2020-07-28 09:50:10 -04:00
Pierre Souchay 505de6dc29
Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 23:11:11 +02:00
Matt Keeler 5c2c762106
Move connect root retrieval and cert signing logic out of the RPC endpoints (#8364)
The code now lives on the Server type itself. This was done so that all of this could be shared with auto config certificate signing.
2020-07-24 10:00:51 -04:00
Matt Keeler 2ee9fe0a4d
Move generation of the CA Configuration from the agent code into a method on the RuntimeConfig (#8363)
This allows this to be reused elsewhere.
2020-07-23 16:05:28 -04:00
Daniel Nephin 3d115a62fd
Merge pull request #8323 from hashicorp/dnephin/add-event-publisher-2
stream: close subscriptions on shutdown
2020-07-23 13:12:50 -04:00
Matt Keeler 2713c0e682
Refactor the agentpb package (#8362)
First move the whole thing to the top-level proto package name.

Secondly change some things around internally to have sub-packages.
2020-07-23 11:24:20 -04:00
Paul Coignet 1d75a8fb50
Fix tests 2020-07-23 11:04:10 +02:00
Daniel Nephin ed69feca6d stream: close all subs when EventProcessor is shutdown. 2020-07-22 19:04:10 -04:00
Daniel Nephin a99a4103bd stream: fix overallocation in filter
And add tests
2020-07-22 19:04:10 -04:00
Daniel Nephin 9ed61fd160 state: speed up TestStateStore_ServicesByNodeMeta
Make watchLimit a var so that we can patch it in tests and reduce the time spent creating state.
2020-07-22 16:57:06 -04:00
Daniel Nephin 0402dd7ac5 state: Use subtests in TestStateStore_ServicesByNodeMeta
These subtests make it much easier to identify the slow part of the test, but they also help enumerate all the different cases which are being tested.
2020-07-22 16:39:09 -04:00
Daniel Nephin 3570ce6566
Merge pull request #7948 from hashicorp/dnephin/buffer-test-logs
testutil: NewLogBuffer - buffer logs until a test fails
2020-07-21 15:21:52 -04:00
Matt Keeler 3c09482864
Merge pull request #8311 from hashicorp/bugfix/auto-encrypt-token-update 2020-07-21 13:15:27 -04:00
Daniel Nephin a33a7a6fe2
Merge pull request #8344 from hashicorp/dnephin/fix-flakes-in-stream
stream: handle empty event in TestEventSnapshot
2020-07-21 13:14:35 -04:00
Daniel Nephin 51efba2c7d testutil: NewLogBuffer - buffer logs until a test fails
Replaces #7559

Running tests in parallel, with background goroutines, results in test output not being associated with the correct test. `go test` does not make any guarantees about output from goroutines being attributed to the correct test case.

Attaching log output from background goroutines also cause data races.  If the goroutine outlives the test, it will race with the test being marked done. Previously this was noticed as a panic when logging, but with the race detector enabled it is shown as a data race.

The previous solution did not address the problem of correct test attribution because test output could still be hidden when it was associated with a test that did not fail. You would have to look at all of the log output to find the relevant lines. It also made debugging test failures more difficult because each log line was very long.

This commit attempts a new approach. Instead of printing all the logs, only print when a test fails. This should work well when there are a small number of failures, but may not work well when there are many test failures at the same time. In those cases the failures are unlikely a result of a specific test, and the log output is likely less useful.

All of the logs are printed from the test goroutine, so they should be associated with the correct test.

Also removes some test helpers that were not used, or only had a single caller. Packages which expose many functions with similar names can be difficult to use correctly.

Related:
https://github.com/golang/go/issues/38458 (may be fixed in go1.15)
https://github.com/golang/go/issues/38382#issuecomment-612940030
2020-07-21 12:50:40 -04:00
Matt Keeler 12acdd7481
Disable background cache refresh for Connect Leaf Certs
The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself.

In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed.

If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond.

This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.
2020-07-21 12:19:25 -04:00
Matt Keeler 9da8c51ac5
Fix issue with changing the agent token causing failure to renew the auto-encrypt certificate
The fallback method would still work but it would get into a state where it would let the certificate expire for 10s before getting a new one. And the new one used the less secure RPC endpoint.

This is also a pretty large refactoring of the auto encrypt code. I was going to write some tests around the certificate monitoring but it was going to be impossible to get a TestAgent configured in such a way that I could write a test that ran in less than an hour or two to exercise the functionality.

Moving the certificate monitoring into its own package will allow for dependency injection and in particular mocking the cache types to control how it hands back certificates and how long those certificates should live. This will allow for exercising the main loop more than would be possible with it coupled so tightly with the Agent.
2020-07-21 12:19:25 -04:00
Daniel Nephin 1910e2a246 checks: wait for goroutine to complete
CheckAlias already had a waitGroup, but the Add() call was happening too late, which was causing a race in tests. The add must happen before the goroutine is started.

CheckHTTP did not have a waitGroup, so I added it to match CheckAlias.

It looks like a lot of the implementation could be shared, and may not need all of channel, waitgroup and bool, but I will leave that refactor for another time.
2020-07-20 18:55:39 -04:00
Daniel Nephin 9482961b1c stream: handle empty event in TestEventSnapshot
When the race detector is enabled we see this test fail occasionally. The reordering of execution seems to make it possible for the snapshot splice to happen before any events are published to the topicBuffers.

We can handle this case in the test the same way it is handled by a subscription, by proceeding to the next event.
2020-07-20 18:20:02 -04:00
Daniel Nephin 57e00b0fd5
Merge pull request #8245 from hashicorp/dnephin/use-not-modified-in-cache
agent/cache: Use AllowNotModified in CatalogListServices
2020-07-20 15:30:52 -04:00
Daniel Nephin 29571465e1
Merge pull request #8290 from hashicorp/dnephin/watch-decode
watch: fix script watches with single arg
2020-07-20 14:41:17 -04:00
Paul Coignet a4e39c840b
Add default prefix_filter 2020-07-20 10:39:58 +02:00
Daniel Nephin ecccb30690 state: update calls that are no longer state methods
In a previous commit these methods were changed to functions, so remove the Store paramter.
2020-07-16 15:46:10 -04:00
Daniel Nephin 2008884241 state: un-method funcs that don't use their receiver
This change was mostly automated with the following

First generate a list of functions with:

  git grep -o 'Store) \([^(]\+\)(tx \*txn' ./agent/consul/state | awk '{print $2}' | grep -o '^[^(]\+'

Then the list was curated a bit with trial/error to remove and add funcs
as necessary.

Finally the replacement was done with:

  dir=agent/consul/state
  file=${1-funcnames}

  while read fn; do
    echo "$fn"
    sed -i -e "s/(s \*Store) $fn(/$fn(/" $dir/*.go
    sed -i -e "s/s\.$fn(/$fn(/" $dir/*.go
    sed -i -e "s/s\.store\.$fn(/$fn(/" $dir/*.go
  done < $file
2020-07-16 15:30:39 -04:00
Daniel Nephin 63b153df8c store: convert methods that don't use their receiver to functions
Making these functions allows them to be used without introducing
an artificial dependency on the struct. Many of these will be called
from streaming Event processors, which do not have a store.

This change is being made ahead of the streaming work to get to reduce
the size of the streaming diff.
2020-07-16 15:30:10 -04:00
André 3bc27df844
minor: fix docstring of DNSOnlyPassing (#8318)
In runtime.go it had "duration" but it is actually a boolean.
2020-07-16 09:47:33 -04:00
Daniel Nephin 2ec3760b70 agent/cache: Use AllowNotModifiedResponse in CatalogListServices
Co-authored-by: Pierre Souchay <pierresouchay@users.noreply.github.com>
2020-07-14 18:58:20 -04:00
Daniel Nephin dbb8d14728 agent/cache: Update some docstrings 2020-07-14 18:58:20 -04:00
Daniel Nephin c07acbeb6b stream: Add forceClose and refactor subscription filtering
Move the subscription context to Next. context.Context should generally
never be stored in a struct because it makes that struct only valid
while the context is valid. This is rarely obvious from the caller.
Adds a forceClosed channel in place of the old context, and uses the new
context as a way for the caller to stop the Subscription blocking.

Remove some recursion out of bufferImte.Next. The caller is already looping so we can continue
in that loop instead of recursing. This ensures currentItem is updated immediately (which probably
does not matter in practice), and also removes the chance that we overflow the stack.

NextNoBlock and FollowAfter do not need to handle bufferItem.Err, the caller already
handles it.

Moves filter to a method to simplify Next, and more explicitly separate filtering from looping.

Also improve some godoc

Only unwrap itemBuffer.Err when necessary
2020-07-14 15:57:47 -04:00
Daniel Nephin f19f8e99bb stream: Improve docstrings
Also rename ResumeStrema to EndOfEmptySnapshot to be more consistent with other framing events

Co-authored-by: Paul Banks <banks@banksco.de>
2020-07-14 15:57:47 -04:00
Daniel Nephin fc1c2ae412 stream: change Topic to an interface
Consumers of the package can decide on which type to use for the Topic. In the future we may
use a gRPC type for the topic.
2020-07-14 15:57:47 -04:00
Daniel Nephin 6fa36e3aee state: Move change processing out of EventPublisher
EventPublisher was receiving TopicHandlers, which had a couple of
problems:

- ChangeProcessors were being grouped by Topic, but they completely
  ignored the topic and were performed on every change
- ChangeProcessors required EventPublisher to be aware of database
  changes

By moving ChangeProcesors out of EventPublisher, and having Publish
accept events instead of changes, EventPublisher no longer needs to
be aware of these things.

Handlers is now only SnapshotHandlers, which are still mapped by Topic.

Also allows us to remove the small 'db' package that had only two types.
They can now be unexported types in state.
2020-07-14 15:57:47 -04:00
Daniel Nephin 48c766d2c6 server: Abandom state store to shutdown EventPublisher
So that we don't leak goroutines
2020-07-14 15:57:47 -04:00
Daniel Nephin 889c57fd2d stream: unexport identifiers
Now that EventPublisher is part of stream a lot of the internals can be hidden
2020-07-14 15:57:47 -04:00
Daniel Nephin 4fa0fdc0e0 stream: Move EventPublisher to stream package
The EventPublisher is the central hub of the PubSub system. It is toughly coupled with much of
stream. Some stream internals were exported exclusively for EventPublisher.

The two Subscribe cases (with or without index) were also awkwardly split between two packages. By
moving EventPublisher into stream they are now both in the same package (although still in different files).
2020-07-14 15:57:47 -04:00
Daniel Nephin 489876c86b state: Make handleACLUpdate async once again
So that we keep as much as possible out of the FSM commit hot path.
2020-07-14 15:57:47 -04:00
Daniel Nephin 5f9db94956 state: Use interface for Txn
Also store the index in Changes instead of the Txn.

This change is in preparation for movinng EventPublisher to the stream package, and
making handleACLUpdates async once again.
2020-07-14 15:57:46 -04:00
Daniel Nephin a709ed1ab5 stream.Subscription unexport fields and additiona docstrings 2020-07-14 15:57:46 -04:00