Commit Graph

2325 Commits

Author SHA1 Message Date
Daniel Nephin 0c5428eea8 Remove LogOutput from Client 2020-08-05 14:00:42 -04:00
Daniel Nephin e8ee2cf2f7 Pass a logger to ConnPool and yamux, instead of an io.Writer
Allowing us to remove the LogOutput field from config.
2020-08-05 13:25:08 -04:00
Daniel Nephin ed8210fe4d api: Use a Logger instead of an io.Writer in api.Watch
So that we can pass around only a Logger, not a LogOutput
2020-08-05 13:25:08 -04:00
Daniel Nephin 1e17a0c3e1 config: Remove unused field 2020-08-05 13:25:08 -04:00
Daniel Nephin ba3ace1219 Return nil value on error.
The main bug was fixed in cb050b280c, but the return value of 'result' is still misleading.
Change the return value to nil to make the code more clear.
2020-08-05 13:10:17 -04:00
R.B. Boyer c599a2f5f4
xds: add support for envoy 1.15.0 and drop support for 1.11.x (#8424)
Related changes:

- hard-fail the xDS connection attempt if the envoy version is known to be too old to be supported
- remove the RouterMatchSafeRegex proxy feature since all supported envoy versions have it
- stop using --max-obj-name-len (due to: envoyproxy/envoy#11740)
2020-07-31 15:52:49 -05:00
Freddy f1e8addbdf
Avoid panics during shutdown routine (#8412) 2020-07-30 11:11:10 -06:00
Matt Keeler 1a78cf9b4c
Ensure certificates retrieved through the cache get persisted with auto-config (#8409) 2020-07-30 11:37:18 -04:00
Matt Keeler dbb461a5d3
Allow setting verify_incoming* when using auto_encrypt or auto_config (#8394)
Ensure that enabling AutoConfig sets the tls configurator properly

This also refactors the TLS configurator a bit so the naming doesn’t imply only AutoEncrypt as the source of the automatically setup TLS cert info.
2020-07-30 10:15:12 -04:00
Hans Hasselberg 054595b1f8
agent/cache test for cache throttling. (#8396) 2020-07-30 14:41:13 +02:00
Matt Keeler 34034b76f5
Agent Auto Config: Implement Certificate Generation (#8360)
Most of the groundwork was laid in previous PRs between adding the cert-monitor package to extracting the logic of signing certificates out of the connect_ca_endpoint.go code and into a method on the server.

This also refactors the auto-config package a bit to split things out into multiple files.
2020-07-28 15:31:48 -04:00
Matt Keeler be01c4241d
Default Cache rate limiting options in New
Also get rid of the TestCache helper which was where these defaults were happening previously.
2020-07-28 12:34:35 -04:00
Matt Keeler 83d09de230
Fix some broken code in master
There were several PRs that while all passed CI independently, when they all got merged into the same branch caused compilation errors in test code.

The main changes that caused issues where changing agent/cache.Cache.New to require a concrete options struct instead of a pointer. This broke the cert monitor tests and the catalog_list_services_test.go. Another change was made to unembed the http.Server from the agent.HTTPServer struct. That coupled with another change to add a test to ensure cache rate limiting coming from HTTP requests was working as expected caused compilation failures.
2020-07-28 09:50:10 -04:00
Pierre Souchay 505de6dc29
Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 23:11:11 +02:00
Matt Keeler 5c2c762106
Move connect root retrieval and cert signing logic out of the RPC endpoints (#8364)
The code now lives on the Server type itself. This was done so that all of this could be shared with auto config certificate signing.
2020-07-24 10:00:51 -04:00
Matt Keeler 2ee9fe0a4d
Move generation of the CA Configuration from the agent code into a method on the RuntimeConfig (#8363)
This allows this to be reused elsewhere.
2020-07-23 16:05:28 -04:00
Daniel Nephin 3d115a62fd
Merge pull request #8323 from hashicorp/dnephin/add-event-publisher-2
stream: close subscriptions on shutdown
2020-07-23 13:12:50 -04:00
Matt Keeler 2713c0e682
Refactor the agentpb package (#8362)
First move the whole thing to the top-level proto package name.

Secondly change some things around internally to have sub-packages.
2020-07-23 11:24:20 -04:00
Daniel Nephin ed69feca6d stream: close all subs when EventProcessor is shutdown. 2020-07-22 19:04:10 -04:00
Daniel Nephin a99a4103bd stream: fix overallocation in filter
And add tests
2020-07-22 19:04:10 -04:00
Daniel Nephin 9ed61fd160 state: speed up TestStateStore_ServicesByNodeMeta
Make watchLimit a var so that we can patch it in tests and reduce the time spent creating state.
2020-07-22 16:57:06 -04:00
Daniel Nephin 0402dd7ac5 state: Use subtests in TestStateStore_ServicesByNodeMeta
These subtests make it much easier to identify the slow part of the test, but they also help enumerate all the different cases which are being tested.
2020-07-22 16:39:09 -04:00
Daniel Nephin 3570ce6566
Merge pull request #7948 from hashicorp/dnephin/buffer-test-logs
testutil: NewLogBuffer - buffer logs until a test fails
2020-07-21 15:21:52 -04:00
Matt Keeler 3c09482864
Merge pull request #8311 from hashicorp/bugfix/auto-encrypt-token-update 2020-07-21 13:15:27 -04:00
Daniel Nephin a33a7a6fe2
Merge pull request #8344 from hashicorp/dnephin/fix-flakes-in-stream
stream: handle empty event in TestEventSnapshot
2020-07-21 13:14:35 -04:00
Daniel Nephin 51efba2c7d testutil: NewLogBuffer - buffer logs until a test fails
Replaces #7559

Running tests in parallel, with background goroutines, results in test output not being associated with the correct test. `go test` does not make any guarantees about output from goroutines being attributed to the correct test case.

Attaching log output from background goroutines also cause data races.  If the goroutine outlives the test, it will race with the test being marked done. Previously this was noticed as a panic when logging, but with the race detector enabled it is shown as a data race.

The previous solution did not address the problem of correct test attribution because test output could still be hidden when it was associated with a test that did not fail. You would have to look at all of the log output to find the relevant lines. It also made debugging test failures more difficult because each log line was very long.

This commit attempts a new approach. Instead of printing all the logs, only print when a test fails. This should work well when there are a small number of failures, but may not work well when there are many test failures at the same time. In those cases the failures are unlikely a result of a specific test, and the log output is likely less useful.

All of the logs are printed from the test goroutine, so they should be associated with the correct test.

Also removes some test helpers that were not used, or only had a single caller. Packages which expose many functions with similar names can be difficult to use correctly.

Related:
https://github.com/golang/go/issues/38458 (may be fixed in go1.15)
https://github.com/golang/go/issues/38382#issuecomment-612940030
2020-07-21 12:50:40 -04:00
Matt Keeler 12acdd7481
Disable background cache refresh for Connect Leaf Certs
The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself.

In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed.

If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond.

This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.
2020-07-21 12:19:25 -04:00
Matt Keeler 9da8c51ac5
Fix issue with changing the agent token causing failure to renew the auto-encrypt certificate
The fallback method would still work but it would get into a state where it would let the certificate expire for 10s before getting a new one. And the new one used the less secure RPC endpoint.

This is also a pretty large refactoring of the auto encrypt code. I was going to write some tests around the certificate monitoring but it was going to be impossible to get a TestAgent configured in such a way that I could write a test that ran in less than an hour or two to exercise the functionality.

Moving the certificate monitoring into its own package will allow for dependency injection and in particular mocking the cache types to control how it hands back certificates and how long those certificates should live. This will allow for exercising the main loop more than would be possible with it coupled so tightly with the Agent.
2020-07-21 12:19:25 -04:00
Daniel Nephin 1910e2a246 checks: wait for goroutine to complete
CheckAlias already had a waitGroup, but the Add() call was happening too late, which was causing a race in tests. The add must happen before the goroutine is started.

CheckHTTP did not have a waitGroup, so I added it to match CheckAlias.

It looks like a lot of the implementation could be shared, and may not need all of channel, waitgroup and bool, but I will leave that refactor for another time.
2020-07-20 18:55:39 -04:00
Daniel Nephin 9482961b1c stream: handle empty event in TestEventSnapshot
When the race detector is enabled we see this test fail occasionally. The reordering of execution seems to make it possible for the snapshot splice to happen before any events are published to the topicBuffers.

We can handle this case in the test the same way it is handled by a subscription, by proceeding to the next event.
2020-07-20 18:20:02 -04:00
Daniel Nephin 57e00b0fd5
Merge pull request #8245 from hashicorp/dnephin/use-not-modified-in-cache
agent/cache: Use AllowNotModified in CatalogListServices
2020-07-20 15:30:52 -04:00
Daniel Nephin 29571465e1
Merge pull request #8290 from hashicorp/dnephin/watch-decode
watch: fix script watches with single arg
2020-07-20 14:41:17 -04:00
Daniel Nephin ecccb30690 state: update calls that are no longer state methods
In a previous commit these methods were changed to functions, so remove the Store paramter.
2020-07-16 15:46:10 -04:00
Daniel Nephin 2008884241 state: un-method funcs that don't use their receiver
This change was mostly automated with the following

First generate a list of functions with:

  git grep -o 'Store) \([^(]\+\)(tx \*txn' ./agent/consul/state | awk '{print $2}' | grep -o '^[^(]\+'

Then the list was curated a bit with trial/error to remove and add funcs
as necessary.

Finally the replacement was done with:

  dir=agent/consul/state
  file=${1-funcnames}

  while read fn; do
    echo "$fn"
    sed -i -e "s/(s \*Store) $fn(/$fn(/" $dir/*.go
    sed -i -e "s/s\.$fn(/$fn(/" $dir/*.go
    sed -i -e "s/s\.store\.$fn(/$fn(/" $dir/*.go
  done < $file
2020-07-16 15:30:39 -04:00
Daniel Nephin 63b153df8c store: convert methods that don't use their receiver to functions
Making these functions allows them to be used without introducing
an artificial dependency on the struct. Many of these will be called
from streaming Event processors, which do not have a store.

This change is being made ahead of the streaming work to get to reduce
the size of the streaming diff.
2020-07-16 15:30:10 -04:00
André 3bc27df844
minor: fix docstring of DNSOnlyPassing (#8318)
In runtime.go it had "duration" but it is actually a boolean.
2020-07-16 09:47:33 -04:00
Daniel Nephin 2ec3760b70 agent/cache: Use AllowNotModifiedResponse in CatalogListServices
Co-authored-by: Pierre Souchay <pierresouchay@users.noreply.github.com>
2020-07-14 18:58:20 -04:00
Daniel Nephin dbb8d14728 agent/cache: Update some docstrings 2020-07-14 18:58:20 -04:00
Daniel Nephin c07acbeb6b stream: Add forceClose and refactor subscription filtering
Move the subscription context to Next. context.Context should generally
never be stored in a struct because it makes that struct only valid
while the context is valid. This is rarely obvious from the caller.
Adds a forceClosed channel in place of the old context, and uses the new
context as a way for the caller to stop the Subscription blocking.

Remove some recursion out of bufferImte.Next. The caller is already looping so we can continue
in that loop instead of recursing. This ensures currentItem is updated immediately (which probably
does not matter in practice), and also removes the chance that we overflow the stack.

NextNoBlock and FollowAfter do not need to handle bufferItem.Err, the caller already
handles it.

Moves filter to a method to simplify Next, and more explicitly separate filtering from looping.

Also improve some godoc

Only unwrap itemBuffer.Err when necessary
2020-07-14 15:57:47 -04:00
Daniel Nephin f19f8e99bb stream: Improve docstrings
Also rename ResumeStrema to EndOfEmptySnapshot to be more consistent with other framing events

Co-authored-by: Paul Banks <banks@banksco.de>
2020-07-14 15:57:47 -04:00
Daniel Nephin fc1c2ae412 stream: change Topic to an interface
Consumers of the package can decide on which type to use for the Topic. In the future we may
use a gRPC type for the topic.
2020-07-14 15:57:47 -04:00
Daniel Nephin 6fa36e3aee state: Move change processing out of EventPublisher
EventPublisher was receiving TopicHandlers, which had a couple of
problems:

- ChangeProcessors were being grouped by Topic, but they completely
  ignored the topic and were performed on every change
- ChangeProcessors required EventPublisher to be aware of database
  changes

By moving ChangeProcesors out of EventPublisher, and having Publish
accept events instead of changes, EventPublisher no longer needs to
be aware of these things.

Handlers is now only SnapshotHandlers, which are still mapped by Topic.

Also allows us to remove the small 'db' package that had only two types.
They can now be unexported types in state.
2020-07-14 15:57:47 -04:00
Daniel Nephin 48c766d2c6 server: Abandom state store to shutdown EventPublisher
So that we don't leak goroutines
2020-07-14 15:57:47 -04:00
Daniel Nephin 889c57fd2d stream: unexport identifiers
Now that EventPublisher is part of stream a lot of the internals can be hidden
2020-07-14 15:57:47 -04:00
Daniel Nephin 4fa0fdc0e0 stream: Move EventPublisher to stream package
The EventPublisher is the central hub of the PubSub system. It is toughly coupled with much of
stream. Some stream internals were exported exclusively for EventPublisher.

The two Subscribe cases (with or without index) were also awkwardly split between two packages. By
moving EventPublisher into stream they are now both in the same package (although still in different files).
2020-07-14 15:57:47 -04:00
Daniel Nephin 489876c86b state: Make handleACLUpdate async once again
So that we keep as much as possible out of the FSM commit hot path.
2020-07-14 15:57:47 -04:00
Daniel Nephin 5f9db94956 state: Use interface for Txn
Also store the index in Changes instead of the Txn.

This change is in preparation for movinng EventPublisher to the stream package, and
making handleACLUpdates async once again.
2020-07-14 15:57:46 -04:00
Daniel Nephin a709ed1ab5 stream.Subscription unexport fields and additiona docstrings 2020-07-14 15:57:46 -04:00
Daniel Nephin 17b833b4c9 Add a context for stopping EventPublisher goroutine 2020-07-14 15:57:46 -04:00
Daniel Nephin 2c8342f115 EventPublisher: Make Unsubscribe a function on Subscription
It is critical that Unsubscribe be called with the same pointer to a
SubscriptionRequest that was used to create the Subscription. The
docstring made that clear, but it sill allowed a caler to get it wrong by
creating a new SubscriptionRequest.

By hiding this detail from the caller, and only exposing an Unsubscribe
method, it should be impossible to fail to Unsubscribe.

Also update some godoc strings.
2020-07-14 15:57:46 -04:00
Daniel Nephin 1622bb3a45 EventPublisher: handleACL changes synchronously
Use a separate lock for subscriptions.ByToken to allow it to happen synchronously
in the commit flow.
This removes the need to create a new txn for the goroutine, and removes
the need for EventPublisher to contain a reference to DB.
2020-07-14 15:57:46 -04:00
Daniel Nephin effab15131 stream.EventSnapshot: reduce the fields on the struct
Many of the fields are only needed in one place, and by using a closure
they can be removed from the struct. This reduces the scope of the variables
making it esier to see how they are used.
2020-07-14 15:57:45 -04:00
Daniel Nephin a5cf933fe8 stream.EventBuffer: Seed the fuzz test with time.Now()
Otherwise the test will run with exactly the same values each time.

By printing the seed we can attempt to reproduce the test by adding an env var to override the seed
2020-07-14 15:57:45 -04:00
Daniel Nephin bbe7272d8e state: memdb_wrapper.go -> memdb.go
Renaming in a separate commit so that git can merge changes to the file.
2020-07-14 15:57:45 -04:00
Daniel Nephin 2a8a8f7b8d state: publish changes from Commit
Make topicRegistry use functions instead of unbound methods
Use a regular memDB in EventPublisher to remove a reference cycle
Removes the need for EventPublisher to use a store
2020-07-14 15:57:45 -04:00
Daniel Nephin f5ecd5de5f EventPublisher: docstrings and getTopicBuffer
also rename commitCh -> publishCh
2020-07-14 15:57:45 -04:00
Daniel Nephin 555cfe52d9 ProcessChanges: use stream.Event
Also remove secretHash, which was used to hash tokens. We don't expose
these tokens anywhere, so we can use the string itself instead of a
Hash.

Fix acl_events_test.go for storing a structs type.
2020-07-14 15:57:45 -04:00
Daniel Nephin 4e0bc8013b stream: Use local types for Event Topic SubscriptionRequest 2020-07-14 15:57:45 -04:00
Daniel Nephin aacd514dca Rename stream_publisher.go -> event_publisher.go 2020-07-14 15:57:44 -04:00
Daniel Nephin c0b0109e80 Add streaming package with Subscription and Snapshot components.
The remaining files from 7965767de0bd62ab07669b85d6879bd5f815d157

Co-authored-by: Paul Banks <banks@banksco.de>
2020-07-14 15:57:44 -04:00
Matt Keeler 8beca1cb8d
Add ability for notifications when one of the agent tokens is updated (#8301)
Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-07-14 09:53:55 -04:00
Hans Hasselberg 496fb5fc5b
add support for envoy 1.14.4, 1.13.4, 1.12.6 (#8216) 2020-07-13 15:44:44 -05:00
Chris Piraino 4d857d117f
Set enterprise metadata after resolving the token (#8302)
The token can encode enterprise metadata information, and we must make
sure we set that on the reply so that we can correct filter ACLs.
2020-07-13 13:39:57 -05:00
Daniel Nephin 7b3f26e61d watch: Allow args from different types
Fixes a bug where specifying a slice of args with a single item was being converted to
a string when config was loaded, causing an error.
2020-07-10 17:18:32 -04:00
Freddy e72af87918
Add api mod support for /catalog/gateway-services (#8278) 2020-07-10 13:01:45 -06:00
Daniel Nephin 653c938edc watch: extract makeWatchPlan to facilitate testing
There is a bug in here now that slices in opaque config are unsliced. But to test that bug fix we
need a function that can be easily tested.
2020-07-10 13:33:45 -04:00
R.B. Boyer 1eef096dfe
xds: version sniff envoy and switch regular expressions from 'regex' to 'safe_regex' on newer envoy versions (#8222)
- cut down on extra node metadata transmission
- split the golden file generation to compare all envoy version
2020-07-09 17:04:51 -05:00
Daniel Nephin f22f3d300d
Merge pull request #8231 from hashicorp/dnephin/unembed-HTTPServer-Server
agent/http: un-embed the http.Server
2020-07-09 17:42:33 -04:00
Daniel Nephin df4088291c agent/http: Update TestSetupHTTPServer_HTTP2
To remove the need to store the http.Server. This will allow us to
remove the http.Server field from the HTTPServer struct.
2020-07-09 16:42:19 -04:00
Daniel Nephin d98a4c1317
Merge pull request #8237 from hashicorp/dnephin/remove-acls-enabled-from-delegate
Remove ACLsEnabled from delegate interface
2020-07-09 16:35:43 -04:00
Matt Keeler 4fb535ba48
Pass the Config and TLS Configurator into the AutoConfig constructor
This is instead of having the AutoConfigBackend interface provide functions for retrieving them.

NOTE: the config is not reloadable. For now this is fine as we don’t look at any reloadable fields. If that changes then we should provide a way to make it reloadable.
2020-07-08 12:36:11 -04:00
Matt Keeler f2f32735ce
Rename (*Server).forward to (*Server).ForwardRPC
Also get rid of the preexisting shim in server.go that existed before to have this name just call the unexported one.
2020-07-08 11:05:44 -04:00
Matt Keeler d2e4869c7c
Refactor AutoConfig RPC to not have a direct dependency on the Server type
Instead it has an interface which can be mocked for better unit testing that is deterministic and not prone to flakiness.
2020-07-08 11:05:44 -04:00
Chris Piraino 735337b170
Append port number to ingress host domain (#8190)
A port can be sent in the Host header as defined in the HTTP RFC, so we
take any hosts that we want to match traffic to and also add another
host with the listener port added.

Also fix an issue with envoy integration tests not running the
case-ingress-gateway-tls test.
2020-07-07 10:43:04 -05:00
Daniel Nephin 5247ef4c70 Remove ACLsEnabled from delegate interface
In all cases (oss/ent, client/server) this method was returning a value from config. Since the
value is consistent, it doesn't need to be part of the delegate interface.
2020-07-03 17:00:20 -04:00
Daniel Nephin a7f69b615a
Merge pull request #8215 from hashicorp/dnephin/support-not-modified-response-server
agent/consul: Add support for NotModified to two endpoints
2020-07-03 16:15:31 -04:00
Pierre Souchay 20d1ea7d2d
Upgrade go-connlimit to v0.3.0 / return http 429 on too many connections (#8221)
Fixes #7527

I want to highlight this and explain what I think the implications are and make sure we are aware:

* `HTTPConnStateFunc` closes the connection when it is beyond the limit. `Close` does not block.
* `HTTPConnStateFuncWithDefault429Handler(10 * time.Millisecond)` blocks until the following is done (worst case):
  1) `conn.SetDeadline(10*time.Millisecond)` so that
  2) `conn.Write(429error)` is guaranteed to timeout after 10ms, so that the http 429 can be written and 
  3) `conn.Close` can happen

The implication of this change is that accepting any new connection is worst case delayed by 10ms. But only after a client reached the limit already.
2020-07-03 09:25:07 +02:00
Daniel Nephin a5e45defb1 agent/http: un-embed the HTTPServer
The embedded HTTPServer struct is not used by the large HTTPServer
struct. It is used by tests and the agent. This change is a small first
step in the process of removing that field.

The eventual goal is to reduce the scope of HTTPServer making it easier
to test, and split into separate packages.
2020-07-02 17:21:12 -04:00
Daniel Nephin 5d36f98710 agent/consul: Add support for NotModified to two endpoints
A query made with AllowNotModifiedResponse and a MinIndex, where the
result has the same Index as MinIndex, will return an empty response
with QueryMeta.NotModified set to true.

Co-authored-by: Pierre Souchay <pierresouchay@users.noreply.github.com>
2020-07-02 17:05:46 -04:00
Matt Keeler f8e8f48125
Merge pull request #8211 from hashicorp/bugfix/auto-encrypt-various 2020-07-02 09:49:49 -04:00
Yury Evtikhov 10361dd210 DNS: add IsErrQueryNotFound function for easier error evaluation 2020-07-01 03:41:44 +01:00
Yury Evtikhov 8d18422f19 DNS: fix agent returning SERVFAIL where NXDOMAIN should be returned 2020-07-01 01:51:21 +01:00
Yury Evtikhov 3b4ddaaab5 DNS: add test to verify NXDOMAIN is returned when a non-existent domain is queried over RPC 2020-07-01 01:51:16 +01:00
Matt Keeler 6e7acfa618
Add an AutoEncrypt “integration” test
Also fix a bug where Consul could segfault if TLS was enabled but no client certificate was provided. How no one has reported this as a problem I am not sure.
2020-06-30 15:23:29 -04:00
Matt Keeler 2ddcba00c6
Overwrite agent leaf cert trust domain on the servers 2020-06-30 09:59:08 -04:00
Matt Keeler 19040f1166
Store the Connect CA rate limiter on the server
This fixes a bug where auto_encrypt was operating without utilizing a common rate limiter.
2020-06-30 09:59:07 -04:00
Matt Keeler a5a9560bbd
Initialize the agent leaf cert cache result with a state to prevent unnecessary second certificate signing 2020-06-30 09:59:07 -04:00
Matt Keeler 39b567a55a
Fix auto_encrypt IP/DNS SANs
The initial auto encrypt CSR wasn’t containing the user supplied IP and DNS SANs. This fixes that. Also We were configuring a default :: IP SAN. This should be ::1 instead and was fixed.
2020-06-30 09:59:07 -04:00
Matt Keeler 85fd8c552f
Merge pull request #8193 from hashicorp/feature/auto-config/suppress-config-warnings 2020-06-27 10:06:52 -04:00
R.B. Boyer 462f0f37ed
connect: various changes to make namespaces for intentions work more like for other subsystems (#8194)
Highlights:

- add new endpoint to query for intentions by exact match

- using this endpoint from the CLI instead of the dump+filter approach

- enforcing that OSS can only read/write intentions with a SourceNS or
  DestinationNS field of "default".

- preexisting OSS intentions with now-invalid namespace fields will
  delete those intentions on initial election or for wildcard namespaces
  an attempt will be made to downgrade them to "default" unless one
  exists.

- also allow the '-namespace' CLI arg on all of the intention subcommands

- update lots of docs
2020-06-26 16:59:15 -05:00
Matt Keeler be576c9737
Use the DNS and IP SANs from the auto config stanza when set 2020-06-26 16:01:30 -04:00
Matt Keeler e8b39dd255
Overhaul the auto-config translation
This fixes some issues around spurious warnings about using enterprise configuration in OSS.
2020-06-26 15:25:21 -04:00
Freddy 10d6e9c458
Split up unused key validation for oss/ent (#8189)
Split up unused key validation in config entry decode for oss/ent.

This is needed so that we can return an informative error in OSS if namespaces are provided.
2020-06-25 13:58:29 -06:00
Daniel Nephin a891ee8428
Merge pull request #8176 from hashicorp/dnephin/add-linter-unparam-1
lint: add unparam linter and fix some of the issues
2020-06-25 15:34:48 -04:00
Matt Keeler 7041f69892
Merge pull request #8184 from hashicorp/bugfix/goroutine-leaks 2020-06-25 09:22:19 -04:00
Chris Piraino df48db0abd
Merge pull request #7932 from hashicorp/ingress/internal-ui-endpoint-multiple-ports
Update gateway-services-nodes API endpoint to allow multiple addresses
2020-06-24 17:11:01 -05:00
Chris Piraino f213d3592a remove obsolete comments about test parallelization 2020-06-24 16:36:13 -05:00
Chris Piraino b3db907bdf Update gateway-services-nodes API endpoint to allow multiple addresses
Previously, we were only returning a single ListenerPort for a single
service. However, we actually allow a single service to be serviced over
multiple ports, as well as allow users to define what hostnames they
expect their services to be contacted over. When no hosts are defined,
we return the default ingress domain for any configured DNS domain.

To show this in the UI, we modify the gateway-services-nodes API to
return a GatewayConfig.Addresses field, which is a list of addresses
over which the specific service can be contacted.
2020-06-24 16:35:23 -05:00
Matt Keeler e9835610f3
Add a test for go routine leaks
This is in its own separate package so that it will be a separate test binary that runs thus isolating the go runtime from other tests and allowing accurate go routine leak checking.

This test would ideally use goleak.VerifyTestMain but that will fail 100% of the time due to some architectural things (blocking queries and net/rpc uncancellability).

This test is not comprehensive. We should enable/exercise more features and more cluster configurations. However its a start.
2020-06-24 17:09:50 -04:00
Matt Keeler 29d0cfdd7d
Fix go routine leak in auto encrypt ca roots tracking 2020-06-24 17:09:50 -04:00
Matt Keeler 25a4f3c83b
Allow cancelling blocking queries in response to shutting down. 2020-06-24 17:09:50 -04:00
Daniel Nephin 0279bf6fe5 Update TestAgent_GetCoordinate
The old test case was a very specific regresion test for a case that is no longer possible.
Replaced with a new test that checks the default coordinate is returned.
2020-06-24 13:00:15 -04:00
Daniel Nephin f65e21e6dc Remove unused return values 2020-06-24 13:00:15 -04:00
Daniel Nephin 010a609912 Fix a bunch of unparam lint issues 2020-06-24 13:00:14 -04:00
Matt Keeler 15e7b3940c
Ensure that retryLoopBackoff can be cancelled
We needed to pass a cancellable context into the limiter.Wait instead of context.Background. So I made the func take a context instead of a chan as most places were just passing through a Done chan from a context anyways.

Fix go routine leak in the gateway locator
2020-06-24 12:41:08 -04:00
Matt Keeler e2cfa93f02
Don’t leak metrics go routines in tests (#8182) 2020-06-24 10:15:25 -04:00
gitforbit 808f632346
agent-http: cleanup: return nil instead of err (#8043)
Since err is already checked, it should return `nil`
2020-06-24 14:29:21 +02:00
R.B. Boyer c63c994b04
connect: upgrade github.com/envoyproxy/go-control-plane to v0.9.5 (#8165) 2020-06-23 15:19:56 -05:00
freddygv c791fbc79c Update namespaces subject-verb agreement 2020-06-23 10:57:30 -06:00
freddygv 044d027ff8 Remove break 2020-06-22 19:59:04 -06:00
freddygv 70810b0602 Let users know namespaces are ent only in config entry decode 2020-06-22 19:59:04 -06:00
Pierre Souchay 35d852fd9a
Returns DNS Error NSDOMAIN when DC does not exists (#8103)
This will allow to increase cache value when DC is not valid (aka
return SOA to avoid too many consecutive requests) and will
distinguish DC being temporarily not available from DC not existing.

Implements https://github.com/hashicorp/consul/issues/8102
2020-06-22 09:01:48 -04:00
Matt Keeler 4a5b352c18
Require enabling TLS to enable Auto Config (#8159)
On the servers they must have a certificate.

On the clients they just have to set verify_outgoing to true to attempt TLS connections for RPCs.

Eventually we may relax these restrictions but right now all of the settings we push down (acl tokens, acl related settings, certificates, gossip key) are sensitive and shouldn’t be transmitted over an unencrypted connection. Our guides and docs should recoommend verify_server_hostname on the clients as well.

Another reason to do this is weird things happen when making an insecure RPC when TLS is not enabled. Basically it tries TLS anyways. We should probably fix that to make it clearer what is going on.
2020-06-19 16:38:14 -04:00
Freddy 5baa7b1b04
Always return a gateway cluster (#8158) 2020-06-19 13:31:39 -06:00
Matt Keeler d6e05482ab
Allow cancelling startup when performing auto-config (#8157)
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2020-06-19 15:16:00 -04:00
Daniel Nephin 4f17350928
Merge pull request #8147 from hashicorp/dnephin/remove-private-ip-2
Remove some dead code from agent/consul/util.go
2020-06-18 15:51:09 -04:00
Matt Keeler b0fcf86140 Change auto config authorizer to allow for future extension
The envisioned changes would allow extra settings to enable dynamically defined auth methods to be used instead of  or in addition to the statically defined one in the configuration.
2020-06-18 15:22:24 -04:00
Daniel Nephin b0ba546a1f Remove bytesToUint64 from agent/consul 2020-06-18 12:45:43 -04:00
Daniel Nephin a00f007c5e Remove unused private IP code from agent/consul 2020-06-18 12:40:38 -04:00
Matt Keeler 3dbbd2d37d
Implement Client Agent Auto Config
There are a couple of things in here.

First, just like auto encrypt, any Cluster.AutoConfig RPC will implicitly use the less secure RPC mechanism.

This drastically modifies how the Consul Agent starts up and moves most of the responsibilities (other than signal handling) from the cli command and into the Agent.
2020-06-17 16:49:46 -04:00
Matt Keeler 8b7d669a27
Allow the Agent its its child Client/Server to share a connection pool
This is needed so that we can make an AutoConfig RPC at the Agent level prior to creating the Client/Server.
2020-06-17 16:19:33 -04:00
Matt Keeler 51c3a605ad
Merge pull request #8035 from hashicorp/feature/auto-config/server-rpc 2020-06-17 16:07:25 -04:00
Chris Piraino 79a862d019
Remove ACLEnforceVersion8 from tests (#8138)
The field had been deprecated for a while and was recently removed,
however a PR which added these tests prior to removal was merged.
2020-06-17 14:58:01 -05:00
Daniel Nephin 692a4a8fc8
Merge pull request #7762 from hashicorp/dnephin/warn-on-unknown-service-file
config: warn if a config file is being skipped because of its file extension
2020-06-17 15:14:40 -04:00
Daniel Nephin be29d6bf75 config: warn when a config file is skipped
All commands which read config (agent, services, and validate) will now
print warnings when one of the config files is skipped because it did
not match an expected format.

Also ensures that config validate prints all warnings.
2020-06-17 13:08:54 -04:00
Daniel Nephin 5afcf5c1bc
Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4
ci: enable SA4006 staticcheck check and add ineffassign
2020-06-17 12:16:02 -04:00
Matt Keeler 9b01f9423c
Implement the insecure version of the Cluster.AutoConfig RPC endpoint
Right now this is only hooked into the insecure RPC server and requires JWT authorization. If no JWT authorizer is setup in the configuration then we inject a disabled “authorizer” to always report that JWT authorization is disabled.
2020-06-17 11:25:29 -04:00
Pierre Souchay d31691dc87
gossip: Ensure that metadata of Consul Service is updated (#7903)
While upgrading servers to a new version, I saw that metadata of
existing servers are not upgraded, so the version and raft meta
is not up to date in catalog.

The only way to do it was to:
 * update Consul server
 * make it leave the cluster, then metadata is accurate

That's because the optimization to avoid updating catalog does
not take into account metadata, so no update on catalog is performed.
2020-06-17 12:16:13 +02:00
Daniel Nephin d345cd8d30 ci: Add ineffsign linter
And fix an additional ineffective assignment that was not caught by staticcheck
2020-06-16 17:32:50 -04:00
Daniel Nephin a9851e1812
Merge pull request #8070 from hashicorp/dnephin/add-gofmt-simplify
ci: Enable gofmt simplify
2020-06-16 17:18:38 -04:00
Matt Keeler 9f7b22a5eb
Agent Auto Configuration: Configuration Syntax Updates (#8003) 2020-06-16 15:03:22 -04:00
Daniel Nephin 068b43df90 Enable gofmt simplify
Code changes done automatically with 'gofmt -s -w'
2020-06-16 13:21:11 -04:00
Daniel Nephin cb050b280c ci: enable SA4006 staticcheck check
And fix the 'value not used' issues.

Many of these are not bugs, but a few are tests not checking errors, and
one appears to be a missed error in non-test code.
2020-06-16 13:10:11 -04:00
Daniel Nephin f7c84ad802 Rename txnWrapper to txn 2020-06-16 13:06:02 -04:00
Daniel Nephin 32aa3ada35 Rename db 2020-06-16 13:04:31 -04:00
Daniel Nephin deef6fcc32 Handle return value from txn.Commit 2020-06-16 13:04:31 -04:00
Daniel Nephin 59bac0f99d state: Update docstrings for changeTrackerDB and txn
And un-embed memdb.DB to prevent accidental access to underlying
methods.
2020-06-16 13:04:31 -04:00
Paul Banks f6ac08be04 state: track changes so that they may be used to produce change events 2020-06-16 13:04:29 -04:00
Matt Keeler d3881dd754
ACL Node Identities (#7970)
A Node Identity is very similar to a service identity. Its main targeted use is to allow creating tokens for use by Consul agents that will grant the necessary permissions for all the typical agent operations (node registration, coordinate updates, anti-entropy).

Half of this commit is for golden file based tests of the acl token and role cli output. Another big updates was to refactor many of the tests in agent/consul/acl_endpoint_test.go to use the same style of tests and the same helpers. Besides being less boiler plate in the tests it also uses a common way of starting a test server with ACLs that should operate without any warnings regarding deprecated non-uuid master tokens etc.
2020-06-16 12:54:27 -04:00
Daniel Nephin 476b57fe22 config: refactor to consolidate all File->Source loading
Previously the logic for reading ConfigFiles and produces Sources was split
between NewBuilder and Build. This commit moves all of the logic into NewBuilder
so that Build() can operate entirely on Sources.

This change is in preparation for logging warnings when files have an
unsupported extension.

It also reduces the scope of BuilderOpts, and gets us very close to removing
Builder.options.
2020-06-16 12:52:23 -04:00
Daniel Nephin 219790ca49 config: Make ConfigFormat not a pointer
The nil value was never used. We can avoid a bunch of complications by
making the field a string value instead of a pointer.

This change is in preparation for fixing a silent config failure.
2020-06-16 12:52:22 -04:00
Daniel Nephin 77101eee82 config: rename Flags to BuilderOpts
Flags is an overloaded term in this context. It generally is used to
refer to command line flags. This struct, however, is a data object
used as input to the construction.

It happens to be partially populated by command line flags, but
otherwise has very little to do with them.

Renaming this struct should make the actual responsibility of this struct
more obvious, and remove the possibility that it is confused with
command line flags.

This change is in preparation for adding additional fields to
BuilderOpts.
2020-06-16 12:51:19 -04:00
Daniel Nephin 85e0338136 config: remove Args field from Flags
This field was populated for one reason, to test that it was empty.
Of all the callers, only a single one used this functionality. The rest
constructed a `Flags{}` struct which did not set Args.

I think this shows that the logic was in the wrong place. Only the agent
command needs to care about validating the args.

This commit removes the field, and moves the logic to the one caller
that cares.

Also fix some comments.
2020-06-16 12:49:53 -04:00
Daniel Nephin 73cd0b6fac agent/service_manager: remove 'updateCh' field from serviceConfigWatch
Passing the channel to the function which uses it significantly
reduces the scope of the variable, and makes its usage more explicit. It
also moves the initialization of the channel closer to where it is used.

Also includes a couple very small cleanups to remove a local var and
read the error from `ctx.Err()` directly instead of creating a channel
to check for an error.
2020-06-16 12:15:57 -04:00
Daniel Nephin 26291a8482 agent/service_manager: remove 'defaults' field from serviceConfigWatch
This field was always read by the same function that populated the field,
so it does not need to be a field. Passing the value as an argument to
functions makes it more obvious where the value comes from, and also reduces
the scope of the variable significantly.
2020-06-16 12:15:52 -04:00
Daniel Nephin 93235da253 agent/service_manager: Pass ctx around
[The documentation for context](https://golang.org/pkg/context/)
recommends not storing context in a struct field:

> Do not store Contexts inside a struct type; instead, pass a Context
> explicitly to each function that needs it. The Context should be the
> first parameter, typically named ctx...

Sometimes there are good reasons to not follow this recommendation, but
in this case it seems easy enough to follow.

Also moved the ctx argument to be the first in one of the function calls
to follow the same recommendation.
2020-06-16 12:14:00 -04:00
Daniel Nephin 2eac5b8023
Merge pull request #8074 from hashicorp/dnephin/remove-references-to-PatchSliceOfMaps
Update comments that reference PatchSliceOfMaps
2020-06-15 14:33:10 -04:00
Matt Keeler 8837907de4
Make the Agent Cache more Context aware (#8092)
Blocking queries issues will still be uncancellable (that cannot be helped until we get rid of net/rpc). However this makes it so that if calling getWithIndex (like during a cache Notify go routine) we can cancell the outer routine. Previously it would keep issuing more blocking queries until the result state actually changed.
2020-06-15 11:01:25 -04:00
freddygv d97cff0966 Update telemetry for gateway-services endpoint 2020-06-12 14:44:36 -06:00
freddygv cd927eed5e Remove unused method and fixup docs ref 2020-06-12 13:47:43 -06:00