Commit Graph

2708 Commits

Author SHA1 Message Date
freddygv 81115b6eaa Compile down LB policy to disco chain nodes 2020-08-28 13:11:04 -06:00
Daniel Nephin 6956477be5
Merge pull request #8548 from edevil/fix_flake
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve
2020-08-28 15:10:55 -04:00
Daniel Nephin 72bf350069
Merge pull request #8552 from pierresouchay/reload_cache_throttling_config
Ensure that Cache options are reloaded when `consul reload` is performed
2020-08-28 15:04:42 -04:00
Pierre Souchay d5974b1d17 Added Unit test for cache reloading 2020-08-28 13:03:58 +02:00
freddygv ff56a64b08 Add LB policy to service-resolver 2020-08-27 19:44:02 -06:00
Jack 9e1c6727f9
Add http2 and grpc support to ingress gateways (#8458) 2020-08-27 15:34:08 -06:00
R.B. Boyer 74d5df7c7a
xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569) 2020-08-27 12:20:58 -05:00
R.B. Boyer d1843456d2
agent: ensure that we normalize bootstrapped config entries (#8547) 2020-08-27 11:37:25 -05:00
Pierre Souchay 9a64d3e5fe Also test reload of EntryFetchMaxBurst 2020-08-27 18:14:05 +02:00
Matt Keeler f97cc0445a
Move RPC router from Client/Server and into BaseDeps (#8559)
This will allow it to be a shared component which is needed for AutoConfig
2020-08-27 11:23:52 -04:00
Pierre Souchay 5842a902df Tests that changes in rate limit are taken into account by agent 2020-08-27 16:41:20 +02:00
Pierre Souchay 879d087f65 Added `options.Equals()` and minor fixes indentation fixes 2020-08-27 13:44:45 +02:00
R.B. Boyer fead4fc2a5
agent: expose the list of supported envoy versions on /v1/agent/self (#8545) 2020-08-26 10:04:11 -05:00
Kyle Havlovitz 97f1f341d6 Automatically renew the token used by the Vault CA provider 2020-08-25 10:34:49 -07:00
Pierre Souchay d2be9d38da Ensure that Cache options are reloaded when `consul reload` is performed.
This will apply cache throttling parameters are properly applied:
 * cache.EntryFetchMaxBurst
 * cache.EntryFetchRate

When values are updated, a log is displayed in info.
2020-08-24 23:33:10 +02:00
André Cruz 9a0792139c
Decrease test flakiness
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve and TestCacheNotifyPolling
2020-08-24 20:30:02 +01:00
André Cruz aa212423e3
testing: Fix govet errors 2020-08-21 18:01:55 +01:00
Daniel Nephin 01745feec0
Merge pull request #8537 from hashicorp/dnephin/fix-panic-on-connect-nil
Fix panic when decoding 'Connect: null'
2020-08-20 18:00:25 -04:00
Daniel Nephin 07ad662131 Fix panic when decoding 'Connect: null'
Surprisingly the json Unmarshal updates the aux pointer to a nil.
2020-08-20 17:52:14 -04:00
Daniel Nephin e16375216d config: use logging.Config in RuntimeConfig
To add structure to RuntimeConfig, and remove the need to translate into a third type.
2020-08-19 13:21:00 -04:00
Daniel Nephin f2373a5575 logging: move init of grpclog
This line initializes global state. Moving it out of the constructor and closer to where logging
is setup helps keep related things together.
2020-08-19 13:21:00 -04:00
Daniel Nephin 33c401a16e logging: Setup accept io.Writer instead of []io.Writer
Also accept a non-pointer Config, since the config is not modified
2020-08-19 13:20:41 -04:00
Daniel Nephin 63bad36de7 testing: disable global metrics sink in tests
This might be better handled by allowing configuration for the InMemSink interval and retail, and disabling
the global. For now this is a smaller change to remove the goroutine leak caused by tests because go-metrics
does not provide any way of shutting down the global goroutine.
2020-08-18 19:04:57 -04:00
Daniel Nephin 5d4df54296 agent: extract dependency creation from New
With this change, Agent.New() accepts many of the dependencies instead
of creating them in New. Accepting fully constructed dependencies from
a constructor makes the type easier to test, and easier to change.

There are still a number of dependencies created in Start() which can
be addressed in a follow up.
2020-08-18 19:04:55 -04:00
Daniel Nephin 51b08c645b
Merge pull request #8514 from hashicorp/dnephin/testing-improvements-1
testing: small improvements to TestSessionCreate and testutil.retry
2020-08-18 18:26:05 -04:00
Daniel Nephin ab2157bbc9
Merge pull request #8528 from hashicorp/dnephin/move-node-name-validation
config: Move some config validation from Agent.Start to config.Builder.Validate
2020-08-18 18:25:41 -04:00
Hans Hasselberg a932aafc91
add primary keys to list keyring (#8522)
During gossip encryption key rotation it would be nice to be able to see if all nodes are using the same key. This PR adds another field to the json response from `GET v1/operator/keyring` which lists the primary keys in use per dc. That way an operator can tell when a key was successfully setup as primary key.

Based on https://github.com/hashicorp/serf/pull/611 to add primary key to list keyring output:

```json
[
  {
    "WAN": true,
    "Datacenter": "dc2",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 6,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6
    },
    "NumNodes": 6
  },
  {
    "WAN": false,
    "Datacenter": "dc2",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 8,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "NumNodes": 8
  },
  {
    "WAN": false,
    "Datacenter": "dc1",
    "Segment": "",
    "Keys": {
      "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 3,
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "PrimaryKeys": {
      "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8
    },
    "NumNodes": 8
  }
]
```

I intentionally did not change the CLI output because I didn't find a good way of displaying this information. There are a couple of options that we could implement later:
* add a flag to show the primary keys
* add a flag to show json output

Fixes #3393.
2020-08-18 09:50:24 +02:00
Daniel Nephin 35f1ecee0b config: Move remote-script-checks warning to config
Previously it was done in Agent.Start, but it can be done much earlier
2020-08-17 17:39:49 -04:00
Daniel Nephin 27b36bfc4e config: move NodeName validation to config validation
Previsouly it was done in Agent.Start, which is much later then it needs to be.

The new 'dns' package was required, because otherwise there would be an
import cycle. In the future we should move more of the dns server into
the dns package.
2020-08-17 17:25:02 -04:00
Daniel Nephin b4015969c9
Merge pull request #8515 from hashicorp/dnephin/unexport-testing-shims
config: unexport fields and resolve TODOs in config.Builder
2020-08-17 16:03:07 -04:00
Daniel Nephin 16217fe9b9 testing: use t.Cleanup in testutil.TempFile
So that it has the same behaviour as TempDir.

Also remove the now unnecessary 'defer os.Remove'
2020-08-14 20:06:01 -04:00
Daniel Nephin d68edcecf4 testing: Remove all the defer os.Removeall
Now that testutil uses t.Cleanup to remove the directory the caller no longer has to manage
the removal
2020-08-14 19:58:53 -04:00
Daniel Nephin 8a4d292c8e config: unexport and resolve TODOs in config.Builder
- unexport testing shims, and document their purpose
- resolve a TODO by moving validation to NewBuilder and storing the one
  field that is used instead of all of Options
- create a slice with the correct size to avoid extra allocations
2020-08-14 19:23:32 -04:00
Daniel Nephin a4b201af36 testing: Improve session_endpoint_test
While working on another change I caused a bunch of these tests to fail.
Unfortunately the failure messages were not super helpful at first.

One problem was that the request and response were created outside of
the retry. This meant that when the second attempt happened, the request
body was empty (because the buffer had been consumed), and so the
request was not actually being retried. This was fixed by moving more of
the request creation into the retry block.

Another problem was that these functions can return errors in two ways, and
are not consistent about which way they use. Some errors are returned to
the response writer, but the tests were not checking those errors, which
was causing a panic later on. This was fixed by adding a check for the
response code.

Also adds some missing t.Helper(), and has assertIndex use checkIndex so
that it is clear these are the same implementation.
2020-08-14 18:55:52 -04:00
Daniel Nephin 070e843113 testutil: Add t.Cleanup to TempDir
TempDir registers a Cleanup so that the directory is always removed. To disable to cleanup, set the TEST_NOCLEANUP
env var.
2020-08-14 13:19:10 -04:00
Daniel Nephin 2b920ad199 testing: fix flaky test TestDNS_NonExistentDC_RPC
I saw this test flake locally, and it was easy to reproduce with -count=10.

The failure was: 'TestAgent.dns: rpc error: error=No known Consul servers'.

Waiting for the agent seems to fix it.
2020-08-13 18:03:04 -04:00
Daniel Nephin 1912c5ad89 testing: wait until monitor has started before shutdown
This commit fixes a test that I saw flake locally while running tests. The test output from the monitor
started immediately after the line the test was looking for.

To fix the problem a channel is closed when the goroutine starts. Shutdown is not called until this channel
is closed, which seems to greatly reduce the chance of a flake.
2020-08-13 17:53:29 -04:00
Daniel Nephin 3a4e62836b testing: Remove TestAgent.Key and change TestAgent.DataDir
TestAgent.Key was only used by 3 tests. Extracting it from the common helper that is used in hundreds of
tests helps keep the shared part small and more focused.

This required a second change (which I was planning on making anyway), which was to change the behaviour of
DataDir. Now in all cases the TestAgent will use the DataDir, and clean it up once the test is complete.
2020-08-13 17:53:24 -04:00
Daniel Nephin b1679508d4 testing: use t.Cleanup in TestAgent for returnPorts 2020-08-13 17:09:37 -04:00
Daniel Nephin 4e8e0de8f0 testing: remove unused fields from TestACLAgent 2020-08-13 17:03:55 -04:00
Daniel Nephin 399c77dfb6 agent: rename vars in newConsulConfig
'base' is a bit misleading, since it is the return value. Renamed to cfg.
2020-08-13 11:58:21 -04:00
Daniel Nephin 7b5b170a0d agent: Move setupKeyring functions to keyring.go
There are a couple reasons for this change:

1. agent.go is way too big. Smaller files makes code eaasier to read
   because tools that show usage also include filename which can give
   a lot more context to someone trying to understand which functions
   call other functions.
2. these two functions call into a large number of functions already in
   keyring.go.
2020-08-13 11:58:21 -04:00
Daniel Nephin 9919e5dfa5 agent: unmethod consulConfig
To allow us to move newConsulConfig out of Agent.
2020-08-13 11:58:21 -04:00
Daniel Nephin 8f596f5551 Fix conflict in merged PRs
One PR renamed the var from config->cfg, and another used the old name config, which caused the
build to fail on master.
2020-08-13 11:28:26 -04:00
Daniel Nephin d677706625 state: remove unused Store method receiver
And use ReadTxn interface where appropriate.
2020-08-13 11:25:22 -04:00
Daniel Nephin 190fcc14a3
Merge pull request #8463 from hashicorp/dnephin/unmethod-make-node-id
agent: convert NodeID methods to functions
2020-08-13 11:18:11 -04:00
Daniel Nephin 912aae8624
Merge pull request #8461 from hashicorp/dnephin/remove-notify-shutdown
agent/consul: Remove NotifyShutdown
2020-08-13 11:16:48 -04:00
Daniel Nephin 5b37efd91b
Merge pull request #8365 from hashicorp/dnephin/fix-service-by-node-meta-flake
state: speed up tests that use watchLimit
2020-08-13 11:16:12 -04:00
Daniel Nephin 37eacf8192 auto-config: reduce awareness of config
This is a small step to allowing Agent to accept its dependencies
instead of creating them in New.

There were two fields in autoconfig.Config that were used exclusively
to load config. These were replaced with a single function, allowing us
to move LoadConfig back to the config package.

Also removed the WithX functions for building a Config. Since these were
simple assignment, it appeared we were not getting much value from them.
2020-08-12 13:23:23 -04:00
Daniel Nephin e07554500e Remove check that hostID is a uuid.
Immediately afterward we hash the ID, so it does not need to be a uuid anymore.
2020-08-12 13:05:10 -04:00
Daniel Nephin 875d8bde42 agent: convert NodeID methods to functions
Making these functions allows us to cleanup how an agent is initialized. They only make use of a config and a logger, so they do not need to be agent methods.

Also cleanup the testing to use t.Run and require.
2020-08-12 13:05:10 -04:00
Daniel Nephin 0738eb8596 Extract nodeID functions to a different file
In preparation for turning them into functions.
To reduce the scope of Agent, and refactor how Agent is created and started.
2020-08-12 13:05:10 -04:00
R.B. Boyer e3cd4a8539
connect: use stronger validation that ingress gateways have compatible protocols defined for their upstreams (#8470)
Fixes #8466

Since Consul 1.8.0 there was a bug in how ingress gateway protocol
compatibility was enforced. At the point in time that an ingress-gateway
config entry was modified the discovery chain for each upstream was
checked to ensure the ingress gateway protocol matched. Unfortunately
future modifications of other config entries were not validated against
existing ingress-gateway definitions, such as:

1. create tcp ingress-gateway pointing to 'api' (ok)
2. create service-defaults for 'api' setting protocol=http (worked, but not ok)
3. create service-splitter or service-router for 'api' (worked, but caused an agent panic)

If you were to do these in a different order, it would fail without a
crash:

1. create service-defaults for 'api' setting protocol=http (ok)
2. create service-splitter or service-router for 'api' (ok)
3. create tcp ingress-gateway pointing to 'api' (fail with message about
   protocol mismatch)

This PR introduces the missing validation. The two new behaviors are:

1. create tcp ingress-gateway pointing to 'api' (ok)
2. (NEW) create service-defaults for 'api' setting protocol=http ("ok" for back compat)
3. (NEW) create service-splitter or service-router for 'api' (fail with
   message about protocol mismatch)

In consideration for any existing users that may be inadvertently be
falling into item (2) above, that is now officiall a valid configuration
to be in. For anyone falling into item (3) above while you cannot use
the API to manufacture that scenario anymore, anyone that has old (now
bad) data will still be able to have the agent use them just enough to
generate a new agent/proxycfg error message rather than a panic.
Unfortunately we just don't have enough information to properly fix the
config entries.
2020-08-12 11:19:20 -05:00
Freddy d72f72dcd5
Notify alias checks when aliased service is [de]registered (#8456) 2020-08-12 09:47:41 -06:00
Daniel Nephin 3d96c5b651
Merge pull request #8469 from hashicorp/dnephin/config-source
config: make Source an interface to avoid the marshal/unmarshal cycle in auto-config
2020-08-12 11:17:15 -04:00
Hans Hasselberg aacf0fd777
Merge pull request #8471 from hashicorp/local_only
thread local-only through the layers
2020-08-12 08:54:51 +02:00
Freddy 875816d0d3
Internal endpoint to query intentions associated with a gateway (#8400) 2020-08-11 17:20:41 -06:00
Kyle Havlovitz 635952681e Fix a state store comment about version 2020-08-11 13:46:12 -07:00
Kyle Havlovitz c39a275666 fsm: Fix snapshot bug with restoring node/service/check indexes 2020-08-11 11:49:52 -07:00
Hans Hasselberg aff02198d7 Refactor keyring ops:
* changes some functions to return data instead of modifying pointer
  arguments
* renames globalRPC() to keyringRPCs() to make its purpose more clear
* restructures KeyringOperation() to make it more understandable
2020-08-11 13:42:03 +02:00
Hans Hasselberg 07261db64d thread local-only through the layers
$ consul keyring -list -local-only
==> Gathering installed encryption keys...

dc1 (LAN):
  aUlAW4ST3+vwseI61so24CoORkyjZofcmHk+j7QPSYQ= [1/1]
2020-08-11 13:41:53 +02:00
Daniel Nephin 4297a8ba07 auto-config: Avoid the marshal/unmarshal cycle in auto-config
Use a LiteralConfig and return a config.Config from translate.
2020-08-10 20:07:52 -04:00
freddygv de0b574a26 Update error handling 2020-08-10 17:48:22 -06:00
Daniel Nephin 38980ebb4c config: Make Source an interface
This will allow us to accept config from auto-config without needing to
go through a serialziation cycle.
2020-08-10 12:46:28 -04:00
Mike Morris ff37af1129
changelog: Update for 1.8.2, 1.7.6, 1.7.5 and 1.6.7 (#8462)
* update bindata_assetfs.go

* Release v1.8.2

* Putting source back into Dev Mode

* changelog: add entries for 1.7.6, 1.7.5 and 1.6.7

Co-authored-by: hashicorp-ci <hashicorp-ci@users.noreply.github.com>
2020-08-07 18:58:09 -04:00
Daniel Nephin 80e99cb3e6 testing: remove unnecessary defers in tests
The data directory is now removed by the test helper that created it.
2020-08-07 17:28:16 -04:00
Daniel Nephin 7dbacf297c testing: Remove NotifyShutdown
NotifyShutdown was only used for testing. Now that t.Cleanup exists, we
can use that instead of attaching cleanup to the Server shutdown.

The Autopilot test which used NotifyShutdown doesn't need this
notification because Shutdown is synchronous. Waiting for the function
to return is equivalent.
2020-08-07 17:14:44 -04:00
Matt Keeler 67dec3b609
Require token replication to be enabled in secondary dcs when ACLs are enabled with AutoConfig (#8451)
AutoConfig will generate local tokens for clients and the ability to use local tokens is gated off of token replication being enabled and being configured with a replication token. Therefore we already have a hard requirement on having token replication enabled, this commit just makes sure to surface that to the operator instead of having to discern what the issue is from RPC errors.
2020-08-07 10:20:27 -04:00
Hans Hasselberg d316cd06c1
auto_config implies connect (#8433) 2020-08-07 12:02:02 +02:00
freddygv c8f5215e9d Fix test build 2020-08-06 11:31:56 -06:00
Hans Hasselberg 51a8e15cf8
Mark its own cluster as healthy when rebalancing. (#8406)
This code started as an optimization to avoid doing an RPC Ping to
itself. But in a single server cluster the rebalancing was led to
believe that there were no healthy servers because foundHealthyServer
was not set. Now this is being set properly.

Fixes #8401 and #8403.
2020-08-06 10:42:09 +02:00
freddygv 15c3cfce5e PR comments and addtl tests 2020-08-05 16:07:11 -06:00
Daniel Nephin ae382805bd
Merge pull request #8404 from hashicorp/dnephin/remove-log-output-field
Use Logger consistently, instead of LogOutput
2020-08-05 14:31:43 -04:00
Daniel Nephin 3b82ad0955 Rename NewClient/NewServer
Now that duplicate constructors have been removed we can use the shorter names for the single constructor.
2020-08-05 14:00:55 -04:00
Daniel Nephin 0420d91cdd Remove LogOutput from Agent
Now that it is no longer used, we can remove this unnecessary field. This is a pre-step in cleanup up RuntimeConfig->Consul.Config, which is a pre-step to adding a gRPCHandler component to Server for streaming.

Removing this field also allows us to remove one of the return values from logging.Setup.
2020-08-05 14:00:44 -04:00
Daniel Nephin 5acf01ceeb Remove LogOutput from Server 2020-08-05 14:00:44 -04:00
Daniel Nephin 0c5428eea8 Remove LogOutput from Client 2020-08-05 14:00:42 -04:00
Daniel Nephin e8ee2cf2f7 Pass a logger to ConnPool and yamux, instead of an io.Writer
Allowing us to remove the LogOutput field from config.
2020-08-05 13:25:08 -04:00
Daniel Nephin ed8210fe4d api: Use a Logger instead of an io.Writer in api.Watch
So that we can pass around only a Logger, not a LogOutput
2020-08-05 13:25:08 -04:00
Daniel Nephin 1e17a0c3e1 config: Remove unused field 2020-08-05 13:25:08 -04:00
Daniel Nephin ba3ace1219 Return nil value on error.
The main bug was fixed in cb050b280c, but the return value of 'result' is still misleading.
Change the return value to nil to make the code more clear.
2020-08-05 13:10:17 -04:00
R.B. Boyer c599a2f5f4
xds: add support for envoy 1.15.0 and drop support for 1.11.x (#8424)
Related changes:

- hard-fail the xDS connection attempt if the envoy version is known to be too old to be supported
- remove the RouterMatchSafeRegex proxy feature since all supported envoy versions have it
- stop using --max-obj-name-len (due to: envoyproxy/envoy#11740)
2020-07-31 15:52:49 -05:00
freddygv 0956624e39 collect GatewayServices from iter in a function 2020-07-31 13:30:40 -06:00
Freddy f1e8addbdf
Avoid panics during shutdown routine (#8412) 2020-07-30 11:11:10 -06:00
freddygv aa6c59dbfc end to end changes to pass gatewayservices to /ui/services/ 2020-07-30 10:21:11 -06:00
Matt Keeler 1a78cf9b4c
Ensure certificates retrieved through the cache get persisted with auto-config (#8409) 2020-07-30 11:37:18 -04:00
freddygv 51255eede7 Support ConnectedWithProxy 2020-07-30 09:32:12 -06:00
Matt Keeler dbb461a5d3
Allow setting verify_incoming* when using auto_encrypt or auto_config (#8394)
Ensure that enabling AutoConfig sets the tls configurator properly

This also refactors the TLS configurator a bit so the naming doesn’t imply only AutoEncrypt as the source of the automatically setup TLS cert info.
2020-07-30 10:15:12 -04:00
Hans Hasselberg 054595b1f8
agent/cache test for cache throttling. (#8396) 2020-07-30 14:41:13 +02:00
Matt Keeler 34034b76f5
Agent Auto Config: Implement Certificate Generation (#8360)
Most of the groundwork was laid in previous PRs between adding the cert-monitor package to extracting the logic of signing certificates out of the connect_ca_endpoint.go code and into a method on the server.

This also refactors the auto-config package a bit to split things out into multiple files.
2020-07-28 15:31:48 -04:00
Matt Keeler be01c4241d
Default Cache rate limiting options in New
Also get rid of the TestCache helper which was where these defaults were happening previously.
2020-07-28 12:34:35 -04:00
Matt Keeler 83d09de230
Fix some broken code in master
There were several PRs that while all passed CI independently, when they all got merged into the same branch caused compilation errors in test code.

The main changes that caused issues where changing agent/cache.Cache.New to require a concrete options struct instead of a pointer. This broke the cert monitor tests and the catalog_list_services_test.go. Another change was made to unembed the http.Server from the agent.HTTPServer struct. That coupled with another change to add a test to ensure cache rate limiting coming from HTTP requests was working as expected caused compilation failures.
2020-07-28 09:50:10 -04:00
Pierre Souchay 505de6dc29
Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 23:11:11 +02:00
Matt Keeler 5c2c762106
Move connect root retrieval and cert signing logic out of the RPC endpoints (#8364)
The code now lives on the Server type itself. This was done so that all of this could be shared with auto config certificate signing.
2020-07-24 10:00:51 -04:00
Matt Keeler 2ee9fe0a4d
Move generation of the CA Configuration from the agent code into a method on the RuntimeConfig (#8363)
This allows this to be reused elsewhere.
2020-07-23 16:05:28 -04:00
Daniel Nephin 3d115a62fd
Merge pull request #8323 from hashicorp/dnephin/add-event-publisher-2
stream: close subscriptions on shutdown
2020-07-23 13:12:50 -04:00
Matt Keeler 2713c0e682
Refactor the agentpb package (#8362)
First move the whole thing to the top-level proto package name.

Secondly change some things around internally to have sub-packages.
2020-07-23 11:24:20 -04:00
Paul Coignet 1d75a8fb50
Fix tests 2020-07-23 11:04:10 +02:00
Daniel Nephin ed69feca6d stream: close all subs when EventProcessor is shutdown. 2020-07-22 19:04:10 -04:00
Daniel Nephin a99a4103bd stream: fix overallocation in filter
And add tests
2020-07-22 19:04:10 -04:00
Daniel Nephin 9ed61fd160 state: speed up TestStateStore_ServicesByNodeMeta
Make watchLimit a var so that we can patch it in tests and reduce the time spent creating state.
2020-07-22 16:57:06 -04:00
Daniel Nephin 0402dd7ac5 state: Use subtests in TestStateStore_ServicesByNodeMeta
These subtests make it much easier to identify the slow part of the test, but they also help enumerate all the different cases which are being tested.
2020-07-22 16:39:09 -04:00
Daniel Nephin 3570ce6566
Merge pull request #7948 from hashicorp/dnephin/buffer-test-logs
testutil: NewLogBuffer - buffer logs until a test fails
2020-07-21 15:21:52 -04:00
Matt Keeler 3c09482864
Merge pull request #8311 from hashicorp/bugfix/auto-encrypt-token-update 2020-07-21 13:15:27 -04:00
Daniel Nephin a33a7a6fe2
Merge pull request #8344 from hashicorp/dnephin/fix-flakes-in-stream
stream: handle empty event in TestEventSnapshot
2020-07-21 13:14:35 -04:00
Daniel Nephin 51efba2c7d testutil: NewLogBuffer - buffer logs until a test fails
Replaces #7559

Running tests in parallel, with background goroutines, results in test output not being associated with the correct test. `go test` does not make any guarantees about output from goroutines being attributed to the correct test case.

Attaching log output from background goroutines also cause data races.  If the goroutine outlives the test, it will race with the test being marked done. Previously this was noticed as a panic when logging, but with the race detector enabled it is shown as a data race.

The previous solution did not address the problem of correct test attribution because test output could still be hidden when it was associated with a test that did not fail. You would have to look at all of the log output to find the relevant lines. It also made debugging test failures more difficult because each log line was very long.

This commit attempts a new approach. Instead of printing all the logs, only print when a test fails. This should work well when there are a small number of failures, but may not work well when there are many test failures at the same time. In those cases the failures are unlikely a result of a specific test, and the log output is likely less useful.

All of the logs are printed from the test goroutine, so they should be associated with the correct test.

Also removes some test helpers that were not used, or only had a single caller. Packages which expose many functions with similar names can be difficult to use correctly.

Related:
https://github.com/golang/go/issues/38458 (may be fixed in go1.15)
https://github.com/golang/go/issues/38382#issuecomment-612940030
2020-07-21 12:50:40 -04:00
Matt Keeler 12acdd7481
Disable background cache refresh for Connect Leaf Certs
The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself.

In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed.

If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond.

This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.
2020-07-21 12:19:25 -04:00
Matt Keeler 9da8c51ac5
Fix issue with changing the agent token causing failure to renew the auto-encrypt certificate
The fallback method would still work but it would get into a state where it would let the certificate expire for 10s before getting a new one. And the new one used the less secure RPC endpoint.

This is also a pretty large refactoring of the auto encrypt code. I was going to write some tests around the certificate monitoring but it was going to be impossible to get a TestAgent configured in such a way that I could write a test that ran in less than an hour or two to exercise the functionality.

Moving the certificate monitoring into its own package will allow for dependency injection and in particular mocking the cache types to control how it hands back certificates and how long those certificates should live. This will allow for exercising the main loop more than would be possible with it coupled so tightly with the Agent.
2020-07-21 12:19:25 -04:00
Daniel Nephin 1910e2a246 checks: wait for goroutine to complete
CheckAlias already had a waitGroup, but the Add() call was happening too late, which was causing a race in tests. The add must happen before the goroutine is started.

CheckHTTP did not have a waitGroup, so I added it to match CheckAlias.

It looks like a lot of the implementation could be shared, and may not need all of channel, waitgroup and bool, but I will leave that refactor for another time.
2020-07-20 18:55:39 -04:00
Daniel Nephin 9482961b1c stream: handle empty event in TestEventSnapshot
When the race detector is enabled we see this test fail occasionally. The reordering of execution seems to make it possible for the snapshot splice to happen before any events are published to the topicBuffers.

We can handle this case in the test the same way it is handled by a subscription, by proceeding to the next event.
2020-07-20 18:20:02 -04:00
Daniel Nephin 57e00b0fd5
Merge pull request #8245 from hashicorp/dnephin/use-not-modified-in-cache
agent/cache: Use AllowNotModified in CatalogListServices
2020-07-20 15:30:52 -04:00
Daniel Nephin 29571465e1
Merge pull request #8290 from hashicorp/dnephin/watch-decode
watch: fix script watches with single arg
2020-07-20 14:41:17 -04:00
Paul Coignet a4e39c840b
Add default prefix_filter 2020-07-20 10:39:58 +02:00
Daniel Nephin ecccb30690 state: update calls that are no longer state methods
In a previous commit these methods were changed to functions, so remove the Store paramter.
2020-07-16 15:46:10 -04:00
Daniel Nephin 2008884241 state: un-method funcs that don't use their receiver
This change was mostly automated with the following

First generate a list of functions with:

  git grep -o 'Store) \([^(]\+\)(tx \*txn' ./agent/consul/state | awk '{print $2}' | grep -o '^[^(]\+'

Then the list was curated a bit with trial/error to remove and add funcs
as necessary.

Finally the replacement was done with:

  dir=agent/consul/state
  file=${1-funcnames}

  while read fn; do
    echo "$fn"
    sed -i -e "s/(s \*Store) $fn(/$fn(/" $dir/*.go
    sed -i -e "s/s\.$fn(/$fn(/" $dir/*.go
    sed -i -e "s/s\.store\.$fn(/$fn(/" $dir/*.go
  done < $file
2020-07-16 15:30:39 -04:00
Daniel Nephin 63b153df8c store: convert methods that don't use their receiver to functions
Making these functions allows them to be used without introducing
an artificial dependency on the struct. Many of these will be called
from streaming Event processors, which do not have a store.

This change is being made ahead of the streaming work to get to reduce
the size of the streaming diff.
2020-07-16 15:30:10 -04:00
André 3bc27df844
minor: fix docstring of DNSOnlyPassing (#8318)
In runtime.go it had "duration" but it is actually a boolean.
2020-07-16 09:47:33 -04:00
Daniel Nephin 2ec3760b70 agent/cache: Use AllowNotModifiedResponse in CatalogListServices
Co-authored-by: Pierre Souchay <pierresouchay@users.noreply.github.com>
2020-07-14 18:58:20 -04:00
Daniel Nephin dbb8d14728 agent/cache: Update some docstrings 2020-07-14 18:58:20 -04:00
Daniel Nephin c07acbeb6b stream: Add forceClose and refactor subscription filtering
Move the subscription context to Next. context.Context should generally
never be stored in a struct because it makes that struct only valid
while the context is valid. This is rarely obvious from the caller.
Adds a forceClosed channel in place of the old context, and uses the new
context as a way for the caller to stop the Subscription blocking.

Remove some recursion out of bufferImte.Next. The caller is already looping so we can continue
in that loop instead of recursing. This ensures currentItem is updated immediately (which probably
does not matter in practice), and also removes the chance that we overflow the stack.

NextNoBlock and FollowAfter do not need to handle bufferItem.Err, the caller already
handles it.

Moves filter to a method to simplify Next, and more explicitly separate filtering from looping.

Also improve some godoc

Only unwrap itemBuffer.Err when necessary
2020-07-14 15:57:47 -04:00
Daniel Nephin f19f8e99bb stream: Improve docstrings
Also rename ResumeStrema to EndOfEmptySnapshot to be more consistent with other framing events

Co-authored-by: Paul Banks <banks@banksco.de>
2020-07-14 15:57:47 -04:00
Daniel Nephin fc1c2ae412 stream: change Topic to an interface
Consumers of the package can decide on which type to use for the Topic. In the future we may
use a gRPC type for the topic.
2020-07-14 15:57:47 -04:00
Daniel Nephin 6fa36e3aee state: Move change processing out of EventPublisher
EventPublisher was receiving TopicHandlers, which had a couple of
problems:

- ChangeProcessors were being grouped by Topic, but they completely
  ignored the topic and were performed on every change
- ChangeProcessors required EventPublisher to be aware of database
  changes

By moving ChangeProcesors out of EventPublisher, and having Publish
accept events instead of changes, EventPublisher no longer needs to
be aware of these things.

Handlers is now only SnapshotHandlers, which are still mapped by Topic.

Also allows us to remove the small 'db' package that had only two types.
They can now be unexported types in state.
2020-07-14 15:57:47 -04:00
Daniel Nephin 48c766d2c6 server: Abandom state store to shutdown EventPublisher
So that we don't leak goroutines
2020-07-14 15:57:47 -04:00
Daniel Nephin 889c57fd2d stream: unexport identifiers
Now that EventPublisher is part of stream a lot of the internals can be hidden
2020-07-14 15:57:47 -04:00
Daniel Nephin 4fa0fdc0e0 stream: Move EventPublisher to stream package
The EventPublisher is the central hub of the PubSub system. It is toughly coupled with much of
stream. Some stream internals were exported exclusively for EventPublisher.

The two Subscribe cases (with or without index) were also awkwardly split between two packages. By
moving EventPublisher into stream they are now both in the same package (although still in different files).
2020-07-14 15:57:47 -04:00
Daniel Nephin 489876c86b state: Make handleACLUpdate async once again
So that we keep as much as possible out of the FSM commit hot path.
2020-07-14 15:57:47 -04:00
Daniel Nephin 5f9db94956 state: Use interface for Txn
Also store the index in Changes instead of the Txn.

This change is in preparation for movinng EventPublisher to the stream package, and
making handleACLUpdates async once again.
2020-07-14 15:57:46 -04:00
Daniel Nephin a709ed1ab5 stream.Subscription unexport fields and additiona docstrings 2020-07-14 15:57:46 -04:00
Daniel Nephin 17b833b4c9 Add a context for stopping EventPublisher goroutine 2020-07-14 15:57:46 -04:00
Daniel Nephin 2c8342f115 EventPublisher: Make Unsubscribe a function on Subscription
It is critical that Unsubscribe be called with the same pointer to a
SubscriptionRequest that was used to create the Subscription. The
docstring made that clear, but it sill allowed a caler to get it wrong by
creating a new SubscriptionRequest.

By hiding this detail from the caller, and only exposing an Unsubscribe
method, it should be impossible to fail to Unsubscribe.

Also update some godoc strings.
2020-07-14 15:57:46 -04:00
Daniel Nephin 1622bb3a45 EventPublisher: handleACL changes synchronously
Use a separate lock for subscriptions.ByToken to allow it to happen synchronously
in the commit flow.
This removes the need to create a new txn for the goroutine, and removes
the need for EventPublisher to contain a reference to DB.
2020-07-14 15:57:46 -04:00
Daniel Nephin effab15131 stream.EventSnapshot: reduce the fields on the struct
Many of the fields are only needed in one place, and by using a closure
they can be removed from the struct. This reduces the scope of the variables
making it esier to see how they are used.
2020-07-14 15:57:45 -04:00
Daniel Nephin a5cf933fe8 stream.EventBuffer: Seed the fuzz test with time.Now()
Otherwise the test will run with exactly the same values each time.

By printing the seed we can attempt to reproduce the test by adding an env var to override the seed
2020-07-14 15:57:45 -04:00
Daniel Nephin bbe7272d8e state: memdb_wrapper.go -> memdb.go
Renaming in a separate commit so that git can merge changes to the file.
2020-07-14 15:57:45 -04:00
Daniel Nephin 2a8a8f7b8d state: publish changes from Commit
Make topicRegistry use functions instead of unbound methods
Use a regular memDB in EventPublisher to remove a reference cycle
Removes the need for EventPublisher to use a store
2020-07-14 15:57:45 -04:00
Daniel Nephin f5ecd5de5f EventPublisher: docstrings and getTopicBuffer
also rename commitCh -> publishCh
2020-07-14 15:57:45 -04:00
Daniel Nephin 555cfe52d9 ProcessChanges: use stream.Event
Also remove secretHash, which was used to hash tokens. We don't expose
these tokens anywhere, so we can use the string itself instead of a
Hash.

Fix acl_events_test.go for storing a structs type.
2020-07-14 15:57:45 -04:00
Daniel Nephin 4e0bc8013b stream: Use local types for Event Topic SubscriptionRequest 2020-07-14 15:57:45 -04:00
Daniel Nephin aacd514dca Rename stream_publisher.go -> event_publisher.go 2020-07-14 15:57:44 -04:00
Daniel Nephin c0b0109e80 Add streaming package with Subscription and Snapshot components.
The remaining files from 7965767de0bd62ab07669b85d6879bd5f815d157

Co-authored-by: Paul Banks <banks@banksco.de>
2020-07-14 15:57:44 -04:00
Matt Keeler 8beca1cb8d
Add ability for notifications when one of the agent tokens is updated (#8301)
Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-07-14 09:53:55 -04:00
Hans Hasselberg 496fb5fc5b
add support for envoy 1.14.4, 1.13.4, 1.12.6 (#8216) 2020-07-13 15:44:44 -05:00
Chris Piraino 4d857d117f
Set enterprise metadata after resolving the token (#8302)
The token can encode enterprise metadata information, and we must make
sure we set that on the reply so that we can correct filter ACLs.
2020-07-13 13:39:57 -05:00
Daniel Nephin 7b3f26e61d watch: Allow args from different types
Fixes a bug where specifying a slice of args with a single item was being converted to
a string when config was loaded, causing an error.
2020-07-10 17:18:32 -04:00
Freddy e72af87918
Add api mod support for /catalog/gateway-services (#8278) 2020-07-10 13:01:45 -06:00
Daniel Nephin 653c938edc watch: extract makeWatchPlan to facilitate testing
There is a bug in here now that slices in opaque config are unsliced. But to test that bug fix we
need a function that can be easily tested.
2020-07-10 13:33:45 -04:00
Paul Coignet b8b939b98a
Keep both metrics 2020-07-10 11:27:22 +02:00
R.B. Boyer 1eef096dfe
xds: version sniff envoy and switch regular expressions from 'regex' to 'safe_regex' on newer envoy versions (#8222)
- cut down on extra node metadata transmission
- split the golden file generation to compare all envoy version
2020-07-09 17:04:51 -05:00
Daniel Nephin f22f3d300d
Merge pull request #8231 from hashicorp/dnephin/unembed-HTTPServer-Server
agent/http: un-embed the http.Server
2020-07-09 17:42:33 -04:00
Daniel Nephin df4088291c agent/http: Update TestSetupHTTPServer_HTTP2
To remove the need to store the http.Server. This will allow us to
remove the http.Server field from the HTTPServer struct.
2020-07-09 16:42:19 -04:00
Daniel Nephin d98a4c1317
Merge pull request #8237 from hashicorp/dnephin/remove-acls-enabled-from-delegate
Remove ACLsEnabled from delegate interface
2020-07-09 16:35:43 -04:00
Paul Coignet cf68577bef
Use method and path as labels 2020-07-09 10:31:27 +02:00
Matt Keeler 4fb535ba48
Pass the Config and TLS Configurator into the AutoConfig constructor
This is instead of having the AutoConfigBackend interface provide functions for retrieving them.

NOTE: the config is not reloadable. For now this is fine as we don’t look at any reloadable fields. If that changes then we should provide a way to make it reloadable.
2020-07-08 12:36:11 -04:00
Matt Keeler f2f32735ce
Rename (*Server).forward to (*Server).ForwardRPC
Also get rid of the preexisting shim in server.go that existed before to have this name just call the unexported one.
2020-07-08 11:05:44 -04:00
Matt Keeler d2e4869c7c
Refactor AutoConfig RPC to not have a direct dependency on the Server type
Instead it has an interface which can be mocked for better unit testing that is deterministic and not prone to flakiness.
2020-07-08 11:05:44 -04:00
Chris Piraino 735337b170
Append port number to ingress host domain (#8190)
A port can be sent in the Host header as defined in the HTTP RFC, so we
take any hosts that we want to match traffic to and also add another
host with the listener port added.

Also fix an issue with envoy integration tests not running the
case-ingress-gateway-tls test.
2020-07-07 10:43:04 -05:00
Daniel Nephin 5247ef4c70 Remove ACLsEnabled from delegate interface
In all cases (oss/ent, client/server) this method was returning a value from config. Since the
value is consistent, it doesn't need to be part of the delegate interface.
2020-07-03 17:00:20 -04:00
Daniel Nephin a7f69b615a
Merge pull request #8215 from hashicorp/dnephin/support-not-modified-response-server
agent/consul: Add support for NotModified to two endpoints
2020-07-03 16:15:31 -04:00
Pierre Souchay 20d1ea7d2d
Upgrade go-connlimit to v0.3.0 / return http 429 on too many connections (#8221)
Fixes #7527

I want to highlight this and explain what I think the implications are and make sure we are aware:

* `HTTPConnStateFunc` closes the connection when it is beyond the limit. `Close` does not block.
* `HTTPConnStateFuncWithDefault429Handler(10 * time.Millisecond)` blocks until the following is done (worst case):
  1) `conn.SetDeadline(10*time.Millisecond)` so that
  2) `conn.Write(429error)` is guaranteed to timeout after 10ms, so that the http 429 can be written and 
  3) `conn.Close` can happen

The implication of this change is that accepting any new connection is worst case delayed by 10ms. But only after a client reached the limit already.
2020-07-03 09:25:07 +02:00
Daniel Nephin a5e45defb1 agent/http: un-embed the HTTPServer
The embedded HTTPServer struct is not used by the large HTTPServer
struct. It is used by tests and the agent. This change is a small first
step in the process of removing that field.

The eventual goal is to reduce the scope of HTTPServer making it easier
to test, and split into separate packages.
2020-07-02 17:21:12 -04:00
Daniel Nephin 5d36f98710 agent/consul: Add support for NotModified to two endpoints
A query made with AllowNotModifiedResponse and a MinIndex, where the
result has the same Index as MinIndex, will return an empty response
with QueryMeta.NotModified set to true.

Co-authored-by: Pierre Souchay <pierresouchay@users.noreply.github.com>
2020-07-02 17:05:46 -04:00
Matt Keeler f8e8f48125
Merge pull request #8211 from hashicorp/bugfix/auto-encrypt-various 2020-07-02 09:49:49 -04:00
Yury Evtikhov 10361dd210 DNS: add IsErrQueryNotFound function for easier error evaluation 2020-07-01 03:41:44 +01:00
Yury Evtikhov 8d18422f19 DNS: fix agent returning SERVFAIL where NXDOMAIN should be returned 2020-07-01 01:51:21 +01:00
Yury Evtikhov 3b4ddaaab5 DNS: add test to verify NXDOMAIN is returned when a non-existent domain is queried over RPC 2020-07-01 01:51:16 +01:00
Matt Keeler 6e7acfa618
Add an AutoEncrypt “integration” test
Also fix a bug where Consul could segfault if TLS was enabled but no client certificate was provided. How no one has reported this as a problem I am not sure.
2020-06-30 15:23:29 -04:00
Matt Keeler 2ddcba00c6
Overwrite agent leaf cert trust domain on the servers 2020-06-30 09:59:08 -04:00
Matt Keeler 19040f1166
Store the Connect CA rate limiter on the server
This fixes a bug where auto_encrypt was operating without utilizing a common rate limiter.
2020-06-30 09:59:07 -04:00
Matt Keeler a5a9560bbd
Initialize the agent leaf cert cache result with a state to prevent unnecessary second certificate signing 2020-06-30 09:59:07 -04:00
Matt Keeler 39b567a55a
Fix auto_encrypt IP/DNS SANs
The initial auto encrypt CSR wasn’t containing the user supplied IP and DNS SANs. This fixes that. Also We were configuring a default :: IP SAN. This should be ::1 instead and was fixed.
2020-06-30 09:59:07 -04:00
Matt Keeler 85fd8c552f
Merge pull request #8193 from hashicorp/feature/auto-config/suppress-config-warnings 2020-06-27 10:06:52 -04:00
R.B. Boyer 462f0f37ed
connect: various changes to make namespaces for intentions work more like for other subsystems (#8194)
Highlights:

- add new endpoint to query for intentions by exact match

- using this endpoint from the CLI instead of the dump+filter approach

- enforcing that OSS can only read/write intentions with a SourceNS or
  DestinationNS field of "default".

- preexisting OSS intentions with now-invalid namespace fields will
  delete those intentions on initial election or for wildcard namespaces
  an attempt will be made to downgrade them to "default" unless one
  exists.

- also allow the '-namespace' CLI arg on all of the intention subcommands

- update lots of docs
2020-06-26 16:59:15 -05:00
Matt Keeler be576c9737
Use the DNS and IP SANs from the auto config stanza when set 2020-06-26 16:01:30 -04:00
Matt Keeler e8b39dd255
Overhaul the auto-config translation
This fixes some issues around spurious warnings about using enterprise configuration in OSS.
2020-06-26 15:25:21 -04:00
Freddy 10d6e9c458
Split up unused key validation for oss/ent (#8189)
Split up unused key validation in config entry decode for oss/ent.

This is needed so that we can return an informative error in OSS if namespaces are provided.
2020-06-25 13:58:29 -06:00
Daniel Nephin a891ee8428
Merge pull request #8176 from hashicorp/dnephin/add-linter-unparam-1
lint: add unparam linter and fix some of the issues
2020-06-25 15:34:48 -04:00
Matt Keeler 7041f69892
Merge pull request #8184 from hashicorp/bugfix/goroutine-leaks 2020-06-25 09:22:19 -04:00
Chris Piraino df48db0abd
Merge pull request #7932 from hashicorp/ingress/internal-ui-endpoint-multiple-ports
Update gateway-services-nodes API endpoint to allow multiple addresses
2020-06-24 17:11:01 -05:00
Chris Piraino f213d3592a remove obsolete comments about test parallelization 2020-06-24 16:36:13 -05:00
Chris Piraino b3db907bdf Update gateway-services-nodes API endpoint to allow multiple addresses
Previously, we were only returning a single ListenerPort for a single
service. However, we actually allow a single service to be serviced over
multiple ports, as well as allow users to define what hostnames they
expect their services to be contacted over. When no hosts are defined,
we return the default ingress domain for any configured DNS domain.

To show this in the UI, we modify the gateway-services-nodes API to
return a GatewayConfig.Addresses field, which is a list of addresses
over which the specific service can be contacted.
2020-06-24 16:35:23 -05:00
Matt Keeler e9835610f3
Add a test for go routine leaks
This is in its own separate package so that it will be a separate test binary that runs thus isolating the go runtime from other tests and allowing accurate go routine leak checking.

This test would ideally use goleak.VerifyTestMain but that will fail 100% of the time due to some architectural things (blocking queries and net/rpc uncancellability).

This test is not comprehensive. We should enable/exercise more features and more cluster configurations. However its a start.
2020-06-24 17:09:50 -04:00
Matt Keeler 29d0cfdd7d
Fix go routine leak in auto encrypt ca roots tracking 2020-06-24 17:09:50 -04:00
Matt Keeler 25a4f3c83b
Allow cancelling blocking queries in response to shutting down. 2020-06-24 17:09:50 -04:00
Daniel Nephin 0279bf6fe5 Update TestAgent_GetCoordinate
The old test case was a very specific regresion test for a case that is no longer possible.
Replaced with a new test that checks the default coordinate is returned.
2020-06-24 13:00:15 -04:00
Daniel Nephin f65e21e6dc Remove unused return values 2020-06-24 13:00:15 -04:00
Daniel Nephin 010a609912 Fix a bunch of unparam lint issues 2020-06-24 13:00:14 -04:00
Matt Keeler 15e7b3940c
Ensure that retryLoopBackoff can be cancelled
We needed to pass a cancellable context into the limiter.Wait instead of context.Background. So I made the func take a context instead of a chan as most places were just passing through a Done chan from a context anyways.

Fix go routine leak in the gateway locator
2020-06-24 12:41:08 -04:00
Matt Keeler e2cfa93f02
Don’t leak metrics go routines in tests (#8182) 2020-06-24 10:15:25 -04:00
gitforbit 808f632346
agent-http: cleanup: return nil instead of err (#8043)
Since err is already checked, it should return `nil`
2020-06-24 14:29:21 +02:00
R.B. Boyer c63c994b04
connect: upgrade github.com/envoyproxy/go-control-plane to v0.9.5 (#8165) 2020-06-23 15:19:56 -05:00
freddygv c791fbc79c Update namespaces subject-verb agreement 2020-06-23 10:57:30 -06:00
freddygv 044d027ff8 Remove break 2020-06-22 19:59:04 -06:00
freddygv 70810b0602 Let users know namespaces are ent only in config entry decode 2020-06-22 19:59:04 -06:00
Pierre Souchay 35d852fd9a
Returns DNS Error NSDOMAIN when DC does not exists (#8103)
This will allow to increase cache value when DC is not valid (aka
return SOA to avoid too many consecutive requests) and will
distinguish DC being temporarily not available from DC not existing.

Implements https://github.com/hashicorp/consul/issues/8102
2020-06-22 09:01:48 -04:00
Matt Keeler 4a5b352c18
Require enabling TLS to enable Auto Config (#8159)
On the servers they must have a certificate.

On the clients they just have to set verify_outgoing to true to attempt TLS connections for RPCs.

Eventually we may relax these restrictions but right now all of the settings we push down (acl tokens, acl related settings, certificates, gossip key) are sensitive and shouldn’t be transmitted over an unencrypted connection. Our guides and docs should recoommend verify_server_hostname on the clients as well.

Another reason to do this is weird things happen when making an insecure RPC when TLS is not enabled. Basically it tries TLS anyways. We should probably fix that to make it clearer what is going on.
2020-06-19 16:38:14 -04:00
Freddy 5baa7b1b04
Always return a gateway cluster (#8158) 2020-06-19 13:31:39 -06:00
Matt Keeler d6e05482ab
Allow cancelling startup when performing auto-config (#8157)
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2020-06-19 15:16:00 -04:00
Daniel Nephin 4f17350928
Merge pull request #8147 from hashicorp/dnephin/remove-private-ip-2
Remove some dead code from agent/consul/util.go
2020-06-18 15:51:09 -04:00
Matt Keeler b0fcf86140 Change auto config authorizer to allow for future extension
The envisioned changes would allow extra settings to enable dynamically defined auth methods to be used instead of  or in addition to the statically defined one in the configuration.
2020-06-18 15:22:24 -04:00
Daniel Nephin b0ba546a1f Remove bytesToUint64 from agent/consul 2020-06-18 12:45:43 -04:00
Daniel Nephin a00f007c5e Remove unused private IP code from agent/consul 2020-06-18 12:40:38 -04:00
Matt Keeler 3dbbd2d37d
Implement Client Agent Auto Config
There are a couple of things in here.

First, just like auto encrypt, any Cluster.AutoConfig RPC will implicitly use the less secure RPC mechanism.

This drastically modifies how the Consul Agent starts up and moves most of the responsibilities (other than signal handling) from the cli command and into the Agent.
2020-06-17 16:49:46 -04:00
Matt Keeler 8b7d669a27
Allow the Agent its its child Client/Server to share a connection pool
This is needed so that we can make an AutoConfig RPC at the Agent level prior to creating the Client/Server.
2020-06-17 16:19:33 -04:00
Matt Keeler 51c3a605ad
Merge pull request #8035 from hashicorp/feature/auto-config/server-rpc 2020-06-17 16:07:25 -04:00
Chris Piraino 79a862d019
Remove ACLEnforceVersion8 from tests (#8138)
The field had been deprecated for a while and was recently removed,
however a PR which added these tests prior to removal was merged.
2020-06-17 14:58:01 -05:00
Daniel Nephin 692a4a8fc8
Merge pull request #7762 from hashicorp/dnephin/warn-on-unknown-service-file
config: warn if a config file is being skipped because of its file extension
2020-06-17 15:14:40 -04:00
Daniel Nephin be29d6bf75 config: warn when a config file is skipped
All commands which read config (agent, services, and validate) will now
print warnings when one of the config files is skipped because it did
not match an expected format.

Also ensures that config validate prints all warnings.
2020-06-17 13:08:54 -04:00
Daniel Nephin 5afcf5c1bc
Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4
ci: enable SA4006 staticcheck check and add ineffassign
2020-06-17 12:16:02 -04:00
Matt Keeler 9b01f9423c
Implement the insecure version of the Cluster.AutoConfig RPC endpoint
Right now this is only hooked into the insecure RPC server and requires JWT authorization. If no JWT authorizer is setup in the configuration then we inject a disabled “authorizer” to always report that JWT authorization is disabled.
2020-06-17 11:25:29 -04:00
Pierre Souchay d31691dc87
gossip: Ensure that metadata of Consul Service is updated (#7903)
While upgrading servers to a new version, I saw that metadata of
existing servers are not upgraded, so the version and raft meta
is not up to date in catalog.

The only way to do it was to:
 * update Consul server
 * make it leave the cluster, then metadata is accurate

That's because the optimization to avoid updating catalog does
not take into account metadata, so no update on catalog is performed.
2020-06-17 12:16:13 +02:00
Daniel Nephin d345cd8d30 ci: Add ineffsign linter
And fix an additional ineffective assignment that was not caught by staticcheck
2020-06-16 17:32:50 -04:00
Daniel Nephin a9851e1812
Merge pull request #8070 from hashicorp/dnephin/add-gofmt-simplify
ci: Enable gofmt simplify
2020-06-16 17:18:38 -04:00
Matt Keeler 9f7b22a5eb
Agent Auto Configuration: Configuration Syntax Updates (#8003) 2020-06-16 15:03:22 -04:00
Daniel Nephin 068b43df90 Enable gofmt simplify
Code changes done automatically with 'gofmt -s -w'
2020-06-16 13:21:11 -04:00
Daniel Nephin cb050b280c ci: enable SA4006 staticcheck check
And fix the 'value not used' issues.

Many of these are not bugs, but a few are tests not checking errors, and
one appears to be a missed error in non-test code.
2020-06-16 13:10:11 -04:00
Daniel Nephin f7c84ad802 Rename txnWrapper to txn 2020-06-16 13:06:02 -04:00
Daniel Nephin 32aa3ada35 Rename db 2020-06-16 13:04:31 -04:00
Daniel Nephin deef6fcc32 Handle return value from txn.Commit 2020-06-16 13:04:31 -04:00
Daniel Nephin 59bac0f99d state: Update docstrings for changeTrackerDB and txn
And un-embed memdb.DB to prevent accidental access to underlying
methods.
2020-06-16 13:04:31 -04:00
Paul Banks f6ac08be04 state: track changes so that they may be used to produce change events 2020-06-16 13:04:29 -04:00
Matt Keeler d3881dd754
ACL Node Identities (#7970)
A Node Identity is very similar to a service identity. Its main targeted use is to allow creating tokens for use by Consul agents that will grant the necessary permissions for all the typical agent operations (node registration, coordinate updates, anti-entropy).

Half of this commit is for golden file based tests of the acl token and role cli output. Another big updates was to refactor many of the tests in agent/consul/acl_endpoint_test.go to use the same style of tests and the same helpers. Besides being less boiler plate in the tests it also uses a common way of starting a test server with ACLs that should operate without any warnings regarding deprecated non-uuid master tokens etc.
2020-06-16 12:54:27 -04:00
Daniel Nephin 476b57fe22 config: refactor to consolidate all File->Source loading
Previously the logic for reading ConfigFiles and produces Sources was split
between NewBuilder and Build. This commit moves all of the logic into NewBuilder
so that Build() can operate entirely on Sources.

This change is in preparation for logging warnings when files have an
unsupported extension.

It also reduces the scope of BuilderOpts, and gets us very close to removing
Builder.options.
2020-06-16 12:52:23 -04:00
Daniel Nephin 219790ca49 config: Make ConfigFormat not a pointer
The nil value was never used. We can avoid a bunch of complications by
making the field a string value instead of a pointer.

This change is in preparation for fixing a silent config failure.
2020-06-16 12:52:22 -04:00
Daniel Nephin 77101eee82 config: rename Flags to BuilderOpts
Flags is an overloaded term in this context. It generally is used to
refer to command line flags. This struct, however, is a data object
used as input to the construction.

It happens to be partially populated by command line flags, but
otherwise has very little to do with them.

Renaming this struct should make the actual responsibility of this struct
more obvious, and remove the possibility that it is confused with
command line flags.

This change is in preparation for adding additional fields to
BuilderOpts.
2020-06-16 12:51:19 -04:00
Daniel Nephin 85e0338136 config: remove Args field from Flags
This field was populated for one reason, to test that it was empty.
Of all the callers, only a single one used this functionality. The rest
constructed a `Flags{}` struct which did not set Args.

I think this shows that the logic was in the wrong place. Only the agent
command needs to care about validating the args.

This commit removes the field, and moves the logic to the one caller
that cares.

Also fix some comments.
2020-06-16 12:49:53 -04:00
Daniel Nephin 73cd0b6fac agent/service_manager: remove 'updateCh' field from serviceConfigWatch
Passing the channel to the function which uses it significantly
reduces the scope of the variable, and makes its usage more explicit. It
also moves the initialization of the channel closer to where it is used.

Also includes a couple very small cleanups to remove a local var and
read the error from `ctx.Err()` directly instead of creating a channel
to check for an error.
2020-06-16 12:15:57 -04:00
Daniel Nephin 26291a8482 agent/service_manager: remove 'defaults' field from serviceConfigWatch
This field was always read by the same function that populated the field,
so it does not need to be a field. Passing the value as an argument to
functions makes it more obvious where the value comes from, and also reduces
the scope of the variable significantly.
2020-06-16 12:15:52 -04:00
Daniel Nephin 93235da253 agent/service_manager: Pass ctx around
[The documentation for context](https://golang.org/pkg/context/)
recommends not storing context in a struct field:

> Do not store Contexts inside a struct type; instead, pass a Context
> explicitly to each function that needs it. The Context should be the
> first parameter, typically named ctx...

Sometimes there are good reasons to not follow this recommendation, but
in this case it seems easy enough to follow.

Also moved the ctx argument to be the first in one of the function calls
to follow the same recommendation.
2020-06-16 12:14:00 -04:00
Daniel Nephin 2eac5b8023
Merge pull request #8074 from hashicorp/dnephin/remove-references-to-PatchSliceOfMaps
Update comments that reference PatchSliceOfMaps
2020-06-15 14:33:10 -04:00
Matt Keeler 8837907de4
Make the Agent Cache more Context aware (#8092)
Blocking queries issues will still be uncancellable (that cannot be helped until we get rid of net/rpc). However this makes it so that if calling getWithIndex (like during a cache Notify go routine) we can cancell the outer routine. Previously it would keep issuing more blocking queries until the result state actually changed.
2020-06-15 11:01:25 -04:00
freddygv d97cff0966 Update telemetry for gateway-services endpoint 2020-06-12 14:44:36 -06:00
freddygv cd927eed5e Remove unused method and fixup docs ref 2020-06-12 13:47:43 -06:00
freddygv 0f97b7d63d Fixup stray sid references 2020-06-12 13:47:43 -06:00
freddygv 19e3954603 Move compound service names to use ServiceName type 2020-06-12 13:47:43 -06:00
freddygv e7b52d35d4 Create HTTP endpoint 2020-06-12 13:46:47 -06:00
freddygv 15c74d6943 Move GatewayServices out of Internal 2020-06-12 13:46:47 -06:00
Freddy 166a8b2a58
Only pass one hostname via EDS and prefer healthy ones (#8084)
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Currently when passing hostname clusters to Envoy, we set each service instance registered with Consul as an LbEndpoint for the cluster.

However, Envoy can only handle one per cluster:
[2020-06-04 18:32:34.094][1][warning][config] [source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Error adding/updating cluster(s) dc2.internal.ddd90499-9b47-91c5-4616-c0cbf0fc358a.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint, server.dc2.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint

Envoy is currently handling this gracefully by only picking one of the endpoints. However, we should avoid passing multiple to avoid these warning logs.

This PR:

* Ensures we only pass one endpoint, which is tied to one service instance.
* We prefer sending an endpoint which is marked as Healthy by Consul.
* If no endpoints are healthy we emit a warning and skip the cluster.
* If multiple unique hostnames are spread across service instances we emit a warning and let the user know which will be resolved.
2020-06-12 13:46:17 -06:00
Chris Piraino 6fa48c9512
Allow users to set hosts to the wildcard specifier when TLS is disabled (#8083)
This allows easier demoing/testing of ingress gateways, while still
preserving the validation we have for DNSSANs
2020-06-11 10:03:06 -05:00
Chris Piraino 91ab89dd48
Move ingress param to a new endpoint (#8081)
In discussion with team, it was pointed out that query parameters tend
to be filter mechanism, and that semantically the "/v1/health/connect"
endpoint should return "all healthy connect-enabled endpoints (e.g.
could be side car proxies or native instances) for this service so I can
connect with mTLS".

That does not fit an ingress gateway, so we remove the query parameter
and add a new endpoint "/v1/health/ingress" that semantically means
"all the healthy ingress gateway instances that I can connect to
to access this connect-enabled service without mTLS"
2020-06-10 13:07:15 -05:00
Daniel Nephin 8ec029ae6a Update comments that reference PatchSliceOfMaps
To reference decode.HookWeakDecodeFromSlice instead.

Also removes a step from the adding config fields checklist which is
no longer necessary.
2020-06-09 17:43:05 -04:00
Chris Piraino 496e683360
Merge pull request #8064 from hashicorp/ingress/health-query-param
Add API query parameter ?ingress to allow users to find ingress gateways associated to a service
2020-06-09 16:08:28 -05:00
Chris Piraino c1d329c5dd Remove TODO note about ingress API, it is done! 2020-06-09 14:58:30 -05:00
Chris Piraino ca41f80493 Set connect or ingress boolean after checking for query param 2020-06-09 14:45:21 -05:00
Daniel Nephin 08f1ed16b4
Merge pull request #7900 from hashicorp/dnephin/add-linter-staticcheck-2
intentions: fix a bug in Intention.SetHash
2020-06-09 15:40:20 -04:00
Daniel Nephin 62a1125c7b
Merge pull request #8037 from hashicorp/dnephin/add-linter-staticcheck-5
ci: Enabled SA2002 staticcheck check
2020-06-09 15:31:24 -04:00
Hans Hasselberg 242994a016
acl: do not resolve local tokens from remote dcs (#8068) 2020-06-09 21:13:09 +02:00
Kyle Havlovitz 0c8966220f
Merge pull request #8040 from hashicorp/ingress/expose-cli
Ingress expose CLI command
2020-06-09 12:11:23 -07:00
Chris Piraino 3c037d9b96 Add ?ingress query parameter on /v1/health/connect
Refactor boolean query parameter logic from ?passing value to re-use
with ingress
2020-06-09 11:44:31 -05:00
Daniel Nephin c66c533d73
Merge pull request #7964 from hashicorp/dnephin/remove-patch-slice-of-maps-forward-compat
config: Use HookWeakDecodeFromSlice in place of PatchSliceOfMaps
2020-06-08 19:53:04 -04:00
Daniel Nephin 75cbbe2702 config: add HookWeakDecodeFromSlice
Currently opaque config blocks (config entries, and CA provider config) are
modified by PatchSliceOfMaps, making it impossible for these opaque
config sections to contain slices of maps.

In order to fix this problem, any lazy-decoding of these blocks needs to support
weak decoding of []map[string]interface{} to a struct type before
PatchSliceOfMaps is replaces. This is necessary because these config
blobs are persisted, and during an upgrade an older version of Consul
could read one of the new configuration values, which would cause an error.

To support the upgrade path, this commit first introduces the new hooks
for weak decoding of []map[string]interface{} and uses them only in the
lazy-decode paths. That way, in a future release, new style
configuration will be supported by the older version of Consul.

This decode hook has a number of advantages:

1. It no longer panics. It allows mapstructure to report the error
2. It no longer requires the user to declare which fields are slices of
   structs. It can deduce that information from the 'to' value.
3. It will make it possible to preserve opaque configuration, allowing
   for structured opaque config.
2020-06-08 17:05:09 -04:00
Hans Hasselberg 98eea08d3b
Tokens converted from legacy ACLs get their Hash computed (#8047)
* Fixes #5606: Tokens converted from legacy ACLs get their Hash computed

This allows new style token replication to work for legacy tokens as well when they change.

* tests: fix timestamp comparison

Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
2020-06-08 21:44:06 +02:00
Chris Piraino 1a853fc954
Always require Host header values for http services (#7990)
Previously, we did not require the 'service-name.*' host header value
when on a single http service was exposed. However, this allows a user
to get into a situation where, if they add another service to the
listener, suddenly the previous service's traffic might not be routed
correctly. Thus, we always require the Host header, even if there is
only 1 service.

Also, we add the make the default domain matching more restrictive by
matching "service-name.ingress.*" by default. This lines up better with
the namespace case and more accurately matches the Consul DNS value we
expect people to use in this case.
2020-06-08 13:16:24 -05:00
Hans Hasselberg c7e6c9ebec
http: use default minsize for gzip handler. (#7354)
Fixes #6306
2020-06-08 10:10:08 +02:00
Hans Hasselberg 72f92ae7ca
agent: add option to disable agent cache for HTTP endpoints (#8023)
This allows the operator to disable agent caching for the http endpoint.
It is on by default for backwards compatibility and if disabled will
ignore the url parameter `cached`.
2020-06-08 10:08:12 +02:00
Kyle Havlovitz b874c8ef0c Add connect expose CLI command 2020-06-05 14:54:29 -07:00
Daniel Nephin caa692deea ci: Enabled SA2002 staticcheck check
And handle errors in the main test goroutine
2020-06-05 17:50:11 -04:00
Hans Hasselberg 5281cb74db
Setup intermediate_pki_path on secondary when using vault (#8001)
Make sure to mount vault backend for intermediate_pki_path on secondary
dc.
2020-06-05 21:36:22 +02:00
Daniel Nephin ce6cc094a1 intentions: fix a bug in Intention.SetHash
Found using staticcheck.

binary.Write does not accept int types without a size. The error from binary.Write was ignored, so we never saw this error. Casting the data to uint64 produces a correct hash.

Also deprecate the Default{Addr,Port} fields, and prevent them from being encoded. These fields will always be empty and are not used.
Removing these would break backwards compatibility, so they are left in place for now.

Co-authored-by: Hans Hasselberg <me@hans.io>
2020-06-05 14:51:43 -04:00
R.B. Boyer 9cfa4a3fc9
tests: ensure that the ServiceExists helper function normalizes entmeta (#8025)
This fixes a unit test failure over in enterprise due to https://github.com/hashicorp/consul/pull/7384
2020-06-05 10:41:39 +02:00
R.B. Boyer b88bd6660e
server: don't activate federation state replication or anti-entropy until all servers are running 1.8.0+ (#8014) 2020-06-04 16:05:27 -05:00
Hans Hasselberg dfcf45c6cf
tests: use constructor instead init (#8024) 2020-06-04 22:59:06 +02:00
Pierre Souchay 9813ae512b
checks: when a service does not exists in an alias, consider it failing (#7384)
In current implementation of Consul, check alias cannot determine
if a service exists or not. Because a service without any check
is semantically considered as passing, so when no healthchecks
are found for an agent, the check was considered as passing.

But this make little sense as the current implementation does not
make any difference between:
 * a non-existing service (passing)
 * a service without any check (passing as well)

In order to make it work, we have to ensure that when a check did
not find any healthcheck, the service does indeed exists. If it
does not, lets consider the check as failing.
2020-06-04 14:50:52 +02:00
Hans Hasselberg 0f343332da
Merge pull request #7966 from hashicorp/pool_improvements
Agent connection pool cleanup
2020-06-04 08:56:26 +02:00
Freddy 9ed325ba8b
Enable gateways to resolve hostnames to IPv4 addresses (#7999)
The DNS resolution will be handled by Envoy and defaults to LOGICAL_DNS. This discovery type can be overridden on a per-gateway basis with the envoy_dns_discovery_type Gateway Option.

If a service contains an instance with a hostname as an address we set the Envoy cluster to use DNS as the discovery type rather than EDS. Since both mesh gateways and terminating gateways route to clusters using SNI, whenever there is a mix of hostnames and IP addresses associated with a service we use the hostname + CDS rather than the IPs + EDS.

Note that we detect hostnames by attempting to parse the service instance's address as an IP. If it is not a valid IP we assume it is a hostname.
2020-06-03 15:28:45 -06:00
Matt Keeler 771c613dae
Fix legacy management tokens in unupgraded secondary dcs (#7908)
The ACL.GetPolicy RPC endpoint was supposed to return the “parent” policy and not always the default policy. In the case of legacy management tokens the parent policy was supposed to be “manage”. The result of us not sending this properly was that operations that required specifically a management token such as saving a snapshot would not work in secondary DCs until they were upgraded.
2020-06-03 11:22:22 -04:00
Matt Keeler 0e4c65d422
Fix segfault due to race condition for checking server versions (#7957)
The ACL monitoring routine uses c.routers to check for server version updates. Therefore it needs to be started after initializing the routers.
2020-06-03 10:36:32 -04:00
Daniel Nephin 99eb583ebc
Replace goe/verify.Values with testify/require.Equal (#7993)
* testing: replace most goe/verify.Values with require.Equal

One difference between these two comparisons is that go/verify considers
nil slices/maps to be equal to empty slices/maps, where as testify/require
does not, and does not appear to provide any way to enable that behaviour.

Because of this difference some expected values were changed from empty
slices to nil slices, and some calls to verify.Values were left.

* Remove github.com/pascaldekloe/goe/verify

Reduce the number of assertion packages we use from 2 to 1
2020-06-02 12:41:25 -04:00
Alvin Huang 80c34f0461
Merge pull request #7956 from hashicorp/update-master-to-1.8.0-beta2
Update master to 1.8.0 beta2
2020-06-01 16:52:19 -04:00
R.B. Boyer 833211c14c
acl: allow auth methods created in the primary datacenter to optionally create global tokens (#7899) 2020-06-01 11:44:47 -05:00
R.B. Boyer ffb9c7d6f7
acl: remove the deprecated `acl_enforce_version_8` option (#7991)
Fixes #7292
2020-05-29 16:16:03 -05:00
Jono Sosulska c554ba9e10
Replace whitelist/blacklist terminology with allowlist/denylist (#7971)
* Replace whitelist/blacklist terminology with allowlist/denylist
2020-05-29 14:19:16 -04:00
Hans Hasselberg 1fbc1d4777 pool: remove timeout parameter
Timeout was never used in a meaningful way by callers, which is why it
is now entirely internal to the pool.
2020-05-29 08:21:28 +02:00
Hans Hasselberg ad03f863ff pool: remove useTLS and ForceTLS
In the past TLS usage was enforced with these variables, but these days
this decision is made by TLSConfigurator and there is no reason to keep
using the variables.
2020-05-29 08:21:24 +02:00
Hans Hasselberg c45432014b pool: remove version
The version field has been used to decide which multiplexing to use. It
was introduced in 2457293dce. But this is
6y ago and there is no need for this differentiation anymore.
2020-05-28 23:06:01 +02:00
hashicorp-ci cd617dbfa9 update bindata_assetfs.go 2020-05-28 14:39:37 -04:00
hashicorp-ci b3ca11fb0b update bindata_assetfs.go 2020-05-28 14:39:28 -04:00
Daniel Nephin c88fae0aac ci: Add staticcheck and fix most errors
Three of the checks are temporarily disabled to limit the size of the
diff, and allow us to enable all the other checks in CI.

In a follow up we can fix the issues reported by the other checks one
at a time, and enable them.
2020-05-28 11:59:58 -04:00
Daniel Nephin 4f2bff174d
Merge pull request #7963 from hashicorp/dnephin/replace-lib-translate-keys
Replace lib.TranslateKeys with a mapstructure decode hook
2020-05-27 16:51:26 -04:00
Daniel Nephin 6a2d7d77c0 config: use the new HookTranslateKeys instead of lib.TranslateKeys
With the exception of CA provider config, which will be migrated at some
later time.
2020-05-27 16:24:47 -04:00
Daniel Nephin 8ced4300c8 Add alias struct tags for new decode hook 2020-05-27 16:24:47 -04:00
R.B. Boyer 77f2e54618
create lib/stringslice package (#7934) 2020-05-27 11:47:32 -05:00
R.B. Boyer ddd0a13e27
agent: handle re-bootstrapping in a secondary datacenter when WAN federation via mesh gateways is configured (#7931)
The main fix here is to always union the `primary-gateways` list with
the list of mesh gateways in the primary returned from the replicated
federation states list. This will allow any replicated (incorrect) state
to be supplemented with user-configured (correct) state in the config
file. Eventually the game of random selection whack-a-mole will pick a
winning entry and re-replicate the latest federation states from the
primary. If the user-configured state is actually the incorrect one,
then the same eventual correct selection process will work in that case,
too.

The secondary fix is actually to finish making wanfed-via-mgws actually
work as originally designed. Once a secondary datacenter has replicated
federation states for the primary AND managed to stand up its own local
mesh gateways then all of the RPCs from a secondary to the primary
SHOULD go through two sets of mesh gateways to arrive in the consul
servers in the primary (one hop for the secondary datacenter's mesh
gateway, and one hop through the primary datacenter's mesh gateway).
This was neglected in the initial implementation. While everything
works, ideally we should treat communications that go around the mesh
gateways as just provided for bootstrapping purposes.

Now we heuristically use the success/failure history of the federation
state replicator goroutine loop to determine if our current mesh gateway
route is working as intended. If it is, we try using the local gateways,
and if those don't work we fall back on trying the primary via the union
of the replicated state and the go-discover configuration flags.

This can be improved slightly in the future by possibly initializing the
gateway choice to local on startup if we already have replicated state.
This PR does not address that improvement.

Fixes #7339
2020-05-27 11:31:10 -05:00
Raphaël Rondeau 0d2f178b7b
connect: fix endpoints clusterName when using cluster escape hatch (#7319)
```changelog
* fix(connect): fix endpoints clusterName when using cluster escape hatch
```
2020-05-26 10:57:22 +02:00
Pierre Souchay d6649e42af
Stop all watches before shuting down anything dring shutdown. (#7526)
This will prevent watches from being triggered.

```changelog
* fix(agent):  stop all watches before shuting down
```
2020-05-26 10:01:49 +02:00
R.B. Boyer 1b5023cb69
connect: ensure proxy-defaults protocol is used for upstreams (#7938) 2020-05-21 16:08:39 -05:00
Kyle Havlovitz b14696e32a
Standardize support for Tagged and BindAddresses in Ingress Gateways (#7924)
* Standardize support for Tagged and BindAddresses in Ingress Gateways

This updates the TaggedAddresses and BindAddresses behavior for Ingress
to match Mesh/Terminating gateways. The `consul connect envoy` command
now also allows passing an address without a port for tagged/bind
addresses.

* Update command/connect/envoy/envoy.go

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* PR comments

* Check to see if address is an actual IP address

* Update agent/xds/listeners.go

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* fix whitespace

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2020-05-21 09:08:12 -05:00
Daniel Nephin 03291943e1
Merge pull request #7933 from hashicorp/dnephin/state-txn-missing-errors
state: fix unhandled error
2020-05-20 17:00:20 -04:00
Daniel Nephin 04bf0f3490
Update agent/consul/state/catalog.go
Co-authored-by: Hans Hasselberg <me@hans.io>
2020-05-20 16:34:14 -04:00
Seth Hoenig 44ee818d46
grpc: use default resolver scheme for grpc dialing (#7617)
Currently checks of type gRPC will emit log messages such as,

    2020/02/12 13:48:22 [INFO] parsed scheme: ""
    2020/02/12 13:48:22 [INFO] scheme "" not registered, fallback to default scheme

Without adding full support for using custom gRPC schemes (maybe that's
right long-term path) we can just supply the default scheme as provided
by the grpc library.

Fixes https://github.com/hashicorp/consul/issues/7274
and https://github.com/hashicorp/nomad/issues/7415
2020-05-20 22:26:26 +02:00
Daniel Nephin 3f607d9ef0 state: use an error to indicate compare failed
Errors are values. We can use the error value to identify the 'comparison failed' case which makes the function easier to use and should make it harder to miss handle the error case
2020-05-20 12:43:33 -04:00
Pierre Souchay 5c7af90154
tests: added unit test to ensure watches are not re-triggered on consul reload (#7449)
This ensures no regression about https://github.com/hashicorp/consul/issues/7318
And ensure that https://github.com/hashicorp/consul/issues/7446 cannot happen anymore
2020-05-20 12:38:29 +02:00
Pierre Souchay e9d176db2a
Allow to restrict servers that can join a given Serf Consul cluster. (#7628)
Based on work done in https://github.com/hashicorp/memberlist/pull/196
this allows to restrict the IP ranges that can join a given Serf cluster
and be a member of the cluster.

Restrictions on IPs can be done separatly using 2 new differents flags
and config options to restrict IPs for LAN and WAN Serf.
2020-05-20 11:31:19 +02:00
Daniel Nephin 1bbea2751f consul/state: refactor tnxService to avoid missed cases
Handling errors at the end of a log switch/case block is somewhat
brittle. This block included a couple cases where errors were ignored,
but it was not obvious the way it was written.

This change moves all error handling into each case block. There is
still potentially one case where err is ignored, which will be handled
in a follow up.
2020-05-19 16:50:14 -04:00
Daniel Nephin 9f27d61bee Remove unused var
The usage was removed in 8e22d80e35,
however it seems there may be a bug here because the cluster name
is not updated when the target changes.
2020-05-19 16:50:14 -04:00
Daniel Nephin 4f738b462b Handle error from template.Execute
Refactored the function to make the problem more obvious, by using a
guard.
2020-05-19 16:50:14 -04:00
Daniel Nephin c662f0f0de Fix a number of problems found by staticcheck
Some of these problems are minor (unused vars), but others are real bugs (ignored errors).

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2020-05-19 16:50:14 -04:00
Daniel Nephin 5c99109dd9 Remove unused var
The usage of this var was removed in b92f895c23.

Found by using staticcheck
2020-05-19 16:50:14 -04:00
Chris Piraino 9d9e23cc44 Add service id context to the proxycfg logger
This is especially useful when multiple proxies are all querying the
same Consul agent.
2020-05-18 09:08:05 -05:00
Chris Piraino 79468793f6 Do not return an error if requested service is not a gateway
This commit converts the previous error into just a Warn-level log
message. By returning an error when the requested service was not a
gateway, we did not appropriately update envoy because the cache Fetch
returned an error and thus did not propagate the update through proxycfg
and xds packages.
2020-05-18 09:08:04 -05:00
Aleksandr Zagaevskiy a75e3d9051
Preserve ModifyIndex for unchanged entry in KVS TXN (#7832) 2020-05-14 13:25:04 -06:00
Pierre Souchay cf55e81c06
tests: fix unstable test `TestAgentAntiEntropy_Checks`. (#7594)
Example of failure: https://circleci.com/gh/hashicorp/consul/153932#tests/containers/2
2020-05-14 09:54:49 +02:00
Kit Patella ad1d4d4d07 http: migrate from instrumentation in s.wrap() to an s.enterpriseHandler() 2020-05-13 15:47:05 -07:00
Matt Keeler acccdbe45c
Fix identity resolution on clients and in secondary dcs (#7862)
Previously this happened to be using the method on the Server/Client that was meant to allow the ACLResolver to locally resolve tokens. On Servers that had tokens (primary or secondary dc + token replication) this function would lookup the token from raft and return the ACLIdentity. On clients this was always a noop. We inadvertently used this function instead of creating a new one when we added logging accessor ids for permission denied RPC requests. 

With this commit, a new method is used for resolving the identity properly via the ACLResolver which may still resolve locally in the case of being on a server with tokens but also supports remote token resolution.
2020-05-13 13:00:08 -04:00
Chris Piraino 7a7760bfd5
Make new gateway tests compatible with enterprise (#7856) 2020-05-12 13:48:20 -05:00
Daniel Nephin 600645b5f9 Add unconvert linter
To find unnecessary type convertions
2020-05-12 13:47:25 -04:00
Drew Bailey c9d0b83277 Value is already an int, remove type cast 2020-05-12 13:13:09 -04:00
Daniel Nephin 9d5ab443a7
Merge pull request #7689 from hashicorp/dnephin/remove-deadcode-1
Remove some dead code
2020-05-12 12:33:59 -04:00
Daniel Nephin 47238a693d
Merge pull request #7819 from hashicorp/dnephin/remove-t.Parallel-1
test: Remove t.Parallel() from agent/structs tests
2020-05-12 12:11:57 -04:00
R.B. Boyer 1efafd7523
acl: add auth method for JWTs (#7846) 2020-05-11 20:59:29 -05:00
Kit Patella 58ee349a83
Merge pull request #7843 from hashicorp/oss-sync/auditing-config
agent/config: Fix tests & include Audit struct as a pointer on Config
2020-05-11 14:23:44 -07:00
Kit Patella 10b3478a4d agent/config: include Audit struct as a pointer on Config, fix tests 2020-05-11 14:13:05 -07:00
Kit Patella b5564751bf
Merge pull request #7841 from hashicorp/oss-sync/auditing-config
OSS sync - Auditing config
2020-05-11 13:44:38 -07:00
Kit Patella f5030957d0 agent/config: add auditing config to OSS and add to enterpriseConfigMap exclusions 2020-05-11 13:27:35 -07:00
Chris Piraino c21052457b
Return early from updateGatewayServices if nothing to update (#7838)
* Return early from updateGatewayServices if nothing to update

Previously, we returned an empty slice of gatewayServices, which caused
us to accidentally delete everything in the memdb table

* PR comment and better formatting
2020-05-11 14:46:48 -05:00
Chris Piraino 4d6751bf16
Fix TestInternal_GatewayServiceDump_Ingress (#7840)
Protocol was added as a field on GatewayServices after
GatewayServiceDump PR branch was created.
2020-05-11 14:46:31 -05:00
R.B. Boyer 7414a3fa53
cli: ensure 'acl auth-method update' doesn't deep merge the Config field (#7839) 2020-05-11 14:21:17 -05:00
Chris Piraino 74c0543ef2 PR comment and better formatting 2020-05-11 14:04:59 -05:00
Chris Piraino fb9ee9d892 Return early from updateGatewayServices if nothing to update
Previously, we returned an empty slice of gatewayServices, which caused
us to accidentally delete everything in the memdb table
2020-05-11 12:38:04 -05:00
Freddy b3ec383d04
Gateway Services Nodes UI Endpoint (#7685)
The endpoint supports queries for both Ingress Gateways and Terminating Gateways. Used to display a gateway's linked services in the UI.
2020-05-11 11:35:17 -06:00
Kyle Havlovitz 136549205c
Merge pull request #7759 from hashicorp/ingress/tls-hosts
Add TLS option for Ingress Gateway listeners
2020-05-11 09:18:43 -07:00
Kyle Havlovitz 8d140ce9af Disallow the blanket wildcard prefix from being used as custom host 2020-05-08 20:24:18 -07:00
Chris Piraino a0e1f57ac2 Remove development log line 2020-05-08 20:24:18 -07:00
Chris Piraino 26f92e74f6 Compute all valid DNSSANs for ingress gateways
For DNSSANs we take into account the following and compute the
appropriate wildcard values:
- source datacenter
- namespaces
- alt domains
2020-05-08 20:23:17 -07:00
Daniel Nephin 5655d7f34e Add outlier_detection check to integration test
Fix decoding of time.Duration types.
2020-05-08 14:56:57 -04:00
Daniel Nephin eaa05d623a xds: Add passive health check config for upstreams 2020-05-08 14:56:57 -04:00
Chris Piraino 429d0cedd2
Restoring config entries updates the gateway-services table (#7811)
- Adds a new validateConfigEntryEnterprise function
- Also fixes some state store tests that were failing in enterprise
2020-05-08 13:24:33 -05:00
Daniel Nephin e60bb9f102 test: Remove t.Parallel() from agent/structs tests
go test will only run tests in parallel within a single package. In this case the package test run time is exactly the same with or without t.Parallel() (~0.7s).

In generally we should avoid t.Parallel() as it causes a number of problems with `go test` not reporting failure messages correctly. I encountered one of these problems, which is what prompted this change.  Since `t.Parallel` is not providing any benefit in this package, this commit removes it.

The change was automated with:

    git grep -l 't.Parallel' | xargs sed -i -e '/t.Parallel/d'
2020-05-08 14:06:10 -04:00
Freddy c32a4f1ece
Fix up enterprise compatibility for gateways (#7813) 2020-05-08 09:44:34 -06:00
Jono Sosulska 9b363e9f23
Fix spelling of deregister (#7804) 2020-05-08 10:03:45 -04:00
Chris Piraino f55e20a2f7
Allow ingress gateways to send empty clusters, routes, and listeners (#7795)
This is useful when updating an config entry with no services, and the
expected behavior is that envoy closes all listeners and clusters.

We also allow empty routes because ingress gateways name route
configurations based on the port of the listener, so it is important we
remove any stale routes. Then, if a new listener with an old port is
added, we will not have to deal with stale routes hanging around routing
to the wrong place.

Endpoints are associated with clusters, and thus by deleting the
clusters we don't have to care about sending empty endpoint responses.
2020-05-07 16:19:25 -05:00
Chris Piraino 0bd5618cb2 Cleanup proxycfg for TLS
- Use correct enterprise metadata for finding config entry
- nil out cancel functions on config snapshot copy
- Look at HostsSet when checking validity
2020-05-07 10:22:57 -05:00
Chris Piraino 5105bf3d67
Require individual services in ingress entry to match protocols (#7774)
We require any non-wildcard services to match the protocol defined in
the listener on write, so that we can maintain a consistent experience
through ingress gateways. This also helps guard against accidental
misconfiguration by a user.

- Update tests that require an updated protocol for ingress gateways
2020-05-06 16:09:24 -05:00
Freddy b069887b2a
Remove timeout and call to Fatal from goroutine (#7797) 2020-05-06 14:33:17 -06:00
Chris Piraino 0c22eacca8 Add TLS field to ingress API structs
- Adds test in api and command/config/write packages
2020-05-06 15:12:02 -05:00
Chris Piraino 30792e933b Add test for adding DNSSAN for ConnectCALeaf cache type 2020-05-06 15:12:02 -05:00
Chris Piraino 0b9ba9660d Validate hosts input in ingress gateway config entry
We can only allow host names that are valid domain names because we put
these hosts into a DNSSAN. In addition, we validate that the wildcard
specifier '*' is only present as the leftmost label to allow for a
wildcard DNSSAN and associated wildcard Host routing in the ingress
gateway proxy.
2020-05-06 15:12:02 -05:00
Kyle Havlovitz f14c54e25e Add TLS option and DNS SAN support to ingress config
xds: Only set TLS context for ingress listener when requested
2020-05-06 15:12:02 -05:00
Chris Piraino 905279f5d1 A proxy-default config entry only exists in the default namespace 2020-05-06 15:06:14 -05:00
Chris Piraino d498a0afc9 Correctly set a namespace label in the required domain for xds routes
If an upstream is not in the default namespace, we expect DNS requests
to be served over "<service-name>.ingress.<namespace>.*"
2020-05-06 15:06:14 -05:00
Chris Piraino 114a18e890 Remove outdated comment 2020-05-06 15:06:14 -05:00
Chris Piraino d8517bd6fd Better document wildcard specifier interactions 2020-05-06 15:06:14 -05:00
Chris Piraino 45e635286a Re-add comment on connect-proxy virtual hosts 2020-05-06 15:06:14 -05:00
Kyle Havlovitz f9672f9bf1 Make sure IngressHosts isn't parsed during JSON decode 2020-05-06 15:06:14 -05:00
Chris Piraino c44f877758 Comment why it is ok to expect upstreams slice to not be empty 2020-05-06 15:06:13 -05:00
Chris Piraino 881760f701 xds: Use only the port number as the configured route name
This removes duplication of protocol from the stats_prefix
2020-05-06 15:06:13 -05:00
Kyle Havlovitz 89e6b16815 Filter wildcard gateway services to match listener protocol
This now requires some type of protocol setting in ingress gateway tests
to ensure the services are not filtered out.

- small refactor to add a max(x, y) function
- Use internal configEntryTxn function and add MaxUint64 to lib
2020-05-06 15:06:13 -05:00
Chris Piraino f40833d094 Allow Hosts field to be set on an ingress config entry
- Validate that this cannot be set on a 'tcp' listener nor on a wildcard
service.
- Add Hosts field to api and test in consul config write CLI
- xds: Configure envoy with user-provided hosts from ingress gateways
2020-05-06 15:06:13 -05:00
Chris Piraino b73a13fc9e Remove service_subset field from ingress config entry
We decided that this was not a useful MVP feature, and just added
unnecessary complexity
2020-05-06 15:06:13 -05:00
Kyle Havlovitz 711d1389aa Support multiple listeners referencing the same service in gateway definitions 2020-05-06 15:06:13 -05:00
Kyle Havlovitz 247f9eaf13 Allow ingress gateways to route traffic based on Host header
This commit adds the necessary changes to allow an ingress gateway to
route traffic from a single defined port to multiple different upstream
services in the Consul mesh.

To do this, we now require all HTTP requests coming into the ingress
gateway to specify a Host header that matches "<service-name>.*" in
order to correctly route traffic to the correct service.

- Differentiate multiple listener's route names by port
- Adds a case in xds for allowing default discovery chains to create a
  route configuration when on an ingress gateway. This allows default
  services to easily use host header routing
- ingress-gateways have a single route config for each listener
  that utilizes domain matching to route to different services.
2020-05-06 15:06:13 -05:00
R.B. Boyer a854e4d9c5
acl: oss plumbing to support auth method namespace rules in enterprise (#7794)
This includes website docs updates.
2020-05-06 13:48:04 -05:00
R.B. Boyer 3242d0816d
test: make the kube auth method test helper use freeport (#7788) 2020-05-05 16:55:21 -05:00
Hans Hasselberg 096a2f2f02 network_segments: stop advertising segment tags 2020-05-05 21:32:05 +02:00
Hans Hasselberg 995a24b8e4 agent: refactor to use a single addrFn 2020-05-05 21:08:10 +02:00
Hans Hasselberg 6994c0d47f agent: rename local/global to src/dst 2020-05-05 21:07:34 +02:00
Chris Piraino 69b44fb942
Construct a default destination if one does not exist for service-router (#7783) 2020-05-05 10:49:50 -05:00
R.B. Boyer 22eb016153
acl: add MaxTokenTTL field to auth methods (#7779)
When set to a non zero value it will limit the ExpirationTime of all
tokens created via the auth method.
2020-05-04 17:02:57 -05:00
R.B. Boyer ca52ba7068
acl: add DisplayName field to auth methods (#7769)
Also add a few missing acl fields in the api.
2020-05-04 15:18:25 -05:00
Hans Hasselberg c4093c87cc
agent: don't let left nodes hold onto their node-id (#7747) 2020-05-04 18:39:08 +02:00
Matt Keeler daec810e34
Merge pull request #7714 from hashicorp/oss-sync/msp-agent-token 2020-05-04 11:33:50 -04:00
Matt Keeler cbe3a70f56
Update enterprise configurations to be in OSS
This will emit warnings about the configs not doing anything but still allow them to be parsed.

This also added the warnings for enterprise fields that we already had in OSS but didn’t change their enforcement behavior. For example, attempting to use a network segment will cause a hard error in OSS.
2020-05-04 10:21:05 -04:00
R.B. Boyer 9533451a63
acl: refactor the authmethod.Validator interface (#7760)
This is a collection of refactors that make upcoming PRs easier to digest.

The main change is the introduction of the authmethod.Identity struct.
In the one and only current auth method (type=kubernetes) all of the
trusted identity attributes are both selectable and projectable, so they
were just passed around as a map[string]string.

When namespaces were added, this was slightly changed so that the
enterprise metadata can also come back from the login operation, so
login now returned two fields.

Now with some upcoming auth methods it won't be true that all identity
attributes will be both selectable and projectable, so rather than
update the login function to return 3 pieces of data it seemed worth it
to wrap those fields up and give them a proper name.
2020-05-01 17:35:28 -05:00
R.B. Boyer 54ba8e3868
acl: change authmethod.Validator to take a logger (#7758) 2020-05-01 15:55:26 -05:00
R.B. Boyer 8927b54121
test: move some test helpers over from enterprise (#7754) 2020-05-01 14:52:15 -05:00
R.B. Boyer b282268408
sdk: extracting testutil.RequireErrorContains from various places it was duplicated (#7753) 2020-05-01 11:56:34 -05:00
Hans Hasselberg 51549bd232
rpc: oss changes for network area connection pooling (#7735) 2020-04-30 22:12:17 +02:00
Freddy 021f0ee36e
Watch fallback channel for gateways that do not exist (#7715)
Also ensure that WatchSets in tests are reset between calls to watchFired. 
Any time a watch fires, subsequent calls to watchFired on the same WatchSet
will also return true even if there were no changes.
2020-04-29 16:52:27 -06:00
Matt Keeler 7a4c73acaf
Updates to allow for using an enterprise specific token as the agents token
This is needed to allow for managed Consul instances to register themselves in the catalog with one of the managed service provider tokens.
2020-04-28 09:44:26 -04:00
Matt Keeler bec3fb7c18
Some boilerplate to allow for ACL Bootstrap disabling configurability 2020-04-28 09:42:46 -04:00
Freddy 137a2c32c6
TLS Origination for Terminating Gateways (#7671) 2020-04-27 16:25:37 -06:00
freddygv 4710410cb5 Remove fallthrough 2020-04-27 12:00:14 -06:00
freddygv d1e6d668c2 Add authz filter when creating filterchain 2020-04-27 11:08:41 -06:00
freddygv 034d7d83d4 Fix snapshot IsEmpty 2020-04-27 11:08:41 -06:00
freddygv 3afe816a94 Clean up dead code, issue addressed by passing ws to serviceGatewayNodes 2020-04-27 11:08:41 -06:00
Freddy 3b1b24c2ce Update agent/proxycfg/state_test.go 2020-04-27 11:08:41 -06:00
freddygv eddd5bd73b PR comments 2020-04-27 11:08:41 -06:00
freddygv 77bb2f1002 Fix internal endpoint test 2020-04-27 11:08:41 -06:00
freddygv d82e7e8c2a Fix listener error handling 2020-04-27 11:08:41 -06:00
freddygv 6abc71f915 Skip filter chain creation if no client cert 2020-04-27 11:08:41 -06:00
freddygv 915db10903 Avoid deleting mappings for services linked to other gateways on dereg 2020-04-27 11:08:41 -06:00
freddygv cd28d4125d Re-fix bug in CheckConnectServiceNodes 2020-04-27 11:08:41 -06:00
freddygv 09a8e5f36d Use golden files for gateway certs and fix listener test flakiness 2020-04-27 11:08:41 -06:00
freddygv 840d27a9d5 Un-nest switch in gateway update handler 2020-04-27 11:08:40 -06:00
freddygv c0e1751878 Allow terminating-gateway to setup listener before servicegroups are known 2020-04-27 11:08:40 -06:00
freddygv 913b13f31f Add subset support 2020-04-27 11:08:40 -06:00
freddygv 9f233dece2 Fix ConnectQueryBlocking test 2020-04-27 11:08:40 -06:00
freddygv 86342e4bca Fix bug in CheckConnectServiceNodes
Previously, if a blocking query called CheckConnectServiceNodes
before the gateway-services memdb table had any entries,
a nil watchCh would be returned when calling serviceTerminatingGatewayNodes.
This means that the blocking query would not fire if a gateway config entry
was added after the watch started.

In cases where the blocking query started on proxy registration,
the proxy could potentially never become aware of an upstream endpoint
if that upstream was going to be represented by a gateway.
2020-04-27 11:08:40 -06:00
freddygv 219c78e586 Add xds cluster/listener/endpoint management 2020-04-27 11:08:40 -06:00
freddygv 24207226ca Add proxycfg state management for terminating-gateways 2020-04-27 11:07:06 -06:00
freddygv c9385129ae Require service:read to read terminating-gateway config 2020-04-27 11:07:06 -06:00
Matt Keeler a1648c61ae
A couple testing helper updates (#7694) 2020-04-27 12:17:38 -04:00
Kit Patella df14a7c694
Merge pull request #7699 from pierresouchay/fix_comment_misplaced
Fixed comment on wrong line
2020-04-24 10:09:58 -07:00
Chris Piraino ecc8a2d6f7 Allow ingress gateways to route through mesh gateways
- Adds integration test for mesh gateways local + remote modes with ingress
- ingress golden files updated for mesh gateway endpoints
2020-04-24 09:31:32 -05:00
Chris Piraino cb9df538d5 Add all the xds ingress tests
This commit copies many of the connect-proxy xds testcases and reuses
for ingress gateways. This allows us to more easily see changes to the
envoy configuration when make updates to ingress gateways.
2020-04-24 09:31:32 -05:00
Chris Piraino 0ca9b606e8 Pull out setupTestVariationConfigEntriesAndSnapshot in proxycfg
This allows us to reuse the same variations for ingress gateway testing
2020-04-24 09:31:32 -05:00
Kyle Havlovitz e7b1ee55de Add http routing support and integration test to ingress gateways 2020-04-24 09:31:32 -05:00
Hans Hasselberg 1194fe441f
auto_encrypt: add validations for auto_encrypt.{tls,allow_tls} (#7704)
Fixes https://github.com/hashicorp/consul/issues/7407.
2020-04-24 15:51:38 +02:00
Pierre Souchay 5e79efc80f Fixed comment on wrong line.
While investigating and fixing an issue on our 1.5.1 branch,
I saw you also/already fixed the bug I found (tags not updated
for existing servers), but comment is misplaced.
2020-04-24 01:15:15 +02:00
Freddy 3956cff60f
Fix check deletion in anti-entropy sync (#7690)
* Incorporate entMeta into service equality check
2020-04-23 10:16:50 -06:00
Daniel Nephin d6e22a77e3 Remove deadcode
This UnmarshalJSON was never called. The decode function is passed a map[string]interface
so it has no way of knowing that this function exists.

Tested by adding a panic to this function and watching the tests pass.

I attempted to use this Unmarshal function by passing in the type, however the tests
showed that it does not work. The test was failing to parse the request.

If the performance of this endpoint is indeed critical we can solve the problem by adding
all the fields to the request struct and handling the normalziation without a custom Unmarshal.
2020-04-22 16:48:28 -04:00
Daniel Nephin ff0d894101 agent: remove deadcode that called lib.TranslateKeys
Move the last remaining function from agent/config.go to the one place
it was called.
2020-04-22 13:41:43 -04:00
Chris Piraino 115d2d5db5
Expect default enterprise metadata in gateway tests (#7664)
This makes it so that both OSS and enterprise tests pass correctly

In the api tests, explicitly set namespace to empty string so that tests
can be shared.
2020-04-20 09:02:35 -05:00
Kit Patella ccece5cd21 http: rename paresTokenResolveProxy to parseTokenWithDefault 2020-04-17 13:35:24 -07:00
Kit Patella e2467f4b2c
Merge pull request #7656 from hashicorp/feature/audit/oss-merge
agent: stub out auditing functionality in OSS
2020-04-17 13:33:06 -07:00
Kit Patella 3b105435b8 agent,config: port enterprise only fields to embedded enterprise structs 2020-04-17 13:27:39 -07:00
Daniel Nephin 67d14d8349
Merge pull request #7641 from hashicorp/dnephin/agent-cache-request-info
agent/cache: reduce function arguments by removing duplicates
2020-04-17 14:10:49 -04:00
Chris Piraino 6ef8ae9965
Fix bug where non-typical services are associated with gateways (#7662)
On every service registration, we check to see if a service should be
assassociated to a wildcard gateway-service. This fixes an issue where
we did not correctly check to see if the service being registered was a
"typical" service or not.
2020-04-17 11:24:34 -05:00
Daniel Nephin 81755c860a agent/cache: remove error return from fetch
A previous change removed the only error, so the return value can be
removed now.
2020-04-17 11:55:01 -04:00
Daniel Nephin 4ef9fc9f27 agent/cache: reduce function arguments by removing duplicates
A few of the unexported functions in agent/cache took a large number of
arguments. These arguments were effectively overrides for values that
were provided in RequestInfo.

By using a struct we can not only reduce the number of arguments, but
also simplify the logic by removing the need for overrides.
2020-04-17 11:35:07 -04:00
Kit Patella 4a86cb12c1 config/runtime: fix an extra field in config sanitize 2020-04-16 16:37:25 -07:00
Daniel Nephin 5fe7043439 agent/cache: Make all cache options RegisterOptions
Previously the SupportsBlocking option was specified by a method on the
type, and all the other options were specified from RegisterOptions.

This change moves RegisterOptions to a method on the type, and moves
SupportsBlocking into the options struct.

Currently there are only 2 cache-types. So all cache-types can implement
this method by embedding a struct with those predefined values. In the
future if a cache type needs to be registered more than once with different
options it can remove the embedded type and implement the method in a way
that allows for paramaterization.
2020-04-16 18:56:34 -04:00
Kit Patella 927f584761 agent: stub out auditing functionality in OSS 2020-04-16 15:07:52 -07:00
Kyle Havlovitz e9e8c0e730
Ingress Gateways for TCP services (#7509)
* Implements a simple, tcp ingress gateway workflow

This adds a new type of gateway for allowing Ingress traffic into Connect from external services.

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
2020-04-16 14:00:48 -07:00
Daniel Nephin f46d1b5c94 agent/structs: Remove ServiceID.Init and CheckID.Init
The Init method provided the same functionality as the New constructor.
The constructor is both more widely used, and more idiomatic, so remove
the Init method.

This change is in preparation for fixing printing of these IDs.
2020-04-15 12:09:56 -04:00
sasha ac9b330f6b
add DNSSAN and IPSAN to cache key (#7597) 2020-04-15 10:11:11 -05:00
Matt Keeler 6a78c24d67
Update the Client code to use the common version checking infra… (#7558)
Also reduce the log level of some version checking messages on the server as they can be pretty noisy during upgrades and really are more for debugging purposes.
2020-04-14 11:54:27 -04:00
Matt Keeler da893c36a1
Allow the bootstrap endpoint to be disabled in enterprise. (#7614) 2020-04-14 11:45:39 -04:00
Daniel Nephin 89f41bddfe Remove TTL from cacheEntryExpiry
This should very slightly reduce the amount of memory required to store each item in
the cache.

It will also enable setting different TTLs based on the type of result. For example
we may want to use a shorter TTL when the result indicates the resource does not exist,
as storing these types of records could easily lead to a DOS caused by
OOM.
2020-04-13 13:10:38 -04:00
Daniel Nephin 7246d8b6cb agent/cache: Reduce differences between notify implementations
These two notify functions are very similar. There appear to be just
enough differences that trying to parameterize the differences may not
improve things.

For now, reduce some of the cosmetic differences so that the material
differences are more obvious.
2020-04-13 13:10:38 -04:00
Daniel Nephin 66fbb13976 agent/cache: Inline the refresh function to make recursion more obvious
fetch is already an exceptionally long function, but hiding the
recrusion in a function call likely does not help.
2020-04-13 13:10:38 -04:00
Daniel Nephin faeaed5d0c agent/cache: Make the return values of getEntryLocked more obvious
Use named returned so that the caller has a better idea of what these
bools mean.

Return early to reduce the scope, and make it more obvious what values
are returned in which cases. Also reduces the number of conditional
expressions in each case.
2020-04-13 13:10:38 -04:00
Daniel Nephin e9e45545dd agent/cache: Small formatting improvements to improve readability
Remove Cache.entryKey which called a single function.
Format multiline struct creation one field per line.
2020-04-13 12:34:11 -04:00
Daniel Nephin 329d76fd0e Remove SnapshotRPC passthrough
The caller has access to the delegate, so we do not gain anything by
wrapping the call in Agent.
2020-04-13 12:32:57 -04:00
Daniel Nephin 1f25bf88b8
Merge pull request #7596 from hashicorp/dnephin/agent-cache-type-entry
agent/cache: move typeEntry lookup to the edge
2020-04-13 12:24:07 -04:00
Pierre Souchay 1b4218a068
fix flaky TestReplication_FederationStates test due to race conditions (#7612)
The test had two racy bugs related to memdb references.

The first was when we initially populated data and retained the FederationState objects in a slice. Due to how the `inmemCodec` works these were actually the identical objects passed into memdb.

The second was that the `checkSame` assertion function was reading from memdb and setting the RaftIndexes to zeros to aid in equality checks. This was mutating the contents of memdb which is a no-no.

With this fix, the command:
```
i=0; while /usr/local/bin/go test -count=1 -timeout 30s github.com/hashicorp/consul/agent/consul -run '^(TestReplication_FederationStates)$'; do i=$((i + 1)); printf "$i "; done
```
That used to break on my machine in less than 20 runs is now running 150+ times without any issue.

Might also fix #7575
2020-04-09 15:42:41 -05:00
Pierre Souchay 4a6569a4e3
tests: change default http_max_conns_per_client to 250 to ease tests (#7625)
On recent Mac OS versions, the ulimit defaults to 256 by default, but many
systems (eg: some Linux distributions) often limit this value to 1024.

On validation of configuration, Consul now validates that the number of
allowed files descriptors is bigger than http_max_conns_per_client.

This make some unit tests failing on Mac OS.
Use a less important value in unit test, so tests runs well by default
on Mac OS without need for tuning the OS.
2020-04-09 11:11:42 +02:00
Freddy 9eb1867fbb
Terminating gateway discovery (#7571)
* Enable discovering terminating gateways

* Add TerminatingGatewayServices to state store

* Use GatewayServices RPC endpoint for ingress/terminating
2020-04-08 12:37:24 -06:00
Freddy aae14b3951
Add decode rules for Expose cfg in service-defaults (#7611) 2020-04-07 19:37:47 -06:00
Matt Keeler 0e7d3d93b3
Enable filtering language support for the v1/connect/intentions… (#7593)
* Enable filtering language support for the v1/connect/intentions listing API

* Update website for filtering of Intentions

* Update website/source/api/connect/intentions.html.md
2020-04-07 11:48:44 -04:00
Daniel Nephin 8549cc2d99
Merge pull request #7598 from pierresouchay/preallocation_of_dns_meta
Pre-allocations of DNS meta to avoid several allocations
2020-04-06 14:00:32 -04:00
Pierre Souchay d1d016d61d
[LINT] Close resp.Body to avoid linter complaining (#7600) 2020-04-06 09:11:04 -04:00
Pierre Souchay c9e01ed0a3 Pre-allocations of DNS meta to avoid several allocations 2020-04-05 11:12:41 +02:00
Daniel Nephin c9a87be6ee agent/cache: move typeEntry lookup to the edge
This change moves all the typeEntry lookups to the first step in the exported methods,
and makes unexporter internals accept the typeEntry struct.

This change is primarily intended to make it easier to extract the container of caches
from the Cache type.

It may incidentally reduce locking in fetch, but that was not a goal.
2020-04-03 16:01:56 -04:00
Pierre Souchay 73056fecf8 Fixed unstable test TestForwardSignals()
Sometimes, in the CI, it could receive a SIGURG, producing this line:

  FAIL: TestForwardSignals/signal-interrupt (0.06s)
        util_test.go:286: expected to read line "signal: interrupt" but got "signal: urgent I/O condition"

Only forward the signals we test to avoid this kind of false positive

Example of such unstable errors in CI:
https://circleci.com/gh/hashicorp/consul/153571
2020-04-03 14:23:03 +02:00
Pierre Souchay 09e638a9c6
tests: more tolerance to latency for unstable test `TestCacheNotifyPolling()`. (#7574) 2020-04-03 10:29:38 +02:00
Matt Keeler 8aec09aa8f
Ensure that token clone copies the roles (#7577) 2020-04-02 12:09:35 -04:00
Chris Piraino 584f90bbeb
Fix flapping of mesh gateway connect-service watches (#7575) 2020-04-02 10:12:13 -05:00
Pierre Souchay 2a8bf45e38
agent: show warning when enable_script_checks is enabled without safty net (#7437)
In order to enforce a bit security on Consul agents, add a new method in agent
to highlight possible security issues.

This does not return an error for now, but might in the future.

For now, it detects issues such as:

https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations/

This would display this kind of messages:

```
2020-03-11T18:27:49.873+0100 [ERROR] agent: [SECURITY] issue: error="using enable-script-checks without ACLs and without allow_write_http_from is DANGEROUS, use enable-local-script-checks instead see https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations/"
```
2020-04-02 09:59:23 +02:00
Andy Lindeman fb0a990e4d
agent: rewrite checks with proxy address, not local service address (#7518)
Exposing checks is supposed to allow a Consul agent bound to a different
IP address (e.g., in a different Kubernetes pod) to access healthchecks
through the proxy while the underlying service binds to localhost. This
is an important security feature that makes sure no external traffic
reaches the service except through the proxy.

However, as far as I can tell, this is subtly broken in the case where
the Consul agent cannot reach the proxy over localhost.

If a proxy is configured with: `{ LocalServiceAddress: "127.0.0.1",
Checks: true }`, as is typical with a sidecar proxy, the Consul checks
are currently rewritten to `127.0.0.1:<random port>`. A Consul agent
that does not share the loopback address cannot reach this address. Just
to make sure I was not misunderstanding, I tried configuring the proxy
with `{ LocalServiceAddress: "<pod ip>", Checks: true }`. In this case,
while the checks are rewritten as expected and the agent can reach the
dynamic port, the proxy can no longer reach its backend because the
traffic is no longer on the loopback interface.

I think rewriting the checks to use `proxy.Address`, the proxy's own
address, is more correct in this case. That is the IP where the proxy
can be reached, both by other proxies and by a Consul agent running on
a different IP. The local service address should continue to use
`127.0.0.1` in most cases.
2020-04-02 09:35:43 +02:00
Andy Lindeman c1cb18c648
proxycfg: support path exposed with non-HTTP2 protocol (#7510)
If a proxied service is a gRPC or HTTP2 service, but a path is exposed
using the HTTP1 or TCP protocol, Envoy should not be configured with
`http2ProtocolOptions` for the cluster backing the path.

A situation where this comes up is a gRPC service whose healthcheck or
metrics route (e.g. for Prometheus) is an HTTP1 service running on
a different port. Previously, if these were exposed either using
`Expose: { Checks: true }` or `Expose: { Paths: ... }`, Envoy would
still be configured to communicate with the path over HTTP2, which would
not work properly.
2020-04-02 09:35:04 +02:00
Pierre Souchay be1c5c4b48
config: validate system limits against limits.http_max_conns_per_client (#7434)
I spent some time today on my local Mac to figure out why Consul 1.6.3+
was not accepting limits.http_max_conns_per_client.

This adds an explicit check on number of file descriptors to be sure
it might work (this is no guarantee as if many clients are reaching
the agent, it might consume even more file descriptors)

Anyway, many users are fighting with RLIMIT_NOFILE, having a clear
message would allow them to figure out what to fix.

Example of message (reload or start):

```
2020-03-11T16:38:37.062+0100 [ERROR] agent: Error starting agent: error="system allows a max of 512 file descriptors, but limits.http_max_conns_per_client: 8192 needs at least 8212"
```
2020-04-02 09:22:17 +02:00
Shaker Islam ac309d55f4
docs: document exported functions in agent.go (closes #7101) (#7366)
and fix one linter error
2020-04-01 22:52:23 +02:00
Pierre Souchay 08acd6a03e [FIX BUILD] fix build due to merge of #7562
Due to merge #7562, upstream does not compile anymore.

Error is:

ERRO Running error: gofmt: analysis skipped: errors in package: [/Users/p.souchay/go/src/github.com/hashicorp/consul/agent/config_endpoint_test.go:188:33: too many arguments]
2020-04-01 18:29:45 +02:00
Daniel Nephin 0d8edc3e27
Merge pull request #7562 from hashicorp/dnephin/remove-tname-from-name
testing: Remove old default value from NewTestAgent() calls
2020-04-01 11:48:45 -04:00
Daniel Nephin 190fc3c732
Merge pull request #7533 from hashicorp/dnephin/xds-server-1
agent/xds: small cleanup
2020-04-01 11:24:50 -04:00
Emre Savcı 2083b7b04d
agent: add len, cap while initializing arrays 2020-04-01 10:54:51 +02:00
Daniel Nephin e759daafdd Rename NewTestAgentWithFields to StartTestAgent
This function now only starts the agent.

Using:

git grep -l 'StartTestAgent(t, true,' | \
        xargs sed -i -e 's/StartTestAgent(t, true,/StartTestAgent(t,/g'
2020-03-31 17:14:55 -04:00
Daniel Nephin f9f6b14533 Convert the remaining calls to NewTestAgentWithFields
After removing the t.Name() parameter with sed, convert the last few tests which
use a custom name to call NewTestAgentWithFields instead.
2020-03-31 17:14:55 -04:00
Daniel Nephin 0c409d460f
Merge pull request #7470 from hashicorp/dnephin/dns-unused-params
dns: Remove a few unused function parameters
2020-03-31 16:56:19 -04:00