Commit Graph

11200 Commits

Author SHA1 Message Date
Daniel Nephin 89f41bddfe Remove TTL from cacheEntryExpiry
This should very slightly reduce the amount of memory required to store each item in
the cache.

It will also enable setting different TTLs based on the type of result. For example
we may want to use a shorter TTL when the result indicates the resource does not exist,
as storing these types of records could easily lead to a DOS caused by
OOM.
2020-04-13 13:10:38 -04:00
Daniel Nephin 7246d8b6cb agent/cache: Reduce differences between notify implementations
These two notify functions are very similar. There appear to be just
enough differences that trying to parameterize the differences may not
improve things.

For now, reduce some of the cosmetic differences so that the material
differences are more obvious.
2020-04-13 13:10:38 -04:00
Daniel Nephin 66fbb13976 agent/cache: Inline the refresh function to make recursion more obvious
fetch is already an exceptionally long function, but hiding the
recrusion in a function call likely does not help.
2020-04-13 13:10:38 -04:00
Daniel Nephin faeaed5d0c agent/cache: Make the return values of getEntryLocked more obvious
Use named returned so that the caller has a better idea of what these
bools mean.

Return early to reduce the scope, and make it more obvious what values
are returned in which cases. Also reduces the number of conditional
expressions in each case.
2020-04-13 13:10:38 -04:00
Daniel Nephin e9e45545dd agent/cache: Small formatting improvements to improve readability
Remove Cache.entryKey which called a single function.
Format multiline struct creation one field per line.
2020-04-13 12:34:11 -04:00
Daniel Nephin 329d76fd0e Remove SnapshotRPC passthrough
The caller has access to the delegate, so we do not gain anything by
wrapping the call in Agent.
2020-04-13 12:32:57 -04:00
Daniel Nephin 6b860c926f
Merge pull request #7608 from hashicorp/dnephin/grpc-default-scheme
command/envoy: enable TLS when CONSUL_HTTP_ADDR=https://...
2020-04-13 12:30:26 -04:00
Daniel Nephin 1f25bf88b8
Merge pull request #7596 from hashicorp/dnephin/agent-cache-type-entry
agent/cache: move typeEntry lookup to the edge
2020-04-13 12:24:07 -04:00
Matt Keeler 4468002f37
Update CHANGELOG.md 2020-04-13 11:21:18 -04:00
Pierre Souchay 1b4218a068
fix flaky TestReplication_FederationStates test due to race conditions (#7612)
The test had two racy bugs related to memdb references.

The first was when we initially populated data and retained the FederationState objects in a slice. Due to how the `inmemCodec` works these were actually the identical objects passed into memdb.

The second was that the `checkSame` assertion function was reading from memdb and setting the RaftIndexes to zeros to aid in equality checks. This was mutating the contents of memdb which is a no-no.

With this fix, the command:
```
i=0; while /usr/local/bin/go test -count=1 -timeout 30s github.com/hashicorp/consul/agent/consul -run '^(TestReplication_FederationStates)$'; do i=$((i + 1)); printf "$i "; done
```
That used to break on my machine in less than 20 runs is now running 150+ times without any issue.

Might also fix #7575
2020-04-09 15:42:41 -05:00
Andrea Scarpino bf601842c2
docs: document consulPrefix properly (#7603) 2020-04-09 22:02:23 +02:00
danielehc 0fef193278
Adding API version for example call (#7626) 2020-04-09 21:25:22 +02:00
Hans Hasselberg 66415be90e
connect: support envoy 1.14.1 (#7624) 2020-04-09 20:58:22 +02:00
Pierre Souchay 4a6569a4e3
tests: change default http_max_conns_per_client to 250 to ease tests (#7625)
On recent Mac OS versions, the ulimit defaults to 256 by default, but many
systems (eg: some Linux distributions) often limit this value to 1024.

On validation of configuration, Consul now validates that the number of
allowed files descriptors is bigger than http_max_conns_per_client.

This make some unit tests failing on Mac OS.
Use a less important value in unit test, so tests runs well by default
on Mac OS without need for tuning the OS.
2020-04-09 11:11:42 +02:00
Blake Covarrubias 7f03949424 docs: Fix broken link to Nomad Consul Connect guide 2020-04-08 14:59:36 -07:00
Freddy 9eb1867fbb
Terminating gateway discovery (#7571)
* Enable discovering terminating gateways

* Add TerminatingGatewayServices to state store

* Use GatewayServices RPC endpoint for ingress/terminating
2020-04-08 12:37:24 -06:00
Freddy aae14b3951
Add decode rules for Expose cfg in service-defaults (#7611) 2020-04-07 19:37:47 -06:00
Daniel Nephin 8b6861518f Fix CONSUL_HTTP_ADDR=https not enabling TLS
Use the config instead of attempting to reparse the env var.
2020-04-07 18:16:53 -04:00
Daniel Nephin 0888c6575b Step 3: fix a bug in api.NewClient and fix the tests
The api client should never rever to HTTP if the user explicitly
requested TLS. This change broke some tests because the tests always use
an non-TLS http server, but some tests explicitly enable TLS.
2020-04-07 18:02:56 -04:00
Iryna Shustava 74bd138bae
docs: Add Helm docs for auto-encrypt and external servers (#7595)
* docs: Add Helm docs for auto-encrypt and external servers
2020-04-07 14:41:16 -07:00
Luke Kysow c03e314c16
Merge pull request #7586 from hashicorp/helm-docs
Document bootstrapACLs deprecation
2020-04-07 14:02:12 -07:00
Daniel Nephin 1a8ffec6a7 Step 2: extract the grpc address logic and a new type
The new grpcAddress function contains all of the logic to translate the
command line options into the values used in the template.

The new type has two advantages.

1. It introduces a logical grouping of values in the BootstrapTplArgs
   struct which is exceptionally large. This grouping makes the struct
   easier to understand because each set of nested values can be seen
   as a single entity.
2. It gives us a reasonable return value for this new function.
2020-04-07 16:36:51 -04:00
Daniel Nephin 830b4a15f6 Step 1: move all the grpcAddr logic into the same spot
There is no reason a reader should have to jump around to find this value. It is only
used in 1 place
2020-04-07 15:53:12 -04:00
Alvin Huang 1bb4e1ccd2
filter out non go branches from the 'go-tests' workflow (#7606) 2020-04-07 15:39:23 -04:00
Matt Keeler 32daa2b27c
Update CHANGELOG.md 2020-04-07 11:49:36 -04:00
Matt Keeler 0e7d3d93b3
Enable filtering language support for the v1/connect/intentions… (#7593)
* Enable filtering language support for the v1/connect/intentions listing API

* Update website for filtering of Intentions

* Update website/source/api/connect/intentions.html.md
2020-04-07 11:48:44 -04:00
Daniel Nephin 8549cc2d99
Merge pull request #7598 from pierresouchay/preallocation_of_dns_meta
Pre-allocations of DNS meta to avoid several allocations
2020-04-06 14:00:32 -04:00
Luke Kysow df9d88831d
Update website/source/docs/platform/k8s/helm.html.md
Co-Authored-By: Iryna Shustava <ishustava@users.noreply.github.com>
2020-04-06 09:16:49 -07:00
Pierre Souchay d1d016d61d
[LINT] Close resp.Body to avoid linter complaining (#7600) 2020-04-06 09:11:04 -04:00
Pierre Souchay c9e01ed0a3 Pre-allocations of DNS meta to avoid several allocations 2020-04-05 11:12:41 +02:00
Jono Sosulska 93509690be
Change style to match "join" singular (#7569)
* Change style to match "join" singular

- Replaced "(Consul) cluster" with  "Consul Datacenter"
- Removed "ing" so the feature fits "Consul Auto-join", and that the tense is correct.

Co-authored-by: danielehc <40759828+danielehc@users.noreply.github.com>
2020-04-03 16:04:07 -04:00
Daniel Nephin c9a87be6ee agent/cache: move typeEntry lookup to the edge
This change moves all the typeEntry lookups to the first step in the exported methods,
and makes unexporter internals accept the typeEntry struct.

This change is primarily intended to make it easier to extract the container of caches
from the Cache type.

It may incidentally reduce locking in fetch, but that was not a goal.
2020-04-03 16:01:56 -04:00
David Yu b51ad875c3
[docs] Built-in Proxies not meant for production (#7579)
* [docs] Built-in Proxies not meant for production

* Adding link to Envoy for Connect

* Update website/source/docs/connect/proxies/built-in.md

Co-Authored-By: Blake Covarrubias <blake@covarrubi.as>

* Revising note

* Update website/source/docs/connect/proxies/built-in.md

period

Co-Authored-By: Hans Hasselberg <me@hans.io>

Co-authored-by: Blake Covarrubias <blake@covarrubi.as>
Co-authored-by: Hans Hasselberg <me@hans.io>
2020-04-03 11:52:05 -07:00
Daniel Nephin b9cb1e93c3
Merge pull request #7589 from pierresouchay/fix_unstable_test_TestForwardSignals
Fixed unstable test TestForwardSignals()
2020-04-03 12:43:28 -04:00
Pierre Souchay 73056fecf8 Fixed unstable test TestForwardSignals()
Sometimes, in the CI, it could receive a SIGURG, producing this line:

  FAIL: TestForwardSignals/signal-interrupt (0.06s)
        util_test.go:286: expected to read line "signal: interrupt" but got "signal: urgent I/O condition"

Only forward the signals we test to avoid this kind of false positive

Example of such unstable errors in CI:
https://circleci.com/gh/hashicorp/consul/153571
2020-04-03 14:23:03 +02:00
Pierre Souchay 09e638a9c6
tests: more tolerance to latency for unstable test `TestCacheNotifyPolling()`. (#7574) 2020-04-03 10:29:38 +02:00
Luke Kysow 08df582e20
Document bootstrapACLs deprecation 2020-04-02 16:58:55 -07:00
Freddy b61214ef24
Fix regression with gateway registration and update docs (#7582) 2020-04-02 12:52:11 -06:00
Matt Keeler 8aec09aa8f
Ensure that token clone copies the roles (#7577) 2020-04-02 12:09:35 -04:00
Chris Piraino bcb7a89fda
Update CHANGELOG.md 2020-04-02 10:14:17 -05:00
Chris Piraino 584f90bbeb
Fix flapping of mesh gateway connect-service watches (#7575) 2020-04-02 10:12:13 -05:00
Pierre Souchay 2a8bf45e38
agent: show warning when enable_script_checks is enabled without safty net (#7437)
In order to enforce a bit security on Consul agents, add a new method in agent
to highlight possible security issues.

This does not return an error for now, but might in the future.

For now, it detects issues such as:

https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations/

This would display this kind of messages:

```
2020-03-11T18:27:49.873+0100 [ERROR] agent: [SECURITY] issue: error="using enable-script-checks without ACLs and without allow_write_http_from is DANGEROUS, use enable-local-script-checks instead see https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations/"
```
2020-04-02 09:59:23 +02:00
Andy Lindeman fb0a990e4d
agent: rewrite checks with proxy address, not local service address (#7518)
Exposing checks is supposed to allow a Consul agent bound to a different
IP address (e.g., in a different Kubernetes pod) to access healthchecks
through the proxy while the underlying service binds to localhost. This
is an important security feature that makes sure no external traffic
reaches the service except through the proxy.

However, as far as I can tell, this is subtly broken in the case where
the Consul agent cannot reach the proxy over localhost.

If a proxy is configured with: `{ LocalServiceAddress: "127.0.0.1",
Checks: true }`, as is typical with a sidecar proxy, the Consul checks
are currently rewritten to `127.0.0.1:<random port>`. A Consul agent
that does not share the loopback address cannot reach this address. Just
to make sure I was not misunderstanding, I tried configuring the proxy
with `{ LocalServiceAddress: "<pod ip>", Checks: true }`. In this case,
while the checks are rewritten as expected and the agent can reach the
dynamic port, the proxy can no longer reach its backend because the
traffic is no longer on the loopback interface.

I think rewriting the checks to use `proxy.Address`, the proxy's own
address, is more correct in this case. That is the IP where the proxy
can be reached, both by other proxies and by a Consul agent running on
a different IP. The local service address should continue to use
`127.0.0.1` in most cases.
2020-04-02 09:35:43 +02:00
Andy Lindeman c1cb18c648
proxycfg: support path exposed with non-HTTP2 protocol (#7510)
If a proxied service is a gRPC or HTTP2 service, but a path is exposed
using the HTTP1 or TCP protocol, Envoy should not be configured with
`http2ProtocolOptions` for the cluster backing the path.

A situation where this comes up is a gRPC service whose healthcheck or
metrics route (e.g. for Prometheus) is an HTTP1 service running on
a different port. Previously, if these were exposed either using
`Expose: { Checks: true }` or `Expose: { Paths: ... }`, Envoy would
still be configured to communicate with the path over HTTP2, which would
not work properly.
2020-04-02 09:35:04 +02:00
Pierre Souchay be1c5c4b48
config: validate system limits against limits.http_max_conns_per_client (#7434)
I spent some time today on my local Mac to figure out why Consul 1.6.3+
was not accepting limits.http_max_conns_per_client.

This adds an explicit check on number of file descriptors to be sure
it might work (this is no guarantee as if many clients are reaching
the agent, it might consume even more file descriptors)

Anyway, many users are fighting with RLIMIT_NOFILE, having a clear
message would allow them to figure out what to fix.

Example of message (reload or start):

```
2020-03-11T16:38:37.062+0100 [ERROR] agent: Error starting agent: error="system allows a max of 512 file descriptors, but limits.http_max_conns_per_client: 8192 needs at least 8212"
```
2020-04-02 09:22:17 +02:00
Daniel Nephin a6cede401a
Merge pull request #7572 from hashicorp/dnephin/ci-fix-go-mod-download
ci: Fix working_directory for go mod download
2020-04-01 18:57:21 -04:00
Daniel Nephin 625ce9baf2 ci: Fix working_directory for go mod download
The previous PR which added these was accidentally performing the download
in the root directory. For the api, and sdk directories it should be in done
in the same directory that will be used to run tests. Otherwise the
wrong dependencies will be downloaded which may add unnecessary time to
the CI run.
2020-04-01 17:02:23 -04:00
Shaker Islam ac309d55f4
docs: document exported functions in agent.go (closes #7101) (#7366)
and fix one linter error
2020-04-01 22:52:23 +02:00
R.B. Boyer c2f4063103 update changelog 2020-04-01 13:16:01 -05:00
Pierre Souchay 5703aadb59
[BUGFIX] Fix race condition in freeport (#7567)
This removes a race condition in reset since pendingPorts can be set to nil in reset()

If ticker is hit at wrong time, it would crash the unit test.

We ensure in reset to avoid this race condition by cancelling the goroutine using
killTicker chan.

We also properly clean up eveything, so garbage collector can work as expected.

To reproduce existing bug:
`while go test -timeout 30s github.com/hashicorp/consul/sdk/freeport -run '^(Test.*)$'; do go clean -testcache; done`

Will crash after a few 10s runs on my machine.

Error could be seen in unit tests sometimes:

[INFO] freeport: resetting the freeport package state
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1125536]

goroutine 25 [running]:
container/list.(*List).Len(...)
	/usr/local/Cellar/go/1.14/libexec/src/container/list/list.go:66
github.com/hashicorp/consul/sdk/freeport.checkFreedPortsOnce()
	/Users/p.souchay/go/src/github.com/hashicorp/consul/sdk/freeport/freeport.go:157 +0x86
github.com/hashicorp/consul/sdk/freeport.checkFreedPorts()
	/Users/p.souchay/go/src/github.com/hashicorp/consul/sdk/freeport/freeport.go:147 +0x71
created by github.com/hashicorp/consul/sdk/freeport.initialize
	/Users/p.souchay/go/src/github.com/hashicorp/consul/sdk/freeport/freeport.go:113 +0x2cf
FAIL	github.com/hashicorp/consul/sdk/freeport	1.607s
2020-04-01 13:14:33 -05:00