Commit Graph

1492 Commits

Author SHA1 Message Date
Todd Radel a18b6d5ab9 connect: store signingKeyId instead of authorityKeyId (#6005) 2019-06-27 16:47:22 +02:00
Aestek 81f8092a42 acl: allow service deregistration with node write permission (#5217)
With ACLs enabled if an agent is wiped and restarted without a leave
it can no longer deregister the services it had previously registered
because it no longer has the tokens the services were registered with.
To remedy that we allow service deregistration from tokens with node
write permission.
2019-06-27 14:24:34 +02:00
Akshay Ganeshen 98a35fbe69 dns: support alt domains for dns resolution (#5940)
this adds an option for an alt domain to be used with dns while migrating to a new consul domain.
2019-06-27 12:00:37 +02:00
Pierre Souchay 4eb73973b6 agent: added metadata information about servers into consul service description (#5455)
This allows have information about servers from HTTP APIs without
using the command line.
2019-06-26 23:46:47 +02:00
Sarah Christoff d3d92d76f3
ui: modify content path (#5950)
* Add ui-content-path flag

* tests complete, regex validator on string, index.html updated

* cleaning up debugging stuff

* ui: Enable ember environment configuration to be set via the go binary at runtime (#5934)

* ui: Only inject {{.ContentPath}} if we are makeing a prod build...

...otherwise we just use the current rootURL

This gets injected into a <base /> node which solves the assets path
problem but not the ember problem

* ui: Pull out the <base href=""> value and inject it into ember env

See previous commit:

The <base href=""> value is 'sometimes' injected from go at index
serve time. We pass this value down to ember by overwriting the ember
config that is injected via a <meta> tag. This has to be done before
ember bootup.

Sometimes (during testing and development, basically not production)
this is injected with the already existing value, in which case this
essentially changes nothing.

The code here is slightly abstracted away from our specific usage to
make it easier for anyone else to use, and also make sure we can cope
with using this same method to pass variables down from the CLI through
to ember in the future.

* ui: We can't use <base /> move everything to javascript (#5941)

Unfortuantely we can't seem to be able to use <base> and rootURL
together as URL paths will get doubled up (`ui/ui/`).

This moves all the things that we need to interpolate with .ContentPath
to the `startup` javascript so we can conditionally print out
`{{.ContentPath}}` in lots of places (now we can't use base)

* fixed when we serve index.html

* ui: For writing a ContentPath, we also need to cope with testing... (#5945)

...and potentially more environments

Testing has more additional things in a separate index.html in `tests/`

This make the entire thing a little saner and uses just javascriopt
template literals instead of a pseudo handbrake synatx for our
templating of these files.

Intead of just templating the entire file this way, we still only
template `{{content-for 'head'}}` and `{{content-for 'body'}}`
in this way to ensure we support other plugins/addons

* build: Loosen up the regex for retrieving the CONSUL_VERSION (#5946)

* build: Loosen up the regex for retrieving the CONSUL_VERSION

1. Previously the `sed` replacement was searching for the CONSUL_VERSION
comment at the start of a line, it no longer does this to allow for
indentation.
2. Both `grep` and `sed` where looking for the omment at the end of the
line. We've removed this restriction here. We don't need to remove it
right now, but if we ever put the comment followed by something here the
searching would break.
3. Added `xargs` for trimming the resulting version string. We aren't
using this already in the rest of the scripts, but we are pretty sure
this is available on most systems.

* ui: Fix erroneous variable, and also force an ember cache clean on build

1. We referenced a variable incorrectly here, this fixes that.
2. We also made sure that every `make` target clears ember's `tmp` cache
to ensure that its not using any caches that have since been edited
everytime we call a `make` target.

* added docs, fixed encoding

* fixed go fmt

* Update agent/config/config.go

Co-Authored-By: R.B. Boyer <public@richardboyer.net>

* Completed Suggestions

* run gofmt on http.go

* fix testsanitize

* fix fullconfig/hcl by setting correct 'want'

* ran gofmt on agent/config/runtime_test.go

* Update website/source/docs/agent/options.html.md

Co-Authored-By: Hans Hasselberg <me@hans.io>

* Update website/source/docs/agent/options.html.md

Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>

* remove contentpath from redirectFS struct
2019-06-26 11:43:30 -05:00
Pierre Souchay 0e907f5aa8 Support for maximum size for Output of checks (#5233)
* Support for maximum size for Output of checks

This PR allows users to limit the size of output produced by checks at the agent 
and check level.

When set at the agent level, it will limit the output for all checks monitored
by the agent.

When set at the check level, it can override the agent max for a specific check but
only if it is lower than the agent max.

Default value is 4k, and input must be at least 1.
2019-06-26 09:43:25 -06:00
Hans Hasselberg f13fe4b304
agent: transfer leadership when establishLeadership fails (#5247) 2019-06-19 14:50:48 +02:00
Aestek 24a0f2bba2 ae: use stale requests when performing full sync (#5873)
Read requests performed during anti antropy full sync currently target
the leader only. This generates a non-negligible load on the leader when
the DC is large enough and can be offloaded to the followers following
the "eventually consistent" policy for the agent state.
We switch the AE read calls to use stale requests with a small (2s)
MaxStaleDuration value and make sure we do not read too fast after a
write.
2019-06-17 18:05:47 +02:00
Matt Keeler 2557d7a6cc
Fix CAS operations on Services (#5971)
* Fix CAS operations on services

* Update agent/consul/state/catalog_test.go

Co-Authored-By: R.B. Boyer <public@richardboyer.net>
2019-06-17 10:41:04 -04:00
Paul Banks acfcc7daf4
Add rate limiting to RPCs sent within a server instance too (#5927) 2019-06-13 04:26:27 -05:00
Paul Banks ffcfdf29fc
Upgrade xDS (go-control-plane) API to support Envoy 1.10. (#5872)
* Upgrade xDS (go-control-plane) API to support Envoy 1.10.

This includes backwards compatibility shim to work around the ext_authz package rename in 1.10.

It also adds integration test support in CI for 1.10.0.

* Fix go vet complaints

* go mod vendor

* Update Envoy version info in docs

* Update website/source/docs/connect/proxies/envoy.md
2019-06-07 07:10:43 -05:00
Pierre Souchay 4a4c63bda0 Ensure Consul is IPv6 compliant (#5468) 2019-06-04 10:02:38 -04:00
Matt Keeler 2ba6c3ac00
Update links to envoy docs on xDS protocol (#5871) 2019-06-03 11:03:05 -05:00
R.B. Boyer 40336fd353
agent: fix several data races and bugs related to node-local alias checks (#5876)
The observed bug was that a full restart of a consul datacenter (servers
and clients) in conjunction with a restart of a connect-flavored
application with bring-your-own-service-registration logic would very
frequently cause the envoy sidecar service check to never reflect the
aliased service.

Over the course of investigation several bugs and unfortunate
interactions were corrected:

(1)

local.CheckState objects were only shallow copied, but the key piece of
data that gets read and updated is one of the things not copied (the
underlying Check with a Status field). When the stock code was run with
the race detector enabled this highly-relevant-to-the-test-scenario field
was found to be racy.

Changes:

 a) update the existing Clone method to include the Check field
 b) copy-on-write when those fields need to change rather than
    incrementally updating them in place.

This made the observed behavior occur slightly less often.

(2)

If anything about how the runLocal method for node-local alias check
logic was ever flawed, there was no fallback option. Those checks are
purely edge-triggered and failure to properly notice a single edge
transition would leave the alias check incorrect until the next flap of
the aliased check.

The change was to introduce a fallback timer to act as a control loop to
double check the alias check matches the aliased check every minute
(borrowing the duration from the non-local alias check logic body).

This made the observed behavior eventually go away when it did occur.

(3)

Originally I thought there were two main actions involved in the data race:

A. The act of adding the original check (from disk recovery) and its
   first health evaluation.

B. The act of the HTTP API requests coming in and resetting the local
   state when re-registering the same services and checks.

It took awhile for me to realize that there's a third action at work:

C. The goroutines associated with the original check and the later
   checks.

The actual sequence of actions that was causing the bad behavior was
that the API actions result in the original check to be removed and
re-added _without waiting for the original goroutine to terminate_. This
means for brief windows of time during check definition edits there are
two goroutines that can be sending updates for the alias check status.

In extremely unlikely scenarios the original goroutine sees the aliased
check start up in `critical` before being removed but does not get the
notification about the nearly immediate update of that check to
`passing`.

This is interlaced wit the new goroutine coming up, initializing its
base case to `passing` from the current state and then listening for new
notifications of edge triggers.

If the original goroutine "finishes" its update, it then commits one
more write into the local state of `critical` and exits leaving the
alias check no longer reflecting the underlying check.

The correction here is to enforce that the old goroutines must terminate
before spawning the new one for alias checks.
2019-05-24 13:36:56 -05:00
Freddy 6b31482333
Increase reliability of TestResetSessionTimerLocked_Renew 2019-05-24 13:54:51 -04:00
Pierre Souchay e892981418 agent: Improve startup message to avoid confusing users when no error occurs (#5896)
* Improve startup message to avoid confusing users when no error occurs

Several times, some users not very familiar with Consul get confused
by error message at startup:

  `[INFO] agent: (LAN) joined: 1 Err: <nil>`

Having `Err: <nil>` seems weird to many users, I propose to have the
following instead:

* Success: `[INFO] agent: (LAN) joined: 1`
* Error:   `[WARN] agent: (LAN) couldn't join: %d Err: ERROR`
2019-05-24 16:50:18 +02:00
Freddy 17e74985b0
Run TestServer_Expect on its own (#5890) 2019-05-23 19:52:33 -04:00
Freddy 6c19cacd42
Flaky test: ACLReplication_Tokens (#5891)
* Exclude non-go workflows while testing

* Wait for s2 global-management policy

* Revert "Exclude non-go workflows while testing"

This reverts commit 47a83cbe9f.
2019-05-23 19:52:02 -04:00
Freddy d4ea163b0b
Add retries to StatsFetcherTest (#5892) 2019-05-23 19:51:31 -04:00
Jack Pearkes 40cec98468
Release v1.5.1 2019-05-22 20:19:12 +00:00
freddygv 40b809bce3 Wait for s2 global-management policy 2019-05-21 17:58:37 -06:00
Freddy e9259ca97a
Change log line used for verification 2019-05-21 17:07:06 -06:00
Freddy d1c315fad9
Stop running TestLeader_ChangeServerID in parallel 2019-05-21 15:28:08 -06:00
Sarah Christoff 32b5992d0f Add retries around `obj` 2019-05-21 13:36:52 -05:00
Sarah Christoff 73d73e0e20 Add retries to all `obj` 2019-05-21 13:31:37 -05:00
Sarah Christoff 2a018e5e0a
Update agent/coordinate_endpoint_test.go
Co-Authored-By: Freddy <freddygv@users.noreply.github.com>
2019-05-17 14:32:50 -05:00
Sarah Christoff b96d9b01bd Update type assertion logic
Logic updated to evaluate with a boolean after the type assertion.
This allows us to check if the type assertion succeeded and be
more clear with errors.
2019-05-17 13:32:36 -05:00
Kyle Havlovitz 31bb9d67df
Set the dead node reclaim timer at 30s 2019-05-15 11:59:33 -07:00
Kyle Havlovitz 29eb83c9c2
Merge branch 'master' into change-node-id 2019-05-15 10:51:04 -07:00
Jack Pearkes 34eff659dc
Release v1.5.0 2019-05-08 18:34:08 +00:00
Matt Keeler dbc48ea3f7 Fixes race condition in Agent Cache (#5796)
* Fix race condition during a cache get

Check the entry we pulled out of the cache while holding the lock had Fetching set.
If it did then we should use the existing Waiter instead of calling fetch. The reason
this is better than just calling fetch is that fetch re-gets the entry out of the
entries map and the previous fetch may have finished. Therefore this prevents
erroneously starting a new fetch because we just missed the last update.

* Fix race condition fully

The first commit still allowed for the following scenario:

• No entry existing when checked in getWithIndex while holding the read lock
• Then by time we had reached fetch it had been created and finished.

* always use ok when returning

* comment mentioning the reading from entries.

* use cacheHit consistently
2019-05-07 11:15:49 +01:00
Matt Keeler dbf0a0f6c0
Copy the proxy config instead of direct assignment (#5786)
This prevents modifying the data in the state store which is supposed to be immutable.
2019-05-06 12:09:59 -04:00
R.B. Boyer 20eefeea11
acl: a role binding rule for a role that does not exist should be ignored (#5778)
I wrote the docs under this assumption but completely forgot to actually
enforce it.
2019-05-03 14:22:44 -05:00
R.B. Boyer b4371bcccd
acl: enforce that you cannot persist tokens and roles with missing links except during replication (#5779) 2019-05-02 15:02:21 -05:00
Matt Keeler 42d32db817
Fix ConfigEntryResponse binary marshaller and ensure we watch the chan in ConfigEntry.Get even when no entry exists. (#5773) 2019-05-02 15:25:29 -04:00
Paul Banks 6a58527cd8
Fix previous accidental master push 🤦 (#5771)
* Fix previous accidental master push 🤦

* Fix ACL test
2019-05-02 15:49:37 +01:00
Paul Banks 6c81f9da0d
Fix panic in Resolving service config when proxy-defaults isn't defined yet (#5769) 2019-05-02 14:12:21 +01:00
Paul Banks 8f5b16ebaf
Fix uint8 conversion issues for service config response maps. 2019-05-02 14:11:33 +01:00
Paul Banks 0cfb6051ea Add integration test for central config; fix central config WIP (#5752)
* Add integration test for central config; fix central config WIP

* Add integration test for central config; fix central config WIP

* Set proxy protocol correctly and begin adding upstream support

* Add upstreams to service config cache key and start new notify watcher if they change.

This doesn't update the tests to pass though.

* Fix some merging logic get things working manually with a hack (TODO fix properly)

* Simplification to not allow enabling sidecars centrally - it makes no sense without upstreams anyway

* Test compile again and obvious ones pass. Lots of failures locally not debugged yet but may be flakes. Pushing up to see what CI does

* Fix up service manageer and API test failures

* Remove the enable command since it no longer makes much sense without being able to turn on sidecar proxies centrally

* Remove version.go hack - will make integration test fail until release

* Remove unused code from commands and upstream merge

* Re-bump version to 1.5.0
2019-05-01 16:39:31 -07:00
Matt Keeler 69f902608c
Update to use a consulent build tag instead of just ent (#5759) 2019-05-01 11:11:27 -04:00
Matt Keeler 3145bf5230 Centralized Config CLI (#5731)
* Add HTTP endpoints for config entry management

* Finish implementing decoding in the HTTP Config entry apply endpoint

* Add CAS operation to the config entry apply endpoint

Also use this for the bootstrapping and move the config entry decoding function into the structs package.

* First pass at the API client for the config entries

* Fixup some of the ConfigEntry APIs

Return a singular response object instead of a list for the ConfigEntry.Get RPC. This gets plumbed through the HTTP API as well.

Dont return QueryMeta in the JSON response for the config entry listing HTTP API. Instead just return a list of config entries.

* Minor API client fixes

* Attempt at some ConfigEntry api client tests

These don’t currently work due to weak typing in JSON

* Get some of the api client tests passing

* Implement reflectwalk magic to correct JSON encoding a ProxyConfigEntry

Also added a test for the HTTP endpoint that exposes the problem. However, since the test doesn’t actually do the JSON encode/decode its still failing.

* Move MapWalk magic into a binary marshaller instead of JSON.

* Add a MapWalk test

* Get rid of unused func

* Get rid of unused imports

* Fixup some tests now that the decoding from msgpack coerces things into json compat types

* Stub out most of the central config cli

Fully implement the config read command.

* Basic config delete command implementation

* Implement config write command

* Implement config list subcommand

Not entirely sure about the output here. Its basically the read output indented with a line specifying the kind/name of each type which is also duplicated in the indented output.

* Update command usage

* Update some help usage formatting

* Add the connect enable helper cli command

* Update list command output

* Rename the config entry API client methods.

* Use renamed apis

* Implement config write tests

Stub the others with the noTabs tests.

* Change list output format

Now just simply output 1 line per named config

* Add config read tests

* Add invalid args write test.

* Add config delete tests

* Add config list tests

* Add connect enable tests

* Update some CLI commands to use CAS ops

This also modifies the HTTP API for a write op to return a boolean indicating whether the value was written or not.

* Fix up the HTTP API CAS tests as I realized they weren’t testing what they should.

* Update config entry rpc tests to properly test CAS

* Fix up a few more tests

* Fix some tests that using ConfigEntries.Apply

* Update config_write_test.go

* Get rid of unused import
2019-04-30 16:27:16 -07:00
Matt Keeler f665695b6b
Ensure ServiceName is populated correctly for agent service checks
Also update some snapshot agent docs

* Enforce correct permissions when registering a check

Previously we had attempted to enforce service:write for a check associated with a service instead of node:write on the agent but due to how we decoded the health check from the request it would never do it properly. This commit fixes that.

* Update website/source/docs/commands/snapshot/agent.html.markdown.erb

Co-Authored-By: mkeeler <mkeeler@users.noreply.github.com>
2019-04-30 19:00:57 -04:00
Matt Keeler d0f410cd84
Make a few config entry endpoints return 404s and allow for snake_case and lowercase key names. (#5748) 2019-04-30 18:19:19 -04:00
Freddy 44e3dd79ff
go fmt runtime_test.go 2019-04-30 13:28:02 -06:00
Freddy d19eb36085
Restrict config file extensions read 2019-04-30 12:43:32 -06:00
Matt Keeler 4daa1585b0
ACL Token ID Initialization (#5307) 2019-04-30 11:45:36 -04:00
Paul Banks a12810664f
Modify ConfigEntry bootstrapping syntax more generic (#5744)
* Modify ConfigEntry bootstrapping syntax to be generic and compatible with other CLI config syntax. Refs #5743

* Fix gofmt issues.
2019-04-30 15:13:59 +01:00
Kyle Havlovitz aba54cec55 Add HTTP endpoints for config entry management (#5718) 2019-04-29 18:08:09 -04:00
Paul Banks 421ecd32fc
Connect: allow configuring Envoy for L7 Observability (#5558)
* Add support for HTTP proxy listeners

* Add customizable bootstrap configuration options

* Debug logging for xDS AuthZ

* Add Envoy Integration test suite with basic test coverage

* Add envoy command tests to cover new cases

* Add tracing integration test

* Add gRPC support WIP

* Merged changes from master Docker. get CI integration to work with same Dockerfile now

* Make docker build optional for integration

* Enable integration tests again!

* http2 and grpc integration tests and fixes

* Fix up command config tests

* Store all container logs as artifacts in circle on fail

* Add retries to outer part of stats measurements as we keep missing them in CI

* Only dump logs on failing cases

* Fix typos from code review

* Review tidying and make tests pass again

* Add debug logs to exec test.

* Fix legit test failure caused by upstream rename in envoy config

* Attempt to reduce cases of bad TLS handshake in CI integration tests

* bring up the right service

* Add prometheus integration test

* Add test for denied AuthZ both HTTP and TCP

* Try ANSI term for Circle
2019-04-29 17:27:57 +01:00
R.B. Boyer c6722fc43d
Merge pull request #5617 from hashicorp/f-acl-ux
Secure ACL Introduction for Kubernetes
2019-04-26 15:34:26 -05:00