Commit Graph

265 Commits

Author SHA1 Message Date
Matt Keeler 18b29c45c4
New ACLs (#4791)
This PR is almost a complete rewrite of the ACL system within Consul. It brings the features more in line with other HashiCorp products. Obviously there is quite a bit left to do here but most of it is related docs, testing and finishing the last few commands in the CLI. I will update the PR description and check off the todos as I finish them over the next few days/week.
Description

At a high level this PR is mainly to split ACL tokens from Policies and to split the concepts of Authorization from Identities. A lot of this PR is mostly just to support CRUD operations on ACLTokens and ACLPolicies. These in and of themselves are not particularly interesting. The bigger conceptual changes are in how tokens get resolved, how backwards compatibility is handled and the separation of policy from identity which could lead the way to allowing for alternative identity providers.

On the surface and with a new cluster the ACL system will look very similar to that of Nomads. Both have tokens and policies. Both have local tokens. The ACL management APIs for both are very similar. I even ripped off Nomad's ACL bootstrap resetting procedure. There are a few key differences though.

    Nomad requires token and policy replication where Consul only requires policy replication with token replication being opt-in. In Consul local tokens only work with token replication being enabled though.
    All policies in Nomad are globally applicable. In Consul all policies are stored and replicated globally but can be scoped to a subset of the datacenters. This allows for more granular access management.
    Unlike Nomad, Consul has legacy baggage in the form of the original ACL system. The ramifications of this are:
        A server running the new system must still support other clients using the legacy system.
        A client running the new system must be able to use the legacy RPCs when the servers in its datacenter are running the legacy system.
        The primary ACL DC's servers running in legacy mode needs to be a gate that keeps everything else in the entire multi-DC cluster running in legacy mode.

So not only does this PR implement the new ACL system but has a legacy mode built in for when the cluster isn't ready for new ACLs. Also detecting that new ACLs can be used is automatic and requires no configuration on the part of administrators. This process is detailed more in the "Transitioning from Legacy to New ACL Mode" section below.
2018-10-19 12:04:07 -04:00
Jack Pearkes 8c684db488 New command: consul debug (#4754)
* agent/debug: add package for debugging, host info

* api: add v1/agent/host endpoint

* agent: add v1/agent/host endpoint

* command/debug: implementation of static capture

* command/debug: tests and only configured targets

* agent/debug: add basic test for host metrics

* command/debug: add methods for dynamic data capture

* api: add debug/pprof endpoints

* command/debug: add pprof

* command/debug: timing, wg, logs to disk

* vendor: add gopsutil/disk

* command/debug: add a usage section

* website: add docs for consul debug

* agent/host: require operator:read

* api/host: improve docs and no retry timing

* command/debug: fail on extra arguments

* command/debug: fixup file permissions to 0644

* command/debug: remove server flags

* command/debug: improve clarity of usage section

* api/debug: add Trace for profiling, fix profile

* command/debug: capture profile and trace at the same time

* command/debug: add index document

* command/debug: use "clusters" in place of members

* command/debug: remove address in output

* command/debug: improve comment on metrics sleep

* command/debug: clarify usage

* agent: always register pprof handlers and protect

This will allow us to avoid a restart of a target agent
for profiling by always registering the pprof handlers.

Given this is a potentially sensitive path, it is protected
with an operator:read ACL and enable debug being
set to true on the target agent. enable_debug still requires
a restart.

If ACLs are disabled, enable_debug is sufficient.

* command/debug: use trace.out instead of .prof

More in line with golang docs.

* agent: fix comment wording

* agent: wrap table driven tests in t.run()
2018-10-19 08:41:03 -07:00
Paul Banks 1909a95118 xDS Server Implementation (#4731)
* Vendor updates for gRPC and xDS server

* xDS server implementation for serving Envoy as a Connect proxy

* Address initial review comments

* consistent envoy package aliases; typos fixed; override TLS and authz for custom listeners

* Moar Typos

* Moar typos
2018-10-10 16:55:34 +01:00
Mitchell Hashimoto 3237047e72
vendor: update mapstructure to v1.1.0
We require this change to support struct to struct decoding.
2018-09-30 19:15:40 -07:00
Matt Keeler d1e52e5292
Update Raft Vendoring (#4539)
Pulls in a fix for a potential memory leak regarding consistent reads that invoke VerifyLeader.
2018-09-06 15:07:42 -04:00
Mitchell Hashimoto bbb13598bf
vendor k8s client lib 2018-09-05 14:59:02 -07:00
Mitchell Hashimoto 66e31f02f7
Update go-discover vendor 2018-09-05 13:31:10 -07:00
Shubheksha fc3997f266 replace old fork of text package (#4501) 2018-08-14 12:23:18 -07:00
Paul Banks 9ce10769ce Update Serf and memberlist (#4511)
This includes fixes that improve gossip scalability on very large (> 10k node) clusters.

The Serf changes:
 - take snapshot disk IO out of the critical path for handling messages hashicorp/serf#524
 - make snapshot compaction much less aggressive - the old fixed threshold caused snapshots to be constantly compacted (synchronously with request handling) on clusters larger than about 2000 nodes! hashicorp/serf#525

Memberlist changes:
 - prioritize handling alive messages over suspect/dead to improve stability, and handle queue in LIFO order to avoid acting on info that 's already stale in the queue by the time we handle it. hashicorp/memberlist#159
 - limit the number of concurrent pushPull requests being handled at once to 128. In one test scenario with 10s of thousands of servers we saw channel and lock blocking cause over 3000 pushPulls at once which ballooned the memory of the server because each push pull contained a de-serialised list of all known 10k+ nodes and their tags for a total of about 60 million objects and 7GB of memory stuck. While the rest of the fixes here should prevent the same root cause from blocking in the same way, this prevents any other bug or source of contention from allowing pushPull messages to stack up and eat resources. hashicorp/memberlist#158
2018-08-09 13:16:13 -04:00
Siva Prasad f4a1c381a5 Vendoring update for go-discover. (#4412)
* New Providers added and updated vendoring for go-discover

* Vendor.json formatted using make vendorfmt

* Docs/Agent/auto-join: Added documentation for the new providers introduced in this PR

* Updated the golang.org/x/sys/unix in the vendor directory

* Agent: TestGoDiscoverRegistration updated to reflect the addition of new providers

* Deleted terraform.tfstate from vendor.

* Deleted terraform.tfstate.backup

Deleted terraform state file artifacts from unknown runs.

* Updated x/sys/windows vendor for Windows binary compilation
2018-07-25 16:21:04 -07:00
Matt Keeler cf92110abd Vendor golang.org/x/sys/windows/svc 2018-07-12 11:29:57 -04:00
mkeeler 6813a99081 Merge remote-tracking branch 'connect/f-connect' 2018-06-25 19:42:51 +00:00
Matt Keeler 98e98fa815 Remove build tags from vendored vault file to allow for this to merge properly into enterprise 2018-06-25 12:26:10 -07:00
Matt Keeler 01f82717b4 Vendor the vault api 2018-06-25 12:26:10 -07:00
Paul Banks 2f8c1d2059 Remove go-diff vendor as assert.JSONEq output is way better for our case 2018-06-25 12:25:39 -07:00
Leo Zhang 7f6d727aa5
Fix invalid vendor.json syntax for go-discover 2018-06-15 02:02:12 -07:00
Kyle Havlovitz 28e6f800d8
Add missing vendor dep github.com/stretchr/objx 2018-06-14 09:42:13 -07:00
Matt Keeler 701b2842a6 Remove bogus second yamux vendoring 2018-06-04 16:28:33 -04:00
Matt Keeler 2786ec979e Update yamux vendoring
Pulls in logging fixes.
2018-06-04 16:02:50 -04:00
Jack Pearkes aa1c993806
Merge pull request #4013 from sethvargo/sethvargo/user_agent
Add a helper for generating Consul's user-agent string
2018-06-01 09:13:38 -07:00
Matt Keeler 27fe219918
Merge pull request #4131 from pierresouchay/enable_full_dns_compression
Enable full dns compression
2018-06-01 10:42:03 -04:00
Seth Vargo 3dc2cf793c
Update vendor for go-discover 2018-05-25 15:52:05 -04:00
Wim 1e0a2e25d0 Add github.com/coredns/coredns/plugin/pkg/dnsutil files 2018-05-21 22:25:16 +02:00
Wim 71dd83c62a Add github.com/coredns/coredns/plugin/pkg/dnsutil to vendor.json 2018-05-21 22:18:19 +02:00
Pierre Souchay 4853733098 Bump DNS lib to 1.0.7 with 14bits Len() fix 2018-05-16 10:52:51 +02:00
Matt Keeler efa9a564a3 Fix vendoring of two missed libs 2018-05-11 11:31:42 -04:00
Matt Keeler b79db64ecf Update prometheus indirect deps 2018-05-11 11:18:15 -04:00
Matt Keeler 52370a5b07 Update the various deps of miekg/dns in our vendor.json 2018-05-11 10:52:05 -04:00
Matt Keeler 3152fc2944 Pull in miekg/dns deps on the golang crypto ed25519 packages 2018-05-11 10:31:27 -04:00
Kyle Havlovitz bd42da760b
vendor: pull in latest version of go-discover 2018-05-10 15:40:16 -07:00
Preetha Appan fff532cf84
Update serf to pick up clean leave fix 2018-05-04 15:51:55 -05:00
Paul Banks 4de68fcb4b
Merge pull request #4016 from pierresouchay/support_for_prometheus
Support for prometheus for metrics endpoint
2018-04-24 16:14:43 +01:00
Mitchell Hashimoto 3de62e0db3
vendor: add hashstructure and mock 2018-04-19 08:10:05 -07:00
Pierre Souchay 04cb007bed Added dependency github.com/prometheus/client_golang/prometheus/promhttp 2018-04-06 08:54:37 +02:00
Pierre Souchay 5ce3c1587c Bump github.com/armon/go-metrics to allow having prometheus support 2018-04-05 18:21:32 +02:00
Yoann 0f6e05d4c1 Add support for compression in http api
The need has been spotted in issue https://github.com/hashicorp/consul/issues/3687.
Using "NYTimes/gziphandler", the http api responses can now be compressed if required.
The Go API requires compressed response if possible and handle the compressed response.
We here change only the http api (not the UI for instance).
2018-04-03 22:33:13 +02:00
Paul Banks ebbd11edbb
Actually add the `require` vendored files I intended to add in 0d5600ff60
Note that the vendor.json is already correct but the actual files were never checked in so report as missing:

```
$ govendor list | grep testify
 v  github.com/stretchr/testify/assert
  m github.com/stretchr/testify/require
```
2018-03-29 17:05:11 +01:00
Preetha Appan 5702319c71
vendorfmt 2018-03-28 10:25:49 -05:00
Pierre Souchay 9dc7194321
Bump version of miekg/dns to 1.0.4
See https://github.com/hashicorp/consul/issues/3977

While trying to improve furthermore #3948 (This pull request is still valid since we are not using Compression to compute the result anyway).

I saw a strange behaviour of dns library.
Basically, msg.Len() and len(msg.Pack()) disagree on Message len.

Thus, calculation of DNS response is false consul relies on msg.Len() instead of the result of Pack()

This is linked to miekg/dns#453 and a fix has been provided with miekg/dns#454

Would it be possible to upgrade miekg/dns to a more recent function ?

Consul might for instance upgrade to a post 1.0 release such as https://github.com/miekg/dns/releases/tag/v1.0.4
2018-03-28 10:23:57 -05:00
Paul Banks 0d5600ff60
Add vendored `testify/require` subpackage; upgrade `assert` to match. (#3986) 2018-03-27 15:19:15 +01:00
Preetha Appan 7091595595
Update yamux to pick up performance improvements 2018-03-26 08:56:40 -05:00
Mitchell Hashimoto 8217564c48
agent/consul/fsm: begin using testify/assert 2018-03-06 09:48:15 -08:00
Alvin Huang 85c9cfea05 remove old pkgs and put deps of missing packages in vendor.json 2018-02-23 17:08:24 -05:00
James Phillips 3724e49ddf
Fixes a panic on TCP-based DNS lookups.
This came in via the monkey patch in #3861.

Fixes #3877
2018-02-08 17:57:41 -08:00
Preetha dfd484c090
Fix panic in azure go discover provider (#3876) 2018-02-08 16:46:33 -06:00
Preetha b1c487f286
Patch dns vendor code for picking up a TCP DOS attack bugfix (#3861) 2018-02-05 17:27:45 -06:00
James Phillips e748c63fff
Merge pull request #3855 from hashicorp/pr-3782-slackpad
Adds support for gRPC health checks.
2018-02-02 17:57:27 -08:00
James Phillips fb31d0ec6b
Updates hashicorp/go-discover to pull in support for Azure Virtual Machine Scale Sets. 2018-01-19 16:24:08 -08:00
James Phillips 5800474f02
Updates Serf to pickup fix for spammy zero RTT log messages.
Fixes #3789.
2018-01-19 14:47:12 -08:00
Dmytro Kostiuchenko 1a10b08e82 Add gRPC health-check #3073 2018-01-04 16:42:30 -05:00