Commit Graph

1279 Commits

Author SHA1 Message Date
Kyle Havlovitz 4a73a59d70
Merge pull request #4917 from hashicorp/replication-token-cleanup
Use acl replication_token for connect
2018-11-12 09:12:54 -08:00
Kyle Havlovitz 972177071d update non-voting server test to fix enterprise diff 2018-11-09 12:50:24 -08:00
Kyle Havlovitz 643bd13aed oss: do a proper check-and-set on the CA roots/config fsm operation 2018-11-09 12:36:23 -08:00
R.B. Boyer 2afc2a3c3b
acl: fixes ACL replication for legacy tokens without AccessorIDs (#4885) 2018-11-07 07:59:44 -08:00
Kyle Havlovitz e8dd89359a
agent: fix formatting 2018-11-07 02:16:03 -08:00
R.B. Boyer 9211d2701d
fix comment typos (#4890) 2018-11-02 12:00:39 -05:00
Kyle Havlovitz 8337e3d8c0
Merge pull request #4872 from hashicorp/node-snapshot-fix
Node ID/datacenter snapshot fix
2018-10-31 15:51:07 -07:00
Matt Keeler db2cf01406 Adds documentation for the new ACL APIs (#4851)
* Update the ACL API docs

* Add a CreateTime to the anon token

Also require acl:read permissions at least to perform rule translation. Don’t want someone DoSing the system with an open endpoint that actually does a bit of work.

* Fix one place where I was referring to id instead of AccessorID

* Add godocs for the API package additions.

* Minor updates: removed some extra commas and updated the acl intro paragraph

* minor tweaks

* Updated the language to be clearer

* Updated the language to be clearer for policy page

* I was also confused by that! Your updates are much clearer.

Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>

* Sounds much better.

Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>

* Updated sidebar layout and deprecated warning
2018-10-31 15:11:51 -07:00
Matt Keeler f9cf0eb36e Remaining ACL Unit Tests (#4852)
* Add leader token upgrade test and fix various ACL enablement bugs

* Update the leader ACL initialization tests.

* Add a StateStore ACL tests for ACLTokenSet and ACLTokenGetBy* functions

* Advertise the agents acl support status with the agent/self endpoint.

* Make batch token upsert CAS’able to prevent consistency issues with token auto-upgrade

* Finish up the ACL state store token tests

* Finish the ACL state store unit tests

Also rename some things to make them more consistent.

* Do as much ACL replication testing as I can.
2018-10-31 13:00:46 -07:00
Kyle Havlovitz bd6d0e598f fsm: update snapshot/restore test to include ID and datacenter 2018-10-30 15:53:14 -07:00
Kyle Havlovitz 6483356329 fsm: add missing ID/datacenter to persistNodes 2018-10-30 15:52:54 -07:00
Matt Keeler 790cf90ee5
Fix the NonVoter Bootstrap test (#4786) 2018-10-24 10:23:50 -04:00
Kyle Havlovitz 819566f6b7 fsm: add Intention operations to transactions for internal use 2018-10-19 10:02:28 -07:00
Matt Keeler 34b53e7099 A few misc fixes found by go vet 2018-10-19 12:28:36 -04:00
Matt Keeler 18b29c45c4
New ACLs (#4791)
This PR is almost a complete rewrite of the ACL system within Consul. It brings the features more in line with other HashiCorp products. Obviously there is quite a bit left to do here but most of it is related docs, testing and finishing the last few commands in the CLI. I will update the PR description and check off the todos as I finish them over the next few days/week.
Description

At a high level this PR is mainly to split ACL tokens from Policies and to split the concepts of Authorization from Identities. A lot of this PR is mostly just to support CRUD operations on ACLTokens and ACLPolicies. These in and of themselves are not particularly interesting. The bigger conceptual changes are in how tokens get resolved, how backwards compatibility is handled and the separation of policy from identity which could lead the way to allowing for alternative identity providers.

On the surface and with a new cluster the ACL system will look very similar to that of Nomads. Both have tokens and policies. Both have local tokens. The ACL management APIs for both are very similar. I even ripped off Nomad's ACL bootstrap resetting procedure. There are a few key differences though.

    Nomad requires token and policy replication where Consul only requires policy replication with token replication being opt-in. In Consul local tokens only work with token replication being enabled though.
    All policies in Nomad are globally applicable. In Consul all policies are stored and replicated globally but can be scoped to a subset of the datacenters. This allows for more granular access management.
    Unlike Nomad, Consul has legacy baggage in the form of the original ACL system. The ramifications of this are:
        A server running the new system must still support other clients using the legacy system.
        A client running the new system must be able to use the legacy RPCs when the servers in its datacenter are running the legacy system.
        The primary ACL DC's servers running in legacy mode needs to be a gate that keeps everything else in the entire multi-DC cluster running in legacy mode.

So not only does this PR implement the new ACL system but has a legacy mode built in for when the cluster isn't ready for new ACLs. Also detecting that new ACLs can be used is automatic and requires no configuration on the part of administrators. This process is detailed more in the "Transitioning from Legacy to New ACL Mode" section below.
2018-10-19 12:04:07 -04:00
Pierre Souchay fab55bee2b dns: implements prefix lookups for DNS TTL (#4605)
This will fix https://github.com/hashicorp/consul/issues/4509 and allow forinstance lb-* to match services lb-001 or lb-service-007.
2018-10-19 08:41:04 -07:00
Kyle Havlovitz c617326470 re-add Connect multi-dc config changes
This reverts commit 8bcfbaffb6.
2018-10-19 08:41:03 -07:00
Jack Pearkes 8bcfbaffb6 Revert "Connect multi-dc config" (#4784) 2018-10-11 17:32:45 +01:00
Aestek 25f04fbd21 [Security] Add finer control over script checks (#4715)
* Add -enable-local-script-checks options

These options allow for a finer control over when script checks are enabled by
giving the option to only allow them when they are declared from the local
file system.

* Add documentation for the new option

* Nitpick doc wording
2018-10-11 13:22:11 +01:00
Rebecca Zanzig 34e5516834 Support multiple tags for health and catalog http api endpoints (#4717)
* Support multiple tags for health and catalog api endpoints

Fixes #1781.

Adds a `ServiceTags` field to the ServiceSpecificRequest to support
multiple tags, updates the filter logic in the catalog store, and
propagates these change through to the health and catalog endpoints.

Note: Leaves `ServiceTag` in the struct, since it is being used as
part of the DNS lookup, which in turn uses the health check.

* Update the api package to support multiple tags

Includes additional tests.

* Update new tests to use the `require` library

* Update HealthConnect check after a bad merge
2018-10-11 12:50:05 +01:00
Pierre Souchay 51b33ef015 [Performance On Large clusters] Reduce updates on large services (#4720)
* [Performance On Large clusters] Checks do update services/nodes only when really modified to avoid too many updates on very large clusters

In a large cluster, when having a few thousands of nodes, the anti-entropy
mechanism performs lots of changes (several per seconds) while
there is no real change. This patch wants to improve this in order
to increase Consul scalability when using many blocking requests on
health for instance.

* [Performance for large clusters] Only updates index of service if service is really modified

* [Performance for large clusters] Only updates index of nodes if node is really modified

* Added comments / ensure IsSame() has clear semantics

* Avoid having modified boolean, return nil directly if stutures are Same

* Fixed unstable unit tests TestLeader_ChangeServerID

* Rewrite TestNode_IsSame() for better readability as suggested by @banks

* Rename ServiceNode.IsSame() into IsSameService() + added unit tests

* Do not duplicate TestStructs_ServiceNode_Conversions() and increase test coverage of IsSameService

* Clearer documentation in IsSameService

* Take into account ServiceProxy into ServiceNode.IsSameService()

* Fixed IsSameService() with all new structures
2018-10-11 12:42:39 +01:00
Pierre Souchay 251156eb68 Added SOA configuration for DNS settings. (#4714)
This will allow to fine TUNE SOA settings sent by Consul in DNS responses,
for instance to be able to control negative ttl.

Will fix: https://github.com/hashicorp/consul/issues/4713

# Example

Override all settings:

* min_ttl: 0 => 60s
* retry: 600 (10m) => 300s (5 minutes),
* expire: 86400 (24h) => 43200 (12h)
* refresh: 3600 (1h) => 1800 (30 minutes)

```
consul agent -dev -hcl 'dns_config={soa={min_ttl=60,retry=300,expire=43200,refresh=1800}}'
```

Result:
```
dig +multiline @localhost -p 8600 service.consul

; <<>> DiG 9.12.1 <<>> +multiline @localhost -p 8600 service.consul
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 36557
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;service.consul.		IN A

;; AUTHORITY SECTION:
consul.			0 IN SOA ns.consul. hostmaster.consul. (
				1537959133 ; serial
				1800       ; refresh (30 minutes)
				300        ; retry (5 minutes)
				43200      ; expire (12 hours)
				60         ; minimum (1 minute)
				)

;; Query time: 4 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Wed Sep 26 12:52:13 CEST 2018
;; MSG SIZE  rcvd: 93
```
2018-10-10 15:50:56 -04:00
Kyle Havlovitz e4349c5710 connect/ca: more OSS split for multi-dc 2018-10-10 12:17:59 -07:00
Kyle Havlovitz 0da4f2b2e8 connect/ca: split CA initialization logic between oss/enterprise 2018-10-10 12:17:59 -07:00
Kyle Havlovitz 56dc426227 agent: add primary_datacenter and connect replication config options 2018-10-10 12:17:59 -07:00
Kyle Havlovitz 98d95cfa80 connect: add ExternalTrustDomain to CARoot fields 2018-10-10 12:16:47 -07:00
Kyle Havlovitz 46c829b879 docs: deprecate acl_datacenter and replace it with primary_datacenter 2018-10-10 12:16:47 -07:00
Paul Banks b83bbf248c Add Proxy Upstreams to Service Definition (#4639)
* Refactor Service Definition ProxyDestination.

This includes:
 - Refactoring all internal structs used
 - Updated tests for both deprecated and new input for:
   - Agent Services endpoint response
   - Agent Service endpoint response
   - Agent Register endpoint
     - Unmanaged deprecated field
     - Unmanaged new fields
     - Managed deprecated upstreams
     - Managed new
   - Catalog Register
     - Unmanaged deprecated field
     - Unmanaged new fields
     - Managed deprecated upstreams
     - Managed new
   - Catalog Services endpoint response
   - Catalog Node endpoint response
   - Catalog Service endpoint response
 - Updated API tests for all of the above too (both deprecated and new forms of register)

TODO:
 - config package changes for on-disk service definitions
 - proxy config endpoint
 - built-in proxy support for new fields

* Agent proxy config endpoint updated with upstreams

* Config file changes for upstreams.

* Add upstream opaque config and update all tests to ensure it works everywhere.

* Built in proxy working with new Upstreams config

* Command fixes and deprecations

* Fix key translation, upstream type defaults and a spate of other subtele bugs found with ned to end test scripts...

TODO: tests still failing on one case that needs a fix. I think it's key translation for upstreams nested in Managed proxy struct.

* Fix translated keys in API registration.
≈

* Fixes from docs
 - omit some empty undocumented fields in API
 - Bring back ServiceProxyDestination in Catalog responses to not break backwards compat - this was removed assuming it was only used internally.

* Documentation updates for Upstreams in service definition

* Fixes for tests broken by many refactors.

* Enable travis on f-connect branch in this branch too.

* Add consistent Deprecation comments to ProxyDestination uses

* Update version number on deprecation notices, and correct upstream datacenter field with explanation in docs
2018-10-10 16:55:34 +01:00
Alex Dadgar 43d0f96c42 do not bootstrap with non voters 2018-09-19 17:41:36 -07:00
Kyle Havlovitz d515d25856
Merge pull request #4644 from hashicorp/ca-refactor
connect/ca: rework initialization/root generation in providers
2018-09-13 13:08:34 -07:00
Paul Banks 74f2a80a42
Fix CA pruning when CA config uses string durations. (#4669)
* Fix CA pruning when CA config uses string durations.

The tl;dr here is:

 - Configuring LeafCertTTL with a string like "72h" is how we do it by default and should be supported
 - Most of our tests managed to escape this by defining them as time.Duration directly
 - Out actual default value is a string
 - Since this is stored in a map[string]interface{} config, when it is written to Raft it goes through a msgpack encode/decode cycle (even though it's written from server not over RPC).
 - msgpack decode leaves the string as a `[]uint8`
 - Some of our parsers required string and failed
 - So after 1 hour, a default configured server would throw an error about pruning old CAs
 - If a new CA was configured that set LeafCertTTL as a time.Duration, things might be OK after that, but if a new CA was just configured from config file, intialization would cause same issue but always fail still so would never prune the old CA.
 - Mostly this is just a janky error that got passed tests due to many levels of complicated encoding/decoding.

tl;dr of the tl;dr: Yay for type safety. Map[string]interface{} combined with msgpack always goes wrong but we somehow get bitten every time in a new way :D

We already fixed this once! The main CA config had the same problem so @kyhavlov already wrote the mapstructure DecodeHook that fixes it. It wasn't used in several places it needed to be and one of those is notw in `structs` which caused a dependency cycle so I've moved them.

This adds a whole new test thta explicitly tests the case that broke here. It also adds tests that would have failed in other places before (Consul and Vaul provider parsing functions). I'm not sure if they would ever be affected as it is now as we've not seen things broken with them but it seems better to explicitly test that and support it to not be bitten a third time!

* Typo fix

* Fix bad Uint8 usage
2018-09-13 15:43:00 +01:00
Pierre Souchay 1a906ef34e Fix more unstable tests in agent and command 2018-09-12 14:49:27 +01:00
Kyle Havlovitz c112a72880
connect/ca: some cleanup and reorganizing of the new methods 2018-09-11 16:43:04 -07:00
Pierre Souchay 22500f242e Fix unstable tests in agent, api, and command/watch 2018-09-10 16:58:53 +01:00
Pierre Souchay eddcf228ea Implementation of Weights Data structures (#4468)
* Implementation of Weights Data structures

Adding this datastructure will allow us to resolve the
issues #1088 and #4198

This new structure defaults to values:
```
   { Passing: 1, Warning: 0 }
```

Which means, use weight of 0 for a Service in Warning State
while use Weight 1 for a Healthy Service.
Thus it remains compatible with previous Consul versions.

* Implemented weights for DNS SRV Records

* DNS properly support agents with weight support while server does not (backwards compatibility)

* Use Warning value of Weights of 1 by default

When using DNS interface with only_passing = false, all nodes
with non-Critical healthcheck used to have a weight value of 1.
While having weight.Warning = 0 as default value, this is probably
a bad idea as it breaks ascending compatibility.

Thus, we put a default value of 1 to be consistent with existing behaviour.

* Added documentation for new weight field in service description

* Better documentation about weights as suggested by @banks

* Return weight = 1 for unknown Check states as suggested by @banks

* Fixed typo (of -> or) in error message as requested by @mkeeler

* Fixed unstable unit test TestRetryJoin

* Fixed unstable tests

* Fixed wrong Fatalf format in `testrpc/wait.go`

* Added notes regarding DNS SRV lookup limitations regarding number of instances

* Documentation fixes and clarification regarding SRV records with weights as requested by @banks

* Rephrase docs
2018-09-07 15:30:47 +01:00
Kyle Havlovitz 546bdf8663
connect/ca: add Configure/GenerateRoot to provider interface 2018-09-06 19:18:59 -07:00
Pierre Souchay 9a2ae6e8eb Fixed more flaky tests in ./agent/consul (#4617) 2018-09-04 14:02:47 +01:00
Freddy d7a404f2ee
Bugfix: Use "%#v" when formatting structs (#4600) 2018-08-28 12:37:34 -04:00
Pierre Souchay b898131723 [BUGFIX] Avoid returning empty data on startup of a non-leader server (#4554)
Ensure that DB is properly initialized when performing stale queries

Addresses:
- https://github.com/hashicorp/consul-replicate/issues/82
- https://github.com/hashicorp/consul/issues/3975
- https://github.com/hashicorp/consul-template/issues/1131
2018-08-23 12:06:39 -04:00
Kyle Havlovitz e5e1f867e5
Merge branch 'master' into ca-snapshot-fix 2018-08-16 13:00:54 -07:00
Kyle Havlovitz f186edc42c
fsm: add connect service config to snapshot/restore test 2018-08-16 12:58:54 -07:00
nickmy9729 beddf03b26 Added code to allow snapshot inclusion of NodeMeta (#4527) 2018-08-16 15:33:35 -04:00
Kyle Havlovitz b51d76f469
fsm: add missing CA config to snapshot/restore logic 2018-08-16 11:58:50 -07:00
Kyle Havlovitz 4b35d877ca
autopilot: don't follow the normal server removal rules for nonvoters 2018-08-14 14:24:51 -07:00
Kyle Havlovitz ea14482376
Fix stats fetcher healthcheck RPCs not being independent 2018-08-14 14:23:52 -07:00
Pierre Souchay 0d6de257a2 Display more information about check being not properly added when it fails (#4405)
* Display more information about check being not properly added when it fails

It follows an incident where we add lots of error messages:

  [WARN] consul.fsm: EnsureRegistration failed: failed inserting check: Missing service registration

That seems related to Consul failing to restart on respective agents.

Having Node information as well as service information would help diagnose the issue.

* Renamed ensureCheckIfNodeMatches() as requested by @banks
2018-08-14 17:45:33 +01:00
Pierre Souchay ef3b81ab13 Allow to rename nodes with IDs, will fix #3974 and #4413 (#4415)
* Allow to rename nodes with IDs, will fix #3974 and #4413

This change allow to rename any well behaving recent agent with an
ID to be renamed safely, ie: without taking the name of another one
with case insensitive comparison.

Deprecated behaviour warning
----------------------------

Due to asceding compatibility, it is still possible however to
"take" the name of another name by not providing any ID.

Note that when not providing any ID, it is possible to have 2 nodes
having similar names with case differences, ie: myNode and mynode
which might lead to DB corruption on Consul server side and
lead to server not properly restarting.

See #3983 and #4399 for Context about this change.

Disabling registration of nodes without IDs as specified in #4414
should probably be the way to go eventually.

* Removed the case-insensitive search when adding a node within the else
block since it breaks the test TestAgentAntiEntropy_Services

While the else case is probably legit, it will be fixed with #4414 in
a later release.

* Added again the test in the else to avoid duplicated names, but
enforce this test only for nodes having IDs.

Thus most tests without any ID will work, and allows us fixing

* Added more tests regarding request with/without IDs.

`TestStateStore_EnsureNode` now test registration and renaming with IDs

`TestStateStore_EnsureNodeDeprecated` tests registration without IDs
and tests removing an ID from a node as well as updated a node
without its ID (deprecated behaviour kept for backwards compatibility)

* Do not allow renaming in case of conflict, including when other node has no ID

* Fixed function GetNodeID that was not working due to wrong type when searching node from its ID

Thus, all tests about renaming were not working properly.

Added the full test cas that allowed me to detect it.

* Better error messages, more tests when nodeID is not a valid UUID in GetNodeID()

* Added separate TestStateStore_GetNodeID to test GetNodeID.

More complete test coverage for GetNodeID

* Added new unit test `TestStateStore_ensureNoNodeWithSimilarNameTxn`

Also fixed comments to be clearer after remarks from @banks

* Fixed error message in unit test to match test case

* Use uuid.ParseUUID to parse Node.ID as requested by @mkeeler
2018-08-10 11:30:45 -04:00
Siva Prasad c88900aaa9
PR to fix TestAgent_IndexChurn and TestPreparedQuery_Wrapper. (#4512)
* Fixes TestAgent_IndexChurn

* Fixes TestPreparedQuery_Wrapper

* Increased sleep in agent_test for IndexChurn to 500ms

* Made the comment about joinWAN operation much less of a cliffhanger
2018-08-09 12:40:07 -04:00
Armon Dadgar 4f1fd34e9e consul: Update buffer sizes 2018-08-08 10:26:58 -07:00
Siva Prasad 288d350a73
Revert "CA initialization while boostrapping and TestLeader_ChangeServerID fix." (#4497)
* Revert "BUGFIX: Unit test relying on WaitForLeader() did not work due to wrong test (#4472)"

This reverts commit cec5d72396.

* Revert "CA initialization while boostrapping and TestLeader_ChangeServerID fix. (#4493)"

This reverts commit 589b589b53.
2018-08-07 08:29:48 -04:00
Pierre Souchay cec5d72396 BUGFIX: Unit test relying on WaitForLeader() did not work due to wrong test (#4472)
- Improve resilience of testrpc.WaitForLeader()

- Add additionall retry to CI

- Increase "go test" timeout to 8m

- Add wait for cluster leader to several tests in the agent package

- Add retry to some tests in the api and command packages
2018-08-06 19:46:09 -04:00
Siva Prasad 589b589b53
CA initialization while boostrapping and TestLeader_ChangeServerID fix. (#4493)
* connect: fix an issue with Consul CA bootstrapping being interrupted

* streamline change server id test
2018-08-06 16:15:24 -04:00
Kyle Havlovitz fa0d8aff33
fix inconsistency in TestConnectCAConfig_GetSet 2018-07-26 07:46:47 -07:00
Kyle Havlovitz ed87949385
Merge pull request #4400 from hashicorp/leaf-cert-ttl
Add configurable leaf cert TTL to Connect CA
2018-07-25 17:53:25 -07:00
Paul Banks 8cbeb29e73
Fixes #4421: General solution to stop blocking queries with index 0 (#4437)
* Fix theoretical cache collision bug if/when we use more cache types with same result type

* Generalized fix for blocking query handling when state store methods return zero index

* Refactor test retry to only affect CI

* Undo make file merge

* Add hint to error message returned to end-user requests if Connect is not enabled when they try to request cert

* Explicit error for Roots endpoint if connect is disabled

* Fix tests that were asserting old behaviour
2018-07-25 20:26:27 +01:00
Kyle Havlovitz ce10de036e
connect/ca: check LeafCertTTL when rotating expired roots 2018-07-20 16:04:04 -07:00
Kyle Havlovitz d6ca015a42
connect/ca: add configurable leaf cert TTL 2018-07-16 13:33:37 -07:00
Matt Keeler 63d5c069fc
Merge pull request #4379 from hashicorp/persist-intermediates
connect: persist intermediate CAs on leader change
2018-07-12 12:09:13 -04:00
Matt Keeler 0e83059d1f
Revert "Allow changing Node names since Node now have IDs" 2018-07-12 11:19:21 -04:00
Matt Keeler 91150cca59 Fixup formatting 2018-07-12 10:14:26 -04:00
Matt Keeler 3807e04de9 Revert PR 4294 - Catalog Register: Generate UUID for services registered without one
UUID auto-generation here causes trouble in a few cases. The biggest being older
nodes reregistering will fail when the UUIDs are different and the names match

This reverts commit 0f70034082.
This reverts commit d1a8f9cb3f.
This reverts commit cf69ec42a4.
2018-07-12 10:06:50 -04:00
Kyle Havlovitz f95c6807e7
connect: use reflect.DeepEqual instead for test 2018-07-11 13:10:58 -07:00
Matt Keeler 98ead2a8f8
Merge pull request #3983 from pierresouchay/node_renaming
Allow changing Node names since Node now have IDs
2018-07-11 16:03:02 -04:00
Kyle Havlovitz 4e5fb6bc19
connect: add provider state to snapshots 2018-07-11 11:34:49 -07:00
Kyle Havlovitz 462ace4867
connect: update leader initializeCA comment 2018-07-11 10:00:42 -07:00
Kyle Havlovitz 1d3f4b5099
connect: persist intermediate CAs on leader change 2018-07-11 09:44:30 -07:00
Pierre Souchay fecae3de21 When renaming a node, ensure the name is not taken by another node.
Since DNS is case insensitive and DB as issues when similar names with different
cases are added, check for unicity based on case insensitivity.

Following another big incident we had in our cluster, we also validate
that adding/renaming a not does not conflicts with case insensitive
matches.

We had the following error once:

 - one node called: mymachine.MYDC.mydomain was shut off
 - another node (different ID) was added with name: mymachine.mydc.mydomain before
   72 hours

When restarting the consul server of domain, the consul server restarted failed
to start since it detected an issue in RAFT database because
mymachine.MYDC.mydomain and mymachine.mydc.mydomain had the same names.

Checking at registration time with case insensitivity should definitly fix
those issues and avoid Consul DB corruption.
2018-07-11 14:42:54 +02:00
Matt Keeler d19c7d8882
Merge pull request #4303 from pierresouchay/non_blocking_acl
Only send one single ACL cache refresh across network when TTL is over
2018-07-10 08:57:33 -04:00
MagnumOpus21 300330e24b Agent/Proxy: Formatting and test cases fix 2018-07-09 12:46:10 -04:00
Kyle Havlovitz 401b206a2e
Store the time CARoot is rotated out instead of when to prune 2018-07-06 16:05:25 -07:00
Kyle Havlovitz 1492243e0a
connect/ca: add logic for pruning old stale RootCA entries 2018-07-02 10:35:05 -07:00
Pierre Souchay bd023f352e Updated swith case to use same branch for async-cache and extend-cache 2018-07-02 17:39:34 +02:00
Pierre Souchay 1e7665c0d5 Updated documentation and adding more test case for async-cache 2018-07-01 23:50:30 +02:00
Pierre Souchay abde81a3e7 Added async-cache with similar behaviour as extend-cache but asynchronously 2018-07-01 23:50:30 +02:00
Pierre Souchay 9406ca1c95 Only send one single ACL cache refresh across network when TTL is over
It will allow the following:

 * when connectivity is limited (saturated linnks between DCs), only one
   single request to refresh an ACL will be sent to ACL master DC instead
   of statcking ACL refresh queries
 * when extend-cache is used for ACL, do not wait for result, but refresh
   the ACL asynchronously, so no delay is not impacting slave DC
 * When extend-cache is not used, keep the existing blocking mechanism,
   but only send a single refresh request.

This will fix https://github.com/hashicorp/consul/issues/3524
2018-07-01 23:50:30 +02:00
Matt Keeler 22b7b688a3
Move starting enterprise functionality 2018-06-29 17:38:29 -04:00
Matt Keeler 0f70034082 Move default uuid test into the consul package 2018-06-27 09:21:58 -04:00
Matt Keeler d1a8f9cb3f go fmt changes 2018-06-27 09:07:22 -04:00
Matt Keeler cf69ec42a4 Make sure to generate UUIDs when services are registered without one
This makes the behavior line up with the docs and expected behavior
2018-06-26 17:04:08 -04:00
mkeeler 6813a99081 Merge remote-tracking branch 'connect/f-connect' 2018-06-25 19:42:51 +00:00
Kyle Havlovitz 3baa67cdef connect/ca: pull the cluster ID from config during a rotation 2018-06-25 12:25:42 -07:00
Kyle Havlovitz b4ef7bb64d connect/ca: leave blank root key/cert out of the default config (unnecessary) 2018-06-25 12:25:42 -07:00
Kyle Havlovitz 050da22473 connect/ca: undo the interface changes and use sign-self-issued in Vault 2018-06-25 12:25:42 -07:00
Kyle Havlovitz bc997688e3 connect/ca: update Consul provider to use new cross-sign CSR method 2018-06-25 12:25:41 -07:00
Kyle Havlovitz 226a59215d connect/ca: fix vault provider URI SANs and test 2018-06-25 12:25:41 -07:00
Kyle Havlovitz 1a8ac686b2 connect/ca: add the Vault CA provider 2018-06-25 12:25:41 -07:00
Paul Banks e33bfe249e Note leadership issues in comments 2018-06-25 12:25:41 -07:00
Paul Banks e514570dfa Actually return Intermediate certificates bundled with a leaf! 2018-06-25 12:25:40 -07:00
Paul Banks 2e223ea2b7 Fix hot loop in cache for RPC returning zero index. 2018-06-25 12:25:37 -07:00
Paul Banks 05a8097c5d Fix misc test failures (some from other PRs) 2018-06-25 12:25:13 -07:00
Paul Banks 382ce8f98a Only set precedence on write path 2018-06-25 12:25:13 -07:00
Paul Banks 4a54f8f7e3 Fix some tests failures caused by the sorting change and some cuased by previous UpdatePrecedence() change 2018-06-25 12:25:13 -07:00
Paul Banks bf7a62e0e0 Sort intention list by precedence 2018-06-25 12:25:13 -07:00
Kyle Havlovitz edbeeeb23c agent: update accepted CA config fields and defaults 2018-06-25 12:25:09 -07:00
Mitchell Hashimoto 028aa78e83 agent/consul: set precedence value on struct itself 2018-06-25 12:24:16 -07:00
Mitchell Hashimoto daf46c9cfa agent/consul: support a Connect option on prepared query request 2018-06-25 12:24:12 -07:00
Mitchell Hashimoto 440b1b2d97 agent/consul: prepared query supports "Connect" field 2018-06-25 12:24:11 -07:00
Mitchell Hashimoto 1830c6b308 agent: switch ConnectNative to an embedded struct 2018-06-25 12:24:10 -07:00
Mitchell Hashimoto eb3fcb39b3 agent/consul/state: support querying by Connect native 2018-06-25 12:24:08 -07:00
Mitchell Hashimoto d6a823ad0d agent/consul: support catalog registration with Connect native 2018-06-25 12:24:07 -07:00
Matt Keeler af910bda39
Merge pull request #4216 from hashicorp/rpc-limiting
Make RPC limits reloadable
2018-06-20 09:05:28 -04:00
Mitchell Hashimoto 1906fe1c0d
agent: address feedback 2018-06-14 09:42:20 -07:00
Mitchell Hashimoto 0accfc1628
agent: rename test to check 2018-06-14 09:42:18 -07:00
Mitchell Hashimoto 2a29679e9d
agent/consul: forward request if necessary 2018-06-14 09:42:17 -07:00
Mitchell Hashimoto 54ac5adb08
agent: comments to point to differing logic 2018-06-14 09:42:17 -07:00
Mitchell Hashimoto d68462fca6
agent/consul: implement Intention.Test endpoint 2018-06-14 09:42:17 -07:00
Paul Banks f4b8e8c96d
Add default CA config back - I didn't add it and causes nil panics 2018-06-14 09:42:17 -07:00
Paul Banks 1228a5839a
Ooops remove the CA stuff from actual server defaults and make it test server only 2018-06-14 09:42:16 -07:00
Paul Banks 4aeab3897c
Fixed many tests after rebase. Some still failing and seem unrelated to any connect changes. 2018-06-14 09:42:16 -07:00
Paul Banks b4803eca59
Generate CSR using real trust-domain 2018-06-14 09:42:16 -07:00
Paul Banks 622a475eb1
Add CSR signing verification of service ACL, trust domain and datacenter. 2018-06-14 09:42:16 -07:00
Paul Banks c1f2025d96
Return TrustDomain from CARoots RPC 2018-06-14 09:42:15 -07:00
Kyle Havlovitz e00088e8ee
Rename some of the CA structs/files 2018-06-14 09:42:15 -07:00
Kyle Havlovitz 6e9f1f8acb
Add more metadata to structs.CARoot 2018-06-14 09:42:15 -07:00
Kyle Havlovitz 627aa80d5a
Use provider state table for a global serial index 2018-06-14 09:42:15 -07:00
Kyle Havlovitz de72834b8c
Move connect CA provider to separate package 2018-06-14 09:42:15 -07:00
Mitchell Hashimoto bc605a1576
agent/consul: change provider wait from goto to a loop 2018-06-14 09:42:14 -07:00
Mitchell Hashimoto c8b65217c3
agent/consul: check nil on getCAProvider result 2018-06-14 09:42:14 -07:00
Mitchell Hashimoto 9b3495dddb
agent/consul: retry reading provider a few times 2018-06-14 09:42:14 -07:00
Paul Banks 90c574ebaa
Wire up agent leaf endpoint to cache framework to support blocking. 2018-06-14 09:42:07 -07:00
Kyle Havlovitz a4d18f0eaa
Fill out connect CA rpc endpoint tests 2018-06-14 09:42:06 -07:00
Kyle Havlovitz cce7f1cca1
Add tests for the built in CA's state store table 2018-06-14 09:42:06 -07:00
Kyle Havlovitz 15fbc2fd97
Add more tests for built-in provider 2018-06-14 09:42:06 -07:00
Kyle Havlovitz edcfdb37af
Fix some inconsistencies around the CA provider code 2018-06-14 09:42:06 -07:00
Kyle Havlovitz daa8dd1779
Add CA config to connect section of agent config 2018-06-14 09:42:05 -07:00
Kyle Havlovitz 32d1eae28b
Move ConsulCAProviderConfig into structs package 2018-06-14 09:42:04 -07:00
Kyle Havlovitz 315b8bf594
Simplify the CAProvider.Sign method 2018-06-14 09:42:04 -07:00
Kyle Havlovitz c6e1b72ccb
Simplify the CA provider interface by moving some logic out 2018-06-14 09:42:04 -07:00
Kyle Havlovitz a325388939
Clarify some comments and names around CA bootstrapping 2018-06-14 09:42:04 -07:00
Kyle Havlovitz 33418afd3c
Add cross-signing mechanism to root rotation 2018-06-14 09:42:00 -07:00
Kyle Havlovitz d83fbfc766
Add the root rotation mechanism to the CA config endpoint 2018-06-14 09:41:59 -07:00
Kyle Havlovitz f9d92d795e
Have the built in CA store its state in raft 2018-06-14 09:41:59 -07:00
Kyle Havlovitz 30c1973e8b
Fix the testing endpoint's root set op 2018-06-14 09:41:59 -07:00
Kyle Havlovitz ab737ef0f8
Hook the CA RPC endpoint into the provider interface 2018-06-14 09:41:59 -07:00
Kyle Havlovitz 1f6501895f
Add CA bootstrapping on establishing leadership 2018-06-14 09:41:59 -07:00
Kyle Havlovitz 682f105c7c
Add the bootstrap config for the CA 2018-06-14 09:41:59 -07:00
Kyle Havlovitz 1787f88618
Add CA config set to fsm operations 2018-06-14 09:41:58 -07:00
Kyle Havlovitz 6b3416e480
Add the Connect CA config to the state store 2018-06-14 09:41:58 -07:00
Paul Banks 730da74369
Fix various test failures and vet warnings.
Intention de-duplication in previously merged PR actualy failed some tests that were not caught be me or CI. I ran the test files for state changes but they happened not to trigger this case so I made sure they did first and then fixed. That fixed some upstream intention endpoint tests that I'd not run as part of testing the previous fix.
2018-06-14 09:41:58 -07:00
Paul Banks 88541bba17
Add tests all the way up through the endpoints to ensure duplicate src/destination is supported and so ultimately deny/allow nesting works.
Also adds a sanity check test for `api.Agent().ConnectAuthorize()` and a fix for a trivial bug in it.
2018-06-14 09:41:57 -07:00
Paul Banks ed9f07c361
Allow duplicate source or destination, but enforce uniqueness across all four. 2018-06-14 09:41:57 -07:00
Mitchell Hashimoto 845f7cd8ad
agent/consul/state: ensure exactly one active CA exists when setting 2018-06-14 09:41:54 -07:00
Mitchell Hashimoto 17ca8ad083
agent/connect: rename SpiffeID to CertURI 2018-06-14 09:41:53 -07:00
Mitchell Hashimoto 0cbcb07d61
agent/connect: use proper keyusage fields for CA and leaf 2018-06-14 09:41:53 -07:00
Mitchell Hashimoto a54d1af421
agent/consul: encode issued cert serial number as hex encoded 2018-06-14 09:41:53 -07:00
Mitchell Hashimoto 63d674d07d
agent: /v1/connect/ca/configuration PUT for setting configuration 2018-06-14 09:41:52 -07:00
Mitchell Hashimoto 1c3dbc83ff
agent/consul/fsm,state: snapshot/restore for CA roots 2018-06-14 09:41:52 -07:00
Mitchell Hashimoto 90f423fd02
agent/consul/fsm,state: tests for CA root related changes 2018-06-14 09:41:52 -07:00
Mitchell Hashimoto 1c72639d60
agent/consul: set more fields on the issued cert 2018-06-14 09:41:52 -07:00
Mitchell Hashimoto c2588262b7
agent: /v1/connect/ca/leaf/:service_id 2018-06-14 09:41:52 -07:00
Mitchell Hashimoto e40afd6a73
agent/consul: CAS operations for setting the CA root 2018-06-14 09:41:51 -07:00
Mitchell Hashimoto 578db06600
agent/consul: tests for CA endpoints 2018-06-14 09:41:51 -07:00
Mitchell Hashimoto 891cd22ad9
agent/consul: key the public key of the CSR, verify in test 2018-06-14 09:41:51 -07:00
Mitchell Hashimoto d768d5e9a7
agent/consul: test for ConnectCA.Sign 2018-06-14 09:41:51 -07:00
Mitchell Hashimoto f4ec28bfe3
agent/consul: basic sign endpoint not tested yet 2018-06-14 09:41:51 -07:00
Mitchell Hashimoto 5a950190f3
agent/consul: RPC endpoints to list roots 2018-06-14 09:41:50 -07:00
Mitchell Hashimoto 130098b7b5
agent/consul/state: CARoot structs and initial state store 2018-06-14 09:41:49 -07:00
Mitchell Hashimoto 4d852e62a3
agent: address PR feedback 2018-06-14 09:41:49 -07:00
Mitchell Hashimoto 6313bc5615
agent: clarified a number of comments per PR feedback 2018-06-14 09:41:49 -07:00
Mitchell Hashimoto 353953fcd2
agent/consul: Health.ServiceNodes ACL check for Connect 2018-06-14 09:41:49 -07:00
Mitchell Hashimoto b6c0cb7115
agent/consul: Catalog endpoint ACL requirements for Connect proxies 2018-06-14 09:41:49 -07:00
Mitchell Hashimoto 2feef5f7a3
agent/consul: require name for proxies 2018-06-14 09:41:48 -07:00
Mitchell Hashimoto 44ec8d94d2
agent: clean up connect/non-connect duplication by using shared methods 2018-06-14 09:41:48 -07:00
Mitchell Hashimoto 7d79f9c46f
agent/consul: implement Health.ServiceNodes for Connect, DNS works 2018-06-14 09:41:47 -07:00
Mitchell Hashimoto e01914a025
agent/consul: Catalog.ServiceNodes supports Connect filtering 2018-06-14 09:41:47 -07:00
Mitchell Hashimoto 2062e37270
agent/consul/state: ConnectServiceNodes 2018-06-14 09:41:47 -07:00
Mitchell Hashimoto 7ed26e2c64
agent/consul: enforce ACL on ProxyDestination 2018-06-14 09:41:47 -07:00
Mitchell Hashimoto 0c0c0a58e7
agent/consul: proxy registration and tests 2018-06-14 09:41:46 -07:00
Mitchell Hashimoto 4d4a8443e8
agent: test /v1/catalog/node/:node to list connect proxies 2018-06-14 09:41:46 -07:00
Mitchell Hashimoto 6e257ea51c
agent: /v1/catalog/service/:service works with proxies 2018-06-14 09:41:46 -07:00
Mitchell Hashimoto 63e4a35827
agent/consul/state: convert proxy test to testify/assert 2018-06-14 09:41:46 -07:00
Mitchell Hashimoto 21c6fc623a
agent/consul/state: service registration with proxy works 2018-06-14 09:41:46 -07:00
Mitchell Hashimoto a621afe72c
agent/consul: convert intention ACLs to testify/assert 2018-06-14 09:41:46 -07:00
Mitchell Hashimoto 9dc8aa0fb3
agent/consul,structs: add tests for ACL filter and prefix for intentions 2018-06-14 09:41:45 -07:00
Mitchell Hashimoto 5ac649af7f
agent/consul: Intention.Match ACLs 2018-06-14 09:41:45 -07:00
Mitchell Hashimoto 4d87601bf4
agent/consul: Intention.Get ACLs 2018-06-14 09:41:45 -07:00
Mitchell Hashimoto 9bbbb73734
agent/consul: Intention.Apply ACL on rename 2018-06-14 09:41:45 -07:00
Mitchell Hashimoto 01b644e213
agent/consul: tests for ACLs on Intention.Apply update/delete 2018-06-14 09:41:45 -07:00
Mitchell Hashimoto a67ff1c0dc
agent/consul: Basic ACL on Intention.Apply 2018-06-14 09:41:44 -07:00
Mitchell Hashimoto 0719ff6905
agent: convert all intention tests to testify/assert 2018-06-14 09:41:44 -07:00
Mitchell Hashimoto 454ef7d106
agent/consul/fsm,state: snapshot/restore for intentions 2018-06-14 09:41:44 -07:00
Mitchell Hashimoto 80d068aaa4
agent: use UTC time for intention times, move empty list check to
agent/consul
2018-06-14 09:41:43 -07:00
Mitchell Hashimoto 370b2599a1
agent/consul/fsm: switch tests to use structs.TestIntention 2018-06-14 09:41:43 -07:00
Mitchell Hashimoto 97e2a73145
agent/consul/state: need to set Meta for intentions for tests 2018-06-14 09:41:43 -07:00
Mitchell Hashimoto ad42f42a17
agent/consul/state: remove TODO 2018-06-14 09:41:43 -07:00
Mitchell Hashimoto 70858598e4
agent: use testing intention to get valid intentions 2018-06-14 09:41:43 -07:00
Mitchell Hashimoto ab4ea3efb4
agent/consul: set default intention SourceType, validate it 2018-06-14 09:41:43 -07:00
Mitchell Hashimoto d92993f75b
agent/structs: Intention validation 2018-06-14 09:41:42 -07:00
Mitchell Hashimoto 82a50245e0
agent/consul: support intention description, meta is non-nil 2018-06-14 09:41:42 -07:00
Mitchell Hashimoto c12690b837
agent/consul/fsm: add tests for intention requests 2018-06-14 09:41:42 -07:00
Mitchell Hashimoto a9743f4f15
agent,agent/consul: set default namespaces 2018-06-14 09:41:42 -07:00
Mitchell Hashimoto 10c370c0fb
agent/consul: set CreatedAt, UpdatedAt on intentions 2018-06-14 09:41:42 -07:00
Mitchell Hashimoto 93de03fe8b
agent/consul: RPC endpoint for Intention.Match 2018-06-14 09:41:42 -07:00
Mitchell Hashimoto f93edadbbe
agent/consul/state: IntentionMatch for performing match resolution 2018-06-14 09:41:41 -07:00
Mitchell Hashimoto fb02e53536
agent/consul: test that Apply works to delete an intention 2018-06-14 09:41:41 -07:00
Mitchell Hashimoto 4417f37ede
agent/consul/state,fsm: support for deleting intentions 2018-06-14 09:41:41 -07:00
Mitchell Hashimoto 1b44c1befa
agent/consul: creating intention must not have ID set 2018-06-14 09:41:40 -07:00
Mitchell Hashimoto 771b1737e3
agent/consul: support updating intentions 2018-06-14 09:41:40 -07:00
Mitchell Hashimoto 0d96cdc0a5
agent: GET /v1/connect/intentions/:id 2018-06-14 09:41:40 -07:00
Mitchell Hashimoto e8c4156f07
agent/consul: Intention.Get endpoint 2018-06-14 09:41:40 -07:00
Mitchell Hashimoto 9e307e178e
agent/consul: Intention.Apply, FSM methods, very little validation 2018-06-14 09:41:39 -07:00
Mitchell Hashimoto 212a272989
agent/consul: start Intention RPC endpoints, starting with List 2018-06-14 09:41:39 -07:00
Mitchell Hashimoto 9639bfb1be
agent/consul/state: list intentions 2018-06-14 09:41:39 -07:00
Mitchell Hashimoto cc8a6f7f15
agent/consul/state: initial work on intentions memdb table 2018-06-14 09:41:39 -07:00
Guido Iaquinti f7fe6c2a87 Attach server.Name label to client.rpc.failed 2018-06-13 14:56:14 +01:00
Guido Iaquinti 3d230dee80 Attach server.ID label to client.rpc.failed 2018-06-13 14:53:44 +01:00
Guido Iaquinti e85e63c18c Client: add metric for failed RPC calls to server 2018-06-13 12:35:45 +01:00
Matt Keeler 0df7cd22aa Add a Client ReloadConfig test 2018-06-11 16:23:51 -04:00
Matt Keeler 08e26d10b8 Merge branch 'master' of github.com:hashicorp/consul into rpc-limiting
# Conflicts:
#	agent/agent.go
#	agent/consul/client.go
2018-06-11 16:11:36 -04:00
Matt Keeler 65746b2f8f Apply the limits to the clients rpcLimiter 2018-06-11 15:51:17 -04:00
Matt Keeler b6e9abe926 Allow for easy enterprise/oss coexistence
Uses struct/interface embedding with the embedded structs/interfaces being empty for oss. Also methods on the server/client types are defaulted to do nothing for OSS
2018-05-24 10:36:42 -04:00
Wim 5c04864b28 Add support for reverse lookup of services 2018-05-19 19:39:02 +02:00
Preetha Appan ca67094619
Change default raft threshold config values and add a section to upgrade notes 2018-05-11 10:45:41 -05:00
Preetha Appan d721da7b67
Also make snapshot interval configurable 2018-05-11 10:43:24 -05:00
Preetha Appan 66f31cd25a
Make raft snapshot commit threshold configurable 2018-05-11 10:43:24 -05:00
Jack Pearkes 291e8b83ae
Merge pull request #4097 from hashicorp/remove-deprecated
Remove deprecated check/service fields and metric names
2018-05-10 15:45:49 -07:00
Kyle Havlovitz ba3971d2c1
Remove deprecated metric names 2018-05-08 16:23:15 -07:00
Paul Banks b7fa3358d1
Merge pull request #3970 from pierresouchay/node_health_should_change_service_index
[BUGFIX] When a node level check is removed, ensure all services of node are notified
2018-05-08 16:44:50 +01:00
Pierre Souchay c152cb7bdf Added Missing Service Meta synchronization and field 2018-04-21 17:34:29 +02:00
Pierre Souchay 89ab642928 Allow renaming nodes when ID is unchanged 2018-04-18 15:39:38 +02:00
Kyle Havlovitz af4be34a2a
Update make static-assets goal and run format 2018-04-13 09:57:25 -07:00
Matt Keeler d926679278
Merge pull request #4023 from hashicorp/f-near-ip
Add near=_ip support for prepared queries
2018-04-12 12:10:48 -04:00
Matt Keeler 136efeb3be GH-3798: A couple more PR updates
Test HTTP/DNS source IP without header/extra EDNS data.
Add WARN log for when prepared query with near=_ip is executed without specifying the source ip
2018-04-12 10:10:37 -04:00
Matt Keeler cec8d5145b GH-3798: A few more PR updates 2018-04-11 20:32:35 -04:00
Matt Keeler d065d3a6db GH-3798: Updates for PR
Allow DNS peer IP as the source IP.
Break early when the right node was found for executing the preapred query.
Update docs
2018-04-11 17:02:04 -04:00
Matt Keeler 45a537def9 GH-3798: Add near=_ip support for prepared queries 2018-04-10 14:50:50 -04:00
Paul Banks 0d8993e338
Allow ignoring checks by ID when defining a PreparedQuery. Fixes #3727. 2018-04-10 14:04:16 +01:00
Preetha Appan c7581d68c6
Renames agent API layer for service metadata to "meta" for consistency 2018-03-28 09:04:50 -05:00
Preetha daa61c5803
Merge pull request #3881 from pierresouchay/service_metadata
Feature Request: Support key-value attributes for services
2018-03-27 16:33:57 -05:00
Pierre Souchay 980189a33f Added validation of ServiceMeta in Catalog
Fixed Error Message when ServiceMeta is not valid

Added Unit test for adding a Service with badly formatted ServiceMeta
2018-03-27 22:22:42 +02:00
Preetha Appan 226cb2e95c
fix typo and remove comment 2018-03-27 14:28:05 -05:00
Preetha Appan 010a459365
Remove unnecessary nil checks 2018-03-27 10:59:42 -05:00
Preetha Appan 6c0bb5a810
Fix test and remove unused method 2018-03-27 09:44:41 -05:00
Preetha Appan d77ab91123
Allows disabling WAN federation by setting serf WAN port to -1 2018-03-26 14:21:06 -05:00
Pierre Souchay a9868ae956 Added support for renaming nodes when their IP does not change 2018-03-26 16:44:13 +02:00
Pierre Souchay 18baff80ae Merge remote-tracking branch 'origin/master' into node_health_should_change_service_index 2018-03-22 13:07:11 +01:00
Pierre Souchay 5fb1b18073 More test cases 2018-03-22 12:41:06 +01:00
Pierre Souchay 39a7b5c20d Added new test regarding checks index 2018-03-22 12:20:25 +01:00
Pierre Souchay dd9efb755a Fixed minor typo in comments
Might fix unstable travis build
2018-03-22 10:30:10 +01:00
Josh Soref 94835a2715 Spelling (#3958)
* spelling: another

* spelling: autopilot

* spelling: beginning

* spelling: circonus

* spelling: default

* spelling: definition

* spelling: distance

* spelling: encountered

* spelling: enterprise

* spelling: expands

* spelling: exits

* spelling: formatting

* spelling: health

* spelling: hierarchy

* spelling: imposed

* spelling: independence

* spelling: inspect

* spelling: last

* spelling: latest

* spelling: client

* spelling: message

* spelling: minimum

* spelling: notify

* spelling: nonexistent

* spelling: operator

* spelling: payload

* spelling: preceded

* spelling: prepared

* spelling: programmatically

* spelling: required

* spelling: reconcile

* spelling: responses

* spelling: request

* spelling: response

* spelling: results

* spelling: retrieve

* spelling: service

* spelling: significantly

* spelling: specifies

* spelling: supported

* spelling: synchronization

* spelling: synchronous

* spelling: themselves

* spelling: unexpected

* spelling: validations

* spelling: value
2018-03-19 16:56:00 +00:00
Pierre Souchay b6914617d9 Fixed typo in comments 2018-03-19 17:12:08 +01:00
Pierre Souchay 5e974843f1 Refactoring to have clearer code without weird bool 2018-03-19 16:12:54 +01:00
Pierre Souchay a44b9e84b1 [BUGFIX] When a node level check is removed, ensure all services of node are notified
Bugfix for https://github.com/hashicorp/consul/pull/3899

When a node level check is removed (example: maintenance),
some watchers on services might have to recompute their state.

If those nodes are performing blocking queries, they have to be notified.
While their state was updated when node-level state did change or was added
this was not the case when the check was removed. This fixes it.
2018-03-19 14:14:03 +01:00
Devin Canterberry a61abcd931
🐛 Formatting changes only; add missing trailing commas 2018-03-15 10:19:46 -07:00
Mitchell Hashimoto 8217564c48
agent/consul/fsm: begin using testify/assert 2018-03-06 09:48:15 -08:00
Paul Banks 9a47449c6d
Merge pull request #3899 from pierresouchay/fix_blocking_queries_index
Services Indexes modified per service instead of using a global Index
2018-03-02 16:24:43 +00:00
Pierre Souchay 360dc1dd8d Simplified error handling for maxIndexForService
* added unit tests to ensure service index is properly garbage collected
* added Upgrade from Version 1.0.6 to higher section in documentation
2018-03-01 14:09:36 +01:00
Preetha Appan 80791d5b21
Remove extra newline 2018-02-21 13:21:47 -06:00
Preetha Appan 907b97b7f2
Unit test that calls revokeLeadership twice to make sure its idempotent 2018-02-21 12:48:53 -06:00
Preetha Appan f59abcc394
Make sure revokeLeadership is called if establishLeadership errors 2018-02-21 12:33:22 -06:00
Alex Dadgar 18bf9647d5 Test autopilots start/stop idempotency 2018-02-21 10:19:30 -08:00
Alex Dadgar 33c5afdb31 Improve autopilot shutdown to be idempotent 2018-02-20 15:51:59 -08:00
Pierre Souchay a8d3745104 Fixed comments for function maxIndexForService 2018-02-20 23:57:28 +01:00
Pierre Souchay 09351ba9a6 [Revert] Only update services if tags are different
This patch did give some better results, but break watches on
the services of a node.

It is possible to apply the same optimization for nodes than
to services (one index per instance), but it would complicate
further the patch.

Let's do it in another PR.
2018-02-20 23:34:42 +01:00
Pierre Souchay 60454b570a Only update services if tags are different 2018-02-20 23:08:04 +01:00
Pierre Souchay a05d38737c Enable Raft index optimization per service name on health endpoint
Had to fix unit test in order to check properly indexes.
2018-02-20 01:35:50 +01:00
Pierre Souchay 4f10fae3c3 Get only first service to test whether we have to cleanup index of a service 2018-02-19 22:44:49 +01:00
Pierre Souchay bac8fb046f Fixed comment about raftIndex + use test.Helper() 2018-02-19 19:30:25 +01:00
Pierre Souchay 73127ef407 Services Indexes modified per service instead of using a global Index
This patch improves the watches for services on large cluster:
each service has now its own index, such watches on a specific service
are not modified by changes in the global catalog.

It should improve a lot the performance of tools such as consul-template
or libraries performing watches on very large clusters with many
services/watches.
2018-02-19 18:29:22 +01:00
Veselkov Konstantin 7de57ba4de remove golint warnings 2018-01-28 22:40:13 +04:00
Kyle Havlovitz bfeb09983b
Reset clusterHealth when autopilot starts 2018-01-23 12:52:28 -08:00
Kyle Havlovitz 17805e4634
Move autopilot health loop into leader operations 2018-01-23 11:17:41 -08:00
James Phillips da6a4635b0
Fixes a `go fmt` cleanup. 2017-12-20 13:43:38 -08:00
Kyle Havlovitz 11a0c9cc58
Fix vet error 2017-12-18 18:04:42 -08:00
Kyle Havlovitz 77dc52f430
Move autopilot initializing to oss file 2017-12-18 18:02:44 -08:00
Kyle Havlovitz 039e7f1880
Move autopilot setup to a separate file 2017-12-18 16:55:51 -08:00
Kyle Havlovitz d08ab9fd19
Make some final tweaks to autopilot package 2017-12-18 12:26:47 -08:00
Kyle Havlovitz a86d11ec0a
Merge pull request #3737 from hashicorp/autopilot-refactor
Move autopilot to a standalone package
2017-12-15 14:09:40 -08:00
James Phillips 06f980061e
Merge pull request #3728 from weiwei04/fix_globalRPC_goroutine_leak
fix globalRPC goroutine leak
2017-12-14 17:54:19 -08:00
Kyle Havlovitz 324c2ecb53
Expose IsPotentialVoter for advanced autopilot logic 2017-12-13 17:53:51 -08:00
Kyle Havlovitz 12bf61c851
Merge branch 'master' into autopilot-refactor 2017-12-13 11:54:32 -08:00
Kyle Havlovitz d6b266c045
A few last autopilot adjustments 2017-12-13 11:19:17 -08:00
Kyle Havlovitz 2310687c1d
More autopilot reorganizing 2017-12-13 10:57:37 -08:00
James Phillips 46742a5041
Adds TODOs referencing #3744. 2017-12-13 10:52:06 -08:00
Kyle Havlovitz b92f895c23
More refactoring to make autopilot consul-agnostic 2017-12-12 17:46:28 -08:00
Kyle Havlovitz de28555671
Move autopilot to a standalone package 2017-12-11 16:45:33 -08:00
James Phillips d12e81860f
Moves Serf helper into lib to fix import cycle in consul-enterprise. 2017-12-07 16:57:58 -08:00
James Phillips 5065f3d82e
Turns of intent queue warnings and enables dynamic queue sizing. 2017-12-07 16:27:06 -08:00
Wei Wei cc9648c957 fix globalRPC goroutine leak
Signed-off-by: Wei Wei <weiwei.inf@gmail.com>
2017-12-05 11:53:30 +08:00
James Phillips 3e46544085
Creates a registration mechanism for snapshot and restore. 2017-11-29 18:36:53 -08:00
James Phillips f53f521072
Begins split out of snapshots from the main FSM class. 2017-11-29 18:36:53 -08:00
James Phillips c8e763667f
Creates a registration mechanism for FSM commands. 2017-11-29 18:36:53 -08:00
James Phillips 78292662d7
Moves the FSM into its own package.
This will help make it clearer what happens when we add some registration
plumbing for the different operations and snapshots.
2017-11-29 18:36:53 -08:00
James Phillips e810697e06
Resolves an FSM snapshot TODO.
This adds checks for sink write calls before we continue the refactor, which
will resolve the other TODO comment we deleted as part of this change.
2017-11-29 18:36:53 -08:00
James Phillips aa61159b74
Creates a registration mechanism for schemas.
This also splits out the registration into the table-specific source
files.
2017-11-29 18:36:52 -08:00
James Phillips 93ff33b1be
Creates a registration mechanism for RPC endpoints. 2017-11-29 18:36:52 -08:00
James Phillips 8bf1f57737
Renames stubs to be more consistent. 2017-11-29 18:36:52 -08:00
James Phillips 8abd2050fa
Sheds monotonic time info so tombstone GC bins work properly. 2017-11-29 10:34:24 -08:00
James Phillips de57a9ef51
Gives back the lock before writing to the expire channel.
The lock isn't needed after we clean up the expire bin, and as seen
in #3700 we can get into a deadlock waiting to place the expire index
into the channel while holding this lock.

Fixes #3700
2017-11-19 16:24:16 -08:00
James Phillips f19ba41144
Moves the LAN event handler after the router is created.
Fixes #3680
2017-11-10 12:26:48 -08:00
James Phillips 17737ee030
Revert "Adds a small sleep to make sure we are in the next GC bucket." 2017-11-08 22:18:37 -08:00
James Phillips 24475048e2
Adds a sleep to make sure we are in the next GC bucket, ups time.
Fixes #3670
2017-11-08 22:02:40 -08:00
James Phillips c57884fffe
Skips the tombstone GC test in Travis for now.
Related to #3670
2017-11-08 20:14:20 -08:00
James Phillips f6b7dcbcf6
Removes bogus getPort() in favor of freeport. 2017-11-08 19:55:50 -08:00
James Phillips 7b966e2d26
Tightens timing up and reorders GC test to be less flaky. 2017-11-08 15:09:29 -08:00
James Phillips 7c6ab5e783
Doubles the GC timing. 2017-11-08 15:01:11 -08:00
James Phillips 8de7c77482
Opens up test timing a little more. 2017-11-08 14:01:19 -08:00
James Phillips c46612f691
Shifts off a gran boundary to help make test less flaky. 2017-11-08 13:57:17 -08:00
James Phillips f31856c1b7
Opens up the tombstone GC test timing. 2017-11-08 13:43:39 -08:00
Kyle Havlovitz d3dd2b1402
Move check definition to a sub-struct 2017-11-01 14:54:46 -07:00
Kyle Havlovitz dbab3cd5f6
Merge branch 'master' into esm-changes 2017-11-01 11:37:48 -07:00
Kyle Havlovitz c4375d5a47
Merge pull request #3622 from hashicorp/coordinate-node-endpoint
agent: add /v1/coordianate/node/:node endpoint
2017-11-01 11:35:50 -07:00
Kyle Havlovitz b0536a96cc
Fill out the tests around coordinate/node functionality 2017-10-31 15:36:44 -07:00
Kyle Havlovitz 1e3b0d441b
Factor out registerNodes function 2017-10-31 13:34:49 -07:00
James Phillips 6bf55d16a2
Relaxes Autopilot promotion logic. (#3623)
* Relaxes Autopilot promotion logic.

When we defaulted the Raft protocol version to 3 in #3477 we made
the numPeers() routine more strict to only count voters (this is
more conservative and more correct). This had the side effect of
breaking rolling updates because it's at odds with the Autopilot
non-voter promotion logic.

That logic used to wait to only promote to maintain an odd quorum
of servers. During a rolling update (add one new server, wait, and
then kill an old server) the dead server cleanup would still count
the old server as a peer, which is conservative and the right thing
to do, and no longer count the non-voter. This would wait to promote,
so you could get into a stalemate. It is safer to promote early than
remove early, so by promoting as soon as possible we have chosen
that as the solution here.

Fixes #3611

* Gets rid of unnecessary extra not-a-voter check.
2017-10-31 15:16:56 -05:00
Kyle Havlovitz 2392545adc
Merge branch 'coordinate-node-endpoint' of github.com:hashicorp/consul into esm-changes 2017-10-26 19:20:24 -07:00
Kyle Havlovitz 5589eadcf5
Added Coordinate.Node rpc endpoint and client api method 2017-10-26 19:16:40 -07:00
Kyle Havlovitz a7c42a6c2a
Expose SkipNodeUpdate field and some health check info in the http api 2017-10-25 19:37:30 +02:00
Frank Schroeder c94751ad43 test: replace porter tool with freeport lib
This patch removes the porter tool which hands out free ports from a
given range with a library which does the same thing. The challenge for
acquiring free ports in concurrent go test runs is that go packages are
tested concurrently and run in separate processes. There has to be some
inter-process synchronization in preventing processes allocating the
same ports.

freeport allocates blocks of ports from a range expected to be not in
heavy use and implements a system-wide mutex by binding to the first
port of that block for the lifetime of the application. Ports are then
provided sequentially from that block and are tested on localhost before
being returned as available.
2017-10-21 22:01:09 +02:00
Ryan Slade 85e4aea9d1 Replace time.Now().Sub(x) with time.Since(x) 2017-10-17 20:38:24 +02:00
James Phillips 575d70aaa7
Cleans up some drift between the OSS and Enterprise trees. 2017-10-11 15:53:07 -07:00
James Phillips bb12368eac Makes RPC handling more robust when rolling servers. (#3561)
* Adds client-side retry for no leader errors.

This paves over the case where the client was connected to the leader
when it loses leadership.

* Adds a configurable server RPC drain time and a fail-fast path for RPCs.

When a server leaves it gets removed from the Raft configuration, so it will
never know who the new leader server ends up being. Without this we'd be
doomed to wait out the RPC hold timeout and then fail. This makes things fail
a little quicker while a sever is draining, and since we added a client retry
AND since the server doing this has already shut down and left the Serf LAN,
clients should retry against some other server.

* Makes the RPC hold timeout configurable.

* Reorders struct members.

* Sets the RPC hold timeout default for test servers.

* Bumps the leave drain time up to 5 seconds.

* Robustifies retries with a simpler client-side RPC hold.

* Reverts untended delete.
2017-10-10 15:19:50 -07:00
James Phillips 4dab70cb93 Fixes handling of stop channel and failed barrier attempts. (#3546)
* Fixes handling of stop channel and failed barrier attempts.

There were two issues here. First, we needed to not exit when there
was a timeout trying to write the barrier, because Raft might not
step down, so we'd be left as the leader but having run all the step
down actions.

Second, we didn't close over the stopCh correctly, so it was possible
to nil that out and have the leaderLoop never exit. We close over it
properly AND sequence the nil-ing of it AFTER the leaderLoop exits for
good measure, so the code is more robust.

Fixes #3545

* Cleans up based on code review feedback.

* Tweaks comments.

* Renames variables and removes comments.
2017-10-06 07:54:49 -07:00
Kyle Havlovitz c728564994
Update metric names and add a legacy config flag 2017-10-04 16:43:27 -07:00
Preetha Appan 8dcd7e700c Remove extra newline 2017-10-03 15:19:31 -05:00
Preetha Appan 26accb3b8a Only allow 'list' policies within 'key' policy definitions. Consolidated two similar tests into one and fixed alignment. 2017-10-03 15:15:56 -05:00
Preetha Appan 51a04ec87d Introduces new 'list' permission that applies to KV store recursive reads, and enforced only when opted in. 2017-10-02 17:10:21 -05:00
James Phillips 0190c4a081
Gets rid of flaky clause in stats fetcher unit test.
Given how the rutine is coded we can still get data so this wasn't
a reliable thing to check.
2017-09-26 20:53:06 -07:00
preetapan 4d9fc638b4 Issue 3452 (#3500)
* Make sure that id and address are set in member created during reaping of catalog nodes that have been removed from serf

* Get address from node table in the state store rather than from service address

* Fix incorrect lookup by checkname instead of node name

* Make sure that serverlookup is called with the right address format, added unit test.

* Address code review comments

* Tweaks style stuff.
2017-09-26 20:49:41 -07:00
James Phillips 5fa2322e0b
Cleans up some edge cases in TestSnapshot_Forward_Leader.
These could cause the tests to hang.
2017-09-26 14:07:28 -07:00
Preetha Appan 3c4a108769 Move Raft protocol version for list peers end point to server side, fix unit tests. This fixes #3449 2017-09-26 09:35:39 -05:00
James Phillips 45646ac3f4 Bumps default Raft protocol to version 3. (#3477)
* Changes default Raft protocol to 3.

* Changes numPeers() to report only voters.

This should have been there before, but it's more obvious that this
is incorrect now that we default the Raft protocol to 3, which puts
new servers in a read-only state while Autopilot waits for them to
become healthy.

* Fixes TestLeader_RollRaftServer.

* Fixes TestOperator_RaftRemovePeerByAddress.

* Fixes TestServer_*.

Relaxed the check for a given number of voter peers and instead do
a thorough check that all servers see each other in their Raft
configurations.

* Fixes TestACL_*.

These now just check for Raft replication to be set up, and don't
care about the number of voter peers.

* Fixes TestOperator_Raft_ListPeers.

* Fixes TestAutopilot_CleanupDeadServerPeriodic.

* Fixes TestCatalog_ListNodes_ConsistentRead_Fail.

* Fixes TestLeader_ChangeServerID and adjusts the conn pool to throw away
sockets when it sees io.EOF.

* Changes version to 1.0.0 in the options doc.

* Makes metrics test more deterministic with autopilot metrics possible.
2017-09-25 15:27:04 -07:00
Preetha Appan d7e27e67c1 Introduce Code Policy validation via sentinel, with a noop implementation 2017-09-25 13:44:55 -05:00
Frank Schröder 12216583a1 New config parser, HCL support, multiple bind addrs (#3480)
* new config parser for agent

This patch implements a new config parser for the consul agent which
makes the following changes to the previous implementation:

 * add HCL support
 * all configuration fragments in tests and for default config are
   expressed as HCL fragments
 * HCL fragments can be provided on the command line so that they
   can eventually replace the command line flags.
 * HCL/JSON fragments are parsed into a temporary Config structure
   which can be merged using reflection (all values are pointers).
   The existing merge logic of overwrite for values and append
   for slices has been preserved.
 * A single builder process generates a typed runtime configuration
   for the agent.

The new implementation is more strict and fails in the builder process
if no valid runtime configuration can be generated. Therefore,
additional validations in other parts of the code should be removed.

The builder also pre-computes all required network addresses so that no
address/port magic should be required where the configuration is used
and should therefore be removed.

* Upgrade github.com/hashicorp/hcl to support int64

* improve error messages

* fix directory permission test

* Fix rtt test

* Fix ForceLeave test

* Skip performance test for now until we know what to do

* Update github.com/hashicorp/memberlist to update log prefix

* Make memberlist use the default logger

* improve config error handling

* do not fail on non-existing data-dir

* experiment with non-uniform timeouts to get a handle on stalled leader elections

* Run tests for packages separately to eliminate the spurious port conflicts

* refactor private address detection and unify approach for ipv4 and ipv6.

Fixes #2825

* do not allow unix sockets for DNS

* improve bind and advertise addr error handling

* go through builder using test coverage

* minimal update to the docs

* more coverage tests fixed

* more tests

* fix makefile

* cleanup

* fix port conflicts with external port server 'porter'

* stop test server on error

* do not run api test that change global ENV concurrently with the other tests

* Run remaining api tests concurrently

* no need for retry with the port number service

* monkey patch race condition in go-sockaddr until we understand why that fails

* monkey patch hcl decoder race condidtion until we understand why that fails

* monkey patch spurious errors in strings.EqualFold from here

* add test for hcl decoder race condition. Run with go test -parallel 128

* Increase timeout again

* cleanup

* don't log port allocations by default

* use base command arg parsing to format help output properly

* handle -dc deprecation case in Build

* switch autopilot.max_trailing_logs to int

* remove duplicate test case

* remove unused methods

* remove comments about flag/config value inconsistencies

* switch got and want around since the error message was misleading.

* Removes a stray debug log.

* Removes a stray newline in imports.

* Fixes TestACL_Version8.

* Runs go fmt.

* Adds a default case for unknown address types.

* Reoders and reformats some imports.

* Adds some comments and fixes typos.

* Reorders imports.

* add unix socket support for dns later

* drop all deprecated flags and arguments

* fix wrong field name

* remove stray node-id file

* drop unnecessary patch section in test

* drop duplicate test

* add test for LeaveOnTerm and SkipLeaveOnInt in client mode

* drop "bla" and add clarifying comment for the test

* split up tests to support enterprise/non-enterprise tests

* drop raft multiplier and derive values during build phase

* sanitize runtime config reflectively and add test

* detect invalid config fields

* fix tests with invalid config fields

* use different values for wan sanitiziation test

* drop recursor in favor of recursors

* allow dns_config.udp_answer_limit to be zero

* make sure tests run on machines with multiple ips

* Fix failing tests in a few more places by providing a bind address in the test

* Gets rid of skipped TestAgent_CheckPerformanceSettings and adds case for builder.

* Add porter to server_test.go to make tests there less flaky

* go fmt
2017-09-25 11:40:42 -07:00
James Phillips d84c0b1a01
Robustifies check in TestCatalog_ListNodes_ConsistentRead_Fail test.
Fixes #3469
2017-09-13 21:22:53 -07:00
James Phillips 828be5771a
Revert "Manages segments list via a pointer."
This reverts commit c277a42504.
2017-09-07 16:37:11 -07:00
James Phillips c277a42504
Manages segments list via a pointer. 2017-09-07 16:21:07 -07:00
James Phillips 96a89a3381
Cleans up formatting. 2017-09-07 12:26:58 -07:00
James Phillips 00605c0214
Shows the segment name in the keyring API and command output. 2017-09-07 12:17:39 -07:00
James Phillips 88a150cee1
Moves reconcile loop into segment stub. 2017-09-06 18:01:53 -07:00
James Phillips 5c03cb571d
Takes the skip out of the client check.
Without this the merge delegate won't check the segment for non-servers
a little below here.
2017-09-06 17:05:40 -07:00
James Phillips 3418c7ff93 Merge pull request #3447 from hashicorp/issue-3070
Skips unique node ID check for old versions of Consul.
2017-09-06 13:24:15 -07:00
James Phillips 520060e138
Fixes incorrect comment. 2017-09-06 13:23:19 -07:00
James Phillips 084679ab65
Pulls down some code for the check loop. 2017-09-06 13:07:42 -07:00
James Phillips 3535652595
Uses the Raft configuration for the self-add skip check. 2017-09-06 13:05:51 -07:00
Preetha Appan 5f2e1c9b07 Change member join reconcile step to process joining itself, to handle node IP address changes correctly when number of servers < 3 2017-09-06 13:53:01 -05:00
James Phillips 1333fa57a1
Skips unique node ID check for old versions of Consul.
Fixes #3070.
2017-09-05 22:57:29 -07:00
James Phillips 1a117ba0a8
Makes the all segments query explict, and the default for `consul members`. 2017-09-05 12:22:20 -07:00
James Phillips 9258506dab Adds simple rate limiting for client agent RPC calls to Consul servers. (#3440)
* Added rate limiting for agent RPC calls.
* Initializes the rate limiter based on the config.
* Adds the rate limiter into the snapshot RPC path.
* Adds unit tests for the RPC rate limiter.
* Groups the RPC limit parameters under "limits" in the config.
* Adds some documentation about the RPC limiter.
* Sends a 429 response when the rate limiter kicks in.
* Adds docs for new telemetry.
* Makes snapshot telemetry look like RPC telemetry and cleans up comments.
2017-09-01 15:02:50 -07:00
Kyle Havlovitz 220db48aa7 Merge pull request #3431 from hashicorp/network-segments-oss 2017-09-01 10:24:58 -07:00
Kyle Havlovitz 0e33e2ecab
Pass listeners into setupSegments 2017-08-31 17:56:43 -07:00
Kyle Havlovitz 62102a537e
Organize segments for a cleaner split between enterprise and OSS 2017-08-31 17:39:46 -07:00
Kyle Havlovitz 7e565d7338
Fix some inconsistencies with segment logic and comments 2017-08-30 17:43:46 -07:00
Preetha Appan 2386214655 Wire server provider for raft layer only on protocol version 3 and above, and update changelog 2017-08-30 14:36:47 -05:00
Kyle Havlovitz 14b027a3c2
Add segment addr field to tags for LAN flood joiner 2017-08-30 11:58:29 -07:00
Kyle Havlovitz d129767657
Add agent.segment interpolation to prepared queries 2017-08-30 11:58:29 -07:00
Kyle Havlovitz 2ada0439d4
Add rpc_listener option to segment config 2017-08-30 11:58:29 -07:00
James Phillips b1a15e0c3d
Adds open source side of network segments (feature is Enterprise-only). 2017-08-30 11:58:29 -07:00
Preetha Appan a231eea0e7 More cleanup from code review 2017-08-30 12:31:36 -05:00
Preetha Appan c6ee9bfa69 Remove copy pasted duplicate line, update documentation. 2017-08-30 10:02:10 -05:00
Preetha Appan 0f4e24f72c Consolidate server lookup into one place and replace usages of localConsuls. 2017-08-30 09:30:33 -05:00
Preetha Appan e639154abd Remove stray commented line 2017-08-30 09:30:33 -05:00
Preetha Appan 00836a6aab Remove server address tracking logic from manager/router and maintain it as part of lan event listener instead. Used sync.Map to track this, and added unit tests 2017-08-30 09:30:33 -05:00
Preetha Appan 830aca958a ServerAddressProvider interface also returns an error now 2017-08-30 09:30:33 -05:00
Preetha Appan c68fce89b5 Use config struct to create NetworkTransport layer when setting up raft 2017-08-30 09:30:33 -05:00
Preetha Appan 393ce1581b Implement AddressProvider and wire that up to raft transport layer to support server nodes changing their IP addresses in containerized environments 2017-08-30 09:30:33 -05:00
Frank Schroeder 831d84c940 build: make tests independent of build tags
When the metadata server is scanning the agents for potential servers
it is parsing the version number which the agent provided when it
joined. This version number has to conform to a certain format, i.e.
'n.n.n'. Without this version number properly set some tests fail with
error messages that disguise the root cause.

The default version number is currently set to 'unknown' in
version/version.go which does not parse and triggers the tests to fail.
The work around is to use a build tag 'consul' which will use the
version number set in version_base.go instead which has the correct
format and is set to the current release version.

In addition, some parts of the code also require the version number to
be of a certain value. Setting it to '0.0.0' for example makes some
tests pass and others fail since they don't pass the semantic check.

When using go build/install/test one has to remember to use '-tags
consul' or tests will fail with non-obvious error messages.

Using build tags makes the build process more complex and error prone
since it prevents the use of the plain go toolchain and - at least in
its current form - introduces subtle build and test issues. We should
try to eliminate build tags for anything else but platform specific
code.

This patch removes all references to specific version numbers in the
code and tests and sets the default version to '9.9.9' which is
syntactically correct and passes the semantic check. This solves the
issue of running go build/install/test without tags for the OSS build.
2017-08-30 13:40:18 +02:00
Frank Schröder a3934c263c acl: consolidate error handling (#3401)
The error handling of the ACL code relies on the presence of certain
magic error messages. Since the error values are sent via RPC between
older and newer consul agents we cannot just replace the magic values
with typed errors and switch to type checks since this would break
compatibility with older clients.

Therefore, this patch moves all magic ACL error messages into the acl
package and provides default error values and helper functions which
determine the type of error.
2017-08-23 16:52:48 +02:00
Frank Schroeder 16c58da27d agent: drop unused code
This code from http://github.com/hashicorp/consul/pull/3353 is no longer
required.
2017-08-22 00:02:46 +02:00
James Phillips e8a83bb463 Revert "Return 403 rather than a 404 when acls cause all results to be filter…" 2017-08-09 15:06:57 -07:00
James Phillips 02a87df044 Revert "Ensure that we return a permission denied only if the list of keys/en…" 2017-08-09 15:06:20 -07:00
Preetha Appan 42fb49c00b Added unit test case to kvs_endpointtest 2017-08-09 15:50:22 -05:00
Preetha Appan 3276891142 Ensure that we return a permission denied only if the list of keys/entries prior to filtering by ACL is non empty 2017-08-09 15:32:18 -05:00
Frank Schroeder 7cff50a4df
agent: move agent/consul/agent to agent/metadata 2017-08-09 14:36:52 +02:00
Frank Schroeder c395599cea
agent: move agent/consul/servers to agent/router 2017-08-09 14:36:37 +02:00
Frank Schroeder 1acff3533e
agent: move agent/consul/structs to agent/structs 2017-08-09 14:32:12 +02:00
Kyle Havlovitz cf02e3bc22 Merge pull request #3369 from hashicorp/metrics-enhancements
Add support for labels/filters from go-metrics
2017-08-08 13:55:30 -07:00
Kyle Havlovitz d5634fe2a8
Add support for labels/filters from go-metrics 2017-08-08 01:45:10 -07:00
Preetha Appan 37f75a393e
Use sanitized version of node name of server in NS record, and start with "server" rather than "ns" 2017-08-07 11:11:55 +02:00
Preetha Appan 794d1afe44
Removed a copy pasted irrelevant comment, and other code review feedback 2017-08-07 11:11:54 +02:00
Preetha Appan f9db387097
Add NS records and A records for each server. Constructs ns host names using the advertise address of the server. 2017-08-07 11:11:54 +02:00
James Phillips 4bee2e49f5 Adds secure introduction for the ACL replication token. (#3357)
Adds secure introduction for the ACL replication token, as well as a separate enable config for ACL replication.
2017-08-03 15:39:31 -07:00
James Phillips c0a5ad7903 Adds a new /v1/acl/bootstrap API (#3349) 2017-08-02 17:05:18 -07:00
Preetha Appan 4076c0d741 Return nil instead of empty list when returning a PermissionDenied error, updated unit test 2017-07-31 17:23:20 -05:00
Preetha Appan 6336014a86 Return 403 rather than a 404 when acls cause all results to be filtered out. This fixes #2637 2017-07-31 13:50:29 -05:00
James Phillips 10b660d77a Adds missing autopilot snapshot test and avoids snapshotting nil. (#3333) 2017-07-28 15:48:42 -07:00
James Phillips 6250cd70f5 Adds option to prepared queries to remove empty tags. (#3330) 2017-07-26 22:46:43 -07:00
James Phillips 496b0bcf07 Adds support for agent-side ACL token management via API instead of config files. (#3324)
* Adds token store and removes all runtime use of config for ACL tokens.
* Adds a new API for changing agent tokens on the fly.
2017-07-26 11:03:43 -07:00
Preetha Appan b94617b281 Add extra test case for deleting entire tree with empty prefix 2017-07-26 09:42:07 -05:00
Preetha Appan 4498814843 Don't insert tombstone for empty prefix delete. Other minor unit test fixes 2017-07-25 21:54:11 -05:00
Preetha Appan fee418d378 Removed redundant comments and unit test 2017-07-25 20:39:33 -05:00
Preetha Appan b772c477c2 Removed redundant call to reap tombstone from unit test 2017-07-25 19:39:05 -05:00
Preetha Appan ae443e21d6 Improved unit test per code review 2017-07-25 19:17:40 -05:00
Preetha Appan 36acf8d6a4 Use new DeletePrefixMethod for implementing KVSDeleteTree operation. This makes deletes on sub trees larger than one million nodes about 100 times faster. Added unit tests. 2017-07-25 17:21:18 -05:00
Frank Schroeder 0047b7d3f0
fix spelling in filenames
Fixes #3301
2017-07-19 13:16:38 +02:00
Kyle Havlovitz 19eae3d14b
Add UpgradeVersionTag to autopilot config 2017-07-18 13:35:41 -07:00
James Phillips 1791d99a10 Adds new config to make script checks opt-in, updates documentation. (#3284) 2017-07-17 11:20:35 -07:00
Kyle Havlovitz 78c3a86405
Add TLS setting to router areas 2017-07-14 17:38:08 -07:00
James Phillips 0881e46111 Cleans up version 8 ACLs in the agent and the docs. (#3248)
* Moves magic check and service constants into shared structs package.

* Removes the "consul" service from local state.

Since this service is added by the leader, it doesn't really make sense to
also keep it in local state (which requires special ACLs to configure), and
requires a bunch of special cases in the local state logic. This requires
fewer special cases and makes ACL bootstrapping cleaner.

* Makes coordinate update ACL log message a warning, similar to other AE warnings.

* Adds much more detailed examples for bootstrapping ACLs.

This can hopefully replace https://gist.github.com/slackpad/d89ce0e1cc0802c3c4f2d84932fa3234.
2017-07-13 22:33:47 -07:00
Frank Schroeder 1781fd311f address review comments 2017-07-07 09:22:34 +02:00
Frank Schroeder e4b40acc7e agent: remove unused code 2017-07-07 09:22:34 +02:00
Frank Schroeder 8c792ad57d agent: make TestClient_RPC_ConsulServerPing more robust 2017-07-07 09:22:34 +02:00
James Phillips a855d31f84 Adds a comment about flood joining. 2017-07-07 09:22:34 +02:00
James Phillips 5b5217528a Simplifies Serf dynamic port selection code.
This isn't racy, it's just a little dirty. The listen will happen and a port
will be selected and injected into the config once the Serf instance is
created, so we don't need the retry loop here.
2017-07-07 09:22:34 +02:00
James Phillips d8db4bc086 test: Changes WAN/LAN join confirmer to use port number vs. address.
This fixes TestServer_JoinSeparateLanAndWanAddresses which sets bogus
advertise addresses as part of the test. Port numbers uniquely identify
members since everything is running on localhost.
2017-07-07 09:22:34 +02:00
Frank Schroeder d92f70f313 test: make joinLAN/WAN reliable
only return if the members can see each other
2017-07-07 09:22:34 +02:00
Frank Schroeder 112bc19cd5 rpc: make TestServer_JoinSeparateLanAndWanAddresses more robust 2017-07-07 09:22:34 +02:00
Frank Schroeder ffd45f5da5 rpc: make TestClient_SnapshotRPC_TLS more robust 2017-07-07 09:22:34 +02:00
Frank Schroeder 2159d499e3 rpc: try shutting down leader first to avoid hang in TestLeader_LeftServer 2017-07-07 09:22:34 +02:00
Frank Schroeder f12fac278e rpc: fix logging and try quicker timing of TestServer_JoinSeparateLanAndWanAddresses 2017-07-07 09:22:34 +02:00
Frank Schroeder bae4b1d045 rpc: less agressive raft timeouts
Allowing more time for raft to consolidate should
drop the number of leader elections.
2017-07-07 09:22:34 +02:00
Frank Schroeder 457b98a099 rpc: run agent/consul tests in parallel 2017-07-07 09:22:34 +02:00
Frank Schroeder 13eeeb720d rpc: refactor sessionTimers and fix racy tests
The sessionTimers map was secured by a lock which wasn't used
properly in the tests. This lead to data races and failing tests
when accessing the length or the members of the map.

This patch adds a separate SessionTimers struct which is safe
for concurrent use and which ecapsulates the behavior of the
sessionTimers map.
2017-07-07 09:22:34 +02:00
Frank Schroeder 05f756853e rpc: fix TestServer_Leave
wait for the leader election.
2017-07-07 09:22:34 +02:00
Frank Schroeder 583959392b rpc: fix TestSession_Renew
make the timing less tight
2017-07-07 09:22:34 +02:00
Frank Schroeder ff2c29c0be rpc: fix TestReadyForConsistentRead
timing was too tight. Standardized name.
2017-07-07 09:22:34 +02:00
Frank Schroeder fcab525053 rpc: fix for 'no leader' in TLS tests
Ensure both servers know about each other before looking
for a leader.
2017-07-07 09:22:34 +02:00
Frank Schroeder b2a71fd8b0 rpc: fix TestServer_JoinWAN_Flood
The second server in the first data center should not be
in bootstrap mode.
2017-07-07 09:22:34 +02:00
Frank Schroeder 8369b6cb9d rpc: provide unique node names for server and client 2017-07-07 09:22:34 +02:00
Frank Schroeder 534977239b rpc: prefix log output with test name 2017-07-07 09:22:34 +02:00
Frank Schroeder c8ef588d8d rpc: discover serf wan port before starting serf lan
When using dynamic ports for the serf clusters then
the actual bind port of the serf WAN cluster needs to
be discovered before the serf LAN cluster is started
since the serf LAN cluster announces the port of the WAN
cluster.
2017-07-07 09:22:34 +02:00
Frank Schroeder 53eab7e970 rpc: bind rpc test server to port 0 2017-07-07 09:22:34 +02:00
Frank Schroeder e9e2c599db rpc: refactor: unify test server setup 2017-07-07 09:22:34 +02:00
Frank Schroeder c803146550 rpc: fix typos 2017-07-07 09:22:34 +02:00
Frank Schroeder a0368e3827 agent: refactor: log to stderr during tests 2017-07-07 09:22:34 +02:00
Preetha Appan f549c06764 Rename to raftNotifyCh, fix typo 2017-07-06 09:10:36 -05:00
Preetha Appan f2171a6720 Fixes deadlock between barrier write and leader notify channel read . Fixes #3230 2017-07-05 17:09:18 -05:00
James Phillips e4b11682bc Fixes broken HTTP header and method for health checks. (#3178)
* Fixes broken HTTP header and method for health checks.
* Adds a fuzz utility and test to make sure copy is complete.
2017-06-23 01:15:48 -07:00
Frank Schroeder 2b41f2e3a3 agent: make the RPC endpoint overwrite mechanism more transparent
This patch hides the RPC handler overwrite mechanism from the
rest of the code so that it works in all cases and that there
is no cooperation required from the tested code, i.e. we can
drop a.getEndpoint().
2017-06-21 05:42:39 +02:00
Frank Schroeder c49a15d0f3 agent: move structs into consul/structs pkg
* CheckDefinition
 * ServiceDefinition
 * CheckType
2017-06-21 05:42:39 +02:00
Frank Schroeder 4273fb8444 agent: move NotifyGroup into the agent pkg 2017-06-21 05:42:39 +02:00
Frank Schroeder 82a132da60 agent: move conn pool for muxed connections into separate pkg 2017-06-21 05:42:39 +02:00
Frank Schroeder 80971c8a85 agent: move the SnapshotReplyFn out of the way
When splitting up the consul package into server and client
the SnapshotReplyFn needs to be in a separate package to avoid
a circular dependency.
2017-06-21 05:42:39 +02:00
Frank Schroeder 04b9392b00 agent: use the delegate interface for local state 2017-06-21 05:42:39 +02:00
Preetha Appan f658231ab9 Minor fixes per code review 2017-06-20 19:43:07 -05:00
Preetha Appan b3b2e9dcb4 Added unit test to verify consistentRead method behavior 2017-06-16 11:58:12 -05:00
Preetha Appan 44f5086873 Code review feedback, fixed major logic bug 2017-06-16 10:49:54 -05:00
Preetha Appan 72af7b9bc4 Redo bug fix for stale reads on server startup, leveraging RPCHOldtimeout instead of maxQueryTime, plus tests 2017-06-15 22:41:30 -05:00
Frank Schroeder 1c75cf1af5 pkg refactor
command/agent/*                  -> agent/*
    command/consul/*                 -> agent/consul/*
    command/agent/command{,_test}.go -> command/agent{,_test}.go
    command/base/command.go          -> command/base.go
    command/base/*                   -> command/*
    commands.go                      -> command/commands.go

The script which did the refactor is:

(
	cd $GOPATH/src/github.com/hashicorp/consul
	git mv command/agent/command.go command/agent.go
	git mv command/agent/command_test.go command/agent_test.go
	git mv command/agent/flag_slice_value{,_test}.go command/
	git mv command/agent .
	git mv command/base/command.go command/base.go
	git mv command/base/config_util{,_test}.go command/
	git mv commands.go command/
	git mv consul agent
	rmdir command/base/

	gsed -i -e 's|package agent|package command|' command/agent{,_test}.go
	gsed -i -e 's|package agent|package command|' command/flag_slice_value{,_test}.go
	gsed -i -e 's|package base|package command|' command/base.go command/config_util{,_test}.go
	gsed -i -e 's|package main|package command|' command/commands.go

	gsed -i -e 's|base.Command|BaseCommand|' command/commands.go
	gsed -i -e 's|agent.Command|AgentCommand|' command/commands.go
	gsed -i -e 's|\tCommand:|\tBaseCommand:|' command/commands.go
	gsed -i -e 's|base\.||' command/commands.go
	gsed -i -e 's|command\.||' command/commands.go

	gsed -i -e 's|command|c|' main.go
	gsed -i -e 's|range Commands|range command.Commands|' main.go
	gsed -i -e 's|Commands: Commands|Commands: command.Commands|' main.go

	gsed -i -e 's|base\.BoolValue|BoolValue|' command/operator_autopilot_set.go
	gsed -i -e 's|base\.DurationValue|DurationValue|' command/operator_autopilot_set.go
	gsed -i -e 's|base\.StringValue|StringValue|' command/operator_autopilot_set.go
	gsed -i -e 's|base\.UintValue|UintValue|' command/operator_autopilot_set.go

	gsed -i -e 's|\bCommand\b|BaseCommand|' command/base.go
	gsed -i -e 's|BaseCommand Options|Command Options|' command/base.go
	gsed -i -e 's|base.Command|BaseCommand|' command/*.go
	gsed -i -e 's|c\.Command|c.BaseCommand|g' command/*.go
	gsed -i -e 's|\tCommand:|\tBaseCommand:|' command/*_test.go
	gsed -i -e 's|base\.||' command/*_test.go

	gsed -i -e 's|\bCommand\b|AgentCommand|' command/agent{,_test}.go
	gsed -i -e 's|cmd.AgentCommand|cmd.BaseCommand|' command/agent.go

	gsed -i -e 's|cli.AgentCommand = new(Command)|cli.Command = new(AgentCommand)|' command/agent_test.go
	gsed -i -e 's|exec.AgentCommand|exec.Command|' command/agent_test.go
	gsed -i -e 's|exec.BaseCommand|exec.Command|' command/agent_test.go
	gsed -i -e 's|NewTestAgent|agent.NewTestAgent|' command/agent_test.go
	gsed -i -e 's|= TestConfig|= agent.TestConfig|' command/agent_test.go
	gsed -i -e 's|: RetryJoin|: agent.RetryJoin|' command/agent_test.go

	gsed -i -e 's|\.\./\.\./|../|' command/config_util_test.go

	gsed -i -e 's|\bverifyUniqueListeners|VerifyUniqueListeners|' agent/config{,_test}.go command/agent.go
	gsed -i -e 's|\bserfLANKeyring\b|SerfLANKeyring|g' agent/{agent,keyring,testagent}.go command/agent.go
	gsed -i -e 's|\bserfWANKeyring\b|SerfWANKeyring|g' agent/{agent,keyring,testagent}.go command/agent.go
	gsed -i -e 's|\bNewAgent\b|agent.New|g' command/agent{,_test}.go
	gsed -i -e 's|\bNewAgent|New|' agent/{acl_test,agent,testagent}.go

	gsed -i -e 's|\bAgent\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bBool\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bConfig\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bDefaultConfig\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bDevConfig\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bMergeConfig\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bReadConfigPaths\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bParseMetaPair\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bSerfLANKeyring\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bSerfWANKeyring\b|agent.&|g' command/agent{,_test}.go

	gsed -i -e 's|circonus\.agent|circonus|g' command/agent{,_test}.go
	gsed -i -e 's|logger\.agent|logger|g' command/agent{,_test}.go
	gsed -i -e 's|metrics\.agent|metrics|g' command/agent{,_test}.go
	gsed -i -e 's|// agent.Agent|// agent|' command/agent{,_test}.go
	gsed -i -e 's|a\.agent\.Config|a.Config|' command/agent{,_test}.go

	gsed -i -e 's|agent\.AppendSliceValue|AppendSliceValue|' command/{configtest,validate}.go

	gsed -i -e 's|consul/consul|agent/consul|' GNUmakefile

	gsed -i -e 's|\.\./test|../../test|' agent/consul/server_test.go

	# fix imports
	f=$(grep -rl 'github.com/hashicorp/consul/command/agent' * | grep '\.go')
	gsed -i -e 's|github.com/hashicorp/consul/command/agent|github.com/hashicorp/consul/agent|' $f
	goimports -w $f

	f=$(grep -rl 'github.com/hashicorp/consul/consul' * | grep '\.go')
	gsed -i -e 's|github.com/hashicorp/consul/consul|github.com/hashicorp/consul/agent/consul|' $f
	goimports -w $f

	goimports -w command/*.go main.go
)
2017-06-10 18:52:45 +02:00