Commit Graph

1265 Commits

Author SHA1 Message Date
Kyle Havlovitz b333f3ea04
Only count healthy voters for FailureTolerance 2017-03-16 12:19:16 -07:00
Kyle Havlovitz 07288a20a1
Tweak last_contact health logic for leader 2017-03-15 19:57:54 -07:00
Kyle Havlovitz 5353221666
Reorganized cluster health check loop and logic 2017-03-15 18:27:17 -07:00
Kyle Havlovitz 66ef4a006b
Add tests for servers changing address/ID 2017-03-15 16:50:42 -07:00
Kyle Havlovitz 51b11cd344
Fix an issue with changing server IDs and add a few UX enhancements around autopilot features 2017-03-15 16:09:55 -07:00
Kyle Havlovitz c936fe38da
Add autopilot guide to the docs 2017-03-10 14:55:18 -08:00
Kyle Havlovitz e119240fdf Merge pull request #2788 from hashicorp/f-autopilot-2
Autopilot server health monitoring
2017-03-10 12:29:45 -08:00
Kyle Havlovitz 7608a3c15f
Use defers for WaitGroup and Ticker stop 2017-03-10 12:29:03 -08:00
Kyle Havlovitz 9b4497de09
Cleaned up and reorganized some autopilot-related code 2017-03-09 18:21:40 -08:00
James Phillips 9244360af9
Adds token to deregister request when reconciling.
Fixes #2792.
2017-03-09 09:25:42 -08:00
Kyle Havlovitz ab0e412db4
Add AutopilotPolicy interface and BasicAutopilot 2017-03-08 12:26:58 -08:00
Kyle Havlovitz c3d638e2c5
Move RaftStats to Status endpoint 2017-03-07 13:58:06 -08:00
Kyle Havlovitz 2eefe3ca5b
Add autopilot server health tracking
This adds two goroutines to perform autopilot tasks on the leader - one
to monitor the health of servers and another to periodically clean up
dead servers with a limit on removal count. Also adds a new http endpoint,
`/v1/operator/autopilot/health`, for querying this information through an
operator RPC endpoint.
2017-03-06 16:00:10 -08:00
Kyle Havlovitz 92c8b9c3a0
Rename DeadServerCleanup and make wording adjustments 2017-02-28 14:45:21 -08:00
Kyle Havlovitz 5429e8ce66
Add cli docs and minor test/comment tweaks 2017-02-24 16:55:44 -08:00
Kyle Havlovitz 2ea36d7bd4
Merge branch 'master' into f-autopilot 2017-02-24 15:55:18 -08:00
Kyle Havlovitz c2e7f45002
Add CAS capability to autopilot config endpoint 2017-02-24 13:08:49 -08:00
James Phillips f5fe659497 Reserves an RPC selector byte for Consul Enterprise. 2017-02-24 09:54:33 -08:00
Kyle Havlovitz 81c7a0299e
Add state store table and endpoints for autopilot 2017-02-23 20:32:13 -08:00
Kyle Havlovitz 950a9d2212
Move raft_protocol out of autopilot config 2017-02-23 13:08:40 -08:00
Kyle Havlovitz b20fd222f6
Add raft version 2/3 compatibility 2017-02-22 12:53:32 -08:00
Kyle Havlovitz 985d232522
Add configurable cleanup of dead servers when a new server joins 2017-02-17 10:49:16 -08:00
Sean Chittenden b94aac3a2c
Round the node lookup prefix down to the nearest modulo two size before
performing the lookup.

Hat tip: @dadgar
2017-02-02 12:13:58 -08:00
Sean Chittenden fe49c0a0ab
Reduce the size of the UUID Lookup Length restriction from `8` to `2`.
I'm torn on this.  It's useful from a UX perspective for an operator to
be able to type in something that's short.  At the same time, by
enforcing an `8` character length, we reduced the probability of a user
depending on the behavior and having it suddenly stop working in the
future when a duplicate prefix is injected into the environment.
2017-02-02 12:12:18 -08:00
Kyle Havlovitz 5b8049075e Merge pull request #2704 from hashicorp/f-relay-query-responses
Add relay-factor arg to keyring operations
2017-02-02 12:15:19 -05:00
Kyle Havlovitz 5d888f5303
Added -relay-factor param to keyring operations 2017-02-01 21:53:29 -05:00
James Phillips 15e2e64c00
Adds a test for node UUID or name lookups. 2017-02-01 16:41:44 -08:00
Sean Chittenden b764606844 Merge pull request #2702 from hashicorp/f-dns-nodeid
DNS lookup by Consul node ID
2017-02-01 16:23:18 -08:00
Sean Chittenden c16f33402a
Treat a uuid prefix lookup error as a soft error, as if a node name
lookup returned nil.

Add a TODO to note where a future point of logging should occur once a
logger is present and a few additional comments to explain the program
flow.
2017-02-01 16:09:25 -08:00
Sean Chittenden 62527c1698
Treat a uuid prefix lookup error as a soft error, as if a node name lookup returned nil.
Add a TODO to note where a future point of logging should occur once a
logger is present.
2017-02-01 15:51:25 -08:00
Sean Chittenden 295ca817f3
Run a test of `NodeServices()` with a NodeID as an argument. 2017-02-01 15:41:10 -08:00
Sean Chittenden 854f2b2bfc
Whoops. Return an empty set in the event that there are multiple matches. 2017-02-01 15:18:00 -08:00
Sean Chittenden e86cefe640
Rename `nodeName` to `nodeNameOrID`. 2017-02-01 14:59:24 -08:00
Sean Chittenden 13fb395010
Toggle `AllowMissing` to false to accommodate old clients without Node IDs. 2017-02-01 14:58:34 -08:00
Sean Chittenden f3f3f73e6d
Enable looking up consul nodes by their node ID.
Assuming the following output from a consul agent:

```
==> Consul agent running!
           Version: 'v0.7.3-43-gc5e140c-dev (c5e140c+CHANGES)'
           Node ID: '40e4a748-2192-161a-0510-9bf59fe950b5'
         Node name: 'myhost'
```

it is now possible to lookup nodes by their Node Name or Node ID, or a
prefix match of the Node ID, with the following caveats re: the prefix
match:

1) first eight digits of the Node ID are a required minimum (eight was
   chosen as an arbitrary number)
2) the length of the Node ID must be an even number or no result will be
   returned.

```
% dig @127.0.0.1 -p 8600 myhost.node.dc1.consul.
myhost.node.dc1.consul.	0	IN	A	127.0.0.1
% dig @127.0.0.1 -p 8600 40e4a748-2192-161a-0510-9bf59fe950b5.node.dc1.consul.
40e4a748-2192-161a-0510-9bf59fe950b5.node.dc1.consul. 0	IN A 127.0.0.1
% dig @127.0.0.1 -p 8600 40e4a748.node.dc1.consul.
40e4a748.node.dc1.consul. 0	IN	A	127.0.0.1
% dig @127.0.0.1 -p 8600 40e4a74821.node.dc1.consul.
40e4a74821.node.dc1.consul. 0	IN	A	127.0.0.1
% dig @127.0.0.1 -p 8600 40e4a748-21.node.dc1.consul.
40e4a748-21.node.dc1.consul. 0	IN	A	127.0.0.1
```
2017-02-01 14:46:25 -08:00
Kyle Havlovitz 07ba3ddb6e
Add TLSMinVersion to config options 2017-02-01 16:20:33 -05:00
Sean Chittenden c5e140c79d
Small premature optimization in `isUUID()`.
If the length isn't `36`, return `false` immediately before firing up
the regexp engine.
2017-02-01 11:00:06 -08:00
James Phillips f11817dcdf
Tweaks leader test now that we have new wait timing. 2017-01-25 22:12:22 -08:00
James Phillips 0c93ff1d13
Keeps the old state store state if a restore fails. 2017-01-25 19:42:34 -08:00
James Phillips c370d4ff29
Bails out of blocking queries when a state restore occurs. 2017-01-25 19:00:32 -08:00
James Phillips b787aa17ab
Tweaks a few comments. 2017-01-25 09:58:23 -08:00
James Phillips 75f2aa8588
Pass state store pointer into the blocking query work function.
Previously the blocking functions all closed over the state store from
their first query, with would not have worked properly when a restore
occurred. This makes sure they get a frest state store pointer each time,
and that pointer is synchronized with the abandon watch.
2017-01-25 09:58:23 -08:00
James Phillips d97c3c6c18
Guts all the old blocking query code. 2017-01-25 09:58:23 -08:00
James Phillips 7da2f513dc
Cuts KVS endpoints over to new fine-grained watch plumbing. 2017-01-25 09:58:22 -08:00
James Phillips 68e90d0f24
Adds a facility to notify when restores occur. 2017-01-25 09:58:22 -08:00
James Phillips 1d39ddbd4b
Adds fine-grained watches to session endpoints. 2017-01-25 09:58:22 -08:00
James Phillips 8b7977ccb3
Adds fine-grained watches to prepared query endpoints. 2017-01-25 09:58:22 -08:00
James Phillips dfcffe097c
Adds fine-grained watches to internal endpoints. 2017-01-25 09:58:22 -08:00
James Phillips 3675e5ceba
Adds fine-grained watches to coordinate endpoints. 2017-01-25 09:58:22 -08:00
James Phillips ec90404df0
Adds fine-grained watch support to ACL endpoints. 2017-01-25 09:58:22 -08:00
James Phillips dcb55c766b
Adds fine-grained watches to health endpoints. 2017-01-25 09:58:22 -08:00
James Phillips e4b88324b3
Fixes a race condition when updating the state store during a snapshot restore. 2017-01-25 09:58:22 -08:00
James Phillips b7b42d718a
Adds fine-grained watches to catalog endpoints. 2017-01-25 09:58:22 -08:00
James Phillips f2d9da270d
Adds diff check for node and service parts of register requests.
We always did an update before which caused excessive watch churn, even
with our new fine-grained queries. This does a diff any only updates the
node and service records if something actually changed.
2017-01-25 09:58:22 -08:00
James Phillips fefed7c803
Don't do any watch tracking for non-blocking queries. 2017-01-25 09:58:22 -08:00
James Phillips 05735e39a5
Removes some incorrect comments.
We can't actually return a fine-grained index from these tables unless
support is added for tombstones. Otherwise, the index could slip backwards
as things are deleted.
2017-01-25 09:58:22 -08:00
James Phillips b21625a6af
Adds new variant of blocking query wrapper with WatchSet support. 2017-01-25 09:58:22 -08:00
Kyle Havlovitz bbfd25b530
Fix test import 2017-01-23 21:34:01 -05:00
Kyle Havlovitz a55968f009
Merge branch 'master' into f-prepared-query-nodemeta 2017-01-23 20:17:48 -05:00
Kyle Havlovitz 3f3d7f9891
Add tests for node meta in prepared queries and update docs 2017-01-23 19:17:30 -05:00
James Phillips acbc2289a3 Merge pull request #2661 from hashicorp/f-node-id
Adds basic support for node IDs.
2017-01-18 16:02:38 -08:00
James Phillips bcf770f811
Uses clean replies each time so they are safe to receive map changes. 2017-01-18 15:40:19 -08:00
James Phillips 32be684fdf
Fixes a startup ordering issue between Raft and Serf.
This fixes #2663 and fixes #1899. It's not super related to this PR,
but the startup time changes that this PR brings made this a lot worse
so I was able to track it down.
2017-01-18 15:06:15 -08:00
James Phillips 6ca0173907
Adds catalog support for node IDs. 2017-01-18 14:26:42 -08:00
Kyle Havlovitz 4e8c0fca63
First pass at adding node meta filter to prepared queries 2017-01-18 16:23:33 -05:00
James Phillips bd605e330c
Adds basic support for node IDs. 2017-01-17 22:47:59 -08:00
Kyle Havlovitz f48f105949
Minor formatting tweaks as a follow-up to #2654 2017-01-17 19:20:29 -05:00
Kyle Havlovitz 9e696220a8
Add support for multiple metadata filters to remaining endpoints
Enabled multiple meta filters for /v1/catalog/nodes and /v1/catalog/services
2017-01-13 20:49:13 -05:00
Kyle Havlovitz 5acd69b4fc
Add node metadata filtering to remaining health/catalog endpoints 2017-01-13 20:08:43 -05:00
James Phillips 1330499707
Breaks up the state store into several files. 2017-01-13 11:47:16 -08:00
Kyle Havlovitz 8ef366ec49
Fix inconsistency in TestStateStore_ServicesByNodeMeta 2017-01-12 19:46:58 -05:00
Kyle Havlovitz 7f91cd12f4
Add -node-meta to agent command line options 2017-01-11 16:09:04 -05:00
Kyle Havlovitz c9e430c396
Validate metadata config earlier and handle multiple filters 2017-01-11 15:12:03 -05:00
Kyle Havlovitz 03273e4ed2
Fix formatting 2017-01-09 13:49:33 -08:00
Kyle Havlovitz 84504a20fc
Add meta key validations and more tests 2017-01-09 11:21:49 -08:00
Kyle Havlovitz e44bcb9716
Add tests for node metadata functionality 2017-01-05 17:21:56 -08:00
Kyle Havlovitz 52d6fd831e
Add support for setting node metadata fields 2017-01-05 14:10:26 -08:00
James Phillips ca7a243b70
Adds ACL management support to the agent. 2016-12-14 07:07:41 -08:00
nobody 5beda4d5bc fix fmt.Errorf error
missing argument for Errorf("%q"): format reads arg 1, have only 0 args
2016-12-14 18:57:12 +08:00
James Phillips 9853df3b8f Merge pull request #2592 from hashicorp/acl-complete-node-session
Adds complete ACL coverage for nodes and sessions.
2016-12-13 13:55:07 -08:00
James Phillips 8b67991ef7
Adds complete ACL coverage for /v1/session endpoints. 2016-12-12 21:59:22 -08:00
James Phillips 99e810f9c7
Adds complete ACL coverage for /v1/internal/ui/node endpoints. 2016-12-12 18:22:10 -08:00
James Phillips 2404af6f94
Adds complete ACL support for /v1/query/<query id or name>/execute.
This was already supported by previous changes to the ACL filter, so
we just added a test to show it working.
2016-12-12 17:28:06 -08:00
James Phillips 9c785c7022
Fixes implementation of node ACLs for /v1/catalog/node/<node>.
This would return a "permission denied" error, but this changes it to
return the same response as a node that doesn't exist (as was originally
intended and written in the code comments).
2016-12-12 16:53:31 -08:00
James Phillips 35475a66df
Adds full ACL coverage for /v1/health endpoints. 2016-12-12 16:28:52 -08:00
Kyle Havlovitz 13aa0ba11b Merge pull request #2591 from hashicorp/snapshot-interval
Change raft snapshot interval to 5 seconds
2016-12-12 19:15:10 -05:00
James Phillips bcf1ffad99
Adds complete ACL coverage for /v1/coordinate/nodes and Coordinate.Update RPC. 2016-12-12 14:52:27 -08:00
James Phillips 0139bbb963
Adds support for a new "acl_agent_token" which is used for internal
catalog operations.
2016-12-12 14:52:27 -08:00
James Phillips 0ed6b1bb18
Bans anonymous queries that aren't tied to a session.
This gets us coverage of PQ creation under the existing service
policy or the soon-to-be-added session policy.
2016-12-12 14:52:27 -08:00
Kyle Havlovitz c49ea15449
Change raft snapshot interval to 5 seconds 2016-12-12 13:31:42 -05:00
James Phillips 2ace618bf9
Adds complete ACL coverage for /v1/catalog/service/<service>. 2016-12-12 08:34:15 -08:00
James Phillips 8038f21684
Adds complete ACL coverage for /v1/catalog/nodes. 2016-12-10 16:49:19 -08:00
James Phillips 9d6567ff08
Adds complete ACL coverage for /v1/catalog/node/<node>. 2016-12-10 16:49:19 -08:00
James Phillips 9b7564490c
Adds complete ACL coverage for /v1/catalog/deregister.
This included some state store helpers to make this more efficient.
2016-12-09 21:04:44 -08:00
James Phillips 800c67c58a
Adds complete ACL coverage for /v1/catalog/register. 2016-12-09 21:04:37 -08:00
James Phillips 66b437ca33
Removes the exception for the "consul" service in the catalog. 2016-12-07 17:58:23 -08:00
Kyle Havlovitz c0dd5b65b6 Add retry with backoff to initial bootstrap checks (#2561) 2016-12-01 17:05:02 -05:00
Seth Vargo 4179aacf11
Add an API method for determining the best status
Given a list of HealthChecks, this determines the "best" status for the
collective group. This is useful for nodes and services, which may have
multiple checks associated with them.
2016-11-29 18:41:46 -05:00
Kyle Havlovitz dd3368c19e Add keyring http endpoints 2016-11-22 20:10:43 -05:00
Kyle Havlovitz 8c157e0acd Retry with backoff on session invalidation failure (#2475) 2016-11-04 21:53:22 -07:00