269 Commits

Author SHA1 Message Date
Kyle Havlovitz
2eefe3ca5b
Add autopilot server health tracking
This adds two goroutines to perform autopilot tasks on the leader - one
to monitor the health of servers and another to periodically clean up
dead servers with a limit on removal count. Also adds a new http endpoint,
`/v1/operator/autopilot/health`, for querying this information through an
operator RPC endpoint.
2017-03-06 16:00:10 -08:00
Kyle Havlovitz
ab6c49ab4c Merge pull request #2771 from hashicorp/f-autopilot
Autopilot dead server cleanup, config, and raft version compatibility
2017-02-28 15:04:16 -08:00
Kyle Havlovitz
92c8b9c3a0
Rename DeadServerCleanup and make wording adjustments 2017-02-28 14:45:21 -08:00
Kyle Havlovitz
9221aed856
Remove the RPC client interface and update docs 2017-02-28 13:41:09 -08:00
Kyle Havlovitz
81c7a0299e
Add state store table and endpoints for autopilot 2017-02-23 20:32:13 -08:00
Kyle Havlovitz
b20fd222f6
Add raft version 2/3 compatibility 2017-02-22 12:53:32 -08:00
Kyle Havlovitz
a533e255ab Merge pull request #2699 from hashicorp/f-tls-min-version
Add TLSMinVersion to config options
2017-02-01 16:31:53 -05:00
Kyle Havlovitz
07ba3ddb6e
Add TLSMinVersion to config options 2017-02-01 16:20:33 -05:00
Sean Chittenden
fd2ae702c9
Re-cherry-pick 71d807f607589f2eb4fea4e83e3876d122c8afc0 and e2320d69b6b155d8223758415aabafc60a0e9d3b. 2017-02-01 10:27:04 -08:00
James Phillips
91f3555dd8 Revert "Adds gopsutil in the loop when trying to make the node ID." 2017-01-31 19:13:49 -08:00
James Phillips
e2320d69b6
Gets rid of a goto. 2017-01-31 19:02:25 -08:00
James Phillips
71d807f607
Adds gopsutil into node ID process and attempts to use host ID, if availabile. 2017-01-31 08:51:33 -08:00
Kyle Havlovitz
a55968f009
Merge branch 'master' into f-prepared-query-nodemeta 2017-01-23 20:17:48 -05:00
Kyle Havlovitz
3f3d7f9891
Add tests for node meta in prepared queries and update docs 2017-01-23 19:17:30 -05:00
James Phillips
bd605e330c
Adds basic support for node IDs. 2017-01-17 22:47:59 -08:00
Kyle Havlovitz
7f91cd12f4
Add -node-meta to agent command line options 2017-01-11 16:09:04 -05:00
Kyle Havlovitz
c9e430c396
Validate metadata config earlier and handle multiple filters 2017-01-11 15:12:03 -05:00
Kyle Havlovitz
03273e4ed2
Fix formatting 2017-01-09 13:49:33 -08:00
Kyle Havlovitz
84504a20fc
Add meta key validations and more tests 2017-01-09 11:21:49 -08:00
Kyle Havlovitz
e44bcb9716
Add tests for node metadata functionality 2017-01-05 17:21:56 -08:00
Kyle Havlovitz
52d6fd831e
Add support for setting node metadata fields 2017-01-05 14:10:26 -08:00
James Phillips
ca7a243b70
Adds ACL management support to the agent. 2016-12-14 07:07:41 -08:00
James Phillips
bcf1ffad99
Adds complete ACL coverage for /v1/coordinate/nodes and Coordinate.Update RPC. 2016-12-12 14:52:27 -08:00
James Phillips
0139bbb963
Adds support for a new "acl_agent_token" which is used for internal
catalog operations.
2016-12-12 14:52:27 -08:00
James Phillips
66b437ca33
Removes the exception for the "consul" service in the catalog. 2016-12-07 17:58:23 -08:00
Sean Chittenden
830125a8b3
Run all known addresses through go-sockaddr/template.
The following is now possible:

```
$ consul agent -dev -client="{{GetPrivateIP}}" -bind='{{GetInterfaceIP "en0"}}'
```
2016-12-02 16:35:38 +11:00
Kyle Havlovitz
bd69c6d871 Add reload/leave http endpoints (#2516) 2016-11-30 13:29:42 -05:00
Seth Vargo
4179aacf11
Add an API method for determining the best status
Given a list of HealthChecks, this determines the "best" status for the
collective group. This is useful for nodes and services, which may have
multiple checks associated with them.
2016-11-29 18:41:46 -05:00
Kyle Havlovitz
338e36cc5d Add logWriter to agent Create() method 2016-11-28 18:36:26 -05:00
Kyle Havlovitz
124f907063 Add monitor http endpoint 2016-11-28 18:36:26 -05:00
Kyle Havlovitz
92ce2c9e39 Use uuids in persist temp files to avoid race (#2494) 2016-11-09 15:22:53 -08:00
Kyle Havlovitz
b1760b223e Improve logging when deregistering a nonexistent service (#2492)
Log a warning instead of a success message when attempting to deregister a nonexistent service. In Consul 0.8 this can be changed to giving an error outright, but for now we can keep the idempotent delete behavior.
2016-11-09 16:56:54 -05:00
Kyle Havlovitz
83d2f36b54 Merge pull request #2480 from hashicorp/b-atomic-writes
Atomic writes for persisting service/check state
2016-11-07 15:36:35 -05:00
Kyle Havlovitz
7a3e0f8275
Add a note about not calling sync for persistCheckState 2016-11-07 15:24:31 -05:00
Kyle Havlovitz
e30b289c6f
Call fsync() for saving check/service state 2016-11-07 13:51:03 -05:00
Kyle McCullough
73b281a27c Add setting to skip ssl certificate verification for HTTP checks (#1984)
* http check: add setting to skip ssl certificate verification

* update http check documentation

* fix typo in documentation

* Add TLSSkipVerify to agent api
2016-11-03 13:17:30 -07:00
James Phillips
233a3a101b Supports WAN and LAN Serf Bind Addresses. (#2468)
* * adding cli config and config file support for specifying the serf wan and lan bind addresses
* updating documentation for serf wan and lan options
Fixes #2007

* Cleans up some small things from #2380.

* Uses the bind default for the agent test for Serf WAN and LAN.
2016-11-03 12:58:58 -07:00
James Phillips
c01a3871c9 Adds support for snapshots and restores. (#2396)
* Updates Raft library to get new snapshot/restore API.

* Basic backup and restore working, but need some cleanup.

* Breaks out a snapshot module and adds a SHA256 integrity check.

* Adds snapshot ACL and fills in some missing comments.

* Require a consistent read for snapshots.

* Make sure snapshot works if ACLs aren't enabled.

* Adds a bit of package documentation.

* Returns an empty response from restore to avoid EOF errors.

* Adds API client support for snapshots.

* Makes internal file names match on-disk file snapshots.

* Adds DC and token coverage for snapshot API test.

* Adds missing documentation.

* Adds a unit test for the snapshot client endpoint.

* Moves the connection pool out of the client for easier testing.

* Fixes an incidental issue in the prepared query unit test.

I realized I had two servers in bootstrap mode so this wasn't a good setup.

* Adds a half close to the TCP stream and fixes panic on error.

* Adds client and endpoint tests for snapshots.

* Moves the pool back into the snapshot RPC client.

* Adds a TLS test and fixes half-closes for TLS connections.

* Tweaks some comments.

* Adds a low-level snapshot test.

This is independent of Consul so we can pull this out into a library
later if we want to.

* Cleans up snapshot and archive and completes archive tests.

* Sends a clear error for snapshot operations in dev mode.

Snapshots require the Raft snapshots to be readable, which isn't supported
in dev mode. Send a clear error instead of a deep-down Raft one.

* Adds docs for the snapshot endpoint.

* Adds a stale mode and index feedback for snapshot saves.

This gives folks a way to extract data even if the cluster has no
leader.

* Changes the internal format of a snapshot from zip to tgz.

* Pulls in Raft fix to cancel inflight before a restore.

* Pulls in new Raft restore interface.

* Adds metadata to snapshot saves and a verify function.

* Adds basic save and restore snapshot CLI commands.

* Gets rid of tarball extensions and adds restore message.

* Fixes an incidental bad link in the KV docs.

* Adds documentation for the snapshot CLI commands.

* Scuttle any request body when a snapshot is saved.

* Fixes archive unit test error message check.

* Allows for nil output writers in snapshot RPC handlers.

* Renames hash list Decode to DecodeAndVerify.

* Closes the client connection for snapshot ops.

* Lowers timeout for restore ops.

* Updates Raft vendor to get new Restore signature and integrates with Consul.

* Bounces the leader's internal state when we do a restore.
2016-10-25 19:20:24 -07:00
James Phillips
03ae813bc7 Merge pull request #2389 from hashicorp/jbs-2019
Lower Service tag DNS warning to DEBUG for #2019
2016-10-24 17:05:02 -07:00
Brian Shumate
74a8fbef06
Lower Service tag DNS warning to DEBUG for #2019 2016-10-05 08:45:01 -04:00
Adam Wolfe Gordon
5ac5a8ccfc agent: Stop reaping child processes (resolves #1988)
The consul docker image now uses dumb-init to reap child processes, so
there's no need to reap them ourselves.
2016-10-04 09:36:41 -06:00
James Phillips
57db4bcce6
Adds performance tuning capability for Raft, detuned defaults, and supplemental docs. 2016-08-24 21:58:37 -07:00
James Phillips
4c7a0ed3b0
Merge branch 'master' into f-deregister-critical 2016-08-16 12:53:21 -07:00
James Phillips
ba60afd5d8
Cleans up based on code review feedback. 2016-08-16 12:52:30 -07:00
James Phillips
fbdd021ab9
Adds an "lan" tagged address so we have a way to get them all.
If we didn't have this, then there would be no way to know the LAN
address if address translation was turned on.
2016-08-16 10:49:03 -07:00
James Phillips
4a3d7db24f
Adds ability to deregister a service based on critical check state longer than a timeout. 2016-08-16 01:00:26 -07:00
James Phillips
d336bdd7b0
Adds basic ACL replication plumbing. 2016-08-03 21:24:04 -07:00
Cameron Davison
d138752249
atomic write service state and checks files, fixes #1221 2016-08-03 10:34:12 -05:00
Sean Chittenden
112f3fd468
Give log reviewers a hint as to which check is failing 2016-06-20 15:25:21 -07:00
Sean Chittenden
63adcbd5ef
Revert "Move structs.CheckID to a new top-level package, types."
This reverts commit 2bbd52e3b44ff1b60939a8400264d534662d6d51.
2016-06-07 16:59:02 -04:00