Commit Graph

462 Commits

Author SHA1 Message Date
James Phillips 533f65b7a6
Merge pull request #3845 from 42wim/tagfix
Fix service tags not added to health check. Part two
2018-02-05 16:18:00 -08:00
James Phillips e748c63fff
Merge pull request #3855 from hashicorp/pr-3782-slackpad
Adds support for gRPC health checks.
2018-02-02 17:57:27 -08:00
James Phillips 5f31c8d8d3
Changes "TLS" to "GRPCUseTLS" since it only applies to GRPC checks. 2018-02-02 17:29:34 -08:00
Wim ce771f1fb3 Fix service tags not added to health check. Part two 2018-01-29 20:32:44 +01:00
Veselkov Konstantin 5f38e1148a fix refactoring 2018-01-28 22:53:30 +04:00
Veselkov Konstantin 7de57ba4de remove golint warnings 2018-01-28 22:40:13 +04:00
Kyle Havlovitz 68ae92cb8c
Don't remove the files, just log an error 2018-01-19 14:25:51 -08:00
Kyle Havlovitz 4e325a6b8f
Add graceful handling of malformed persisted service/check files.
Previously a change was made to make the file writing atomic,
but that wasn't enough to cover something like an OS crash so we
needed something here to handle the situation more gracefully.

Fixes #1221.
2018-01-19 14:07:36 -08:00
Dmytro Kostiuchenko 1a10b08e82 Add gRPC health-check #3073 2018-01-04 16:42:30 -05:00
James Phillips f491a55e47
Merge pull request #3642 from yfouquet/master
[Fix] Service tags not added to health checks
2017-12-14 13:59:39 -08:00
James Phillips 2892f91d0b
Copies the autopilot settings from the runtime config.
Fixes #3730
2017-12-13 10:32:05 -08:00
Yoann Fouquet 986148cfe5 [Fix] Service tags not added to health checks
Since commit 9685bdcd0b, service tags are added to the health checks.
Otherwise, when adding a service, tags are not added to its check.

In updateSyncState, we compare the checks of the local agent with the checks of the catalog.
It appears that the service tags are different (missing in one case), and so the check is synchronized.
That increase the ModifyIndex periodically when nothing changes.

Fixed it by adding serviceTags to the check.

Note that the issue appeared in version 0.8.2.
Looks related to #3259.
2017-12-12 13:39:37 +01:00
James Phillips 93f68555d0
Adds enable_agent_tls_for_checks configuration option which allows (#3661)
HTTP health checks for services requiring 2-way TLS to be checked
using the agent's credentials.
2017-11-07 18:22:09 -08:00
James Phillips 4a2cafe525
Adds HTTP/2 support to Consul's HTTPS server. (#3657)
* Refactors the HTTP listen path to create servers in the same spot.

* Adds HTTP/2 support to Consul's HTTPS server.

* Vendors Go HTTP/2 library and associated deps.
2017-11-07 15:06:59 -08:00
Kyle Havlovitz dbab3cd5f6
Merge branch 'master' into esm-changes 2017-11-01 11:37:48 -07:00
Frank Schroeder 164ec3ec39
docker: stop previous check on replace 2017-10-26 12:03:07 +02:00
Kyle Havlovitz ce4e8c46fa
Add deregister critical service field and refactor duration parsing 2017-10-25 19:17:41 -07:00
Frank Schroeder 8f145559d8
Decouple the code that executes checks from the agent 2017-10-25 11:18:07 +02:00
Frank Schroeder 3231385089
ae: fix typo in constructor name 2017-10-23 10:56:05 +02:00
Frank Schroeder de57b16d99
local state: address review comments
* move non-blocking notification mechanism into ae.Trigger
* move Pause/Resume into separate type
2017-10-23 10:56:04 +02:00
Frank Schroeder b803bf3091
local state: tests compile 2017-10-23 10:56:03 +02:00
Frank Schroeder 0a9ac9749e
local state: replace multi-map state with structs
The state of the service and health check records was spread out over
multiple maps guarded by a single lock. Access to the maps has to happen
in a coordinated effort and the tests often violated this which made
them brittle and racy.

This patch replaces the multiple maps with a single one for both checks
and services to make the code less fragile.

This is also necessary since moving the local state into its own package
creates circular dependencies for the tests. To avoid this the tests can
no longer access internal data structures which they should not be doing
in the first place.

The tests still don't compile but this is a ncessary step in that
direction.
2017-10-23 10:56:03 +02:00
Frank Schroeder 6027a9e2a5
local state: move to separate package
This patch moves the local state to a separate package to further
decouple it from the agent code.

The code compiles but the tests do not yet.
2017-10-23 10:56:03 +02:00
Frank Schroeder c00bbdb5e4
agent: simplify some loops 2017-10-23 10:56:03 +02:00
Frank Schroeder 94ef1041a1
agent: cleanup StateSyncer
This patch cleans up the state syncer code by renaming fields, adding
helpers and documentation.
2017-10-23 10:56:03 +02:00
Frank Schroeder 29e18c7494
agent: decouple anti-entropy from local state
The anti-entropy code manages background synchronizations of the local
state on a regular basis or on demand when either the state has changed
or a new consul server has been added.

This patch moves the anti-entropy code into its own package and
decouples it from the local state code since they are performing
two different functions.

To simplify code-review this revision does not make any optimizations,
renames or refactorings. This will happen in subsequent commits.
2017-10-23 10:56:03 +02:00
Frank Schroeder 58b0e153f9
Revert "agent: decouple anti-entropy from local state"
This reverts commit a842dc9c2b.
2017-10-23 10:08:35 +02:00
Frank Schroeder b4e7d0b974
Revert "agent: cleanup StateSyncer"
This reverts commit b7136e100b.
2017-10-23 10:08:35 +02:00
Frank Schroeder 91569a7ceb
Revert "agent: simplify some loops"
This reverts commit b5dbad910c.
2017-10-23 10:08:34 +02:00
Frank Schroeder 67a0689f71
Revert "local state: move to separate package"
This reverts commit d447e823c6.
2017-10-23 10:08:34 +02:00
Frank Schroeder 623e07760a
Revert "local state: replace multi-map state with structs"
This reverts commit ccbae7da5b.
2017-10-23 10:08:34 +02:00
Frank Schroeder 9ed4b2d631
Revert "local state: tests compile"
This reverts commit 1af52bf7be.
2017-10-23 10:08:34 +02:00
Frank Schroeder 46641e44d9
Revert "local state: address review comments"
This reverts commit 1d315075b1.
2017-10-23 10:08:33 +02:00
Frank Schroeder cab3b17292
Revert "ae: fix typo in constructor name"
This reverts commit e88f49e2cc.
2017-10-23 10:08:32 +02:00
Frank Schroeder e88f49e2cc ae: fix typo in constructor name 2017-10-23 08:03:18 +02:00
Frank Schroeder 1d315075b1 local state: address review comments
* move non-blocking notification mechanism into ae.Trigger
* move Pause/Resume into separate type
2017-10-23 08:03:18 +02:00
Frank Schroeder 1af52bf7be local state: tests compile 2017-10-23 08:03:18 +02:00
Frank Schroeder ccbae7da5b local state: replace multi-map state with structs
The state of the service and health check records was spread out over
multiple maps guarded by a single lock. Access to the maps has to happen
in a coordinated effort and the tests often violated this which made
them brittle and racy.

This patch replaces the multiple maps with a single one for both checks
and services to make the code less fragile.

This is also necessary since moving the local state into its own package
creates circular dependencies for the tests. To avoid this the tests can
no longer access internal data structures which they should not be doing
in the first place.

The tests still don't compile but this is a ncessary step in that
direction.
2017-10-23 08:03:18 +02:00
Frank Schroeder d447e823c6 local state: move to separate package
This patch moves the local state to a separate package to further
decouple it from the agent code.

The code compiles but the tests do not yet.
2017-10-23 08:03:18 +02:00
Frank Schroeder b5dbad910c agent: simplify some loops 2017-10-23 08:03:18 +02:00
Frank Schroeder b7136e100b agent: cleanup StateSyncer
This patch cleans up the state syncer code by renaming fields, adding
helpers and documentation.
2017-10-23 08:03:18 +02:00
Frank Schroeder a842dc9c2b agent: decouple anti-entropy from local state
The anti-entropy code manages background synchronizations of the local
state on a regular basis or on demand when either the state has changed
or a new consul server has been added.

This patch moves the anti-entropy code into its own package and
decouples it from the local state code since they are performing
two different functions.

To simplify code-review this revision does not make any optimizations,
renames or refactorings. This will happen in subsequent commits.
2017-10-23 08:03:18 +02:00
Hadar Greinsmark 7e1a860978 Implement HTTP Watch handler (#3413)
Implement HTTP Watch handler
2017-10-21 20:39:09 -05:00
Frank Schröder 94f58199b1 agent: add option to discard health output (#3562)
* agent: add option to discard health output

In high volatile environments consul will have checks with "noisy"
output which changes every time even though the status does not change.
Since the output is stored in the raft log every health check update
unblocks a blocking call on health checks since the raft index has
changed even though the status of the health checks may not have changed
at all. By discarding the output of the health checks the users can
choose a different tradeoff. Less visibility on why a check failed in
exchange for a reduced change rate on the raft log.

* agent: discard output also when adding a check

* agent: add test for discard check output

* agent: update docs

* go vet

* Adds discard_check_output to reloadable config table.

* Updates the change log.
2017-10-10 17:04:52 -07:00
preetapan 77c972f594 Fixes agent error handling when check definition is invalid. Distingu… (#3560)
* Fixes agent error handling when check definition is invalid. Distinguishes between empty checks vs invalid checks

* Made CheckTypes return Checks from service definition struct rather than a new copy, and other changes from code review. This also errors when json payload contains empty structs

* Simplify and improve validate method, and make sure that CheckTypes always returns a new copy of validated check definitions

* Tweaks some small style things and error messages.

* Updates the change log.
2017-10-10 16:54:06 -07:00
James Phillips bb12368eac Makes RPC handling more robust when rolling servers. (#3561)
* Adds client-side retry for no leader errors.

This paves over the case where the client was connected to the leader
when it loses leadership.

* Adds a configurable server RPC drain time and a fail-fast path for RPCs.

When a server leaves it gets removed from the Raft configuration, so it will
never know who the new leader server ends up being. Without this we'd be
doomed to wait out the RPC hold timeout and then fail. This makes things fail
a little quicker while a sever is draining, and since we added a client retry
AND since the server doing this has already shut down and left the Serf LAN,
clients should retry against some other server.

* Makes the RPC hold timeout configurable.

* Reorders struct members.

* Sets the RPC hold timeout default for test servers.

* Bumps the leave drain time up to 5 seconds.

* Robustifies retries with a simpler client-side RPC hold.

* Reverts untended delete.
2017-10-10 15:19:50 -07:00
James Phillips 3bc6df5f0e
Adds script warning and fixes Docker args recognition. 2017-10-04 21:41:27 -07:00
Kyle Havlovitz 198ed6076d Clean up subprocess handling and make shell use optional (#3509)
* Clean up handling of subprocesses and make using a shell optional

* Update docs for subprocess changes

* Fix tests for new subprocess behavior

* More cleanup of subprocesses

* Minor adjustments and cleanup for subprocess logic

* Makes the watch handler reload test use the new path.

* Adds check tests for new args path, and updates existing tests to use new path.

* Adds support for script args in Docker checks.

* Fixes the sanitize unit test.

* Adds panic for unknown watch type, and reverts back to Run().

* Adds shell option back to consul lock command.

* Adds shell option back to consul exec command.

* Adds shell back into consul watch command.

* Refactors signal forwarding and makes Windows-friendly.

* Adds a clarifying comment.

* Changes error wording to a warning.

* Scopes signals to interrupt and kill.

This avoids us trying to send SIGCHILD to the dead process.

* Adds an error for shell=false for consul exec.

* Adds notes about the deprecated script and handler fields.

* De-nests an if statement.
2017-10-04 16:48:00 -07:00
Preetha Appan 51a04ec87d Introduces new 'list' permission that applies to KV store recursive reads, and enforced only when opted in. 2017-10-02 17:10:21 -05:00
Frank Schroeder 1944218492 use ports from derived addresses 2017-09-29 20:26:43 +02:00
Kyle Havlovitz bfa70a10ca Fix watch error when http & https are disabled (#3493)
Remove an error in watch reloading that happens when http and https
are both disabled, and use an https address for running watches if
no http addresses are present.

Fixes #3425.
2017-09-26 13:47:27 -07:00
Frank Schröder 12216583a1 New config parser, HCL support, multiple bind addrs (#3480)
* new config parser for agent

This patch implements a new config parser for the consul agent which
makes the following changes to the previous implementation:

 * add HCL support
 * all configuration fragments in tests and for default config are
   expressed as HCL fragments
 * HCL fragments can be provided on the command line so that they
   can eventually replace the command line flags.
 * HCL/JSON fragments are parsed into a temporary Config structure
   which can be merged using reflection (all values are pointers).
   The existing merge logic of overwrite for values and append
   for slices has been preserved.
 * A single builder process generates a typed runtime configuration
   for the agent.

The new implementation is more strict and fails in the builder process
if no valid runtime configuration can be generated. Therefore,
additional validations in other parts of the code should be removed.

The builder also pre-computes all required network addresses so that no
address/port magic should be required where the configuration is used
and should therefore be removed.

* Upgrade github.com/hashicorp/hcl to support int64

* improve error messages

* fix directory permission test

* Fix rtt test

* Fix ForceLeave test

* Skip performance test for now until we know what to do

* Update github.com/hashicorp/memberlist to update log prefix

* Make memberlist use the default logger

* improve config error handling

* do not fail on non-existing data-dir

* experiment with non-uniform timeouts to get a handle on stalled leader elections

* Run tests for packages separately to eliminate the spurious port conflicts

* refactor private address detection and unify approach for ipv4 and ipv6.

Fixes #2825

* do not allow unix sockets for DNS

* improve bind and advertise addr error handling

* go through builder using test coverage

* minimal update to the docs

* more coverage tests fixed

* more tests

* fix makefile

* cleanup

* fix port conflicts with external port server 'porter'

* stop test server on error

* do not run api test that change global ENV concurrently with the other tests

* Run remaining api tests concurrently

* no need for retry with the port number service

* monkey patch race condition in go-sockaddr until we understand why that fails

* monkey patch hcl decoder race condidtion until we understand why that fails

* monkey patch spurious errors in strings.EqualFold from here

* add test for hcl decoder race condition. Run with go test -parallel 128

* Increase timeout again

* cleanup

* don't log port allocations by default

* use base command arg parsing to format help output properly

* handle -dc deprecation case in Build

* switch autopilot.max_trailing_logs to int

* remove duplicate test case

* remove unused methods

* remove comments about flag/config value inconsistencies

* switch got and want around since the error message was misleading.

* Removes a stray debug log.

* Removes a stray newline in imports.

* Fixes TestACL_Version8.

* Runs go fmt.

* Adds a default case for unknown address types.

* Reoders and reformats some imports.

* Adds some comments and fixes typos.

* Reorders imports.

* add unix socket support for dns later

* drop all deprecated flags and arguments

* fix wrong field name

* remove stray node-id file

* drop unnecessary patch section in test

* drop duplicate test

* add test for LeaveOnTerm and SkipLeaveOnInt in client mode

* drop "bla" and add clarifying comment for the test

* split up tests to support enterprise/non-enterprise tests

* drop raft multiplier and derive values during build phase

* sanitize runtime config reflectively and add test

* detect invalid config fields

* fix tests with invalid config fields

* use different values for wan sanitiziation test

* drop recursor in favor of recursors

* allow dns_config.udp_answer_limit to be zero

* make sure tests run on machines with multiple ips

* Fix failing tests in a few more places by providing a bind address in the test

* Gets rid of skipped TestAgent_CheckPerformanceSettings and adds case for builder.

* Add porter to server_test.go to make tests there less flaky

* go fmt
2017-09-25 11:40:42 -07:00
James Phillips 828be5771a
Revert "Manages segments list via a pointer."
This reverts commit c277a42504.
2017-09-07 16:37:11 -07:00
James Phillips c277a42504
Manages segments list via a pointer. 2017-09-07 16:21:07 -07:00
James Phillips aa5ef4a098
Populates the segment keyrings based on the LAN keyring. 2017-09-07 12:17:20 -07:00
James Phillips 1a117ba0a8
Makes the all segments query explict, and the default for `consul members`. 2017-09-05 12:22:20 -07:00
James Phillips 9258506dab Adds simple rate limiting for client agent RPC calls to Consul servers. (#3440)
* Added rate limiting for agent RPC calls.
* Initializes the rate limiter based on the config.
* Adds the rate limiter into the snapshot RPC path.
* Adds unit tests for the RPC rate limiter.
* Groups the RPC limit parameters under "limits" in the config.
* Adds some documentation about the RPC limiter.
* Sends a 429 response when the rate limiter kicks in.
* Adds docs for new telemetry.
* Makes snapshot telemetry look like RPC telemetry and cleans up comments.
2017-09-01 15:02:50 -07:00
Kyle Havlovitz 7e565d7338
Fix some inconsistencies with segment logic and comments 2017-08-30 17:43:46 -07:00
Kyle Havlovitz 16aaf27208
Default bind/advertise for segments to BindAddr/AdvertiseAddr 2017-08-30 12:51:10 -07:00
Kyle Havlovitz 2ada0439d4
Add rpc_listener option to segment config 2017-08-30 11:58:29 -07:00
James Phillips b1a15e0c3d
Adds open source side of network segments (feature is Enterprise-only). 2017-08-30 11:58:29 -07:00
Frank Schroeder 14ab5c7641 agent: support go-discover retry-join for wan 2017-08-23 21:23:34 +02:00
Frank Schröder a3934c263c acl: consolidate error handling (#3401)
The error handling of the ACL code relies on the presence of certain
magic error messages. Since the error values are sent via RPC between
older and newer consul agents we cannot just replace the magic values
with typed errors and switch to type checks since this would break
compatibility with older clients.

Therefore, this patch moves all magic ACL error messages into the acl
package and provides default error values and helper functions which
determine the type of error.
2017-08-23 16:52:48 +02:00
Frank Schroeder 16c58da27d agent: drop unused code
This code from http://github.com/hashicorp/consul/pull/3353 is no longer
required.
2017-08-22 00:02:46 +02:00
James Phillips f51d56c80c
Switches to using a read lock for the agent's RPC dispatcher.
This prevents RPC calls from getting serialized in this spot.

Fixes #3376
2017-08-09 18:51:55 -07:00
Frank Schroeder 1acff3533e
agent: move agent/consul/structs to agent/structs 2017-08-09 14:32:12 +02:00
Kyle Havlovitz cf02e3bc22 Merge pull request #3369 from hashicorp/metrics-enhancements
Add support for labels/filters from go-metrics
2017-08-08 13:55:30 -07:00
Kyle Havlovitz 0428e9fe9e
Update docs for metrics endpoint 2017-08-08 12:33:30 -07:00
Kyle Havlovitz d5634fe2a8
Add support for labels/filters from go-metrics 2017-08-08 01:45:10 -07:00
Preetha Appan 824fc4ee20
Unify regex used to identify invalid dns characters 2017-08-07 11:11:55 +02:00
Preetha Appan 37f75a393e
Use sanitized version of node name of server in NS record, and start with "server" rather than "ns" 2017-08-07 11:11:55 +02:00
Preetha Appan f9db387097
Add NS records and A records for each server. Constructs ns host names using the advertise address of the server. 2017-08-07 11:11:54 +02:00
James Phillips 4bee2e49f5 Adds secure introduction for the ACL replication token. (#3357)
Adds secure introduction for the ACL replication token, as well as a separate enable config for ACL replication.
2017-08-03 15:39:31 -07:00
James Phillips c0a5ad7903 Adds a new /v1/acl/bootstrap API (#3349) 2017-08-02 17:05:18 -07:00
Preetha Appan aa98aeb4b1 Moved handling advertise address to readConfig and out of the agent's constructor, plus unit test fixes 2017-07-27 22:06:31 -05:00
Preetha Appan 25acd1534a Move go-socketaddr template parsing into config package to make it happen before creating a new agent. Also removed redundant parsetemplate calls from agent.go. 2017-07-27 16:17:35 -05:00
James Phillips 496b0bcf07 Adds support for agent-side ACL token management via API instead of config files. (#3324)
* Adds token store and removes all runtime use of config for ACL tokens.
* Adds a new API for changing agent tokens on the fly.
2017-07-26 11:03:43 -07:00
James Phillips c413a9161e Removes an unnecessary close. 2017-07-24 21:41:18 -07:00
Preetha Appan f8b633c69e Removed redundant logging 2017-07-24 21:07:48 -05:00
Preetha Appan c26fd66edd Clean up temporary files on write errors, and ignore any temporary service files on load with a warning. This fixes #3207 2017-07-24 12:42:51 -05:00
James Phillips 1774fdc237
Tweaks the error when scripts are disabled.
This will hopefully help people self-serve if they upgrade without accounting
for this.
2017-07-19 22:15:04 -07:00
Frank Schroeder 83577e0daa agent: make docker client work on windows 2017-07-19 12:03:59 +02:00
preetapan fb43953894 Merge pull request #3296 from hashicorp/ensure_registration_race
Fix race condition between removing a service and adding a check for …
2017-07-18 18:36:47 -05:00
Preetha Appan e50f0e6722 Clean up any watch monitors associated with a failed AddCheck 2017-07-18 16:54:20 -05:00
Preetha Appan 6a257f242e Removed unit test, added clarifying comment and returned a friendlier error message similar to the one in agent's AddService method
Fixes #3297
2017-07-18 16:15:47 -05:00
Kyle Havlovitz 19eae3d14b
Add UpgradeVersionTag to autopilot config 2017-07-18 13:35:41 -07:00
Frank Schroeder 0d9b53730f agent: stop docker checks on shutdown 2017-07-18 20:59:24 +02:00
Frank Schroeder 60540c2417 agent: stop and remove docker checks
Note that there is no test since the correct way to solve (and test)
this is to replace the different maps with a single one or to hide
that functionality behind a separate data structure. This will be
addressed in #3294.

Fixes #3265
2017-07-18 20:59:24 +02:00
Frank Schroeder 2123700056
agent: replace docker check
This patch replaces the Docker client which is used
for health checks with a simplified version tailored
for that purpose.

See #3254
See #3257
Fixes #3270
2017-07-18 20:24:38 +02:00
James Phillips fff0f9698f Prevents disabling gossip keyring file from disabling gossip encryption. (#3278) 2017-07-17 12:48:45 -07:00
James Phillips 1791d99a10 Adds new config to make script checks opt-in, updates documentation. (#3284) 2017-07-17 11:20:35 -07:00
James Phillips 0881e46111 Cleans up version 8 ACLs in the agent and the docs. (#3248)
* Moves magic check and service constants into shared structs package.

* Removes the "consul" service from local state.

Since this service is added by the leader, it doesn't really make sense to
also keep it in local state (which requires special ACLs to configure), and
requires a bunch of special cases in the local state logic. This requires
fewer special cases and makes ACL bootstrapping cleaner.

* Makes coordinate update ACL log message a warning, similar to other AE warnings.

* Adds much more detailed examples for bootstrapping ACLs.

This can hopefully replace https://gist.github.com/slackpad/d89ce0e1cc0802c3c4f2d84932fa3234.
2017-07-13 22:33:47 -07:00
Frank Schroeder 7381a05d8d agent: do not modify agent config after NewAgent 2017-07-07 09:22:34 +02:00
Frank Schroeder 7f7c0ad65e agent: clone partial consul config
The agent configuration for the consul server is a partial configuration
which needs to be cloned to avoid data races.

This is a stop-gap measure before moving the configuration into
a separate package.
2017-07-07 09:22:34 +02:00
Frank Schroeder 0763788b82 agent: fix data race between consul server and local state 2017-07-07 09:22:34 +02:00
Preetha Appan 07db760d53 Fix missing formatting directive causing go vet to fail 2017-06-27 16:32:38 -05:00
James Phillips 4a3604a3ee
Removes some useless comments. 2017-06-25 10:32:35 -07:00
James Phillips 6977e40077 Fixes watch tracking during reloads and fixes address issue. (#3189)
This patch fixes watch registration through the config file and a broken log line when the watch registration fails. It also plumbs all the watch loading through a common function and tweaks the
unit test to create the watch before the reload.
2017-06-24 12:52:41 -07:00
James Phillips 380c8b957d Changes host-based node IDs from opt-out to opt-in. (#3187) 2017-06-24 09:36:53 -07:00
Frank Schröder 31a310f551 agent: notify systemd after JoinLAN (#2121)
This patch adds support for notifying systemd via the
NOTIFY_SOCKET by sending 'READY=1' to the socket after
a successful JoinLAN.

Fixes #2121
2017-06-21 06:43:55 +02:00
Frank Schroeder ea5b0f2c7c agent: fix 'consul leave' shutdown race (#2880)
When the agent is triggered to shutdown via an external 'consul leave'
command delivered via the HTTP API then the client expects to receive a
response when the agent is down. This creates a race on when to shutdown
the agent itself like the RPC server, the checks and the state and the
external endpoints like DNS and HTTP.

This patch splits the shutdown process into two parts:

 * shutdown the agent
 * shutdown the endpoints (http and dns)

They can be executed multiple times, concurrently and in any order but
should be executed first agent, then endpoints to provide consistent
behavior across all use cases. Both calls have to be executed for a
proper shutdown.

This could be partially hidden in a single function but would introduce
some magic that happens behind the scenes which one has to know of but
isn't obvious.

Fixes #2880
2017-06-21 05:52:51 +02:00
Frank Schroeder c4fc581e07 agent: make registerEndpoint private
This is only used for testing.
2017-06-21 05:42:39 +02:00
Frank Schroeder 2b41f2e3a3 agent: make the RPC endpoint overwrite mechanism more transparent
This patch hides the RPC handler overwrite mechanism from the
rest of the code so that it works in all cases and that there
is no cooperation required from the tested code, i.e. we can
drop a.getEndpoint().
2017-06-21 05:42:39 +02:00
Frank Schroeder c49a15d0f3 agent: move structs into consul/structs pkg
* CheckDefinition
 * ServiceDefinition
 * CheckType
2017-06-21 05:42:39 +02:00
Frank Schroeder 4273fb8444 agent: move NotifyGroup into the agent pkg 2017-06-21 05:42:39 +02:00
Frank Schroeder 80971c8a85 agent: move the SnapshotReplyFn out of the way
When splitting up the consul package into server and client
the SnapshotReplyFn needs to be in a separate package to avoid
a circular dependency.
2017-06-21 05:42:39 +02:00
Frank Schroeder 04b9392b00 agent: use the delegate interface for local state 2017-06-21 05:42:39 +02:00
Frank Schroeder d77d2be13e agent: rename clientServer interface to delegate 2017-06-21 05:42:39 +02:00
Frank Schroeder b083ce17c7
Revert "agent: fix 'consul leave' shutdown race (#2880)"
This reverts commit 90c83a32b5.
2017-06-19 21:34:08 +02:00
Frank Schroeder 90c83a32b5 agent: fix 'consul leave' shutdown race (#2880)
When the agent is triggered to shutdown via an external 'consul leave'
command delivered via the HTTP API then the client expects to receive a
response when the agent is down. This creates a race on when to shutdown
the agent itself like the RPC server, the checks and the state and the
external endpoints like DNS and HTTP. Ideally, the external endpoints
should be shutdown before the internal state but if the goal is to
respond reliably that the agent is down then this is not possible.

This patch splits the agent shutdown into two parts implemented in a
single method to keep it simple and unambiguos for the caller. The first
stage shuts down the internal state, checks, RPC server, ...
synchronously and then triggers the shutdown of the external endpoints
asychronously. This way the caller is guaranteed that the internal state
services are down when Shutdown returns and there remains enough time to
send a response.

Fixes #2880
2017-06-19 21:24:26 +02:00
Kyle Havlovitz 5d99ee80ca Add an option to disable keyring file (#3145)
Also disables keyring file in dev mode.
2017-06-15 15:24:04 -07:00
Frank Schroeder 1c75cf1af5 pkg refactor
command/agent/*                  -> agent/*
    command/consul/*                 -> agent/consul/*
    command/agent/command{,_test}.go -> command/agent{,_test}.go
    command/base/command.go          -> command/base.go
    command/base/*                   -> command/*
    commands.go                      -> command/commands.go

The script which did the refactor is:

(
	cd $GOPATH/src/github.com/hashicorp/consul
	git mv command/agent/command.go command/agent.go
	git mv command/agent/command_test.go command/agent_test.go
	git mv command/agent/flag_slice_value{,_test}.go command/
	git mv command/agent .
	git mv command/base/command.go command/base.go
	git mv command/base/config_util{,_test}.go command/
	git mv commands.go command/
	git mv consul agent
	rmdir command/base/

	gsed -i -e 's|package agent|package command|' command/agent{,_test}.go
	gsed -i -e 's|package agent|package command|' command/flag_slice_value{,_test}.go
	gsed -i -e 's|package base|package command|' command/base.go command/config_util{,_test}.go
	gsed -i -e 's|package main|package command|' command/commands.go

	gsed -i -e 's|base.Command|BaseCommand|' command/commands.go
	gsed -i -e 's|agent.Command|AgentCommand|' command/commands.go
	gsed -i -e 's|\tCommand:|\tBaseCommand:|' command/commands.go
	gsed -i -e 's|base\.||' command/commands.go
	gsed -i -e 's|command\.||' command/commands.go

	gsed -i -e 's|command|c|' main.go
	gsed -i -e 's|range Commands|range command.Commands|' main.go
	gsed -i -e 's|Commands: Commands|Commands: command.Commands|' main.go

	gsed -i -e 's|base\.BoolValue|BoolValue|' command/operator_autopilot_set.go
	gsed -i -e 's|base\.DurationValue|DurationValue|' command/operator_autopilot_set.go
	gsed -i -e 's|base\.StringValue|StringValue|' command/operator_autopilot_set.go
	gsed -i -e 's|base\.UintValue|UintValue|' command/operator_autopilot_set.go

	gsed -i -e 's|\bCommand\b|BaseCommand|' command/base.go
	gsed -i -e 's|BaseCommand Options|Command Options|' command/base.go
	gsed -i -e 's|base.Command|BaseCommand|' command/*.go
	gsed -i -e 's|c\.Command|c.BaseCommand|g' command/*.go
	gsed -i -e 's|\tCommand:|\tBaseCommand:|' command/*_test.go
	gsed -i -e 's|base\.||' command/*_test.go

	gsed -i -e 's|\bCommand\b|AgentCommand|' command/agent{,_test}.go
	gsed -i -e 's|cmd.AgentCommand|cmd.BaseCommand|' command/agent.go

	gsed -i -e 's|cli.AgentCommand = new(Command)|cli.Command = new(AgentCommand)|' command/agent_test.go
	gsed -i -e 's|exec.AgentCommand|exec.Command|' command/agent_test.go
	gsed -i -e 's|exec.BaseCommand|exec.Command|' command/agent_test.go
	gsed -i -e 's|NewTestAgent|agent.NewTestAgent|' command/agent_test.go
	gsed -i -e 's|= TestConfig|= agent.TestConfig|' command/agent_test.go
	gsed -i -e 's|: RetryJoin|: agent.RetryJoin|' command/agent_test.go

	gsed -i -e 's|\.\./\.\./|../|' command/config_util_test.go

	gsed -i -e 's|\bverifyUniqueListeners|VerifyUniqueListeners|' agent/config{,_test}.go command/agent.go
	gsed -i -e 's|\bserfLANKeyring\b|SerfLANKeyring|g' agent/{agent,keyring,testagent}.go command/agent.go
	gsed -i -e 's|\bserfWANKeyring\b|SerfWANKeyring|g' agent/{agent,keyring,testagent}.go command/agent.go
	gsed -i -e 's|\bNewAgent\b|agent.New|g' command/agent{,_test}.go
	gsed -i -e 's|\bNewAgent|New|' agent/{acl_test,agent,testagent}.go

	gsed -i -e 's|\bAgent\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bBool\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bConfig\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bDefaultConfig\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bDevConfig\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bMergeConfig\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bReadConfigPaths\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bParseMetaPair\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bSerfLANKeyring\b|agent.&|g' command/agent{,_test}.go
	gsed -i -e 's|\bSerfWANKeyring\b|agent.&|g' command/agent{,_test}.go

	gsed -i -e 's|circonus\.agent|circonus|g' command/agent{,_test}.go
	gsed -i -e 's|logger\.agent|logger|g' command/agent{,_test}.go
	gsed -i -e 's|metrics\.agent|metrics|g' command/agent{,_test}.go
	gsed -i -e 's|// agent.Agent|// agent|' command/agent{,_test}.go
	gsed -i -e 's|a\.agent\.Config|a.Config|' command/agent{,_test}.go

	gsed -i -e 's|agent\.AppendSliceValue|AppendSliceValue|' command/{configtest,validate}.go

	gsed -i -e 's|consul/consul|agent/consul|' GNUmakefile

	gsed -i -e 's|\.\./test|../../test|' agent/consul/server_test.go

	# fix imports
	f=$(grep -rl 'github.com/hashicorp/consul/command/agent' * | grep '\.go')
	gsed -i -e 's|github.com/hashicorp/consul/command/agent|github.com/hashicorp/consul/agent|' $f
	goimports -w $f

	f=$(grep -rl 'github.com/hashicorp/consul/consul' * | grep '\.go')
	gsed -i -e 's|github.com/hashicorp/consul/consul|github.com/hashicorp/consul/agent/consul|' $f
	goimports -w $f

	goimports -w command/*.go main.go
)
2017-06-10 18:52:45 +02:00