consul

Commit Graph

Author	SHA1	Message	Date
Wojciech Bederski	b014c0f91b	make Pause()/Resume()/isPaused() behave more like a semaphore see: https://github.com/hashicorp/consul/issues/1173 #1173 Reasoning: somewhere during consul development Pause()/Resume() and PauseSync()/ResumeSync() were added to protect larger changes to agent's localState. A few of the places that it tries to protect are: - (a Agent) AddService(...) # part of the method - (c Command) handleReload(...) # almost the whole method - (l *localState) antiEntropy(...)# isPaused() prevents syncChanges() The main problem is, that in the middle of handleReload(...)'s critical section it indirectly (loadServices()) calls AddService(...). AddService() in turn calls Pause() to protect itself against syncChanges(). At the end of AddService() a defered call to Resume() is made. With the current implementation, this releases isPaused() "lock" in the middle of handleReload() allowing antiEntropy to kick in while configuration reload is still in progress. Specifically almost all services and probably all check are unloaded when syncChanges() is allowed to run. This in turn can causes massive service/check de-/re-registration, and since checks are by default registered in the critical state, a majority of services on a node can be marked as failing. It's made worse with automation, often calling `consul reload` in close proximity on many nodes in the cluster. This change basically turns Pause()/Resume() into P()/V() of a garden-variety semaphore. Allowing Pause() to be called multiple times, and releasing isPaused() only after all matching/defered Resumes() are called as well. TODO/NOTE: as with many semaphore implementations, it might be reasonable to panic() if l.paused ever becomes negative.	2015-09-11 18:28:06 +02:00
Wojciech Bederski	24bc17eaa1	failing test showing that nested Pause()/Resume() release too early see: #1173 / https://github.com/hashicorp/consul/issues/1173	2015-09-11 17:52:57 +02:00
James Phillips	27c59f7b30	Adds missing token to maint unit test.	2015-09-10 14:53:00 -07:00
Ryan Uber	1908c16f53	Merge pull request #1230 from hashicorp/f-maintfix Respect tokens in maintenance mode	2015-09-10 12:30:07 -07:00
Ryan Uber	039938a7e0	agent: testing node/service maintenance using tokens	2015-09-10 12:08:08 -07:00
Ryan Uber	125d7fd4ee	agent: thread tokens through for maintenance mode	2015-09-10 11:43:59 -07:00
Wim	0bc4d9322e	Allow AAAA queries for nodeLookup	2015-09-08 16:54:36 +02:00
Ryan Breen	446c640c17	Merge pull request #1217 from 42wim/fix-rfc2308-part3 No NXDOMAIN when the answer is empty	2015-09-04 10:42:38 -04:00
Armon Dadgar	5cb6ab625e	Merge pull request #1214 from zendesk/fix_lock_race_2 lock.go: fix another race condition	2015-09-02 16:04:55 -07:00
Wim	2701bb5cc2	No NXDOMAIN when the answer is empty	2015-09-02 16:12:22 +02:00
Ryan Breen	80d26f9156	Merge pull request #1167 from railsguru/master Add -http-port option to change the HTTP API port	2015-09-02 01:15:55 -04:00
Armon Dadgar	52a8a95af9	agent: Always enable the UI endpoints	2015-09-01 18:28:32 -07:00
Michael S. Fischer	43ab372a18	lock.go: fix another race condition The previous fix to `consul lock` (commit `6875e8d`) didn't completely eliminate the race that could occur if the lock was acquired around the same time SIGTERM was received: It was still possible for Run() to spawn the process via startChild() after killChild() had released the shared mutex. Now, when SIGTERM is received, we acquire a mutex that prevents spawning a new process and never release it. We've tested this fix pretty thoroughly and believe it completely resolves the issue.	2015-09-01 14:27:23 -07:00
Wim	4a1dc90cba	Limit the DNS responses after getting the NodeRecords	2015-09-01 23:23:05 +02:00
Ryan Breen	f41b79eff2	Merge pull request #1195 from 42wim/fix-rfc2308-part2 Return SOA/NXDOMAIN when the answer is empty	2015-09-01 17:08:31 -04:00
Wim	369982270d	Return SOA/not found when the answer is empty	2015-09-01 22:28:12 +02:00
James Phillips	26ce9d16be	Merge pull request #1200 from ryotarai/lock-pass-stdin command/lock: Pass stdin to child process when -pass-stdin passed.	2015-08-31 21:14:45 -07:00
Ryan Uber	11e4cfd72b	agent: reload SCADA client if endpoint changes	2015-08-27 13:29:07 -07:00
Ryan Uber	c468acf222	command: atlas endpoint can be passed	2015-08-27 11:11:05 -07:00
Ryan Uber	1cc2429364	agent: atlas_endpoint is configurable	2015-08-27 11:08:01 -07:00
Ryota Arai	33a6cde7dd	command/lock: Pass stdin to child process when -pass-stdin passed.	2015-08-26 16:27:21 +09:00
Ryan Uber	5ad8bfbd41	agent: log a message when making a new scada connection	2015-08-25 21:03:16 -07:00
Ryan Uber	4b715a7d2c	agent: don't reload scada client if there is no config change	2015-08-25 20:43:57 -07:00
Ryan Uber	ed70720d55	agent: testing scada client creation in command	2015-08-25 20:22:22 -07:00
Ryan Uber	52a7206ff3	agent: test scada HTTP server creation	2015-08-25 18:51:04 -07:00
Ryan Uber	eb8974160f	agent: clean up scada connection manager	2015-08-25 18:27:07 -07:00
Ryan Uber	87c1e4fcd3	agent: document the scada http creation func	2015-08-25 17:19:11 -07:00
Ryan Uber	2e6ccded2c	agent: scada client and HTTP server are tracked separately	2015-08-25 16:59:53 -07:00
Andy Lo-A-Foe	85321301e1	Remove duplicate code	2015-08-20 20:46:20 +02:00
Andy Lo-A-Foe	3e046d3efc	Use Ports.HTTP directly	2015-08-20 20:27:20 +02:00
Andy Lo-A-Foe	4e2c3373bc	Add documentation for http-port option	2015-08-20 20:19:35 +02:00
Ryan Uber	134db62937	Merge pull request #1166 from hashicorp/f-dns-log Log network address of DNS clients	2015-08-13 18:32:32 -07:00
Ryan Uber	05216d3cc4	agent: log network address of DNS clients	2015-08-11 10:33:27 -07:00
Andy Lo-A-Foe	7b5da2a240	Add -http-port option to change the HTTP API port This is useful when pushing consul to PaaS like Cloud Foundry making the HTTP API easily routable.	2015-08-11 14:14:21 +02:00
Armon Dadgar	066e772536	Merge pull request #1158 from mfischer-zd/fix_1155 lock.go: fix race condition	2015-08-05 14:56:13 -07:00
Michael S. Fischer	6875e8d6b4	lock.go: fix race condition Fix a race condition between startChild() and killChild() that could lead to an orphaned managed process. Fixes #1155	2015-08-05 09:06:51 -07:00
J.R. Garcia	4cb6f3e943	Remove trailing slash from lock Lock command will remove trailing slash from path (as it is invalid). Fixes #1136.	2015-07-30 12:14:17 -05:00
Ryan Breen	018fd69aa2	Merge pull request #1143 from hashicorp/GH-1142 Check NXDOMAIN after filtering nodes	2015-07-29 18:56:08 -04:00
Ryan Breen	0a7dc85076	Test for GH-1142.	2015-07-29 18:21:16 -04:00
Armon Dadgar	0363d4b54b	Merge pull request #1137 from 42wim/fix-1124 Recurse when PTR answer is empty	2015-07-29 14:39:04 -07:00
Ryan Breen	42648438a0	Check NXDOMAIN after filtering nodes Move the check for NXDOMAIN below the service health filter.	2015-07-29 17:16:48 -04:00
Ryan Uber	93c9c87f7a	Merge pull request #1141 from hashicorp/f-travis Try moving to newer Travis-CI infrastructure	2015-07-28 10:42:56 -07:00
Ryan Uber	40f3e3fae7	travis-ci: skip syslog tests for container-based travis infra	2015-07-28 09:58:43 -07:00
Wim	5647a37ffe	Recurse when PTR answer is empty	2015-07-27 23:22:36 +02:00
Armon Dadgar	4a9b91f2a2	Merge pull request #1130 from pdf/check_socket Add Socket check type	2015-07-27 14:21:24 -07:00
Ryan Uber	a6317f2fb2	Merge pull request #1090 from hashicorp/f-keyring-acl Keyring ACLs	2015-07-24 10:23:18 -07:00
Peter Fern	b023904298	Add TCP check type Adds the ability to simply check whether a TCP socket accepts connections to determine if it is healthy. This is a light-weight - though less comprehensive than scripting - method of checking network service health. The check parameter `tcp` should be set to the `address:port` combination for the service to be tested. Supports both IPv6 and IPv4, in the case of a hostname that resolves to both, connections will be attempted via both protocol versions, with the first successful connection returning a successful check result. Example check: ```json { "check": { "id": "ssh", "name": "SSH (TCP)", "tcp": "example.com:22", "interval": "10s" } } ```	2015-07-24 14:06:05 +10:00
Ryan Uber	7aa8539c10	agent: disable ACLs for RPC client tests	2015-07-23 17:09:33 -07:00
Armon Dadgar	981c62ccba	command/lock: Check for shutdown during lock acquisition. Fixes #800	2015-07-22 16:07:44 -07:00
Benjamin Abbott-Scott	f877b9ecc4	Return every time lock acquisition fails	2015-07-22 10:44:47 -07:00

1 2 3 4 5 ...

782 Commits