2759 Commits

Author SHA1 Message Date
Wojciech Bederski
b014c0f91b make Pause()/Resume()/isPaused() behave more like a semaphore
see: https://github.com/hashicorp/consul/issues/1173 #1173

Reasoning: somewhere during consul development Pause()/Resume() and
PauseSync()/ResumeSync() were added to protect larger changes to
agent's localState.  A few of the places that it tries to protect are:

- (a *Agent) AddService(...)      # part of the method
- (c *Command) handleReload(...)  # almost the whole method
- (l *localState) antiEntropy(...)# isPaused() prevents syncChanges()

The main problem is, that in the middle of handleReload(...)'s
critical section it indirectly (loadServices()) calls  AddService(...).
AddService() in turn calls Pause() to protect itself against
syncChanges(). At the end of AddService() a defered call to Resume() is
made.

With the current implementation, this releases
isPaused() "lock" in the middle of handleReload() allowing antiEntropy
to kick in while configuration reload is still in progress.
Specifically almost all services and probably all check are unloaded
when syncChanges() is allowed to run.

This in turn can causes massive service/check de-/re-registration,
and since checks are by default registered in the critical state,
a majority of services on a node can be marked as failing.
It's made worse with automation, often calling `consul reload` in close
proximity on many nodes in the cluster.

This change basically turns Pause()/Resume() into P()/V() of
a garden-variety semaphore. Allowing Pause() to be called multiple times,
and releasing isPaused() only after all matching/defered Resumes() are
called as well.

TODO/NOTE: as with many semaphore implementations, it might be reasonable
to panic() if l.paused ever becomes negative.
2015-09-11 18:28:06 +02:00
Wojciech Bederski
24bc17eaa1 failing test showing that nested Pause()/Resume() release too early
see: #1173 / https://github.com/hashicorp/consul/issues/1173
2015-09-11 17:52:57 +02:00
James Phillips
f5d7397a2a Merge pull request #1233 from hashicorp/b-maint-test
Adds missing token to maint unit test.
2015-09-10 15:07:44 -07:00
James Phillips
27c59f7b30 Adds missing token to maint unit test. 2015-09-10 14:53:00 -07:00
Ryan Uber
897f8a3ed5 Update CHANGELOG.md 2015-09-10 12:31:13 -07:00
Ryan Uber
1908c16f53 Merge pull request #1230 from hashicorp/f-maintfix
Respect tokens in maintenance mode
2015-09-10 12:30:07 -07:00
Ryan Uber
039938a7e0 agent: testing node/service maintenance using tokens 2015-09-10 12:08:08 -07:00
Ryan Uber
125d7fd4ee agent: thread tokens through for maintenance mode 2015-09-10 11:43:59 -07:00
Ryan Breen
34f98f7bdf Merge pull request #1222 from 42wim/node-aaaa-queries
Allow AAAA queries for nodeLookup
2015-09-08 11:01:49 -04:00
Wim
0bc4d9322e Allow AAAA queries for nodeLookup 2015-09-08 16:54:36 +02:00
Ryan Breen
446c640c17 Merge pull request #1217 from 42wim/fix-rfc2308-part3
No NXDOMAIN when the answer is empty
2015-09-04 10:42:38 -04:00
Ryan Breen
9d244051fc Merge pull request #1218 from hashicorp/b-typo
Fixes a typo in the telemetry docs.
2015-09-03 10:08:39 -04:00
James Phillips
1387aba91b Fixes a typo in the telemetry docs. 2015-09-02 21:37:31 -07:00
Armon Dadgar
5cb6ab625e Merge pull request #1214 from zendesk/fix_lock_race_2
lock.go: fix another race condition
2015-09-02 16:04:55 -07:00
James Phillips
69fc672938 Merge pull request #1216 from hashicorp/sethvargo/update_middleman
Update Middleman
2015-09-02 09:11:10 -07:00
Seth Vargo
5da996067e Update Middleman 2015-09-02 10:14:06 -04:00
Wim
2701bb5cc2 No NXDOMAIN when the answer is empty 2015-09-02 16:12:22 +02:00
Ryan Breen
80d26f9156 Merge pull request #1167 from railsguru/master
Add -http-port option to change the HTTP API port
2015-09-02 01:15:55 -04:00
Ryan Uber
4e664da433 Merge pull request #1215 from hashicorp/f-ui-endpoint
agent: Always enable the UI endpoints
2015-09-01 21:31:47 -07:00
Andy Lo-A-Foe
bb5422af14 Position it alphabetically 2015-09-02 06:28:55 +02:00
Andy Lo-A-Foe
00b906774b Update agent options section on the website 2015-09-02 05:36:09 +02:00
Armon Dadgar
52a8a95af9 agent: Always enable the UI endpoints 2015-09-01 18:28:32 -07:00
Ryan Breen
1e5aa54ca3 Merge pull request #1194 from 42wim/fix-maxServiceResponses
Limit the DNS responses after getting the NodeRecords (fixes 0 A/AAAA responses)
2015-09-01 17:41:39 -04:00
Michael S. Fischer
43ab372a18 lock.go: fix another race condition
The previous fix to `consul lock` (commit 6875e8d) didn't completely
eliminate the race that could occur if the lock was acquired around the
same time SIGTERM was received:  It was still possible for
Run() to spawn the process via startChild() after killChild() had
released the shared mutex.

Now, when SIGTERM is received, we acquire a mutex that prevents
spawning a new process and never release it.

We've tested this fix pretty thoroughly and believe it completely
resolves the issue.
2015-09-01 14:27:23 -07:00
Wim
4a1dc90cba Limit the DNS responses after getting the NodeRecords 2015-09-01 23:23:05 +02:00
Ryan Breen
f41b79eff2 Merge pull request #1195 from 42wim/fix-rfc2308-part2
Return SOA/NXDOMAIN when the answer is empty
2015-09-01 17:08:31 -04:00
Ryan Breen
ae128ef30f Merge pull request #1211 from kikitux/master
add consul-do to community tools
2015-09-01 16:57:35 -04:00
Alvaro Miranda
54c9fd8403 Update downloads_tools.html.erb 2015-09-02 08:50:57 +12:00
Wim
369982270d Return SOA/not found when the answer is empty 2015-09-01 22:28:12 +02:00
Ryan Breen
f3d6fef82b Merge pull request #1213 from mainframe/nodefabric-patch-1
Adding NodeFabric reference to Community Tools
2015-09-01 16:17:39 -04:00
Andres Toomsalu
3e46d8a7fe Adding NodeFabric reference to Community Tools 2015-09-01 23:09:34 +03:00
Alvaro Miranda
13b9ff6330 add consul-do to community tools
adding consul-do Do something based on leadership status

https://github.com/zeroXten/consul-do

From README.md

Useful for running cronjobs in HA mode.

Run something like this on two or more servers:

* * * * * /usr/bin/consul-do JOB-1 $(/bin/hostname) && /path/to/job1
*/10 * * * * /usr/bin/consul-do JOB-2 $(/bin/hostname) && /path/to/job2
Only one of the servers will be elected leader and will therefore run the job. Should the leader fail, a follower will take over.
2015-09-02 00:52:20 +12:00
James Phillips
26ce9d16be Merge pull request #1200 from ryotarai/lock-pass-stdin
command/lock: Pass stdin to child process when -pass-stdin passed.
2015-08-31 21:14:45 -07:00
James Phillips
4746c0f1e5 Removes incorrect protocol version in change log. 2015-08-31 21:11:50 -07:00
Ryota Arai
b2755d026e website: description of -pass-stdin option 2015-09-01 11:00:26 +09:00
James Phillips
3a3e12a4ea Merge pull request #1209 from hashicorp/f-downgrade
Bumps protocol version back down as we've made memberlist smarter.
2015-08-31 14:52:27 -07:00
James Phillips
af7d2cb596 Bumps protocol version back down as we've made memberlist smarter. 2015-08-31 11:16:34 -07:00
Ryan Breen
8e8526de8f Cleanup for guides/forwarding.html 2015-08-30 12:01:49 -04:00
Ryan Breen
4b57e74bf8 Merge pull request #1204 from tamsky/docs/forwarding-dnsmasq-example
add a dnsmasq example, explain the utility of 'recursors'
2015-08-30 01:17:21 -04:00
Marc Tamsky
0db9346ecc Explain 'recursors' behavior with an example. 2015-08-28 18:27:26 -07:00
Marc Tamsky
b71a51e277 add dnsmasq example, add pointer to 'recursors' 2015-08-28 18:10:37 -07:00
Ryan Uber
4adc0b5c66 website: document precedence of Atlas endpoint inputs 2015-08-27 17:54:56 -07:00
Ryan Uber
6ac89c28a8 Update CHANGELOG.md 2015-08-27 13:35:24 -07:00
Ryan Uber
11e4cfd72b agent: reload SCADA client if endpoint changes 2015-08-27 13:29:07 -07:00
Ryan Uber
3b3fb7f8c9 Merge pull request #1201 from hashicorp/f-atlas-endpoint
Configurable Atlas endpoint
2015-08-27 12:05:36 -07:00
Ryan Uber
b8e82eee1c website: document atlas endpoint config 2015-08-27 11:31:29 -07:00
Ryan Uber
692e9078cb website: rebundle 2015-08-27 11:29:47 -07:00
Ryan Uber
c468acf222 command: atlas endpoint can be passed 2015-08-27 11:11:05 -07:00
Ryan Uber
1cc2429364 agent: atlas_endpoint is configurable 2015-08-27 11:08:01 -07:00
Ryan Uber
b0fcb6c234 Merge pull request #1199 from hashicorp/f-scada-reload
SCADA client is reload-able
2015-08-26 11:46:50 -07:00