Commit Graph

2834 Commits

Author SHA1 Message Date
James Phillips 0b05dbeb21 Merge pull request #1235 from wuub/master
fix conflict between handleReload and antiEntropy critical sections
2015-09-17 07:28:39 -07:00
Wojciech Bederski c4537ed26f panic when unbalanced localState.Resume() is detected 2015-09-17 11:32:08 +02:00
Ryan Breen 1282a6ff95 Merge pull request #1242 from dwijnand/fix-typos
Fix a bunch of typos.
2015-09-15 08:41:22 -04:00
Dale Wijnand 5a28ebcaa3 Fix a bunch of typos. 2015-09-15 13:22:08 +01:00
James Phillips e029702e39 Merge pull request #1240 from hashicorp/sethvargo/link_warnings
Fix link warnings
2015-09-14 10:51:59 -07:00
Seth Vargo 7b4dcad487 Fix link warnings 2015-09-14 18:48:51 +01:00
R.B. Boyer 8b072467eb Correct the Session.Renew{,Periodic} to handle session expiration better 2015-09-14 08:52:32 -05:00
James Phillips 2f9ebdb135 Merge pull request #1187 from sfncook/enable_tag_drift_03
Enable tag drift 03
2015-09-11 15:35:32 -07:00
Shawn Cook 1f330add02 Doc changes in response to review. 2015-09-11 15:26:30 -07:00
Shawn Cook 598526eba2 Docs - add verbage to anti-entropy page. 2015-09-11 14:27:54 -07:00
Ryan Breen 0722c1521a Merge pull request #1236 from scalp42/typos
remove various typos
2015-09-11 15:47:32 -04:00
Anthony Scalisi 10e028d599 remove various typos 2015-09-11 12:29:54 -07:00
Shawn Cook 4caf049c4c Update documentation for service definition 2015-09-11 09:32:54 -07:00
Ryan Breen e347f20790 Merge pull request #1234 from jovandeginste/quote-variable
Add quotes to locations in case pwd contains spaces
2015-09-11 12:31:47 -04:00
Wojciech Bederski b014c0f91b make Pause()/Resume()/isPaused() behave more like a semaphore
see: https://github.com/hashicorp/consul/issues/1173 #1173

Reasoning: somewhere during consul development Pause()/Resume() and
PauseSync()/ResumeSync() were added to protect larger changes to
agent's localState.  A few of the places that it tries to protect are:

- (a *Agent) AddService(...)      # part of the method
- (c *Command) handleReload(...)  # almost the whole method
- (l *localState) antiEntropy(...)# isPaused() prevents syncChanges()

The main problem is, that in the middle of handleReload(...)'s
critical section it indirectly (loadServices()) calls  AddService(...).
AddService() in turn calls Pause() to protect itself against
syncChanges(). At the end of AddService() a defered call to Resume() is
made.

With the current implementation, this releases
isPaused() "lock" in the middle of handleReload() allowing antiEntropy
to kick in while configuration reload is still in progress.
Specifically almost all services and probably all check are unloaded
when syncChanges() is allowed to run.

This in turn can causes massive service/check de-/re-registration,
and since checks are by default registered in the critical state,
a majority of services on a node can be marked as failing.
It's made worse with automation, often calling `consul reload` in close
proximity on many nodes in the cluster.

This change basically turns Pause()/Resume() into P()/V() of
a garden-variety semaphore. Allowing Pause() to be called multiple times,
and releasing isPaused() only after all matching/defered Resumes() are
called as well.

TODO/NOTE: as with many semaphore implementations, it might be reasonable
to panic() if l.paused ever becomes negative.
2015-09-11 18:28:06 +02:00
Jo Vandeginste a626ae1892 Add quotes to locations in case pwd contains spaces 2015-09-11 18:19:22 +02:00
Wojciech Bederski 24bc17eaa1 failing test showing that nested Pause()/Resume() release too early
see: #1173 / https://github.com/hashicorp/consul/issues/1173
2015-09-11 17:52:57 +02:00
Shawn Cook 66fd8fb2a0 Rename EnableTagOverride and update formatting 2015-09-11 08:35:29 -07:00
Shawn Cook d7ce0b3c6b Remove debug lines 2015-09-11 08:32:59 -07:00
James Phillips f5d7397a2a Merge pull request #1233 from hashicorp/b-maint-test
Adds missing token to maint unit test.
2015-09-10 15:07:44 -07:00
Shawn Cook 0b3faf6e4a Merge remote-tracking branch 'hashicorp/master' into enable_tag_drift_03 2015-09-10 14:55:30 -07:00
James Phillips 27c59f7b30 Adds missing token to maint unit test. 2015-09-10 14:53:00 -07:00
Shawn Cook 35f276f25d Add test cases TestAgentAntiEntropy_EnableTagDrift 2015-09-10 14:08:16 -07:00
Ryan Uber 897f8a3ed5 Update CHANGELOG.md 2015-09-10 12:31:13 -07:00
Ryan Uber 1908c16f53 Merge pull request #1230 from hashicorp/f-maintfix
Respect tokens in maintenance mode
2015-09-10 12:30:07 -07:00
Ryan Uber 039938a7e0 agent: testing node/service maintenance using tokens 2015-09-10 12:08:08 -07:00
Ryan Uber 125d7fd4ee agent: thread tokens through for maintenance mode 2015-09-10 11:43:59 -07:00
Ryan Breen 34f98f7bdf Merge pull request #1222 from 42wim/node-aaaa-queries
Allow AAAA queries for nodeLookup
2015-09-08 11:01:49 -04:00
Wim 0bc4d9322e Allow AAAA queries for nodeLookup 2015-09-08 16:54:36 +02:00
Ryan Breen 446c640c17 Merge pull request #1217 from 42wim/fix-rfc2308-part3
No NXDOMAIN when the answer is empty
2015-09-04 10:42:38 -04:00
Ryan Breen 9d244051fc Merge pull request #1218 from hashicorp/b-typo
Fixes a typo in the telemetry docs.
2015-09-03 10:08:39 -04:00
James Phillips 1387aba91b Fixes a typo in the telemetry docs. 2015-09-02 21:37:31 -07:00
Armon Dadgar 5cb6ab625e Merge pull request #1214 from zendesk/fix_lock_race_2
lock.go: fix another race condition
2015-09-02 16:04:55 -07:00
James Phillips 69fc672938 Merge pull request #1216 from hashicorp/sethvargo/update_middleman
Update Middleman
2015-09-02 09:11:10 -07:00
Seth Vargo 5da996067e Update Middleman 2015-09-02 10:14:06 -04:00
Wim 2701bb5cc2 No NXDOMAIN when the answer is empty 2015-09-02 16:12:22 +02:00
Ryan Breen 80d26f9156 Merge pull request #1167 from railsguru/master
Add -http-port option to change the HTTP API port
2015-09-02 01:15:55 -04:00
Ryan Uber 4e664da433 Merge pull request #1215 from hashicorp/f-ui-endpoint
agent: Always enable the UI endpoints
2015-09-01 21:31:47 -07:00
Andy Lo-A-Foe bb5422af14 Position it alphabetically 2015-09-02 06:28:55 +02:00
Andy Lo-A-Foe 00b906774b Update agent options section on the website 2015-09-02 05:36:09 +02:00
Armon Dadgar 52a8a95af9 agent: Always enable the UI endpoints 2015-09-01 18:28:32 -07:00
Ryan Breen 1e5aa54ca3 Merge pull request #1194 from 42wim/fix-maxServiceResponses
Limit the DNS responses after getting the NodeRecords (fixes 0 A/AAAA responses)
2015-09-01 17:41:39 -04:00
Michael S. Fischer 43ab372a18 lock.go: fix another race condition
The previous fix to `consul lock` (commit 6875e8d) didn't completely
eliminate the race that could occur if the lock was acquired around the
same time SIGTERM was received:  It was still possible for
Run() to spawn the process via startChild() after killChild() had
released the shared mutex.

Now, when SIGTERM is received, we acquire a mutex that prevents
spawning a new process and never release it.

We've tested this fix pretty thoroughly and believe it completely
resolves the issue.
2015-09-01 14:27:23 -07:00
Wim 4a1dc90cba Limit the DNS responses after getting the NodeRecords 2015-09-01 23:23:05 +02:00
Ryan Breen f41b79eff2 Merge pull request #1195 from 42wim/fix-rfc2308-part2
Return SOA/NXDOMAIN when the answer is empty
2015-09-01 17:08:31 -04:00
Ryan Breen ae128ef30f Merge pull request #1211 from kikitux/master
add consul-do to community tools
2015-09-01 16:57:35 -04:00
Alvaro Miranda 54c9fd8403 Update downloads_tools.html.erb 2015-09-02 08:50:57 +12:00
Wim 369982270d Return SOA/not found when the answer is empty 2015-09-01 22:28:12 +02:00
Ryan Breen f3d6fef82b Merge pull request #1213 from mainframe/nodefabric-patch-1
Adding NodeFabric reference to Community Tools
2015-09-01 16:17:39 -04:00
Andres Toomsalu 3e46d8a7fe Adding NodeFabric reference to Community Tools 2015-09-01 23:09:34 +03:00