689 Commits

Author SHA1 Message Date
Wojciech Bederski
b014c0f91b make Pause()/Resume()/isPaused() behave more like a semaphore
see: https://github.com/hashicorp/consul/issues/1173 #1173

Reasoning: somewhere during consul development Pause()/Resume() and
PauseSync()/ResumeSync() were added to protect larger changes to
agent's localState.  A few of the places that it tries to protect are:

- (a *Agent) AddService(...)      # part of the method
- (c *Command) handleReload(...)  # almost the whole method
- (l *localState) antiEntropy(...)# isPaused() prevents syncChanges()

The main problem is, that in the middle of handleReload(...)'s
critical section it indirectly (loadServices()) calls  AddService(...).
AddService() in turn calls Pause() to protect itself against
syncChanges(). At the end of AddService() a defered call to Resume() is
made.

With the current implementation, this releases
isPaused() "lock" in the middle of handleReload() allowing antiEntropy
to kick in while configuration reload is still in progress.
Specifically almost all services and probably all check are unloaded
when syncChanges() is allowed to run.

This in turn can causes massive service/check de-/re-registration,
and since checks are by default registered in the critical state,
a majority of services on a node can be marked as failing.
It's made worse with automation, often calling `consul reload` in close
proximity on many nodes in the cluster.

This change basically turns Pause()/Resume() into P()/V() of
a garden-variety semaphore. Allowing Pause() to be called multiple times,
and releasing isPaused() only after all matching/defered Resumes() are
called as well.

TODO/NOTE: as with many semaphore implementations, it might be reasonable
to panic() if l.paused ever becomes negative.
2015-09-11 18:28:06 +02:00
Wojciech Bederski
24bc17eaa1 failing test showing that nested Pause()/Resume() release too early
see: #1173 / https://github.com/hashicorp/consul/issues/1173
2015-09-11 17:52:57 +02:00
Ryan Uber
1908c16f53 Merge pull request #1230 from hashicorp/f-maintfix
Respect tokens in maintenance mode
2015-09-10 12:30:07 -07:00
Ryan Uber
039938a7e0 agent: testing node/service maintenance using tokens 2015-09-10 12:08:08 -07:00
Ryan Uber
125d7fd4ee agent: thread tokens through for maintenance mode 2015-09-10 11:43:59 -07:00
Wim
0bc4d9322e Allow AAAA queries for nodeLookup 2015-09-08 16:54:36 +02:00
Wim
2701bb5cc2 No NXDOMAIN when the answer is empty 2015-09-02 16:12:22 +02:00
Ryan Breen
80d26f9156 Merge pull request #1167 from railsguru/master
Add -http-port option to change the HTTP API port
2015-09-02 01:15:55 -04:00
Armon Dadgar
52a8a95af9 agent: Always enable the UI endpoints 2015-09-01 18:28:32 -07:00
Wim
4a1dc90cba Limit the DNS responses after getting the NodeRecords 2015-09-01 23:23:05 +02:00
Ryan Breen
f41b79eff2 Merge pull request #1195 from 42wim/fix-rfc2308-part2
Return SOA/NXDOMAIN when the answer is empty
2015-09-01 17:08:31 -04:00
Wim
369982270d Return SOA/not found when the answer is empty 2015-09-01 22:28:12 +02:00
Ryan Uber
11e4cfd72b agent: reload SCADA client if endpoint changes 2015-08-27 13:29:07 -07:00
Ryan Uber
c468acf222 command: atlas endpoint can be passed 2015-08-27 11:11:05 -07:00
Ryan Uber
1cc2429364 agent: atlas_endpoint is configurable 2015-08-27 11:08:01 -07:00
Ryan Uber
5ad8bfbd41 agent: log a message when making a new scada connection 2015-08-25 21:03:16 -07:00
Ryan Uber
4b715a7d2c agent: don't reload scada client if there is no config change 2015-08-25 20:43:57 -07:00
Ryan Uber
ed70720d55 agent: testing scada client creation in command 2015-08-25 20:22:22 -07:00
Ryan Uber
52a7206ff3 agent: test scada HTTP server creation 2015-08-25 18:51:04 -07:00
Ryan Uber
eb8974160f agent: clean up scada connection manager 2015-08-25 18:27:07 -07:00
Ryan Uber
87c1e4fcd3 agent: document the scada http creation func 2015-08-25 17:19:11 -07:00
Ryan Uber
2e6ccded2c agent: scada client and HTTP server are tracked separately 2015-08-25 16:59:53 -07:00
Andy Lo-A-Foe
85321301e1 Remove duplicate code 2015-08-20 20:46:20 +02:00
Andy Lo-A-Foe
3e046d3efc Use Ports.HTTP directly 2015-08-20 20:27:20 +02:00
Andy Lo-A-Foe
4e2c3373bc Add documentation for http-port option 2015-08-20 20:19:35 +02:00
Ryan Uber
05216d3cc4 agent: log network address of DNS clients 2015-08-11 10:33:27 -07:00
Andy Lo-A-Foe
7b5da2a240 Add -http-port option to change the HTTP API port
This is useful when pushing consul to PaaS like
Cloud Foundry making the HTTP API easily routable.
2015-08-11 14:14:21 +02:00
Ryan Breen
018fd69aa2 Merge pull request #1143 from hashicorp/GH-1142
Check NXDOMAIN after filtering nodes
2015-07-29 18:56:08 -04:00
Ryan Breen
0a7dc85076 Test for GH-1142. 2015-07-29 18:21:16 -04:00
Armon Dadgar
0363d4b54b Merge pull request #1137 from 42wim/fix-1124
Recurse when PTR answer is empty
2015-07-29 14:39:04 -07:00
Ryan Breen
42648438a0 Check NXDOMAIN after filtering nodes
Move the check for NXDOMAIN below the service health filter.
2015-07-29 17:16:48 -04:00
Ryan Uber
93c9c87f7a Merge pull request #1141 from hashicorp/f-travis
Try moving to newer Travis-CI infrastructure
2015-07-28 10:42:56 -07:00
Ryan Uber
40f3e3fae7 travis-ci: skip syslog tests for container-based travis infra 2015-07-28 09:58:43 -07:00
Wim
5647a37ffe Recurse when PTR answer is empty 2015-07-27 23:22:36 +02:00
Armon Dadgar
4a9b91f2a2 Merge pull request #1130 from pdf/check_socket
Add Socket check type
2015-07-27 14:21:24 -07:00
Ryan Uber
a6317f2fb2 Merge pull request #1090 from hashicorp/f-keyring-acl
Keyring ACLs
2015-07-24 10:23:18 -07:00
Peter Fern
b023904298 Add TCP check type
Adds the ability to simply check whether a TCP socket accepts
connections to determine if it is healthy.  This is a light-weight -
though less comprehensive than scripting - method of checking network
service health.

The check parameter `tcp` should be set to the `address:port`
combination for the service to be tested.  Supports both IPv6 and IPv4,
in the case of a hostname that resolves to both, connections will be
attempted via both protocol versions, with the first successful
connection returning a successful check result.

Example check:

```json
{
  "check": {
    "id": "ssh",
    "name": "SSH (TCP)",
    "tcp": "example.com:22",
    "interval": "10s"
  }
}
```
2015-07-24 14:06:05 +10:00
Ryan Uber
7aa8539c10 agent: disable ACLs for RPC client tests 2015-07-23 17:09:33 -07:00
Ryan Uber
1bbdf3b03b agent: vet fixes 2015-07-14 11:42:51 -07:00
Ryan Uber
5682b715c4 Merge pull request #995 from 42wim/rfc2308-soa-ttl
Send SOA with negative responses (RFC2308)
2015-07-13 08:49:25 -07:00
Ryan Uber
79ac4f3512 agent: testing keyring ACLs 2015-07-07 15:14:06 -06:00
Ryan Uber
5c65bc7df2 agent: write-level keyring ACLs work 2015-07-07 10:36:51 -06:00
Ryan Uber
bffc0861cc agent: read-level keyring ACLs work 2015-07-07 10:30:34 -06:00
Ryan Uber
e37b5ecb69 Merge pull request #1046 from hashicorp/f-event-acl
Event ACLs
2015-07-02 07:02:07 -07:00
Ryan Uber
d0348d1291 Merge pull request #1004 from i0rek/advertise_addrs
Add advertise_addrs.
2015-06-23 12:32:07 -07:00
Hans Hasselberg
267e0caf81 Implement advertise_addrs for SerfLan, SerfWan and RPC.
Fixes #550.
This will make it possible to configure the advertised adresses for
SerfLan, SerfWan and RPC. It will enable multiple consul clients on a
single host which is very useful in a container environment.

This option might override advertise_addr and advertise_addr_wan
depending on the configuration.

It will be configureable with advertise_addrs. Example:

{
  "advertise_addrs": {
    "serf_lan": "10.0.120.91:4424",
    "serf_wan": "201.20.10.61:4423",
    "rpc": "10.20.10.61:4424"
  }
}
2015-06-23 21:23:45 +02:00
Ryan Uber
5bde81bcdc agent: avoid masking errors when ACLs deny a request 2015-06-18 18:13:29 -07:00
Ryan Uber
beb27fb3ef agent: testing user event endpoint ACLs 2015-06-18 18:13:29 -07:00
Ryan Uber
6e084f6897 consul: always fire events from server nodes 2015-06-18 18:13:29 -07:00
Ryan Uber
6f309c355f agent: enforce event policy during event fire 2015-06-18 18:13:29 -07:00