Commit Graph

3692 Commits

Author SHA1 Message Date
Sean Chittenden bc62de541c Update comments to reflect reality 2016-03-23 22:10:50 -07:00
Sean Chittenden d2d55f4bb0 Remove additional cruft from ServerManager's channels
No longer needed code.
2016-03-23 22:10:50 -07:00
Sean Chittenden d13e3c18c9 Emulate a TryLock using atomic.CompareAndSwap
Prevent possible queueing behind serverConfigLock in the event that a server fails on a busy host.
2016-03-23 22:10:50 -07:00
Sean Chittenden 295af01680 Make use of interfaces
Use an interface instead of serf.Serf as arg to NewServerManager.  Bonus points for improved testability.

Pointed out by: @slackpad
2016-03-23 22:10:50 -07:00
Sean Chittenden fdbb142c3f Simplify error handling
Rely on Serf for liveliness.  In the event of a failure, simply cycle the server to the end of the list.  If the server is unhealthy, Serf will reap the dead server.

Additional simplifications:

*) Only rebalance servers based on timers, not when a new server is readded to the cluster.
*) Back out the failure count in server_details.ServerDetails
2016-03-23 22:10:50 -07:00
Sean Chittenden c2c73bfeab Unbreak client tests by reverting to original test
Debugging code crept into the actual test and hung out for much longer than it should have.
2016-03-23 22:10:50 -07:00
Sean Chittenden 0c87463b7e Introduce asynchronous management of consul server lists
Instead of blocking the RPC call path and performing a potentially expensive calculation (including a call to `c.LANMembers()`), introduce a channel to request a rebalance.  Some events don't force a reshuffle, instead the extend the duration of the current rebalance window because the environment thrashed enough to redistribute a client's load.
2016-03-23 22:10:50 -07:00
Sean Chittenden bad6cb8897 Comment nits 2016-03-23 22:10:50 -07:00
Sean Chittenden bf8c860663 Update Serf to include `serf.NumNodes()` 2016-03-23 22:10:50 -07:00
Sean Chittenden 74bcbc63f8 Use saveServerConfig vs atomic.Value.Store(config) 2016-03-23 22:10:50 -07:00
Sean Chittenden 7f55931d02 Commit a handful of refactoring && copy/paste-o fixes 2016-03-23 22:10:50 -07:00
Sean Chittenden e53704b032 Mutate copies of serverCfg.servers, not original
Removing any ambiguity re: ownership of the mutated server lists is a win for maintenance and debugging.
2016-03-23 22:10:50 -07:00
Sean Chittenden e6c27325d9 rebalanceTimer may be nil during initialization
When first starting the server manager, it's possible that the rebalanceTimer in serverConfig will be nil, test accordingly.
2016-03-23 22:10:50 -07:00
Sean Chittenden a7091b0837 Properly retain a pointer to the rebalanceTimer 2016-03-23 22:10:50 -07:00
Sean Chittenden 00ff8e5307 Cosmetic and various other wordsmithing cleanups 2016-03-23 22:10:50 -07:00
Sean Chittenden b4db49a62e Document the various functions and their locking 2016-03-23 22:10:50 -07:00
Sean Chittenden 9eb6481d73 Use config convenience method to get config
'cause ELETTHECOMPILERSDOTHEWORK.  I don't need that cluttering up the subconscious with more complexity.
2016-03-23 22:10:50 -07:00
Sean Chittenden 579e536f58 Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 22:10:50 -07:00
Sean Chittenden c7c551dbe0 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 22:10:50 -07:00
Sean Chittenden e48b910f87 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 22:10:50 -07:00
Sean Chittenden 01b637114c Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 22:10:50 -07:00
Sean Chittenden 117c65dc55 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 22:10:32 -07:00
Sean Chittenden 0eac826573 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 22:09:46 -07:00
Sean Chittenden b9e5588620 Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 22:05:29 -07:00
Sean Chittenden a482eaef70 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 22:05:05 -07:00
Sean Chittenden 9b8767aa67 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 22:03:20 -07:00
Sean Chittenden 075d1b628f Commit miss re: consuls variable rename 2016-03-23 16:24:29 -07:00
Sean Chittenden 2ca4cc58ce Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 16:16:22 -07:00
Sean Chittenden 0925b26250 Refactor consul.serverParts into server_details.ServerDetails
This may be short-lived, but it also seems like this is going to lead us down a path where ServerDetails is going to evolve into a more powerful package that will encapsulate more behavior behind a coherent API.
2016-03-23 16:15:47 -07:00
Sean Chittenden 5be956c310 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 16:15:47 -07:00
Sean Chittenden b1e392405c Handle the case where there are no healthy servers
Pointed out by: @slackpad
2016-03-23 16:15:47 -07:00
Sean Chittenden d4ca349e21 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 16:15:47 -07:00
Sean Chittenden 7b308d8d7e Add a flag to denote that a server is disabled
A server is not normally disabled, but in the event of an RPC error, we want to mark a server as down to allow for fast failover to a different server.  This value must be an int in order to support atomic operations.

Additionally, this is the preliminary work required to bring up a server in a disabled state.  RPC health checks in the future could mark the server as alive, thereby creating an organic "slow start" feature for Consul.
2016-03-23 16:14:59 -07:00
Sean Chittenden 6af781d9d5 Rename `lastServer` to `preferredServer`
Expanding the domain of lastServer beyond RPC() changes the meaning of this variable.  Rename accordingly to match the intent coming in a subsequent commit: a background thread will be in charge of rotating preferredServer.
2016-03-23 16:14:59 -07:00
Sean Chittenden fb0bfcc3cf Introduce GOTEST_FLAGS to conditionally add -v to go test
Trivial change that makes it possible for developers to set an environment variable and change the output of `go test` to be detailed (i.e. `GOTEST_FLAGS=-v`).
2016-03-23 16:14:11 -07:00
Sean Chittenden f6ffbf4e96 Warn if serf events have queued up past 80% of the limit
It is theoretically possible that the number of queued serf events can back up.  If this happens, emit a warning message if there are more than 200 events in queue.

Most notably, this can happen if `c.consulServerLock` is held for an "extended period of time".  The probability of anyone ever seeing this log message is hopefully low to nonexistent, but if it happens, the warning message indicating a large number of serf events fired while a lock was held is likely to be helpful (vs serf mysteriously blocking when attempting to add an event to a channel).
2016-03-23 16:14:11 -07:00
Sean Chittenden 54016f5276 Commit miss re: consuls variable rename 2016-03-23 16:13:49 -07:00
Sean Chittenden f9aa968bf4 Remove lastRPCTime
This mechanism isn't going to provide much value in the future.  Preemptively reduce the complexity of future work.
2016-03-23 16:13:49 -07:00
Sean Chittenden cc86eb0a1a Rename c.consuls to c.consulServers
Prep for breaking out maintenance of consuls into a new goroutine.
2016-03-23 16:10:27 -07:00
Sean Chittenden a92cda7bcd Fix whitespace alignment in a comment 2016-03-23 16:00:39 -07:00
Sean Chittenden 146c5b0a59 Use `rand.Int31n()` to get power of two optimization
In cases where i+1 is a power of two, skip one modulo operation.
2016-03-23 16:00:39 -07:00
James Phillips 77ad084229 Fixes JSON in wildcard query example. 2016-03-23 14:33:20 -07:00
James Phillips d3da5efdb2 Merge pull request #1865 from hashicorp/f-upgrade-boltdb
Updates BoltDB to v1.2.0 release.
2016-03-22 08:53:04 -07:00
James Phillips 351778eabb Updates BoltDB to v1.2.0 release. 2016-03-22 08:36:46 -07:00
James Phillips 77eb95ddd8 Merge pull request #1861 from hashicorp/b-flaky-test
Widens coordinate update sleeps in unit tests.
2016-03-21 18:24:05 -07:00
James Phillips cd7b3d4b49 Widens coordinate update sleeps in unit tests. 2016-03-21 18:23:11 -07:00
James Phillips 520d3eb375 Merge pull request #1854 from talonx/master
Added help text for -dev option #1804
2016-03-21 18:03:34 -07:00
James Phillips 90df1182ec Merge pull request #1860 from hashicorp/b-flaky-sort
Gets rid of flaky sort check.
2016-03-21 17:31:17 -07:00
James Phillips 8a9ad3811b Gets rid of flaky sort check.
If we get a coordinate then this test will fail, so we only check the
first item in the list, which is deterministic.
2016-03-21 17:30:05 -07:00
James Phillips 8211c34924 Merge pull request #1859 from hashicorp/b-flaky-coord-tests
Increases timeouts for coordinate tests.
2016-03-21 17:10:01 -07:00