Commit Graph

1195 Commits

Author SHA1 Message Date
Ryan Uber 0c2ad07fa9 consul: use source parameter for near prepared queries 2016-06-30 12:11:20 -07:00
Ryan Uber d567d6a6d8 consul: send origin node + dc when executing prepared queries 2016-06-21 15:34:26 -07:00
Ryan Uber 03fea4b091 consul: test baked-in distance sort 2016-06-21 12:54:18 -07:00
Ryan Uber 4c1afb1bc6 consul: use the Near field instead of PreferLocal 2016-06-21 12:39:40 -07:00
James Phillips 09cfda47ed Merge pull request #2127 from hashicorp/b-remote-consuls-locking
Ensure locking of `Server`'s `remoteConsuls`.
2016-06-21 10:00:04 -07:00
James Phillips aa1bb5a012 Merge pull request #2131 from hashicorp/b-misc-microoptimizations
Misc micro optimizations
2016-06-21 09:59:01 -07:00
Sean Chittenden 3c197bad30
Ensure locking of `Server`'s `remoteConsuls`. 2016-06-20 22:59:49 -07:00
Sean Chittenden b3df5d3a87
Misc comment improvements 2016-06-20 15:29:38 -07:00
Sean Chittenden 65edc0a374
Initialize a non-empty number of Consul Datacenters. No functional change. 2016-06-20 15:26:59 -07:00
Sean Chittenden 223f605b1e
Prefer rand.Int31n() over rand.Int31(). 2016-06-20 15:26:27 -07:00
Sean Chittenden c3e54b79fd
Fix deadlock in Consul RTT.
- consul/rtt.go:388: s.getDatacentersByDistance().  Acquires RLock()
- consul/rtt.go:341: sortDatacentersByDistance() RLock still held.
- consul/rtt.go:282: getDatacenterDistance() RLock still held.
- consul/rtt.go:268: getNodesForDatacenter(). Attempts to reacquire RLock(), hangs indefinitely.
2016-06-20 14:59:54 -07:00
Ryan Uber 100a46727f consul: test raw PreferLocal functionality 2016-06-20 14:53:13 -07:00
Ryan Uber 3797e6544c consul: support PreferLocal in PQ's 2016-06-20 14:24:40 -07:00
Sean Chittenden e9a2f5b40c
Chase casting types.CheckID to a string into the state_store.
It turns out the indexer can only use strings as arguments when
creating a query.  Cast `types.CheckID` to a `string` before calling
into `memdb`.

Ideally the indexer would be smart enough to do this at compile-time,
but I need to look into how to do this without reflection and the
runtime package.  For the time being statically cast `types.CheckID`
to a `string` at the call sites.
2016-06-07 16:59:02 -04:00
Sean Chittenden 63adcbd5ef
Revert "Move `structs.CheckID` to a new top-level package, `types`."
This reverts commit 2bbd52e3b44ff1b60939a8400264d534662d6d51.
2016-06-07 16:59:02 -04:00
Sean Chittenden cbb945e76a
Move `structs.CheckID` to a new top-level package, `types`.
Per discussion w/ @slackpad, move this type to its own top-level package
2016-06-07 16:59:02 -04:00
Sean Chittenden f5ab25163e
Move `structs.CheckID` to a new top-level package, `types`.
Per discussion w/ @slackpad, move this type to its own top-level package
2016-06-07 16:59:02 -04:00
Sean Chittenden ddbe64a8c8
Float a type balloon. Some strings are square pegs in round holes.
This experiment was brought about because of variable naming
confusion where name and checkIDs were interchanged.  Gave CheckID
an Qualified Type Name and chased downstream changes.
2016-06-07 16:59:02 -04:00
James Phillips 0f5aabcbbd Merge pull request #2028 from hashicorp/f-atomic-kv
Adds support for atomic transactions spanning multiple KV entries.
2016-05-15 13:46:05 -07:00
Sean Chittenden 02b5b53c92
Remove unused peers variable from setupRaft(). 2016-05-15 06:40:46 -07:00
James Phillips 778b975e7a Adds a get-tree verb to KV transaction operations. 2016-05-13 16:57:39 -07:00
James Phillips 4bbaf1cd15 Switches GETs to a filtering model for ACLs. 2016-05-13 15:58:55 -07:00
James Phillips fbfb90a694 Removes null results for deletes, and preps for more than one result from an operation. 2016-05-13 01:47:55 -07:00
James Phillips a37bf9de56 Adds a read-only optimized path for transactions. 2016-05-13 00:34:05 -07:00
James Phillips 9443c6ba71 Adds a comment for the txnKVS() function. 2016-05-12 16:11:26 -07:00
James Phillips 04d99cd702 Makes get fail a transaction if the key doesn't exist. 2016-05-11 14:18:31 -07:00
James Phillips 4882a9fe43 De-nests the KV output structure (removes DirEnt member). 2016-05-11 13:48:03 -07:00
James Phillips 960b9d6fb6 Switches to "KV" instead of "KV" for the KV operations. 2016-05-11 10:58:27 -07:00
James Phillips 38d0f6676f Refactors TxnRequest/TxnResponse into a form that will allow non-KV ops.
This isn't needed/used yet, but it's a good hook to get in there so we
can add more atomic operations in the future. The Go API hides this detail
so that feels like a KV-specific API. The implications on the REST API are
pretty minimal.
2016-05-11 01:39:10 -07:00
James Phillips 69f58ad04a Moves txn code into a new endpoint, not specific to KV. 2016-05-10 21:58:02 -07:00
James Phillips 23545f97fe Fixes some go vet findings in a unit test. 2016-05-10 20:01:52 -07:00
Sean Chittenden fd63a81706
Remove stray type definition
Noticed while working on Nomad Client's server selection code.
2016-05-10 18:56:28 -07:00
James Phillips fcb0c20867 Adds internal endpoint read ACL support and full unit tests. 2016-05-10 11:23:47 -07:00
James Phillips 2f51926852 Adds an empty get test case. 2016-05-09 22:18:26 -07:00
James Phillips e491245062 Performs basic plumbing of KVS transactions through all the layers. 2016-05-09 22:15:49 -07:00
James Phillips b7ae973642 Adds state store support for atomic KVS ops. 2016-05-05 15:46:59 -07:00
James Phillips 6c2aeb25ab Splits existing KVS operations into *Txn helpers for later reuse. 2016-05-04 14:20:11 -07:00
James Phillips 9edca28203 Moves KVS-related state store code out into its own set of files. 2016-05-02 16:21:04 -07:00
Sean Chittenden 6cf21fbbe9 Add the list of Raft peers to Consul's Stats
```
% consul info
[snip]
raft:
[snip]
	raft_peers = 127.0.0.1:8300
[snip]
```

Poached from: Nomad Project
2016-04-28 15:08:48 -07:00
James Phillips 8c8b146f77 Merge pull request #1884 from mtchavez/1541-data-dir-perms
command: Data directory permission error message
2016-04-12 22:06:49 -07:00
James Phillips 0e5aa5d808 Merge pull request #1895 from shoenig/fixtypo
doc: fix trivial typo s/NewFSMPath/NewFSM/
2016-04-12 21:53:24 -07:00
James Phillips ed86e5cc72 Adds a clone method to HealthCheck and uses that in local.go. 2016-04-11 00:05:39 -07:00
Chavez 473908f636 Add description to rpc test client pool member failure message 2016-04-01 19:17:38 -07:00
Seth Hoenig c3637e6ed7 doc: fix trivial typo s/NewFSMPath/NewFSM/ 2016-03-29 20:52:17 -05:00
Sean Chittenden 7cdc056ca5 Rename server_details package to agent 2016-03-29 17:39:19 -07:00
Sean Chittenden 80f9f54bb3 Add a quick package doc for the servers package 2016-03-29 16:22:53 -07:00
Sean Chittenden 5501feecbe Rename serverConfig to serverList
serverList is a vastly more accurate name.  Chase accordingly.  No functional change other than types and APIs.
2016-03-29 16:17:16 -07:00
Sean Chittenden 6185f931d8 Gratuitous rename 1/2
Reduce cognative load and perform an overdue rename.  No functional change.

Rename the `server_manager` package to `servers`.  Rename the `ServerManager` package to `Manager`.  In `client`, rename `serverMgr` to `servers`.
2016-03-29 16:12:00 -07:00
Sean Chittenden 3666c87d2f Remove two unused constants 2016-03-29 11:11:41 -07:00
Sean Chittenden 299d058d54 Remove useless comment residual from decomposing functions 2016-03-29 10:53:00 -07:00
Sean Chittenden ec25ad84a6 EDYSLEXICMOMENT 2016-03-29 10:50:10 -07:00
Sean Chittenden b684b15d94 Refactor out recocileServerList anon function
Add testing to reconcileServerList and test various server sizes.

Test that a percentage of nodes fail their Ping (50% in testing atm)
2016-03-29 02:45:38 -07:00
Sean Chittenden c56f039884 Teach fauxConnPool to fail a pct of the time
50% failure rate seems legit as a starting point w/ 100 servers.
2016-03-28 14:53:29 -07:00
Sean Chittenden 43aed6376a Call NotifyFailedServers to rotate the server list 2016-03-28 14:12:41 -07:00
Sean Chittenden 6648d31579 Add log line re: server manager backing off and sleeping
This is useful in situations where the RPC rotate duration is greater than 1µs.  WTB exponential backoff of logging so we don't spam forever.
2016-03-28 14:04:04 -07:00
Sean Chittenden 544a263a62 Remove old debugging lines of questionable future value 2016-03-28 14:02:53 -07:00
Sean Chittenden e5b5078415 Shuffle in place
Don't create a copy and save the copy, not necessary any more.
2016-03-28 14:02:27 -07:00
Sean Chittenden a81eb95acc Nuke unnecessary comment
See above function comments for details
2016-03-28 13:57:36 -07:00
Sean Chittenden e237904d80 Move FIXME comment to the right call site 2016-03-28 13:49:55 -07:00
Sean Chittenden 3dfd9e02b7 Rename the ConnPoolPinger interface to Pinger 2016-03-28 13:46:01 -07:00
Sean Chittenden c9afc16d96 Return error from PingConsulServer
In order to report why a Ping failed, change the signature of PingConsulServers to include an error message.
2016-03-28 13:38:58 -07:00
Sean Chittenden 28dc6451d9 Change the definition of the ServerDetails struct key
Use only the serf Name for now.  Leaving the plumbing for now.
2016-03-28 12:53:19 -07:00
Sean Chittenden 62c38ecb39 Correct the comment to match reality 2016-03-28 12:32:30 -07:00
Sean Chittenden 8ac35d84f0 Rename serverCfg to sc for consistency 2016-03-28 12:06:26 -07:00
Sean Chittenden 664ab238fd Add a quick length check
Verify that AddServer behaved as expected
2016-03-28 11:38:12 -07:00
Sean Chittenden 4589c36308 Switch the order of ServerDetails.String()
It's more natrual to have the network first.  I think I flipped the order accidentally.
2016-03-28 11:37:25 -07:00
Sean Chittenden e8a2e56370 Move rebalance log statement from INFO to DEBUG 2016-03-27 01:32:04 -07:00
Sean Chittenden 060dd49a95 Chase the API bump re: refreshServerRebalanceTimer
If it works in prod, why shouldn't it work in the tests?
2016-03-27 00:04:52 -07:00
Sean Chittenden 3d00208351 Move initialization of the rebalanceTimer to New() 2016-03-27 00:03:48 -07:00
Sean Chittenden 5652c60971 Add a test for ConnPool.PingConsulServer
Spin up 5x servers, join and ping each server
2016-03-26 23:52:06 -07:00
Sean Chittenden efc1113406 Expose ServerManager.ResetRebalanceTimer
Move the rebalance timer from ServerManager.Start's stack to struct ServerManager.  This makes it possible to shuffle during tests without actually waiting >120s.
2016-03-26 23:41:01 -07:00
Sean Chittenden d275cac1a3 Logging improvements
Comment out noisly loggers for the time being.

Improve the final logging statement to be useful and hint what the next active server for the client is going to be.
2016-03-26 22:41:08 -07:00
Sean Chittenden 35ff2eb71a Standardize the log message based on the package
This log statement used to belong in the consul package but has since moved to the server manager package.
2016-03-26 22:29:00 -07:00
Sean Chittenden e327630523 Reduce the error level from Fatal when unit testing 2016-03-26 22:07:09 -07:00
Sean Chittenden 92c2e8e668 Start server rebalance task after init'ing Serf
Now that there is no longer an event loop driven directly by Serf, start the ServerManager task after Serf has been setup.  When testing and adjusting timers and timeouts to unreasonably low values, it's possible to tickle a race condition where Serf's NumNodes() would fail because Serf had not been initialized.
2016-03-26 22:04:41 -07:00
Sean Chittenden ee95c55d88 Catch up to a few renames 2016-03-26 19:32:11 -07:00
Sean Chittenden dc291e4027 Use empty string for addr in ServerDetails.String() 2016-03-26 19:30:04 -07:00
Sean Chittenden 970938c2dd Guard against a nil ServerDetails.Addr
It's not clear how or why this would ever be nil, but some of the unit tests produce a nil addr.  Be defensive.
2016-03-26 19:29:31 -07:00
Sean Chittenden ca5950a538 Proactively ping server before rotation
Before shuffling the server list, proactively ping the next server in the list to establish the connection and verify the remote endpoint is healthy.
2016-03-26 19:28:13 -07:00
Sean Chittenden 18270bbd04 Factor out the shuffle server 2016-03-26 19:19:04 -07:00
Sean Chittenden 90d626b5d9 Revise comments re: cycleServer
Improve the comments to discuss what happens presently.  Add a note to consider possibly calling to TestConsulServer proactively.
2016-03-26 18:53:13 -07:00
Sean Chittenden cf59a860e7 Comment why the interface is needed: cyclic import 2016-03-26 18:38:35 -07:00
Sean Chittenden 3ecd72f3b5 Add a struct key type for server_details 2016-03-26 17:58:12 -07:00
Sean Chittenden 5893bd5a35 Add additional checks 2016-03-25 14:40:46 -07:00
Sean Chittenden d18e4d7455 Delete the right tag
"role" != "consul"
2016-03-25 14:31:48 -07:00
Sean Chittenden b1194e83cb Don't pass in sm, server manager is already in scope
Go closures are implicitly capturing lambdas.
2016-03-25 14:10:09 -07:00
Sean Chittenden a3a0eeeadd Trim residual complexity from server join notifications
Now that serf node join events are decoupled from rebalancing activities completely, remove the complixity of draining the channel and ensuring only one go routine was rebalancing the server list.

Now that we're no longer initializing a notification channel, we can remove the config load/save from `Start()`
2016-03-25 14:06:35 -07:00
Sean Chittenden a71fbe57e3 Only log in FindServers
In FindServer this is a useful warning hinting why its call failed. RPC returns error and leaves it to the higher level caller to do whatever it wants. As an operator, I'd have the detail necessary to know why the RPC call(s) failed.
2016-03-25 13:58:50 -07:00
Sean Chittenden 58246fcc0b Initialize the rebalancce to clientRPCMinReuseDuration
In an earlier version there was a channel to notify when a new server was added, however this has long since been removed.  Just default to the sane value of 2min before the first rebalance calc takes place.

Pointed out by: slackpad
2016-03-25 13:46:18 -07:00
Sean Chittenden d9251e30c8 Use range vs for
Returning a new array vs mutating an array in place so we can use range now.
2016-03-25 13:08:08 -07:00
Sean Chittenden 9c18bb5f1c Comment updates 2016-03-25 13:06:59 -07:00
Sean Chittenden 0b3f6932df Only rotate server list with more than one server
Fantastic observation by slackpad.  This was left over from when there was a boolean for health in the server struct (vs current strategy where we use server position in the list and rely on serf to cleanup the stale members).

Pointed out by: slackpad
2016-03-25 12:54:36 -07:00
Sean Chittenden 24eb274860 Relocate saveServerConfig next to getServerConfig
Requested by: slackpad
2016-03-25 12:41:22 -07:00
Sean Chittenden b00db393e7 Clarify that ConsulClusterInfo is an interface over serf
An interface was used to break a cyclic import dependency.
2016-03-25 12:38:40 -07:00
Sean Chittenden b8bbdb9e7a Reword comment after moving code into new packages 2016-03-25 12:34:46 -07:00
Sean Chittenden 68183c4378 Change initialReblaanaceTimeout to a time.Duration
Pointed out by: @slackpad
2016-03-25 12:34:12 -07:00
Sean Chittenden 3433feb93b Negative check: test an invalid condition 2016-03-25 12:22:33 -07:00
Sean Chittenden 0f3ad9c120 Test to make sure bootstrap is missing 2016-03-25 12:20:12 -07:00
Sean Chittenden 7a2d30d1cf Be more Go idiomatic w/ variable names: s/valid/ok/g
Cargo culting is bad, m'kay?

Pointy Hat: sean-
2016-03-25 12:14:24 -07:00
Sean Chittenden 9daccb8b41 Fix stale comment
Pointed out by: @slackpad
2016-03-25 12:00:40 -07:00
Sean Chittenden db72041063 Add a comment for Client serverMgr 2016-03-25 11:59:27 -07:00
Sean Chittenden 828606232e Correct a bogus goimport rewrite for tests 2016-03-23 22:35:49 -07:00
Sean Chittenden da872fee63 Test ServerManager.refreshServerRebalanceTimer
Change the signature so it returns a value so that this can be tested externally with mock data.  See the sample table in TestServerManagerInternal_refreshServerRebalanceTimer() for the rate at which it will back off.  This function is mostly used to not cripple large clusters in the event of a partition.
2016-03-23 22:10:50 -07:00
Sean Chittenden a63d5ab963 Add a handful more unit tests to the public interface 2016-03-23 22:10:50 -07:00
Sean Chittenden 8e3c83a258 Rename GetNumServers to NumServers()
Matches the style of the rest of the repo
2016-03-23 22:10:50 -07:00
Sean Chittenden 18f7befba9 Rename NewServerManger to just New
Follow go style recommendations now that this has been refactored out of the consul package and doesn't need the qualifier in the name.
2016-03-23 22:10:50 -07:00
Sean Chittenden e932e9a435 Rename FindHealthyServer() to FindServer()
There is no guarantee the server coming back is healthy.  It's apt to be healthy by virtue of its place in the server list, but it's not guaranteed.
2016-03-23 22:10:50 -07:00
Sean Chittenden 49a5a1ab84 cycleServer is a pure function, save the result 2016-03-23 22:10:50 -07:00
Sean Chittenden 94f79d2c3d Missed unit test cruft 2016-03-23 22:10:50 -07:00
Sean Chittenden bc62de541c Update comments to reflect reality 2016-03-23 22:10:50 -07:00
Sean Chittenden d2d55f4bb0 Remove additional cruft from ServerManager's channels
No longer needed code.
2016-03-23 22:10:50 -07:00
Sean Chittenden d13e3c18c9 Emulate a TryLock using atomic.CompareAndSwap
Prevent possible queueing behind serverConfigLock in the event that a server fails on a busy host.
2016-03-23 22:10:50 -07:00
Sean Chittenden 295af01680 Make use of interfaces
Use an interface instead of serf.Serf as arg to NewServerManager.  Bonus points for improved testability.

Pointed out by: @slackpad
2016-03-23 22:10:50 -07:00
Sean Chittenden fdbb142c3f Simplify error handling
Rely on Serf for liveliness.  In the event of a failure, simply cycle the server to the end of the list.  If the server is unhealthy, Serf will reap the dead server.

Additional simplifications:

*) Only rebalance servers based on timers, not when a new server is readded to the cluster.
*) Back out the failure count in server_details.ServerDetails
2016-03-23 22:10:50 -07:00
Sean Chittenden c2c73bfeab Unbreak client tests by reverting to original test
Debugging code crept into the actual test and hung out for much longer than it should have.
2016-03-23 22:10:50 -07:00
Sean Chittenden 0c87463b7e Introduce asynchronous management of consul server lists
Instead of blocking the RPC call path and performing a potentially expensive calculation (including a call to `c.LANMembers()`), introduce a channel to request a rebalance.  Some events don't force a reshuffle, instead the extend the duration of the current rebalance window because the environment thrashed enough to redistribute a client's load.
2016-03-23 22:10:50 -07:00
Sean Chittenden bad6cb8897 Comment nits 2016-03-23 22:10:50 -07:00
Sean Chittenden 74bcbc63f8 Use saveServerConfig vs atomic.Value.Store(config) 2016-03-23 22:10:50 -07:00
Sean Chittenden 7f55931d02 Commit a handful of refactoring && copy/paste-o fixes 2016-03-23 22:10:50 -07:00
Sean Chittenden e53704b032 Mutate copies of serverCfg.servers, not original
Removing any ambiguity re: ownership of the mutated server lists is a win for maintenance and debugging.
2016-03-23 22:10:50 -07:00
Sean Chittenden e6c27325d9 rebalanceTimer may be nil during initialization
When first starting the server manager, it's possible that the rebalanceTimer in serverConfig will be nil, test accordingly.
2016-03-23 22:10:50 -07:00
Sean Chittenden a7091b0837 Properly retain a pointer to the rebalanceTimer 2016-03-23 22:10:50 -07:00
Sean Chittenden 00ff8e5307 Cosmetic and various other wordsmithing cleanups 2016-03-23 22:10:50 -07:00
Sean Chittenden b4db49a62e Document the various functions and their locking 2016-03-23 22:10:50 -07:00
Sean Chittenden 9eb6481d73 Use config convenience method to get config
'cause ELETTHECOMPILERSDOTHEWORK.  I don't need that cluttering up the subconscious with more complexity.
2016-03-23 22:10:50 -07:00
Sean Chittenden 579e536f58 Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 22:10:50 -07:00
Sean Chittenden c7c551dbe0 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 22:10:50 -07:00
Sean Chittenden e48b910f87 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 22:10:50 -07:00
Sean Chittenden 01b637114c Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 22:10:50 -07:00
Sean Chittenden 117c65dc55 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 22:10:32 -07:00
Sean Chittenden 0eac826573 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 22:09:46 -07:00
Sean Chittenden b9e5588620 Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 22:05:29 -07:00
Sean Chittenden a482eaef70 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 22:05:05 -07:00
Sean Chittenden 9b8767aa67 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 22:03:20 -07:00
Sean Chittenden 075d1b628f Commit miss re: consuls variable rename 2016-03-23 16:24:29 -07:00
Sean Chittenden 2ca4cc58ce Move consul.serverConfig out of the consul package
Relocated to its own package, server_manager.  This now greatly simplifies the RPC() call path and appropriately hides the locking behind the package boundary.  More work is needed to be done here
2016-03-23 16:16:22 -07:00
Sean Chittenden 0925b26250 Refactor consul.serverParts into server_details.ServerDetails
This may be short-lived, but it also seems like this is going to lead us down a path where ServerDetails is going to evolve into a more powerful package that will encapsulate more behavior behind a coherent API.
2016-03-23 16:15:47 -07:00
Sean Chittenden 5be956c310 Rename serverConfigMtx to serverConfigLock
Pointed out by: @slackpad
2016-03-23 16:15:47 -07:00
Sean Chittenden b1e392405c Handle the case where there are no healthy servers
Pointed out by: @slackpad
2016-03-23 16:15:47 -07:00
Sean Chittenden d4ca349e21 Refactor out the management of Consul servers
Move the management of c.consulServers (fka c.consuls) into consul/server_manager.go.

This commit brings in a background task that proactively manages the server list and:

*) reshuffles the list
*) manages the timer out of the RPC() path
*) uses atomics to detect a server has failed

This is a WIP, more work in testing needs to be completed.
2016-03-23 16:15:47 -07:00
Sean Chittenden 7b308d8d7e Add a flag to denote that a server is disabled
A server is not normally disabled, but in the event of an RPC error, we want to mark a server as down to allow for fast failover to a different server.  This value must be an int in order to support atomic operations.

Additionally, this is the preliminary work required to bring up a server in a disabled state.  RPC health checks in the future could mark the server as alive, thereby creating an organic "slow start" feature for Consul.
2016-03-23 16:14:59 -07:00
Sean Chittenden 6af781d9d5 Rename `lastServer` to `preferredServer`
Expanding the domain of lastServer beyond RPC() changes the meaning of this variable.  Rename accordingly to match the intent coming in a subsequent commit: a background thread will be in charge of rotating preferredServer.
2016-03-23 16:14:59 -07:00
Sean Chittenden f6ffbf4e96 Warn if serf events have queued up past 80% of the limit
It is theoretically possible that the number of queued serf events can back up.  If this happens, emit a warning message if there are more than 200 events in queue.

Most notably, this can happen if `c.consulServerLock` is held for an "extended period of time".  The probability of anyone ever seeing this log message is hopefully low to nonexistent, but if it happens, the warning message indicating a large number of serf events fired while a lock was held is likely to be helpful (vs serf mysteriously blocking when attempting to add an event to a channel).
2016-03-23 16:14:11 -07:00
Sean Chittenden 54016f5276 Commit miss re: consuls variable rename 2016-03-23 16:13:49 -07:00
Sean Chittenden f9aa968bf4 Remove lastRPCTime
This mechanism isn't going to provide much value in the future.  Preemptively reduce the complexity of future work.
2016-03-23 16:13:49 -07:00
Sean Chittenden cc86eb0a1a Rename c.consuls to c.consulServers
Prep for breaking out maintenance of consuls into a new goroutine.
2016-03-23 16:10:27 -07:00
Sean Chittenden 146c5b0a59 Use `rand.Int31n()` to get power of two optimization
In cases where i+1 is a power of two, skip one modulo operation.
2016-03-23 16:00:39 -07:00
James Phillips 8a9ad3811b Gets rid of flaky sort check.
If we get a coordinate then this test will fail, so we only check the
first item in the list, which is deterministic.
2016-03-21 17:30:05 -07:00
James Phillips fffa8d5061 Increases timeouts for coordinate tests.
We take the interval and add the random stagger to it, so 2X is cutting it
too close and the unit tests are often flaky.
2016-03-21 16:44:35 -07:00
James Phillips b6cd4318d6 Merge pull request #1851 from hashicorp/f-ipv6-bind
Allow [::] as a bind address (binds to first public IPv6 address)
2016-03-19 16:16:19 -07:00
James Phillips 99eb629c8f Adds more specific checks for ipv6 addresses. 2016-03-19 16:14:45 -07:00
James Phillips 3c883951d7 Removes leader from members and changes name since it's an address. 2016-03-18 17:07:11 -07:00
Sergey Romanov 93c8b496e5 #735 add information about leader to consul members 2016-03-18 17:05:40 -07:00
Wim b5d45322b4 Allow [::] as a bind address (binds to first public IPv6 address) 2016-03-18 23:59:44 +01:00
Calvin Leung Huang 4bd5523276 Obfuscate token for lookupACL error 2016-03-15 17:16:25 -04:00
James Phillips cb9c908a99 Hardens the match interoplator against negative arguments. 2016-03-07 13:32:32 -08:00
James Phillips 1cad6b9e0f Adds a comment about the embedded struct. 2016-03-07 10:45:39 -08:00
James Phillips eb7004f2b8 Renames "debug" endpoint and structures to "explain". 2016-03-07 10:45:39 -08:00
James Phillips d7288e3a5e Adds a prepared query debug endpoint. 2016-03-07 10:45:39 -08:00
James Phillips c7ee82c67f Applies prefix ACL to a catch-all template as a special case. 2016-03-07 10:45:39 -08:00
James Phillips 79eccf2c66 Adds a test for the custom prepared query template indexer. 2016-03-07 10:45:39 -08:00
James Phillips 897ab0d5c7 Adds core query template tests to the state store. 2016-03-07 10:45:39 -08:00
James Phillips 328d138466 Adds in basic query template lookups and vendors newly-updated memdb as well as improved iradix tree. 2016-03-07 10:45:39 -08:00
James Phillips 07514214e1 Adds tests for the low-level template functions. 2016-03-07 10:45:39 -08:00
James Phillips e3827923b8 Adds tests for the string visitor. 2016-03-07 10:45:39 -08:00
James Phillips 799339acb5 Factors rendering down into the resolve function. 2016-03-07 10:45:39 -08:00
James Phillips 6ed64e7f05 Splits walk functions out from the rest of the template code. 2016-03-07 10:45:39 -08:00
James Phillips 998b691878 Integrates templates into state store and endpoint (sans tests). 2016-03-07 10:45:39 -08:00
James Phillips c816f79bf8 Wraps the prepared query to also store the compiled template. 2016-03-07 10:45:39 -08:00
James Phillips 331f1f5b8b Adds basic query template compiler and renderer. 2016-03-07 10:45:39 -08:00
Mike Cowgill 5435055c5f one line schema change to not allow missing for sessions Table node index, Fixes #1774 2016-03-02 21:19:53 -08:00
James Phillips 90898dff98 Adds missing token redact in the GET path. 2016-02-26 15:59:00 -08:00
James Phillips 213026b033 Merge pull request #1757 from hashicorp/f-revert-1667
Reverts server connection rebalancing changes from #1667
2016-02-24 18:07:13 -08:00
James Phillips 7d392118d2 Adds a check for users re-submitting the redacted token. 2016-02-24 17:35:26 -08:00
James Phillips 483898abe5 Renames "prepared_query" ACL policy to "query". 2016-02-24 17:02:06 -08:00
James Phillips 87ceb2f3de Changes to more idiomatic "ok" pattern for prefix getter. 2016-02-24 16:26:43 -08:00
James Phillips e283f9512e Renames a unit test. 2016-02-24 16:17:20 -08:00
James Phillips ff25d033a6 Revert "Merge pull request #1667 from hashicorp/b-redistribute-clients"
This reverts commit 8f30dea420, reversing
changes made to eb27a02956.
2016-02-24 15:38:03 -08:00
James Phillips 899dcfe053 Completes switch of prepared_query ACLs to govern query names. 2016-02-24 01:26:16 -08:00
James Phillips 67de77482e Creates new "prepared-query" ACL type and new token capture behavior.
Prior to this change, prepared queries had the following behavior for
ACLs, which will need to change to support templates:

1. A management token, or a token with read access to the service being
   queried needed to be provided in order to create a prepared query.

2. The token used to create the prepared query was stored with the query
   in the state store and used to execute the query.

3. A management token, or the token used to create the query needed to be
   supplied to perform and CRUD operations on an existing prepared query.

This was pretty subtle and complicated behavior, and won't work for
templates since the service name is computed at execution time. To solve
this, we introduce a new "prepared-query" ACL type, where the prefix
applies to the query name for static prepared query types and to the
prefix for template prepared query types.

With this change, the new behavior is:

1. A management token, or a token with "prepared-query" write access to
   the query name or (soon) the given template prefix is required to do
   any CRUD operations on a prepared query, or to list prepared queries
   (the list is filtered by this ACL).

2. You will no longer need a management token to list prepared queries,
   but you will only be able to see prepared queries that you have access
   to (you get an empty list instead of permission denied).

3. When listing or getting a query, because it was easy to capture
   management tokens given the past behavior, this will always blank out
   the "Token" field (replacing the contents as <hidden>) for all tokens
   unless a management token is supplied. Going forward, we should
   discourage people from binding tokens for execution unless strictly
   necessary.

4. No token will be captured by default when a prepared query is created.
   If the user wishes to supply an execution token then can pass it in via
   the "Token" field in the prepared query definition. Otherwise, this
   field will default to empty.

5. At execution time, we will use the captured token if it exists with the
   prepared query definition, otherwise we will use the token that's passed
   in with the request, just like we do for other RPCs (or you can use the
   agent's configured token for DNS).

6. Prepared queries with no name (accessible only by ID) will not require
   ACLs to create or modify (execution time will depend on the service ACL
   configuration). Our argument here is that these are designed to be
   ephemeral and the IDs are as good as an ACL. Management tokens will be
   able to list all of these.

These changes enable templates, but also enable delegation of authority to
manage the prepared query namespace.
2016-02-23 17:12:43 -08:00
James Phillips 778a26efaf Adds a test for node registration and tagged addresses. 2016-02-07 13:15:22 -08:00
James Phillips 4be2ab1a75 Moves tagged wan address to be managed by anti-entropy, not serf. 2016-02-07 13:12:42 -08:00
James Phillips 33462ebea9 Adds an FSM persist and restore test for tagged addresses. 2016-02-07 11:36:39 -08:00
James Phillips c60a526fde Sets up config for more address tags down the road, renames struct members. 2016-02-07 10:37:34 -08:00
Evan Gilman de8fd561d0 Use a map for additional node addresses 2016-02-06 23:01:45 -08:00
Evan Gilman 069a28b3c0 Use idiomatic name for wan_addr serf tag 2016-02-06 23:01:45 -08:00
James Phillips ed8a71efd7 Store WanAddress during Service/Check sync 2016-02-06 23:01:45 -08:00
Evan Gilman 9300a13643 Store WanAddress during node registration 2016-02-06 23:01:45 -08:00
Evan Gilman 90aafbbdb6 Store WanAddress on Node 2016-02-06 23:01:45 -08:00
Sean Chittenden 1f725e2d05 Use the server's address in debug logging, not the c.lastServer, which may be nil 2016-02-02 15:51:28 -08:00
Sean Chittenden 1c9d74a337 Remove unnecessary check, test was moved further up in scope 2016-02-02 11:13:58 -08:00
Sean Chittenden 1005b91c87 Use panic instead of returning a sentinel UUID values in unit tests 2016-02-01 23:15:19 -08:00
Sean Chittenden c7e58734ed Continually rebalance client connections
Introduce a low-level background connection expiration mechanism wherein connections will be recycled periodically based on the size and health of the cluster.

For the vast majority of consul users, this will mean an average connection age of 150s.  For 10K node clusters it will take ~3min for clusters to rebalance their connections.  In the pathological case for a 100K cluster where 99K clients are in the minority talking to 1x server it will take ~26min to rebalance all connections.

It's possibe for clients recovering from a parititon to become fixated on a single server until the server or agent is restarted.  This is of particular interest to long-running environments with stable agents, where `allow_stale` is true, and partitions occur periodically.
2016-01-30 17:13:50 -08:00
Sean Chittenden 0c83b1b692 Use rand.Int31n() vs unconditionally using modulus 2016-01-30 15:47:58 -08:00
Sean Chittenden 7fb0045bbe Merge branch 'f-consul-lib' of ssh://github.com/hashicorp/consul into b-redistribute-clients 2016-01-30 15:40:54 -08:00
Sean Chittenden c4f7b4a13e Rename clientRPCCache to clientRPCConnMaxIdle, change value
Increase the max idle time for agents talking to servers from 30s to 127s in order to allow for the reuse of connections that are being initiated by cron.

127s was chosen as the first prime above 120s (arbitrarily chose to use a prime) with the intent of reusing connections who are used by once-a-minute cron(8) jobs *and* who use a 60s jitter window (e.g. in vixie cron job execution can drift by up to 59s per job, or 119s for a once-a-minute cron job).
2016-01-30 15:27:46 -08:00
Sean Chittenden b391b075bd Reuse the results from gettimeofday(2)...
Inside of a single RPC call, reuse time.Now().
2016-01-30 14:39:17 -08:00
Sean Chittenden 7af6a94edb Factor out duplicate functions into a lib package
Consolidate code duplication and tests into a single lib package.  Most of these functions were from various **/util.go functions that couldn't be imported due to cyclic imports.  The consul/lib package is intended to be a terminal node in an import DAG and a place to stash various consul-only helper functions.  Pulled in hashicorp/go-uuid instead of consolidating UUID access.
2016-01-29 16:57:45 -08:00
James Phillips 01da5a2248 Prevents watches from being orphaned when KVS blocking queries loop. 2016-01-20 07:18:47 -08:00
James Phillips 94d3f881fe Merge pull request #948 from hashicorp/iface-down-fix
Don't try to bind on address from inactive interface
2016-01-14 17:00:54 -08:00