1064 Commits

Author SHA1 Message Date
Kyle Havlovitz
039e7f1880
Move autopilot setup to a separate file 2017-12-18 16:55:51 -08:00
Kyle Havlovitz
d08ab9fd19
Make some final tweaks to autopilot package 2017-12-18 12:26:47 -08:00
Kyle Havlovitz
a86d11ec0a
Merge pull request #3737 from hashicorp/autopilot-refactor
Move autopilot to a standalone package
2017-12-15 14:09:40 -08:00
James Phillips
06f980061e
Merge pull request #3728 from weiwei04/fix_globalRPC_goroutine_leak
fix globalRPC goroutine leak
2017-12-14 17:54:19 -08:00
Kyle Havlovitz
324c2ecb53
Expose IsPotentialVoter for advanced autopilot logic 2017-12-13 17:53:51 -08:00
Kyle Havlovitz
12bf61c851
Merge branch 'master' into autopilot-refactor 2017-12-13 11:54:32 -08:00
Kyle Havlovitz
d6b266c045
A few last autopilot adjustments 2017-12-13 11:19:17 -08:00
Kyle Havlovitz
2310687c1d
More autopilot reorganizing 2017-12-13 10:57:37 -08:00
James Phillips
46742a5041
Adds TODOs referencing #3744. 2017-12-13 10:52:06 -08:00
Kyle Havlovitz
b92f895c23
More refactoring to make autopilot consul-agnostic 2017-12-12 17:46:28 -08:00
Kyle Havlovitz
de28555671
Move autopilot to a standalone package 2017-12-11 16:45:33 -08:00
James Phillips
d12e81860f
Moves Serf helper into lib to fix import cycle in consul-enterprise. 2017-12-07 16:57:58 -08:00
James Phillips
5065f3d82e
Turns of intent queue warnings and enables dynamic queue sizing. 2017-12-07 16:27:06 -08:00
Wei Wei
cc9648c957 fix globalRPC goroutine leak
Signed-off-by: Wei Wei <weiwei.inf@gmail.com>
2017-12-05 11:53:30 +08:00
James Phillips
3e46544085
Creates a registration mechanism for snapshot and restore. 2017-11-29 18:36:53 -08:00
James Phillips
f53f521072
Begins split out of snapshots from the main FSM class. 2017-11-29 18:36:53 -08:00
James Phillips
c8e763667f
Creates a registration mechanism for FSM commands. 2017-11-29 18:36:53 -08:00
James Phillips
78292662d7
Moves the FSM into its own package.
This will help make it clearer what happens when we add some registration
plumbing for the different operations and snapshots.
2017-11-29 18:36:53 -08:00
James Phillips
e810697e06
Resolves an FSM snapshot TODO.
This adds checks for sink write calls before we continue the refactor, which
will resolve the other TODO comment we deleted as part of this change.
2017-11-29 18:36:53 -08:00
James Phillips
aa61159b74
Creates a registration mechanism for schemas.
This also splits out the registration into the table-specific source
files.
2017-11-29 18:36:52 -08:00
James Phillips
93ff33b1be
Creates a registration mechanism for RPC endpoints. 2017-11-29 18:36:52 -08:00
James Phillips
8bf1f57737
Renames stubs to be more consistent. 2017-11-29 18:36:52 -08:00
James Phillips
8abd2050fa
Sheds monotonic time info so tombstone GC bins work properly. 2017-11-29 10:34:24 -08:00
James Phillips
de57a9ef51
Gives back the lock before writing to the expire channel.
The lock isn't needed after we clean up the expire bin, and as seen
in #3700 we can get into a deadlock waiting to place the expire index
into the channel while holding this lock.

Fixes #3700
2017-11-19 16:24:16 -08:00
James Phillips
f19ba41144
Moves the LAN event handler after the router is created.
Fixes #3680
2017-11-10 12:26:48 -08:00
James Phillips
17737ee030
Revert "Adds a small sleep to make sure we are in the next GC bucket." 2017-11-08 22:18:37 -08:00
James Phillips
24475048e2
Adds a sleep to make sure we are in the next GC bucket, ups time.
Fixes #3670
2017-11-08 22:02:40 -08:00
James Phillips
c57884fffe
Skips the tombstone GC test in Travis for now.
Related to #3670
2017-11-08 20:14:20 -08:00
James Phillips
f6b7dcbcf6
Removes bogus getPort() in favor of freeport. 2017-11-08 19:55:50 -08:00
James Phillips
7b966e2d26
Tightens timing up and reorders GC test to be less flaky. 2017-11-08 15:09:29 -08:00
James Phillips
7c6ab5e783
Doubles the GC timing. 2017-11-08 15:01:11 -08:00
James Phillips
8de7c77482
Opens up test timing a little more. 2017-11-08 14:01:19 -08:00
James Phillips
c46612f691
Shifts off a gran boundary to help make test less flaky. 2017-11-08 13:57:17 -08:00
James Phillips
f31856c1b7
Opens up the tombstone GC test timing. 2017-11-08 13:43:39 -08:00
Kyle Havlovitz
d3dd2b1402
Move check definition to a sub-struct 2017-11-01 14:54:46 -07:00
Kyle Havlovitz
dbab3cd5f6
Merge branch 'master' into esm-changes 2017-11-01 11:37:48 -07:00
Kyle Havlovitz
c4375d5a47
Merge pull request #3622 from hashicorp/coordinate-node-endpoint
agent: add /v1/coordianate/node/:node endpoint
2017-11-01 11:35:50 -07:00
Kyle Havlovitz
b0536a96cc
Fill out the tests around coordinate/node functionality 2017-10-31 15:36:44 -07:00
Kyle Havlovitz
1e3b0d441b
Factor out registerNodes function 2017-10-31 13:34:49 -07:00
James Phillips
6bf55d16a2
Relaxes Autopilot promotion logic. (#3623)
* Relaxes Autopilot promotion logic.

When we defaulted the Raft protocol version to 3 in #3477 we made
the numPeers() routine more strict to only count voters (this is
more conservative and more correct). This had the side effect of
breaking rolling updates because it's at odds with the Autopilot
non-voter promotion logic.

That logic used to wait to only promote to maintain an odd quorum
of servers. During a rolling update (add one new server, wait, and
then kill an old server) the dead server cleanup would still count
the old server as a peer, which is conservative and the right thing
to do, and no longer count the non-voter. This would wait to promote,
so you could get into a stalemate. It is safer to promote early than
remove early, so by promoting as soon as possible we have chosen
that as the solution here.

Fixes #3611

* Gets rid of unnecessary extra not-a-voter check.
2017-10-31 15:16:56 -05:00
Kyle Havlovitz
2392545adc
Merge branch 'coordinate-node-endpoint' of github.com:hashicorp/consul into esm-changes 2017-10-26 19:20:24 -07:00
Kyle Havlovitz
5589eadcf5
Added Coordinate.Node rpc endpoint and client api method 2017-10-26 19:16:40 -07:00
Kyle Havlovitz
a7c42a6c2a
Expose SkipNodeUpdate field and some health check info in the http api 2017-10-25 19:37:30 +02:00
Frank Schroeder
c94751ad43 test: replace porter tool with freeport lib
This patch removes the porter tool which hands out free ports from a
given range with a library which does the same thing. The challenge for
acquiring free ports in concurrent go test runs is that go packages are
tested concurrently and run in separate processes. There has to be some
inter-process synchronization in preventing processes allocating the
same ports.

freeport allocates blocks of ports from a range expected to be not in
heavy use and implements a system-wide mutex by binding to the first
port of that block for the lifetime of the application. Ports are then
provided sequentially from that block and are tested on localhost before
being returned as available.
2017-10-21 22:01:09 +02:00
Ryan Slade
85e4aea9d1 Replace time.Now().Sub(x) with time.Since(x) 2017-10-17 20:38:24 +02:00
James Phillips
575d70aaa7
Cleans up some drift between the OSS and Enterprise trees. 2017-10-11 15:53:07 -07:00
James Phillips
bb12368eac Makes RPC handling more robust when rolling servers. (#3561)
* Adds client-side retry for no leader errors.

This paves over the case where the client was connected to the leader
when it loses leadership.

* Adds a configurable server RPC drain time and a fail-fast path for RPCs.

When a server leaves it gets removed from the Raft configuration, so it will
never know who the new leader server ends up being. Without this we'd be
doomed to wait out the RPC hold timeout and then fail. This makes things fail
a little quicker while a sever is draining, and since we added a client retry
AND since the server doing this has already shut down and left the Serf LAN,
clients should retry against some other server.

* Makes the RPC hold timeout configurable.

* Reorders struct members.

* Sets the RPC hold timeout default for test servers.

* Bumps the leave drain time up to 5 seconds.

* Robustifies retries with a simpler client-side RPC hold.

* Reverts untended delete.
2017-10-10 15:19:50 -07:00
James Phillips
4dab70cb93 Fixes handling of stop channel and failed barrier attempts. (#3546)
* Fixes handling of stop channel and failed barrier attempts.

There were two issues here. First, we needed to not exit when there
was a timeout trying to write the barrier, because Raft might not
step down, so we'd be left as the leader but having run all the step
down actions.

Second, we didn't close over the stopCh correctly, so it was possible
to nil that out and have the leaderLoop never exit. We close over it
properly AND sequence the nil-ing of it AFTER the leaderLoop exits for
good measure, so the code is more robust.

Fixes #3545

* Cleans up based on code review feedback.

* Tweaks comments.

* Renames variables and removes comments.
2017-10-06 07:54:49 -07:00
Kyle Havlovitz
c728564994
Update metric names and add a legacy config flag 2017-10-04 16:43:27 -07:00
Preetha Appan
8dcd7e700c Remove extra newline 2017-10-03 15:19:31 -05:00