Commit Graph

396 Commits

Author SHA1 Message Date
Pierre Souchay 1a906ef34e Fix more unstable tests in agent and command 2018-09-12 14:49:27 +01:00
Pierre Souchay 22500f242e Fix unstable tests in agent, api, and command/watch 2018-09-10 16:58:53 +01:00
Pierre Souchay eddcf228ea Implementation of Weights Data structures (#4468)
* Implementation of Weights Data structures

Adding this datastructure will allow us to resolve the
issues #1088 and #4198

This new structure defaults to values:
```
   { Passing: 1, Warning: 0 }
```

Which means, use weight of 0 for a Service in Warning State
while use Weight 1 for a Healthy Service.
Thus it remains compatible with previous Consul versions.

* Implemented weights for DNS SRV Records

* DNS properly support agents with weight support while server does not (backwards compatibility)

* Use Warning value of Weights of 1 by default

When using DNS interface with only_passing = false, all nodes
with non-Critical healthcheck used to have a weight value of 1.
While having weight.Warning = 0 as default value, this is probably
a bad idea as it breaks ascending compatibility.

Thus, we put a default value of 1 to be consistent with existing behaviour.

* Added documentation for new weight field in service description

* Better documentation about weights as suggested by @banks

* Return weight = 1 for unknown Check states as suggested by @banks

* Fixed typo (of -> or) in error message as requested by @mkeeler

* Fixed unstable unit test TestRetryJoin

* Fixed unstable tests

* Fixed wrong Fatalf format in `testrpc/wait.go`

* Added notes regarding DNS SRV lookup limitations regarding number of instances

* Documentation fixes and clarification regarding SRV records with weights as requested by @banks

* Rephrase docs
2018-09-07 15:30:47 +01:00
Pierre Souchay 9a2ae6e8eb Fixed more flaky tests in ./agent/consul (#4617) 2018-09-04 14:02:47 +01:00
Freddy d7a404f2ee
Bugfix: Use "%#v" when formatting structs (#4600) 2018-08-28 12:37:34 -04:00
Pierre Souchay b898131723 [BUGFIX] Avoid returning empty data on startup of a non-leader server (#4554)
Ensure that DB is properly initialized when performing stale queries

Addresses:
- https://github.com/hashicorp/consul-replicate/issues/82
- https://github.com/hashicorp/consul/issues/3975
- https://github.com/hashicorp/consul-template/issues/1131
2018-08-23 12:06:39 -04:00
Kyle Havlovitz e5e1f867e5
Merge branch 'master' into ca-snapshot-fix 2018-08-16 13:00:54 -07:00
Kyle Havlovitz f186edc42c
fsm: add connect service config to snapshot/restore test 2018-08-16 12:58:54 -07:00
nickmy9729 beddf03b26 Added code to allow snapshot inclusion of NodeMeta (#4527) 2018-08-16 15:33:35 -04:00
Kyle Havlovitz b51d76f469
fsm: add missing CA config to snapshot/restore logic 2018-08-16 11:58:50 -07:00
Kyle Havlovitz 4b35d877ca
autopilot: don't follow the normal server removal rules for nonvoters 2018-08-14 14:24:51 -07:00
Kyle Havlovitz ea14482376
Fix stats fetcher healthcheck RPCs not being independent 2018-08-14 14:23:52 -07:00
Pierre Souchay 0d6de257a2 Display more information about check being not properly added when it fails (#4405)
* Display more information about check being not properly added when it fails

It follows an incident where we add lots of error messages:

  [WARN] consul.fsm: EnsureRegistration failed: failed inserting check: Missing service registration

That seems related to Consul failing to restart on respective agents.

Having Node information as well as service information would help diagnose the issue.

* Renamed ensureCheckIfNodeMatches() as requested by @banks
2018-08-14 17:45:33 +01:00
Pierre Souchay ef3b81ab13 Allow to rename nodes with IDs, will fix #3974 and #4413 (#4415)
* Allow to rename nodes with IDs, will fix #3974 and #4413

This change allow to rename any well behaving recent agent with an
ID to be renamed safely, ie: without taking the name of another one
with case insensitive comparison.

Deprecated behaviour warning
----------------------------

Due to asceding compatibility, it is still possible however to
"take" the name of another name by not providing any ID.

Note that when not providing any ID, it is possible to have 2 nodes
having similar names with case differences, ie: myNode and mynode
which might lead to DB corruption on Consul server side and
lead to server not properly restarting.

See #3983 and #4399 for Context about this change.

Disabling registration of nodes without IDs as specified in #4414
should probably be the way to go eventually.

* Removed the case-insensitive search when adding a node within the else
block since it breaks the test TestAgentAntiEntropy_Services

While the else case is probably legit, it will be fixed with #4414 in
a later release.

* Added again the test in the else to avoid duplicated names, but
enforce this test only for nodes having IDs.

Thus most tests without any ID will work, and allows us fixing

* Added more tests regarding request with/without IDs.

`TestStateStore_EnsureNode` now test registration and renaming with IDs

`TestStateStore_EnsureNodeDeprecated` tests registration without IDs
and tests removing an ID from a node as well as updated a node
without its ID (deprecated behaviour kept for backwards compatibility)

* Do not allow renaming in case of conflict, including when other node has no ID

* Fixed function GetNodeID that was not working due to wrong type when searching node from its ID

Thus, all tests about renaming were not working properly.

Added the full test cas that allowed me to detect it.

* Better error messages, more tests when nodeID is not a valid UUID in GetNodeID()

* Added separate TestStateStore_GetNodeID to test GetNodeID.

More complete test coverage for GetNodeID

* Added new unit test `TestStateStore_ensureNoNodeWithSimilarNameTxn`

Also fixed comments to be clearer after remarks from @banks

* Fixed error message in unit test to match test case

* Use uuid.ParseUUID to parse Node.ID as requested by @mkeeler
2018-08-10 11:30:45 -04:00
Siva Prasad c88900aaa9
PR to fix TestAgent_IndexChurn and TestPreparedQuery_Wrapper. (#4512)
* Fixes TestAgent_IndexChurn

* Fixes TestPreparedQuery_Wrapper

* Increased sleep in agent_test for IndexChurn to 500ms

* Made the comment about joinWAN operation much less of a cliffhanger
2018-08-09 12:40:07 -04:00
Armon Dadgar 4f1fd34e9e consul: Update buffer sizes 2018-08-08 10:26:58 -07:00
Siva Prasad 288d350a73
Revert "CA initialization while boostrapping and TestLeader_ChangeServerID fix." (#4497)
* Revert "BUGFIX: Unit test relying on WaitForLeader() did not work due to wrong test (#4472)"

This reverts commit cec5d72396.

* Revert "CA initialization while boostrapping and TestLeader_ChangeServerID fix. (#4493)"

This reverts commit 589b589b53.
2018-08-07 08:29:48 -04:00
Pierre Souchay cec5d72396 BUGFIX: Unit test relying on WaitForLeader() did not work due to wrong test (#4472)
- Improve resilience of testrpc.WaitForLeader()

- Add additionall retry to CI

- Increase "go test" timeout to 8m

- Add wait for cluster leader to several tests in the agent package

- Add retry to some tests in the api and command packages
2018-08-06 19:46:09 -04:00
Siva Prasad 589b589b53
CA initialization while boostrapping and TestLeader_ChangeServerID fix. (#4493)
* connect: fix an issue with Consul CA bootstrapping being interrupted

* streamline change server id test
2018-08-06 16:15:24 -04:00
Kyle Havlovitz fa0d8aff33
fix inconsistency in TestConnectCAConfig_GetSet 2018-07-26 07:46:47 -07:00
Kyle Havlovitz ed87949385
Merge pull request #4400 from hashicorp/leaf-cert-ttl
Add configurable leaf cert TTL to Connect CA
2018-07-25 17:53:25 -07:00
Paul Banks 8cbeb29e73
Fixes #4421: General solution to stop blocking queries with index 0 (#4437)
* Fix theoretical cache collision bug if/when we use more cache types with same result type

* Generalized fix for blocking query handling when state store methods return zero index

* Refactor test retry to only affect CI

* Undo make file merge

* Add hint to error message returned to end-user requests if Connect is not enabled when they try to request cert

* Explicit error for Roots endpoint if connect is disabled

* Fix tests that were asserting old behaviour
2018-07-25 20:26:27 +01:00
Kyle Havlovitz ce10de036e
connect/ca: check LeafCertTTL when rotating expired roots 2018-07-20 16:04:04 -07:00
Kyle Havlovitz d6ca015a42
connect/ca: add configurable leaf cert TTL 2018-07-16 13:33:37 -07:00
Matt Keeler 63d5c069fc
Merge pull request #4379 from hashicorp/persist-intermediates
connect: persist intermediate CAs on leader change
2018-07-12 12:09:13 -04:00
Matt Keeler 0e83059d1f
Revert "Allow changing Node names since Node now have IDs" 2018-07-12 11:19:21 -04:00
Matt Keeler 91150cca59 Fixup formatting 2018-07-12 10:14:26 -04:00
Matt Keeler 3807e04de9 Revert PR 4294 - Catalog Register: Generate UUID for services registered without one
UUID auto-generation here causes trouble in a few cases. The biggest being older
nodes reregistering will fail when the UUIDs are different and the names match

This reverts commit 0f70034082.
This reverts commit d1a8f9cb3f.
This reverts commit cf69ec42a4.
2018-07-12 10:06:50 -04:00
Kyle Havlovitz f95c6807e7
connect: use reflect.DeepEqual instead for test 2018-07-11 13:10:58 -07:00
Matt Keeler 98ead2a8f8
Merge pull request #3983 from pierresouchay/node_renaming
Allow changing Node names since Node now have IDs
2018-07-11 16:03:02 -04:00
Kyle Havlovitz 4e5fb6bc19
connect: add provider state to snapshots 2018-07-11 11:34:49 -07:00
Kyle Havlovitz 462ace4867
connect: update leader initializeCA comment 2018-07-11 10:00:42 -07:00
Kyle Havlovitz 1d3f4b5099
connect: persist intermediate CAs on leader change 2018-07-11 09:44:30 -07:00
Pierre Souchay fecae3de21 When renaming a node, ensure the name is not taken by another node.
Since DNS is case insensitive and DB as issues when similar names with different
cases are added, check for unicity based on case insensitivity.

Following another big incident we had in our cluster, we also validate
that adding/renaming a not does not conflicts with case insensitive
matches.

We had the following error once:

 - one node called: mymachine.MYDC.mydomain was shut off
 - another node (different ID) was added with name: mymachine.mydc.mydomain before
   72 hours

When restarting the consul server of domain, the consul server restarted failed
to start since it detected an issue in RAFT database because
mymachine.MYDC.mydomain and mymachine.mydc.mydomain had the same names.

Checking at registration time with case insensitivity should definitly fix
those issues and avoid Consul DB corruption.
2018-07-11 14:42:54 +02:00
Matt Keeler d19c7d8882
Merge pull request #4303 from pierresouchay/non_blocking_acl
Only send one single ACL cache refresh across network when TTL is over
2018-07-10 08:57:33 -04:00
MagnumOpus21 300330e24b Agent/Proxy: Formatting and test cases fix 2018-07-09 12:46:10 -04:00
Kyle Havlovitz 401b206a2e
Store the time CARoot is rotated out instead of when to prune 2018-07-06 16:05:25 -07:00
Kyle Havlovitz 1492243e0a
connect/ca: add logic for pruning old stale RootCA entries 2018-07-02 10:35:05 -07:00
Pierre Souchay bd023f352e Updated swith case to use same branch for async-cache and extend-cache 2018-07-02 17:39:34 +02:00
Pierre Souchay 1e7665c0d5 Updated documentation and adding more test case for async-cache 2018-07-01 23:50:30 +02:00
Pierre Souchay abde81a3e7 Added async-cache with similar behaviour as extend-cache but asynchronously 2018-07-01 23:50:30 +02:00
Pierre Souchay 9406ca1c95 Only send one single ACL cache refresh across network when TTL is over
It will allow the following:

 * when connectivity is limited (saturated linnks between DCs), only one
   single request to refresh an ACL will be sent to ACL master DC instead
   of statcking ACL refresh queries
 * when extend-cache is used for ACL, do not wait for result, but refresh
   the ACL asynchronously, so no delay is not impacting slave DC
 * When extend-cache is not used, keep the existing blocking mechanism,
   but only send a single refresh request.

This will fix https://github.com/hashicorp/consul/issues/3524
2018-07-01 23:50:30 +02:00
Matt Keeler 22b7b688a3
Move starting enterprise functionality 2018-06-29 17:38:29 -04:00
Matt Keeler 0f70034082 Move default uuid test into the consul package 2018-06-27 09:21:58 -04:00
Matt Keeler d1a8f9cb3f go fmt changes 2018-06-27 09:07:22 -04:00
Matt Keeler cf69ec42a4 Make sure to generate UUIDs when services are registered without one
This makes the behavior line up with the docs and expected behavior
2018-06-26 17:04:08 -04:00
mkeeler 6813a99081 Merge remote-tracking branch 'connect/f-connect' 2018-06-25 19:42:51 +00:00
Kyle Havlovitz 3baa67cdef connect/ca: pull the cluster ID from config during a rotation 2018-06-25 12:25:42 -07:00
Kyle Havlovitz b4ef7bb64d connect/ca: leave blank root key/cert out of the default config (unnecessary) 2018-06-25 12:25:42 -07:00
Kyle Havlovitz 050da22473 connect/ca: undo the interface changes and use sign-self-issued in Vault 2018-06-25 12:25:42 -07:00