Commit Graph

9585 Commits

Author SHA1 Message Date
Kyle Havlovitz b51d76f469
fsm: add missing CA config to snapshot/restore logic 2018-08-16 11:58:50 -07:00
Kyle Havlovitz fa8990c5a2
Merge pull request #4528 from hashicorp/autopilot-fixes
Fix inconsistency caused by the autopilot StatsFetcher
2018-08-15 11:35:52 -07:00
Siva Prasad 771e0bafd2
Addresses the flakiness of CatalogNodes (#4530) 2018-08-15 11:16:05 -04:00
sandstrom 14f19f75a6 Clarify port usage for agents (#4510) 2018-08-14 16:10:01 -07:00
Kyle Havlovitz 4b35d877ca
autopilot: don't follow the normal server removal rules for nonvoters 2018-08-14 14:24:51 -07:00
Kyle Havlovitz ea14482376
Fix stats fetcher healthcheck RPCs not being independent 2018-08-14 14:23:52 -07:00
Shubheksha fc3997f266 replace old fork of text package (#4501) 2018-08-14 12:23:18 -07:00
Pierre Souchay 0d6de257a2 Display more information about check being not properly added when it fails (#4405)
* Display more information about check being not properly added when it fails

It follows an incident where we add lots of error messages:

  [WARN] consul.fsm: EnsureRegistration failed: failed inserting check: Missing service registration

That seems related to Consul failing to restart on respective agents.

Having Node information as well as service information would help diagnose the issue.

* Renamed ensureCheckIfNodeMatches() as requested by @banks
2018-08-14 17:45:33 +01:00
Freddy 6d43d24edb
Improve reliability of tests with TestAgent (#4525)
- Add WaitForTestAgent to tests flaky due to missing serfHealth registration

- Fix bug in retries calling Fatalf with *testing.T

- Convert TestLockCommand_ChildExitCode to table driven test
2018-08-14 12:08:33 -04:00
Paul Banks e34acd275f
Update intentions.html.md 2018-08-14 15:09:45 +01:00
Freddy e305443db4
Address flakiness in command/exec tests (#4517)
* Add fn to wait for TestAgent node and check registration

* Add waits for TestAgent and retries before timeouts in exec_test
2018-08-10 15:04:07 -04:00
Geoffrey Grosenbach a03512496f
Consul Production Deployment Guide
Renames guide to "Production Deployment"
Adds link in sidebar menu.
Implements edits suggested by Consul engineering team.
2018-08-10 11:51:05 -07:00
Matt Keeler daacbaa520
Update CHANGELOG.md 2018-08-10 11:33:22 -04:00
Pierre Souchay ef3b81ab13 Allow to rename nodes with IDs, will fix #3974 and #4413 (#4415)
* Allow to rename nodes with IDs, will fix #3974 and #4413

This change allow to rename any well behaving recent agent with an
ID to be renamed safely, ie: without taking the name of another one
with case insensitive comparison.

Deprecated behaviour warning
----------------------------

Due to asceding compatibility, it is still possible however to
"take" the name of another name by not providing any ID.

Note that when not providing any ID, it is possible to have 2 nodes
having similar names with case differences, ie: myNode and mynode
which might lead to DB corruption on Consul server side and
lead to server not properly restarting.

See #3983 and #4399 for Context about this change.

Disabling registration of nodes without IDs as specified in #4414
should probably be the way to go eventually.

* Removed the case-insensitive search when adding a node within the else
block since it breaks the test TestAgentAntiEntropy_Services

While the else case is probably legit, it will be fixed with #4414 in
a later release.

* Added again the test in the else to avoid duplicated names, but
enforce this test only for nodes having IDs.

Thus most tests without any ID will work, and allows us fixing

* Added more tests regarding request with/without IDs.

`TestStateStore_EnsureNode` now test registration and renaming with IDs

`TestStateStore_EnsureNodeDeprecated` tests registration without IDs
and tests removing an ID from a node as well as updated a node
without its ID (deprecated behaviour kept for backwards compatibility)

* Do not allow renaming in case of conflict, including when other node has no ID

* Fixed function GetNodeID that was not working due to wrong type when searching node from its ID

Thus, all tests about renaming were not working properly.

Added the full test cas that allowed me to detect it.

* Better error messages, more tests when nodeID is not a valid UUID in GetNodeID()

* Added separate TestStateStore_GetNodeID to test GetNodeID.

More complete test coverage for GetNodeID

* Added new unit test `TestStateStore_ensureNoNodeWithSimilarNameTxn`

Also fixed comments to be clearer after remarks from @banks

* Fixed error message in unit test to match test case

* Use uuid.ParseUUID to parse Node.ID as requested by @mkeeler
2018-08-10 11:30:45 -04:00
Paul Banks 9ce10769ce Update Serf and memberlist (#4511)
This includes fixes that improve gossip scalability on very large (> 10k node) clusters.

The Serf changes:
 - take snapshot disk IO out of the critical path for handling messages hashicorp/serf#524
 - make snapshot compaction much less aggressive - the old fixed threshold caused snapshots to be constantly compacted (synchronously with request handling) on clusters larger than about 2000 nodes! hashicorp/serf#525

Memberlist changes:
 - prioritize handling alive messages over suspect/dead to improve stability, and handle queue in LIFO order to avoid acting on info that 's already stale in the queue by the time we handle it. hashicorp/memberlist#159
 - limit the number of concurrent pushPull requests being handled at once to 128. In one test scenario with 10s of thousands of servers we saw channel and lock blocking cause over 3000 pushPulls at once which ballooned the memory of the server because each push pull contained a de-serialised list of all known 10k+ nodes and their tags for a total of about 60 million objects and 7GB of memory stuck. While the rest of the fixes here should prevent the same root cause from blocking in the same way, this prevents any other bug or source of contention from allowing pushPull messages to stack up and eat resources. hashicorp/memberlist#158
2018-08-09 13:16:13 -04:00
Siva Prasad c88900aaa9
PR to fix TestAgent_IndexChurn and TestPreparedQuery_Wrapper. (#4512)
* Fixes TestAgent_IndexChurn

* Fixes TestPreparedQuery_Wrapper

* Increased sleep in agent_test for IndexChurn to 500ms

* Made the comment about joinWAN operation much less of a cliffhanger
2018-08-09 12:40:07 -04:00
Armon Dadgar 8a1f42e190
Merge pull request #4505 from hashicorp/f-channel-size
consul: Update buffer sizes
2018-08-08 12:26:23 -07:00
Freddy e21f554923
Improve flaky connect/proxy Listener tests (#4498)
Improve flaky connect/proxy Listener tests

- Add sleep to TestEchoConn to allow for Read/Write to finish before fetching data in reportStats

- Account for flakiness around interval for Gauge

- Improve debug output when dumping metrics
2018-08-08 14:56:03 -04:00
Armon Dadgar 4f1fd34e9e consul: Update buffer sizes 2018-08-08 10:26:58 -07:00
Geoffrey Grosenbach 3e6313bebb
Merge pull request #4485 from hashicorp/doc-remove-atlas
Remove all mention of Atlas, even in deprecated changelogs
2018-08-07 10:45:50 -07:00
Geoffrey Grosenbach 78134fa368
Merge pull request #4318 from hashicorp/doc-discovery-code-snippet
Improve styling of discovery snippet
2018-08-07 10:44:58 -07:00
Siva Prasad 288d350a73
Revert "CA initialization while boostrapping and TestLeader_ChangeServerID fix." (#4497)
* Revert "BUGFIX: Unit test relying on WaitForLeader() did not work due to wrong test (#4472)"

This reverts commit cec5d72396.

* Revert "CA initialization while boostrapping and TestLeader_ChangeServerID fix. (#4493)"

This reverts commit 589b589b53.
2018-08-07 08:29:48 -04:00
Pierre Souchay cec5d72396 BUGFIX: Unit test relying on WaitForLeader() did not work due to wrong test (#4472)
- Improve resilience of testrpc.WaitForLeader()

- Add additionall retry to CI

- Increase "go test" timeout to 8m

- Add wait for cluster leader to several tests in the agent package

- Add retry to some tests in the api and command packages
2018-08-06 19:46:09 -04:00
Siva Prasad 589b589b53
CA initialization while boostrapping and TestLeader_ChangeServerID fix. (#4493)
* connect: fix an issue with Consul CA bootstrapping being interrupted

* streamline change server id test
2018-08-06 16:15:24 -04:00
Geoffrey Grosenbach 5d2855ecda Remove all mention of Atlas, even in deprecated changelogs 2018-08-03 10:51:18 -07:00
Freddy 0a25e11aa8
Merge pull request #4484 from freddygv/travis-in-containers
Move travis build env back to containers
2018-08-02 15:51:36 -04:00
Paul Banks 3df6700f95
Update Changelog 2018-08-02 18:57:00 +01:00
Jack Pearkes 625bbb0137
Clarification for serf_wan documentation (#4459)
* updates docs for agent options

trying to add a little more clarity to suggestion that folks should use
port 8302 for both LAN and WAN comms

* website: clarify language for serf wan port behavior
2018-08-02 10:25:25 -07:00
freddygv cb9a28dfda Move travis build env back to containers 2018-08-02 10:30:28 -04:00
Siva Prasad 865068a358
DNS : Fixes recursors answering the DNS query to properly return the correct response. (#4461)
* Fixes the DNS recursor properly resolving the requests

* Added a test case for the recursor bug

* Refactored code && added a test case for all failing recursors

* Inner indentation moved into else if check
2018-08-02 10:12:52 -04:00
Paul Banks 71dd3b408a
Fixes memory leak when blocking on /event/list (#4482) 2018-08-02 14:54:48 +01:00
Mitchell Hashimoto ec755b4ba7
Merge pull request #4464 from hashicorp/je.htmlfix
A couple more docs fixes
2018-07-31 06:57:04 -07:00
Matt Keeler e5b5995adf Putting source back into Dev Mode 2018-07-30 13:54:29 -04:00
John Cowen 80a307cb9f
UI: Add conditional enterprise logo (#4432)
Adds additional 'enterprise' text underneath the 'startup' logo if the
ui is built with a CONSUL_BINARY_TYPE environment variable that doesn't
equal `oss`.
2018-07-30 17:59:43 +01:00
John Cowen fde41467cf
ui: Adds bottom breathing space on the bottom of forms (#4433) 2018-07-30 17:58:13 +01:00
John Cowen 12811c0844
UI - Refactor Adapter.handleResponse (#4398)
* Add some tests to check the correct GET API endpoints are called

* Refactor adapters

1. Add integration tests for `urlFor...` and majority `handleResponse` methods
2. Refactor out `handleResponse` a little more into single/batch/boolean
methods
3. Move setting of the `Datacenter` property into the `handleResponse`
method, basically the same place that the uid is being set using the dc
parsed form the URL
4. Add some Errors for if you don't pass ids to certain `urlFor` methods
2018-07-30 17:55:44 +01:00
John Cowen 45d85542a9
UI - Non-prod CSS sourcemaps (#4418) 2018-07-30 17:53:14 +01:00
mkeeler e716d1b5f8
Release v1.2.2 2018-07-30 16:01:13 +00:00
mkeeler 15fcbb6ed5 Bump website version to 1.2.2 2018-07-30 15:59:17 +00:00
Matt Keeler 7f8b70d0df
Update CHANGELOG.md 2018-07-30 11:57:53 -04:00
Matt Keeler f5c11e1c70
Update CHANGELOG.md
Update for fixing #4441
2018-07-30 09:14:07 -04:00
Matt Keeler 870a6ad6a8
Handle resolving proxy tokens when parsing HTTP requests (#4453)
Fixes: #4441

This fixes the issue with Connect Managed Proxies + ACLs being broken.

The underlying problem was that the token parsed for most http endpoints was sent untouched to the servers via the RPC request. These changes make it so that at the HTTP endpoint when parsing the token we additionally attempt to convert potential proxy tokens into regular tokens before sending to the RPC endpoint. Proxy tokens are only valid on the agent with the managed proxy so the resolution has to happen before it gets forwarded anywhere.
2018-07-30 09:11:51 -04:00
Geoffrey Grosenbach 827fbac3ab Copy-and-paste Go client example (#4448)
* Copy-and-paste Go client example

Includes Go source that runs without modification, as well as simple
instructions for compiling, running, and viewing the output in the
Consul UI.

* Remove unnecessary flags from development server example

This is a bare minimum Go example needed to store keys and values in
Consul. The `-ui` and `-server` flags aren't needed when running with
`-dev`.
2018-07-30 12:48:19 +01:00
Jeff Escalante 0f12370cfb a couple more corrections 2018-07-27 19:39:44 -04:00
John Cowen 0672cef1dc ui: Changelog additions (#4458) 2018-07-27 11:12:58 -07:00
Jeff Escalante 30d27d8356 fix a couple html errors (#4456) 2018-07-26 16:30:24 -07:00
Christie Koehler 2710ae4159 docs: Update links to ttl health check endpoints. (#4208)
* docs: Update links to ttl health check endpoints.

* remove absolute URLs
2018-07-26 16:14:44 -07:00
Matt Keeler 8cdba9611d
Update CHANGELOG.md 2018-07-26 11:42:33 -04:00
Matt Keeler 0e0227792b
Gossip tuneables (#4444)
Expose a few gossip tuneables for both lan and wan interfaces

gossip_nodes
gossip_interval
probe_timeout
probe_interval
retransmit_mult
suspicion_mult
2018-07-26 11:39:49 -04:00
Matt Keeler c45e19b274
Merge pull request #4445 from hashicorp/bugfix/build-cross-compile
Fix cross compiling with make
2018-07-26 11:23:47 -04:00