Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure. https://www.consul.io
Go to file
R.B. Boyer 40336fd353
agent: fix several data races and bugs related to node-local alias checks (#5876)
The observed bug was that a full restart of a consul datacenter (servers
and clients) in conjunction with a restart of a connect-flavored
application with bring-your-own-service-registration logic would very
frequently cause the envoy sidecar service check to never reflect the
aliased service.

Over the course of investigation several bugs and unfortunate
interactions were corrected:

(1)

local.CheckState objects were only shallow copied, but the key piece of
data that gets read and updated is one of the things not copied (the
underlying Check with a Status field). When the stock code was run with
the race detector enabled this highly-relevant-to-the-test-scenario field
was found to be racy.

Changes:

 a) update the existing Clone method to include the Check field
 b) copy-on-write when those fields need to change rather than
    incrementally updating them in place.

This made the observed behavior occur slightly less often.

(2)

If anything about how the runLocal method for node-local alias check
logic was ever flawed, there was no fallback option. Those checks are
purely edge-triggered and failure to properly notice a single edge
transition would leave the alias check incorrect until the next flap of
the aliased check.

The change was to introduce a fallback timer to act as a control loop to
double check the alias check matches the aliased check every minute
(borrowing the duration from the non-local alias check logic body).

This made the observed behavior eventually go away when it did occur.

(3)

Originally I thought there were two main actions involved in the data race:

A. The act of adding the original check (from disk recovery) and its
   first health evaluation.

B. The act of the HTTP API requests coming in and resetting the local
   state when re-registering the same services and checks.

It took awhile for me to realize that there's a third action at work:

C. The goroutines associated with the original check and the later
   checks.

The actual sequence of actions that was causing the bad behavior was
that the API actions result in the original check to be removed and
re-added _without waiting for the original goroutine to terminate_. This
means for brief windows of time during check definition edits there are
two goroutines that can be sending updates for the alias check status.

In extremely unlikely scenarios the original goroutine sees the aliased
check start up in `critical` before being removed but does not get the
notification about the nearly immediate update of that check to
`passing`.

This is interlaced wit the new goroutine coming up, initializing its
base case to `passing` from the current state and then listening for new
notifications of edge triggers.

If the original goroutine "finishes" its update, it then commits one
more write into the local state of `critical` and exits leaving the
alias check no longer reflecting the underlying check.

The correction here is to enforce that the old goroutines must terminate
before spawning the new one for alias checks.
2019-05-24 13:36:56 -05:00
.circleci Revert "Exclude non-go workflows while testing" 2019-05-21 19:17:39 -06:00
.github remove remaining references to govendor and vendorfmt (#5587) 2019-04-01 09:55:48 -05:00
acl Fix to prevent allowing recursive KV deletions when we shouldn’t 2019-05-22 20:13:30 +00:00
agent agent: fix several data races and bugs related to node-local alias checks (#5876) 2019-05-24 13:36:56 -05:00
api api: go mod tidy 2019-05-08 13:26:07 -05:00
bench Gets benchmarks running again and does a rough pass for 0.7.1. 2016-11-29 13:02:26 -08:00
build-support ui: fix 'make ui' again (#5751) 2019-04-30 12:23:24 -05:00
command Do not trigger update check when in dev mode 2019-05-07 09:15:34 -06:00
connect Move the watch package into the api module (#5664) 2019-04-26 12:33:01 -04:00
demo demo: Added udp port forwarding 2018-05-30 13:56:56 +09:00
ipaddr New config parser, HCL support, multiple bind addrs (#3480) 2017-09-25 11:40:42 -07:00
lib Allow MapWalk to handle []interface{} elements that are []uint8 (#5800) 2019-05-07 11:40:48 -04:00
logger Enforce log level filter for log files 2019-04-11 10:04:28 -06:00
sdk Centralized Config CLI (#5731) 2019-04-30 16:27:16 -07:00
sentinel Update to use a consulent build tag instead of just ent (#5759) 2019-05-01 11:11:27 -04:00
service_os Changes made : 2018-06-28 21:18:14 -04:00
snapshot Move internal/ to sdk/ (#5568) 2019-03-27 08:54:56 -04:00
terraform terraform: remove modules in repo (#5085) 2019-04-04 16:31:43 -07:00
test Envoy integration test improvements (#5797) 2019-05-21 14:17:41 +01:00
testrpc Move internal/ to sdk/ (#5568) 2019-03-27 08:54:56 -04:00
tlsutil agent: enable reloading of tls config (#5419) 2019-03-13 10:29:06 +01:00
types Removes remoteConsuls in favor of the new router. 2017-03-16 16:42:19 -07:00
ui-v2 ui: Adds tick whilst editing the link template in the Settings area (#5820) 2019-05-17 12:33:12 +01:00
vendor vendor: update memberlist 2019-05-15 11:10:40 -07:00
version Putting source back into Dev Mode 2019-05-23 12:03:07 -07:00
website Release v1.5.1 2019-05-22 20:19:12 +00:00
.dockerignore Update the scripting 2018-06-14 21:42:47 -04:00
.gitignore Remove old UI, option to use it, and its build processes 2019-04-12 09:02:27 -06:00
.travis.yml remove reference to deleted branch 2019-04-26 15:39:57 -05:00
CHANGELOG.md Update CHANGELOG.md 2019-05-24 16:51:44 +02:00
GNUmakefile Connect: allow configuring Envoy for L7 Observability (#5558) 2019-04-29 17:27:57 +01:00
INTERNALS.md docs: correct link to top level agent package (#4750) 2018-10-04 09:15:55 -05:00
LICENSE Initial commit 2013-11-04 14:15:27 -08:00
NOTICE.md add copyright notice file 2018-07-09 10:58:26 -07:00
README.md Contribution guide (#4704) 2018-10-05 09:06:40 -07:00
Vagrantfile Adds a basic Linux Vagrant setup, stolen from Nomad. 2017-10-06 08:10:12 -07:00
go.mod vendor: update memberlist 2019-05-15 11:10:40 -07:00
go.sum vendor: update memberlist 2019-05-15 11:10:40 -07:00
main.go Added Side Effect import for Windows Service 2018-06-18 14:55:11 -04:00
main_test.go Adding basic CLI infrastructure 2013-12-19 11:22:08 -08:00

README.md

Consul Build Status Join the chat at https://gitter.im/hashicorp-consul/Lobby

Consul is a tool for service discovery and configuration. Consul is distributed, highly available, and extremely scalable.

Consul provides several key features:

  • Service Discovery - Consul makes it simple for services to register themselves and to discover other services via a DNS or HTTP interface. External services such as SaaS providers can be registered as well.

  • Health Checking - Health Checking enables Consul to quickly alert operators about any issues in a cluster. The integration with service discovery prevents routing traffic to unhealthy hosts and enables service level circuit breakers.

  • Key/Value Storage - A flexible key/value store enables storing dynamic configuration, feature flagging, coordination, leader election and more. The simple HTTP API makes it easy to use anywhere.

  • Multi-Datacenter - Consul is built to be datacenter aware, and can support any number of regions without complex configuration.

  • Service Segmentation - Consul Connect enables secure service-to-service communication with automatic TLS encryption and identity-based authorization.

Consul runs on Linux, Mac OS X, FreeBSD, Solaris, and Windows. A commercial version called Consul Enterprise is also available.

Please note: We take Consul's security and our users' trust very seriously. If you believe you have found a security issue in Consul, please responsibly disclose by contacting us at security@hashicorp.com.

Quick Start

An extensive quick start is viewable on the Consul website:

https://www.consul.io/intro/getting-started/install.html

Documentation

Full, comprehensive documentation is viewable on the Consul website:

https://www.consul.io/docs

Contributing

Thank you for your interest in contributing! Please refer to CONTRIBUTING.md for guidance.