consul/website/source/docs/internals/gossip.html.markdown
2016-09-14 10:09:23 -07:00

3.4 KiB

layout page_title sidebar_current description
docs Gossip Protocol docs-internals-gossip Consul uses a gossip protocol to manage membership and broadcast messages to the cluster. All of this is provided through the use of the Serf library. The gossip protocol used by Serf is based on SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol, with a few minor adaptations.

Gossip Protocol

Consul uses a gossip protocol to manage membership and broadcast messages to the cluster. All of this is provided through the use of the Serf library. The gossip protocol used by Serf is based on "SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol", with a few minor adaptations. There are more details about Serf's protocol here.

~> Advanced Topic! This page covers technical details of the internals of Consul. You don't need to know these details to effectively operate and use Consul. These details are documented here for those who wish to learn about them without having to go spelunking through the source code.

Gossip in Consul

Consul makes use of two different gossip pools. We refer to each pool as the LAN or WAN pool respectively. Each datacenter Consul operates in has a LAN gossip pool containing all members of the datacenter, both clients and servers. The LAN pool is used for a few purposes. Membership information allows clients to automatically discover servers, reducing the amount of configuration needed. The distributed failure detection allows the work of failure detection to be shared by the entire cluster instead of concentrated on a few servers. Lastly, the gossip pool allows for reliable and fast event broadcasts for events like leader election.

The WAN pool is globally unique, as all servers should participate in the WAN pool regardless of datacenter. Membership information provided by the WAN pool allows servers to perform cross datacenter requests. The integrated failure detection allows Consul to gracefully handle an entire datacenter losing connectivity, or just a single server in a remote datacenter.

All of these features are provided by leveraging Serf. It is used as an embedded library to provide these features. From a user perspective, this is not important, since the abstraction should be masked by Consul. It can be useful however as a developer to understand how this library is leveraged.

Lifeguard Enhancements

Consul's Gossip Protocol, shared with HashiCorp's Serf, forms the basis of its powerful distributed failure detector. Unfortunately, the algorithm on which it is based makes an assumption that the local node is healthy in the sense that soft real-time processing of packets is possible. In Consul versions prior to 0.7, this can sometimes manifest in a severely degraded node falsely accusing others of being failed, causing occasional flaps in the serfHealth check status for one or more healthy nodes. Consul 0.7 adds two Serf Lifeguard enhancements to this algorithm to help performance in the presence of degraded nodes.

Please see Serf's gossip protocol guide for more details.