diff --git a/website/source/docs/internals/anti-entropy.html.markdown b/website/source/docs/internals/anti-entropy.html.markdown new file mode 100644 index 0000000000..8f9e9469a1 --- /dev/null +++ b/website/source/docs/internals/anti-entropy.html.markdown @@ -0,0 +1,134 @@ +--- +layout: "docs" +page_title: "Anti-Entropy" +sidebar_current: "docs-internals-anti-entropy" +description: > + This section details the process and use of anti-entropy in Consul. +--- + +# Anti-Entropy + +Consul uses an advanced method of maintaining service and health information. +This page details how services and checks are registered, how the catalog is +populated, and how health status information is updated as it changes. + +~> **Advanced Topic!** This page covers technical details of +the internals of Consul. You don't need to know these details to effectively +operate and use Consul. These details are documented here for those who wish +to learn about them without having to go spelunking through the source code. + +### Components + +It is important to first understand the moving pieces involved in services and +health checks: the [agent](#agent) and the [catalog](#catalog). These are +described conceptually below to make anti-entropy easier to understand. + + +#### Agent + +Each Consul agent maintains its own set of service and check registrations as +well as health information. The agents are responsible for executing their own +health checks and updating their local state. + +Services and checks within the context of an agent have a rich set of +configuration options available. This is because the agent is responsible for +generating information about its services and their health through the use of +[health checks](/docs/agent/checks.html). + + +#### Catalog + +Consul's service discovery is backed by a service catalog. This catalog is +formed by aggregating information submitted by the agents. The catalog maintains +the high-level view of the cluster, including which services are available, +which nodes run those services, health information, and more. The catalog is +used to expose this information via the various interfaces Consul provides, +including DNS and HTTP. + +Services and checks within the context of the catalog have a much more limited +set of fields when compared with the agent. This is because the catalog is only +responsible for recording and returning information *about* services, nodes, and +health. + +The catalog is maintained only by server nodes. This is because the catalog is +replicated via the [Raft log](/docs/internals/consensus.html) to provide a +consolidated and consistent view of the cluster. + + +### Anti-Entropy + +Consul has a clear separation between the global service catalog and the agent +local state as discussed above. Reconciling these two is done using an +anti-entropy mechanism. + +Anti-entropy is a syncronization of the local agent state and the catalog. For +example, when a user registers a new service or check with the agent, the agent +in turn notifies the catalog that this new check exists. Similarly, when a check +is deleted from the agent, it is consequently removed from the catalog as well. + +Anti-entropy is also used to update availability information. As agents run +their health checks, their status may change, in which case their new status +is synced to the catalog. Using this information, the catalog can respond +intelligently to queries about its nodes and services based on their +availability. + +During this synchronization, the catalog is also checked for correctness. If +any services or checks exist in the catalog that the agent is not aware of, they +will be automatically removed to make the catalog reflect the proper set of +services and health information for that agent. Consul treats the state of the +agent as authoritative, meaning if there are any differences between the agent +and catalog view, the agent local view will always be used. + +### Periodic Synchronization + +In addition to running when changes to the agent occur, anti-entropy is also a +long-running process which periodically wakes up to sync service and check +status to the catalog. This ensures that the catalog closely matches the agent's +true state. This also allows Consul to re-populate the service catalog even in +the case of complete data loss. + +The amount of time between periodic anti-entropy runs will vary based on cluster +size to avoid saturation. The table below describes the periodic sync times and +how they change as the Consul cluster grows. + + + + + + + + + + + + + + + + + + + + + + + + + + +
Cluster SizePeriodic Sync Interval
1 - 1281 minute
129 - 2561 minutes
257 - 5123 minutes
513 - 10244 minutes
......
+ +The intervals above are approximate. Each Consul agent will choose a randomly +staggered start time within the interval window to avoid a thundering herd. + +### Best-effort sync + +Anti-entropy can fail in a number of cases, including misconfiguration of the +agent or its operating environment, I/O problems (full disk, filesystem +permission, etc.), networking problems (agent cannot communicate with server), +among others. Because of this, the agent attempts to sync in best-effort +fashion. + +If an error is encountered during an anti-entropy run, the error is logged and +the agent continues to run. The anti-entropy mechanism is run periodically to +automatically recover from these types of transient failures. diff --git a/website/source/layouts/docs.erb b/website/source/layouts/docs.erb index 6285a36160..28c1cbb132 100644 --- a/website/source/layouts/docs.erb +++ b/website/source/layouts/docs.erb @@ -46,6 +46,10 @@ ACLs + > + Anti-Entropy + + > Security Model