5.0 KiB
layout | page_title | sidebar_current | description |
---|---|---|---|
docs | Anti-Entropy | docs-internals-anti-entropy | This section details the process and use of anti-entropy in Consul. |
Anti-Entropy
Consul uses an advanced method of maintaining service and health information. This page details how services and checks are registered, how the catalog is populated, and how health status information is updated as it changes.
~> Advanced Topic! This page covers technical details of the internals of Consul. You don't need to know these details to effectively operate and use Consul. These details are documented here for those who wish to learn about them without having to go spelunking through the source code.
Components
It is important to first understand the moving pieces involved in services and health checks: the agent and the catalog. These are described conceptually below to make anti-entropy easier to understand.
Agent
Each Consul agent maintains its own set of service and check registrations, as well as health information. The agents are responsible for executing their own health checks and updating their local state.
Services and checks within the context of an agent have a rich set of configuration options available. This is because the agent is responsible for generating information about its services and their health through the use of health checks.
Catalog
Consul's service discovery is backed by a service catalog. This catalog is formed by aggregating information submitted by the agents. The catalog maintains the high-level view of the cluster, including which services are available, which nodes run those services, health information, and more. The catalog is used to expose this information via the various interfaces Consul provides, including DNS and HTTP.
Services and checks within the context of the catalog have a much more limited set of fields when compared with the agent. This is because the catalog is only responsible for recording and returning information about services, nodes, and health.
The catalog is maintained only by server nodes. This is because the catalog is replicated via the Raft log to provide a consolidated and consistent view of the cluster.
Anti-Entropy
Now that we have covered the functions of the agent and the catalog, we need a mechanism to perform synchronization between the two. In Consul, this method is known as anti-entropy.
Anti-entropy is a syncronization of the local agent state and the catalog. For example, when a user registers a new service or check with the agent, the agent in turn notifies the catalog that this new check exists. Similarly, when a check is deleted from the agent, it is consequently removed from the catalog as well.
Anti-entropy is also used to update availability information. As agents run their health checks, their status may change, in which case their new status is synced to the catalog. Using this information, the catalog can respond intelligently to queries about its nodes and services based on their availability.
During this synchronization, the catalog is also checked for correctness. If any services or checks exist in the catalog that the agent is not aware of, they will be automatically removed to make the catalog reflect the proper set of services and health information for that agent.
Periodic Synchronization
In addition to running when changes to the agent occur, anti-entropy is also a long-running process which periodically wakes up to sync service and check status to the catalog. This ensures that the catalog closely matches the agent's true state.
The amount of time between periodic anti-entropy runs will vary based on cluster size to avoid saturation. The table below describes the periodic sync times and how they change as the Consul cluster grows.
Cluster Size | Periodic Sync Interval |
---|---|
1 - 128 | 15 seconds |
129 - 256 | 30 seconds |
257 - 512 | 45 seconds |
513 - 1024 | 1 minute |
... | ... |
The intervals above are approximate. Each Consul agent will choose a randomly staggered start time within the interval window to avoid a thundering herd.
Best-effort sync
Anti-entropy can fail in a number of cases, including misconfiguration of the agent or its operating environment, I/O problems (full disk, filesystem permission, etc.), networking problems (agent cannot communicate with server), among others. Because of this, the agent attempts to sync in best-effort fashion.
If an error is encountered during an anti-entropy run, the error is logged and the agent continues to run. It is possible that it will be reconciled on a later attempt by the periodic anti-entropy sync.