consul/website/content/docs/intro/vs/nagios.mdx

44 lines
2.1 KiB
Plaintext

---
layout: docs
page_title: 'Consul vs. Nagios'
description: >-
Nagios is a tool built for monitoring. It is used to quickly
notify operators when an issue occurs.
---
# Consul vs. Nagios
Nagios is a tool built for monitoring. It is used to quickly notify
operators when an issue occurs.
Nagios uses a group of central servers that are configured to perform
checks on remote hosts. This design makes it difficult to scale Nagios,
as large fleets quickly reach the limit of vertical scaling, and Nagios
does not easily scale horizontally. Nagios is also notoriously
difficult to use with modern DevOps and configuration management tools,
as local configurations must be updated when remote servers are added
or removed.
Consul provides the same health checking abilities as Nagios,
is friendly to modern DevOps, and avoids the inherent scaling issues.
Consul runs all checks locally, avoiding placing a burden on central servers.
The status of checks is maintained by the Consul servers, which are fault
tolerant and have no single point of failure. Lastly, Consul can scale to
vastly more checks because it relies on edge-triggered updates. This means
that an update is only triggered when a check transitions from "passing"
to "failing" or vice versa.
In a large fleet, the majority of checks are passing, and even the minority
that are failing are persistent. By capturing changes only, Consul reduces
the amount of networking and compute resources used by the health checks,
allowing the system to be much more scalable.
An astute reader may notice that if a Consul agent dies, then no edge triggered
updates will occur. From the perspective of other nodes, all checks will appear
to be in a steady state. However, Consul guards against this as well. The
[gossip protocol](/docs/architecture/gossip) used between clients and servers
integrates a distributed failure detector. This means that if a Consul agent fails,
the failure will be detected, and thus all checks being run by that node can be
assumed failed. This failure detector distributes the work among the entire cluster
while, most importantly, enabling the edge triggered architecture to work.