2014-04-10 14:49:12 -07:00
|
|
|
---
|
2020-09-01 10:14:13 -05:00
|
|
|
layout: docs
|
2021-02-12 10:44:41 -06:00
|
|
|
page_title: 'Consul vs. Nagios'
|
2020-04-07 14:55:19 -04:00
|
|
|
description: >-
|
2021-02-12 10:44:41 -06:00
|
|
|
Nagios is a tool built for monitoring. It is used to quickly
|
2020-04-07 14:55:19 -04:00
|
|
|
notify operators when an issue occurs.
|
2014-04-10 14:49:12 -07:00
|
|
|
---
|
|
|
|
|
2021-02-12 10:44:41 -06:00
|
|
|
# Consul vs. Nagios
|
2014-04-10 14:49:12 -07:00
|
|
|
|
2021-02-12 10:44:41 -06:00
|
|
|
Nagios is a tool built for monitoring. It is used to quickly notify
|
|
|
|
operators when an issue occurs.
|
2014-04-10 14:49:12 -07:00
|
|
|
|
|
|
|
Nagios uses a group of central servers that are configured to perform
|
|
|
|
checks on remote hosts. This design makes it difficult to scale Nagios,
|
|
|
|
as large fleets quickly reach the limit of vertical scaling, and Nagios
|
2014-04-15 23:17:00 -04:00
|
|
|
does not easily scale horizontally. Nagios is also notoriously
|
2014-04-10 14:49:12 -07:00
|
|
|
difficult to use with modern DevOps and configuration management tools,
|
|
|
|
as local configurations must be updated when remote servers are added
|
|
|
|
or removed.
|
|
|
|
|
2021-02-12 10:44:41 -06:00
|
|
|
Consul provides the same health checking abilities as Nagios,
|
|
|
|
is friendly to modern DevOps, and avoids the inherent scaling issues.
|
|
|
|
Consul runs all checks locally, avoiding placing a burden on central servers.
|
|
|
|
The status of checks is maintained by the Consul servers, which are fault
|
|
|
|
tolerant and have no single point of failure. Lastly, Consul can scale to
|
|
|
|
vastly more checks because it relies on edge-triggered updates. This means
|
|
|
|
that an update is only triggered when a check transitions from "passing"
|
|
|
|
to "failing" or vice versa.
|
2014-04-10 14:49:12 -07:00
|
|
|
|
|
|
|
In a large fleet, the majority of checks are passing, and even the minority
|
2014-04-15 23:17:00 -04:00
|
|
|
that are failing are persistent. By capturing changes only, Consul reduces
|
2014-04-10 14:49:12 -07:00
|
|
|
the amount of networking and compute resources used by the health checks,
|
|
|
|
allowing the system to be much more scalable.
|
|
|
|
|
|
|
|
An astute reader may notice that if a Consul agent dies, then no edge triggered
|
2015-03-08 11:19:26 -04:00
|
|
|
updates will occur. From the perspective of other nodes, all checks will appear
|
2014-04-10 14:49:12 -07:00
|
|
|
to be in a steady state. However, Consul guards against this as well. The
|
2020-04-09 19:46:54 -04:00
|
|
|
[gossip protocol](/docs/internals/gossip) used between clients and servers
|
2014-04-10 14:49:12 -07:00
|
|
|
integrates a distributed failure detector. This means that if a Consul agent fails,
|
|
|
|
the failure will be detected, and thus all checks being run by that node can be
|
2015-03-08 11:19:26 -04:00
|
|
|
assumed failed. This failure detector distributes the work among the entire cluster
|
2021-02-12 10:44:41 -06:00
|
|
|
while, most importantly, enabling the edge triggered architecture to work.
|