Ansible role for SMART disk metrics https://github.com/prometheus-community/smartctl_exporter

Go to file

Jakub Sokołowski 76aba3c459 service: restart when binary version changes Signed-off-by: Jakub Sokołowski <jakub@status.im>		2024-05-28 10:12:30 +02:00
defaults	service: upgrade from 0.9.1 to 0.12.0	2024-05-28 10:08:54 +02:00
handlers	add consuls service defintion, use handlers	2023-03-17 16:22:48 +01:00
meta	add Ansible metadata	2023-03-17 16:23:21 +01:00
tasks	service: restart when binary version changes	2024-05-28 10:12:30 +02:00
templates	consul: allow disabling consul healthcheck	2023-05-15 19:49:12 +02:00
README.md	add basic README file	2023-03-17 16:12:57 +01:00

README.md

Description

This role configures smartctl_exporter tool to export SMART metrics for hard drives on the host.

Configuration

The defaults are sane, but a basic config could include:

smart_metrics_log_level: 'debug'
smart_metrics_log_format: 'logfmt'
smart_metrics_refresh_interval: '60s'
smart_metrics_listen_port: 9633
smart_metrics_listen_address: 0.0.0.0
smart_metrics_telemetry_path: '/metrics'

Optionally you can specify list of devices to monitor:

smart_metrics_smartctl_devices: ['/dev/sda']

Or include and exclude rules:

smart_metrics_smartctl_devices_exclude: 'vd.*'
smart_metrics_smartctl_devices_include: 'sd.*'

Usage

You can just query the /metrics endpoint:

 > curl -s localhost:9633/metrics | grep smartctl_device_media_errors
# HELP smartctl_device_media_errors Contains the number of occurrences where the controller detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC checksum failure, or LBA tag mismatch are included in this field
# TYPE smartctl_device_media_errors counter
smartctl_device_media_errors{device="sda"} 0