Ansible role for SMART disk metrics https://github.com/prometheus-community/smartctl_exporter
Go to file
Jakub Sokołowski 76aba3c459
service: restart when binary version changes
Signed-off-by: Jakub Sokołowski <jakub@status.im>
2024-05-28 10:12:30 +02:00
defaults service: upgrade from 0.9.1 to 0.12.0 2024-05-28 10:08:54 +02:00
handlers add consuls service defintion, use handlers 2023-03-17 16:22:48 +01:00
meta add Ansible metadata 2023-03-17 16:23:21 +01:00
tasks service: restart when binary version changes 2024-05-28 10:12:30 +02:00
templates consul: allow disabling consul healthcheck 2023-05-15 19:49:12 +02:00
README.md add basic README file 2023-03-17 16:12:57 +01:00

README.md

Description

This role configures smartctl_exporter tool to export SMART metrics for hard drives on the host.

Configuration

The defaults are sane, but a basic config could include:

smart_metrics_log_level: 'debug'
smart_metrics_log_format: 'logfmt'
smart_metrics_refresh_interval: '60s'
smart_metrics_listen_port: 9633
smart_metrics_listen_address: 0.0.0.0
smart_metrics_telemetry_path: '/metrics'

Optionally you can specify list of devices to monitor:

smart_metrics_smartctl_devices: ['/dev/sda']

Or include and exclude rules:

smart_metrics_smartctl_devices_exclude: 'vd.*'
smart_metrics_smartctl_devices_include: 'sd.*'

Usage

You can just query the /metrics endpoint:

 > curl -s localhost:9633/metrics | grep smartctl_device_media_errors
# HELP smartctl_device_media_errors Contains the number of occurrences where the controller detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC checksum failure, or LBA tag mismatch are included in this field
# TYPE smartctl_device_media_errors counter
smartctl_device_media_errors{device="sda"} 0