Jakub Sokołowski a0f4274691
config: move generate main file from variable
This way we can provide it all in `group_vars` for given fleet.

Part of:
https://github.com/status-im/infra-hq/issues/178

Signed-off-by: Jakub Sokołowski <jakub@status.im>
2025-02-06 23:12:54 +01:00
2020-12-07 12:38:42 +01:00

Description

This role configures AlertManager to notify people of threshold breaches in rules configured in Prometheus master instance.

Configuration

The bare minimum should be:

alertmanager_domain: 'alerts.example.org'
alertmanager_config
  global:
    smtp_from:          'alerts@example.org'
    smtp_smarthost:     'smtp.mail.example.org'
    smtp_auth_username: 'secret-smtp-user'
    smtp_auth_password: 'secret-smtp-pass'
    smtp_require_tls: true

  receivers:
    - name: 'admin-email'
      email_configs:
        - to: 'admin@example.org'
          send_resolved: true

To use VictorOps you will need to create an alert-manager routing rule:

alertmanager_config
  receivers:
    - name: 'victorops-alerts-critical'
      victorops_configs:
        message_type:    'CRITICAL'
        routing_key:     'alert-manager'
        monitoring_tool: 'Prometheus'
        entity_display_name: >-
          {% raw %}
          {{ .CommonLabels.datacenter }}.{{ .GroupLabels.fleet }} ({{ .GroupLabels.alertname }})
          {% endraw %}

There is also optional OAuth Proxy configuration:

alertmanager_oauth_id: '123qwe123qwe'
alertmanager_oauth_secret: '123qwe123qwe123qwe123qwe'
alertmanager_oauth_cookie_secret: '123qwe'
alertmanager_oauth_gh_org: 'my-gh-org'

Management

You can manage existing alerts by using the amtool on any of the hosts running this:

 > amtool alert
Alertname        Starts At                Summary
Test_Alert       2018-07-06 18:30:18 UTC  This is a testing alert!
 > amtool silence
ID                                    Matchers                Starts At                Ends At                  Updated At               Created By  Comment  
9635b573-5177-4601-a3b0-ac6a25d0a4ef  alertname=InstanceDown  2018-07-06 12:37:04 UTC  2018-07-06 14:36:05 UTC  2018-07-06 12:37:04 UTC  jakubgs     test

Details

AlertManager runs in a cluster to achieve high availability. The peer connect via WireGuard VPN. The service listens on :9093 and the Prometheus instance connects to that port via the VPN to inform it of threshold breaches.

The main configuration resides in templates/alertmanager.yml.j2. It configures all the receivers of alerts generated by Prometheus master instance.

The are three main sections:

  • global - Configure general auth related options for SMTP and Slack receivers.
  • receivers - Defines destinations of alets which can be used in the route section.
  • route - Defines rules based on which alerts are directed to defined receivers.

For more details see: https://prometheus.io/docs/alerting/configuration/

Description
Ansible role for configuring AlertManager for Prometheus
Readme
Languages
Jinja 100%