Prometheus Alertmanager Routing Silencing 2025

Alertmanager Konzepte

Alert-Lifecycle:
1. Prometheus evaluiert Alert-Regel
2. Trigger → Alert geht zu Alertmanager
3. Alertmanager: Grouping (gleiche Alerts zusammenfassen)
4. Route: Wer bekommt welchen Alert?
5. Receiver: E-Mail / Slack / PagerDuty
6. Inhibit: Wenn "Root Cause" Alert da, andere unterdrücken
7. Silence: Geplante Wartung = keine Alerts

Vollständige alertmanager.yml

# alertmanager.yml
global:
  smtp_smarthost: 'mail.firma.de:587'
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: 'passwort'
  slack_api_url: 'https://hooks.slack.com/services/...'

templates:
  - '/etc/alertmanager/templates/*.tmpl'

route:
  # Default: alles an E-Mail
  receiver: 'email-ops'
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h

  routes:
    # Kritische Alerts: Sofort PagerDuty
    - matchers:
        - severity="critical"
      receiver: 'pagerduty-ops'
      group_wait: 0s
      continue: false

    # Datenbank-Alerts: DBA-Team Slack
    - matchers:
        - team="database"
      receiver: 'slack-dba'
      group_interval: 10m

    # Nachts: Nur Kritisches
    - matchers:
        - severity!="critical"
      active_time_intervals:
        - nights
      receiver: 'blackhole'  # Ignorieren!

receivers:
  - name: 'email-ops'
    email_configs:
      - to: '[email protected]'
        send_resolved: true

  - name: 'slack-dba'
    slack_configs:
      - channel: '#alerts-database'
        title: '{{ template "slack.title" . }}'
        text: '{{ template "slack.text" . }}'
        send_resolved: true

  - name: 'pagerduty-ops'
    pagerduty_configs:
      - service_key: 'PAGERDUTY_SERVICE_KEY'

  - name: 'blackhole'

time_intervals:
  - name: nights
    time_intervals:
      - times:
          - start_time: '22:00'
            end_time: '07:00'
        weekdays: ['monday:friday']

inhibit_rules:
  # Wenn Node down, keine einzelnen Service-Alerts
  - source_matchers:
      - alertname="NodeDown"
    target_matchers:
      - severity="warning"
    equal: ['instance']

Silences via API

# Silence für Wartungsfenster erstellen
curl -X POST http://alertmanager:9093/api/v2/silences     -H "Content-Type: application/json"     -d '{
        "matchers": [
            {"name": "instance", "value": "prod-db-01:9100", "isRegex": false}
        ],
        "startsAt": "2025-06-15T22:00:00Z",
        "endsAt": "2025-06-16T02:00:00Z",
        "createdBy": "admin",
        "comment": "Wartungsfenster: DB-Migration"
    }'

# Silences anzeigen
curl -s http://alertmanager:9093/api/v2/silences | jq .

# Alertmanager-Status
curl -s http://alertmanager:9093/api/v2/status | jq .

Custom Alert-Templates

{{/* /etc/alertmanager/templates/custom.tmpl */}}
{{ define "slack.title" -}}
[{{ .Status | toUpper }}] {{ .GroupLabels.alertname }}
{{- end }}

{{ define "slack.text" -}}
{{ range .Alerts }}
*Host:* {{ .Labels.instance }}
*Severity:* {{ .Labels.severity }}
*Description:* {{ .Annotations.description }}
{{ if .Labels.runbook }}*Runbook:* {{ .Labels.runbook }}{{ end }}
{{ end }}
{{- end }}

FAQ

Was ist der Unterschied zwischen Inhibit und Silence?
Inhibit: automatisch (wenn Alert A da, unterdrücke Alert B). Silence: manuell (für geplante Wartung).

Fazit

Sauber konfigurierter Alertmanager verhindert Alert-Fatigue: richtige Person, zum richtigen Zeitpunkt, für den richtigen Kontext.

Alerting-Konfiguration für KMU in Heidelberg, Mannheim und der Rhein-Neckar-Region. Anfragen.

Prometheus Alertmanager – Routing und Silencing 2025