Alertmanager Konzepte
Alert-Lifecycle:
1. Prometheus evaluiert Alert-Regel
2. Trigger → Alert geht zu Alertmanager
3. Alertmanager: Grouping (gleiche Alerts zusammenfassen)
4. Route: Wer bekommt welchen Alert?
5. Receiver: E-Mail / Slack / PagerDuty
6. Inhibit: Wenn "Root Cause" Alert da, andere unterdrücken
7. Silence: Geplante Wartung = keine Alerts
Vollständige alertmanager.yml
# alertmanager.yml
global:
smtp_smarthost: 'mail.firma.de:587'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]'
smtp_auth_password: 'passwort'
slack_api_url: 'https://hooks.slack.com/services/...'
templates:
- '/etc/alertmanager/templates/*.tmpl'
route:
# Default: alles an E-Mail
receiver: 'email-ops'
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
routes:
# Kritische Alerts: Sofort PagerDuty
- matchers:
- severity="critical"
receiver: 'pagerduty-ops'
group_wait: 0s
continue: false
# Datenbank-Alerts: DBA-Team Slack
- matchers:
- team="database"
receiver: 'slack-dba'
group_interval: 10m
# Nachts: Nur Kritisches
- matchers:
- severity!="critical"
active_time_intervals:
- nights
receiver: 'blackhole' # Ignorieren!
receivers:
- name: 'email-ops'
email_configs:
- to: '[email protected]'
send_resolved: true
- name: 'slack-dba'
slack_configs:
- channel: '#alerts-database'
title: '{{ template "slack.title" . }}'
text: '{{ template "slack.text" . }}'
send_resolved: true
- name: 'pagerduty-ops'
pagerduty_configs:
- service_key: 'PAGERDUTY_SERVICE_KEY'
- name: 'blackhole'
time_intervals:
- name: nights
time_intervals:
- times:
- start_time: '22:00'
end_time: '07:00'
weekdays: ['monday:friday']
inhibit_rules:
# Wenn Node down, keine einzelnen Service-Alerts
- source_matchers:
- alertname="NodeDown"
target_matchers:
- severity="warning"
equal: ['instance']
Silences via API
# Silence für Wartungsfenster erstellen
curl -X POST http://alertmanager:9093/api/v2/silences -H "Content-Type: application/json" -d '{
"matchers": [
{"name": "instance", "value": "prod-db-01:9100", "isRegex": false}
],
"startsAt": "2025-06-15T22:00:00Z",
"endsAt": "2025-06-16T02:00:00Z",
"createdBy": "admin",
"comment": "Wartungsfenster: DB-Migration"
}'
# Silences anzeigen
curl -s http://alertmanager:9093/api/v2/silences | jq .
# Alertmanager-Status
curl -s http://alertmanager:9093/api/v2/status | jq .
Custom Alert-Templates
{{/* /etc/alertmanager/templates/custom.tmpl */}}
{{ define "slack.title" -}}
[{{ .Status | toUpper }}] {{ .GroupLabels.alertname }}
{{- end }}
{{ define "slack.text" -}}
{{ range .Alerts }}
*Host:* {{ .Labels.instance }}
*Severity:* {{ .Labels.severity }}
*Description:* {{ .Annotations.description }}
{{ if .Labels.runbook }}*Runbook:* {{ .Labels.runbook }}{{ end }}
{{ end }}
{{- end }}
FAQ
Was ist der Unterschied zwischen Inhibit und Silence?
Inhibit: automatisch (wenn Alert A da, unterdrücke Alert B). Silence: manuell (für geplante Wartung).
Fazit
Sauber konfigurierter Alertmanager verhindert Alert-Fatigue: richtige Person, zum richtigen Zeitpunkt, für den richtigen Kontext.
Alerting-Konfiguration für KMU in Heidelberg, Mannheim und der Rhein-Neckar-Region. Anfragen.