Stack-Überblick
Prometheus: Metriken-Datenbank (Pull-basiert)
Node Exporter: Linux-Metriken (CPU, RAM, Disk)
Alertmanager: Benachrichtigungen (E-Mail, Slack)
Grafana: Dashboards und Visualisierung
cAdvisor: Docker-Container-Metriken
Installation mit Docker Compose
# docker-compose.yml
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./rules:/etc/prometheus/rules
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
- '--web.enable-lifecycle'
ports:
- "9090:9090"
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: grafana
volumes:
- grafana-data:/var/lib/grafana
environment:
GF_SECURITY_ADMIN_PASSWORD: sicheres-passwort
GF_USERS_ALLOW_SIGN_UP: false
ports:
- "3000:3000"
restart: unless-stopped
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
ports:
- "9100:9100"
restart: unless-stopped
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker:/var/lib/docker:ro
ports:
- "8080:8080"
restart: unless-stopped
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "9093:9093"
restart: unless-stopped
volumes:
prometheus-data:
grafana-data:
prometheus.yml
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- 'rules/*.yml'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
labels:
server: 'web-server-01'
- targets: ['192.168.1.101:9100']
labels:
server: 'db-server-01'
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'nginx'
static_configs:
- targets: ['192.168.1.10:9113'] # nginx-exporter
Alert-Regeln
# rules/server-alerts.yml
groups:
- name: server
rules:
- alert: HighCPU
expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 5m
labels:
severity: warning
annotations:
summary: "Hohe CPU-Auslastung auf {{ $labels.instance }}"
description: "CPU: {{ $value | humanize }}%"
- alert: LowDiskSpace
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 10
for: 1m
labels:
severity: critical
annotations:
summary: "Wenig Speicherplatz auf {{ $labels.instance }}"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 5m
labels:
severity: warning
Alertmanager E-Mail
# alertmanager.yml
global:
smtp_smarthost: 'mail.firma.de:587'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]'
smtp_auth_password: 'passwort'
route:
group_by: ['alertname', 'instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: '[email protected]'
subject: '[Alert] {{ .GroupLabels.alertname }}'
Grafana Dashboards
Grafana → Dashboards → Import:
ID 1860 – Node Exporter Full (beliebtestes Dashboard)
ID 893 – Docker and system monitoring
ID 3662 – Prometheus 2.0 Overview
Eigenes Dashboard:
+ → Visualization → Time Series
PromQL Query: node_cpu_seconds_total
FAQ
Wie lange speichert Prometheus Metriken?
Standard: 15 Tage. Mit --storage.tsdb.retention.time=30d auf 30 Tage. Für Langzeit-Speicherung: Thanos oder VictoriaMetrics.
Fazit
Der Prometheus + Grafana Stack ist der Industriestandard für Server-Monitoring: mächtig, kostenlos und mit tausenden vorgefertigten Dashboards.
Monitoring und Observability für KMU in Heidelberg, Mannheim und der Rhein-Neckar-Region. Beratung anfragen.