How to Setup Alertmanager
How to Setup Alertmanager Alertmanager is a critical component of the Prometheus monitoring ecosystem, designed to handle alerts sent by Prometheus servers and route them to the appropriate notification channels. Whether you’re managing cloud infrastructure, on-premise servers, or microservices architectures, effective alerting is non-negotiable for maintaining system reliability and minimizing do
How to Setup Alertmanager
Alertmanager is a critical component of the Prometheus monitoring ecosystem, designed to handle alerts sent by Prometheus servers and route them to the appropriate notification channels. Whether youre managing cloud infrastructure, on-premise servers, or microservices architectures, effective alerting is non-negotiable for maintaining system reliability and minimizing downtime. Alertmanager doesnt just send notificationsit deduplicates, silences, and aggregates alerts, ensuring that your team is alerted only when necessary and with the right context.
Many organizations struggle with alert fatiguereceiving too many notifications, often redundant or low-priorityleading to missed critical incidents. Alertmanager solves this by providing intelligent alert routing based on labels, grouping rules, and inhibition policies. When properly configured, it transforms chaotic alert streams into actionable, prioritized events delivered via email, Slack, PagerDuty, Microsoft Teams, or custom webhooks.
This guide walks you through every step required to set up Alertmanager from scratch, including configuration, integration with Prometheus, testing alerts, and implementing best practices. By the end, youll have a production-ready alerting system that reduces noise, improves response times, and enhances operational resilience.
Step-by-Step Guide
Prerequisites
Before beginning the setup, ensure you have the following:
- A Linux or Unix-based system (Ubuntu 20.04/22.04, CentOS 7/8, or similar)
- Prometheus server installed and running (version 2.0 or higher)
- Basic familiarity with YAML configuration files
- Access to a terminal with sudo privileges
- A notification endpoint (e.g., email server, Slack webhook, PagerDuty integration)
If Prometheus is not yet installed, download it from the official Prometheus downloads page and follow the installation instructions for your platform.
Step 1: Download and Install Alertmanager
Alertmanager is distributed as a standalone binary. Visit the Alertmanager GitHub releases page and select the latest stable version compatible with your system architecture (typically amd64 for most servers).
For Ubuntu/Debian systems, use the following commands:
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz
cd alertmanager-0.26.0.linux-amd64
Move the binary to a system-wide location and create a symbolic link for easy access:
sudo mv alertmanager /usr/local/bin/
sudo mv amtool /usr/local/bin/
Verify the installation:
alertmanager --version
You should see output similar to:
alertmanager, version 0.26.0 (branch: HEAD, revision: 99826564436502876069525311115054386a4671)
build user: root@e4694909242c
build date: 20230821-12:28:01
go version: go1.20.7
platform: linux/amd64
Step 2: Create Alertmanager Configuration File
The core of Alertmanagers behavior is defined in its configuration file, typically named alertmanager.yml. Create this file in a dedicated directory:
sudo mkdir -p /etc/alertmanager
sudo nano /etc/alertmanager/alertmanager.yml
Below is a minimal but functional configuration template:
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'your-email@gmail.com'
smtp_auth_username: 'your-email@gmail.com'
smtp_auth_password: 'your-app-password'
smtp_hello: 'localhost'
smtp_require_tls: true
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'ops-team@yourcompany.com'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'cluster', 'service']
Lets break down each section:
- global: Defines default settings for all alerts, including SMTP credentials for email delivery, timeout durations, and TLS requirements.
- route: Determines how alerts are grouped and routed. The
group_byfield ensures alerts with matching labels (e.g., same alert name, cluster, and service) are bundled together.group_waitdelays initial notification to allow grouping;group_intervalsets the time between subsequent notifications for the same group;repeat_intervaldefines how often a resolved alert is re-notified if still firing. - receivers: Defines where alerts are sent. In this example, email is configured. You can add multiple receivers for different teams or channels.
- inhibit_rules: Prevents low-severity alerts from triggering if a higher-severity alert already exists for the same context. For example, if a critical service outage alert fires, all related warning alerts (e.g., high CPU) are suppressed.
Note: If using Gmail, generate an App Password instead of your account password. Enable 2FA on your Google account and generate the app password under Security ? 2-Step Verification ? App passwords.
Step 3: Configure Prometheus to Send Alerts to Alertmanager
Prometheus must be configured to forward alerts to Alertmanager. Edit your Prometheus configuration file (usually /etc/prometheus/prometheus.yml):
sudo nano /etc/prometheus/prometheus.yml
Add or update the alerting section:
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
Ensure the port (9093) matches Alertmanagers default listener port. If Alertmanager is running on a different host, replace localhost with the servers IP or hostname.
Also, verify that alerting rules are defined in your Prometheus configuration. Create a rules file if needed:
sudo mkdir -p /etc/prometheus/rules
sudo nano /etc/prometheus/rules/alerts.rules
Add a sample alert rule:
groups:
- name: example
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{job="myapp"} > 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "High request latency detected"
description: "Job {{ $labels.job }} has a 5-minute average request latency above 0.5 seconds."
Then, include the rules file in your Prometheus configuration:
rule_files:
- "/etc/prometheus/rules/alerts.rules"
Restart Prometheus to apply changes:
sudo systemctl restart prometheus
Step 4: Create a Systemd Service for Alertmanager
To ensure Alertmanager starts automatically on boot and restarts on failure, create a systemd service file:
sudo nano /etc/systemd/system/alertmanager.service
Paste the following:
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/alertmanager \
--config.file=/etc/alertmanager/alertmanager.yml \
--storage.path=/var/lib/alertmanager \
--web.listen-address=:9093 \
--web.route-prefix=/
Restart=always
[Install]
WantedBy=multi-user.target
Create the user and data directory:
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /var/lib/alertmanager
sudo chown prometheus:prometheus /var/lib/alertmanager
Reload systemd and start Alertmanager:
sudo systemctl daemon-reload
sudo systemctl start alertmanager
sudo systemctl enable alertmanager
Verify the service status:
sudo systemctl status alertmanager
You should see active (running) with no errors.
Step 5: Access the Alertmanager Web UI
Alertmanager includes a built-in web interface that provides real-time visibility into active alerts, silences, and inhibition rules. By default, it listens on port 9093.
Open your browser and navigate to:
http://your-server-ip:9093
Youll see a dashboard with tabs for:
- Alerts: Lists all active alerts, grouped by labels.
- Silences: View and create temporary alert suppressions.
- Status: Shows configuration health, version, and cluster status (if running in HA mode).
Use this UI to test your configuration. You can manually trigger an alert via the Prometheus UI or wait for the configured rule to fire. Once triggered, you should see the alert appear in the Alertmanager UI and receive the configured notification (e.g., email).
Step 6: Test Alert Routing
To confirm everything is working, force a test alert using the amtool CLI utility:
amtool alert add \
--summary="Test Alert" \
--description="This is a test alert from amtool" \
--label="severity=critical" \
--label="instance=test-server"
Check the Alertmanager UI. The alert should appear immediately. Then, check your email or configured notification channel. You should receive a notification with the summary and description.
To clear the alert:
amtool alert delete --label="summary=Test Alert"
Verify that a resolved notification is sent if send_resolved: true is configured in your receiver.
Step 7: Secure Alertmanager with Reverse Proxy (Optional but Recommended)
Exposing Alertmanager directly on port 9093 is not secure for production. Use a reverse proxy like Nginx to add TLS encryption and authentication.
Install Nginx:
sudo apt update
sudo apt install nginx -y
Obtain an SSL certificate using Lets Encrypt (Certbot):
sudo apt install certbot python3-certbot-nginx -y
sudo certbot --nginx -d alertmanager.yourdomain.com
Configure Nginx to proxy requests to Alertmanager:
sudo nano /etc/nginx/sites-available/alertmanager
Add:
server {
listen 443 ssl;
server_name alertmanager.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/alertmanager.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/alertmanager.yourdomain.com/privkey.pem;
location / {
proxy_pass http://localhost:9093;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Enable the site:
sudo ln -s /etc/nginx/sites-available/alertmanager /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx
Now access Alertmanager securely at https://alertmanager.yourdomain.com.
Best Practices
1. Use Meaningful Labels and Annotations
Alerts are only as useful as the metadata they carry. Always define clear, consistent labels such as severity, service, cluster, and environment. Use annotations for human-readable details like summary and description, which appear in notifications.
Example:
labels:
severity: critical
service: database
cluster: prod-us-east
annotations:
summary: "Database cluster prod-us-east is unreachable"
description: "All nodes in cluster prod-us-east are down. Check replication status."
These labels enable intelligent grouping and routing in Alertmanager.
2. Implement Alert Inhibition Rules
Alert fatigue is one of the biggest causes of operational failure. Use inhibition rules to prevent redundant alerts. For example, if a node down alert fires, suppress all related high CPU, disk full, or network latency alerts from that node. This reduces noise and helps teams focus on root causes.
3. Group Alerts by Logical Context
Grouping alerts by job, instance, or service ensures that a single incident doesnt trigger 50 separate notifications. For instance, if a Kubernetes pod restarts, group all related container alerts under one group instead of flooding the team with individual container alerts.
4. Set Appropriate Timeouts
Adjust group_wait, group_interval, and repeat_interval based on your SLAs. For critical systems, a group_wait of 1030 seconds is acceptable. For non-critical alerts, extend it to 25 minutes to allow for automatic recovery.
Never set repeat_interval too low. A 5-minute repeat for a critical alert is often sufficient; hourly repeats are better for warnings.
5. Use Multiple Receivers for Escalation
Implement tiered alerting. For example:
- First tier: On-call engineer via Slack
- Second tier: Manager via email after 15 minutes
- Third tier: PagerDuty if unresolved after 1 hour
Use Alertmanagers routing tree to achieve this:
route:
receiver: 'slack-notifications'
routes:
- receiver: 'email-notifications'
group_wait: 15m
match_re:
severity: warning
- receiver: 'pagerduty-notifications'
group_wait: 1h
match_re:
severity: critical
6. Enable Alert Resolution Notifications
Always set send_resolved: true in your receivers. Knowing when an alert has been resolved is as important as knowing when it fired. It provides closure and helps with post-mortem analysis.
7. Avoid Over-Monitoring
Not every metric needs an alert. Focus on business-impacting indicators: service availability, error rates, latency percentiles, and resource exhaustion. Avoid alerting on metrics that self-correct within seconds (e.g., brief CPU spikes).
8. Test and Simulate Alerts Regularly
Run monthly alerting drills. Use amtool to simulate critical alerts and verify notification delivery, routing, and resolution. Document what worked and what didnt.
9. Secure Your Configuration Files
Never commit secrets like SMTP passwords or webhook URLs to version control. Use environment variables or secrets managers like HashiCorp Vault or Kubernetes Secrets.
Modify your systemd service to use environment variables:
EnvironmentFile=-/etc/alertmanager/env
ExecStart=/usr/local/bin/alertmanager \
--config.file=/etc/alertmanager/alertmanager.yml \
--storage.path=/var/lib/alertmanager \
--web.listen-address=:9093
Create /etc/alertmanager/env:
SMTP_PASSWORD=your_app_password_here
SMTP_USERNAME=your-email@gmail.com
Then reference them in alertmanager.yml:
smtp_auth_password: ${SMTP_PASSWORD}
smtp_auth_username: ${SMTP_USERNAME}
10. Monitor Alertmanager Itself
Alertmanager exposes metrics at /metrics. Set up a Prometheus job to scrape Alertmanagers metrics:
- job_name: 'alertmanager'
static_configs:
- targets: ['localhost:9093']
Then create an alert to notify you if Alertmanager is down:
- alert: AlertmanagerDown
expr: up{job="alertmanager"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Alertmanager is down"
description: "Alertmanager has been unreachable for 5 minutes."
Tools and Resources
Official Documentation
- Alertmanager Documentation The authoritative source for configuration options and features.
- Prometheus Alerting Rules Learn how to define effective alert conditions.
Configuration Validators
- amtool config check Validate your YAML configuration before restarting:
amtool config check /etc/alertmanager/alertmanager.yml
- YAML Lint Online tool to validate YAML syntax and indentation.
Notification Integrations
- Slack: Use Incoming Webhooks. Generate a webhook URL from your Slack app settings.
- PagerDuty: Use the Alertmanager PagerDuty integration via webhook endpoint provided by PagerDuty.
- Microsoft Teams: Use a Connector Webhook URL from your Teams channel.
- Discord: Use a Webhook URL from your Discord server settings.
- Webhooks: Send alerts to custom apps via HTTP POST. Useful for internal ticketing systems or custom scripts.
Community Templates
- Official Alertmanager Examples Real-world configs for various use cases.
- CloudAlchemy Ansible Playbooks Automate Alertmanager and Prometheus deployment.
- Prometheus Operator (Kubernetes) Declarative Alertmanager configuration in Kubernetes environments.
Monitoring Dashboards
- Alertmanager Dashboard (Grafana) Visualize alert volume, resolution times, and receiver performance.
- Prometheus Alerting Dashboard Track alert rule health and firing rates.
Debugging Tools
amtool alert queryList all active alerts from the CLI.amtool silence listView active silences.- Prometheus UI ? Alerts tab See which rules are firing and their labels.
- Alertmanager UI ? Status ? Config View the loaded configuration with resolved variables.
Real Examples
Example 1: Kubernetes Cluster Alerting
Scenario: Youre managing a production Kubernetes cluster and want to be notified if any node becomes unready or if etcd is unhealthy.
Prometheus Rule (k8s-alerts.rules):
- alert: KubernetesNodeNotReady
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
for: 10m
labels:
severity: critical
team: platform
annotations:
summary: "Kubernetes node {{ $labels.node }} is not ready"
description: "Node {{ $labels.node }} has been in NotReady state for more than 10 minutes."
- alert: EtcdMembersDown
expr: etcdserver_members{status="alive"}
for: 5m
labels:
severity: critical
team: platform
annotations:
summary: "Etcd cluster has less than 2 healthy members"
description: "etcd cluster health is compromised. Risk of data loss or split-brain."
Alertmanager Configuration:
route:
group_by: ['alertname', 'team']
group_wait: 15s
group_interval: 5m
repeat_interval: 1h
receiver: 'slack-platform'
receivers:
- name: 'slack-platform'
slack_configs:
- api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'
channel: '
platform-alerts'
send_resolved: true
title: '{{ .CommonLabels.alertname }}'
text: |
*Summary:* {{ .CommonAnnotations.summary }}
*Description:* {{ .CommonAnnotations.description }}
*Labels:* {{ .CommonLabels }}
- name: 'email-platform'
email_configs:
- to: 'platform-team@company.com'
send_resolved: true
headers:
Subject: "[CRITICAL] {{ .CommonLabels.alertname }}"
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'team']
Outcome: Platform team receives a single grouped Slack message for all node issues. If etcd goes critical, no redundant high CPU or low disk alerts from affected nodes appear.
Example 2: Web Application Monitoring
Scenario: A customer-facing web application experiences high error rates. You want to alert only if the 5-minute error rate exceeds 5% and only during business hours.
Prometheus Rule:
- alert: HighErrorRate
expr: sum(rate(http_requests_total{code=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
for: 10m
labels:
severity: warning
service: webapp
annotations:
summary: "Web application error rate exceeds 5%"
description: "Error rate is {{ printf \"%.2f\" $value }}%. Check application logs and deployment status."
Alertmanager Configuration with Time-Based Routing:
route:
group_by: ['alertname', 'service']
group_wait: 30s
group_interval: 10m
repeat_interval: 4h
receiver: 'email-during-business'
routes:
- receiver: 'slack-outside-hours'
match:
time_start: "18:00"
time_end: "08:00"
group_wait: 1m
group_interval: 15m
repeat_interval: 12h
receivers:
- name: 'email-during-business'
email_configs:
- to: 'dev-team@company.com'
send_resolved: true
- name: 'slack-outside-hours'
slack_configs:
- api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'
channel: '
oncall'
send_resolved: true
Outcome: During business hours, developers receive email alerts. After hours, alerts are routed to the on-call engineer via Slack, reducing noise for the team.
Example 3: Multi-Tenant Alerting with Inhibition
Scenario: You manage multiple environments (dev, staging, prod). You want to suppress low disk space alerts in dev if a node down alert is active.
Alertmanager Inhibit Rule:
inhibit_rules:
- source_match:
severity: 'critical'
environment: 'prod'
target_match:
severity: 'warning'
environment: 'prod'
equal: ['alertname', 'instance']
Result: In production, if a node crashes (critical), all related disk usage > 85% warnings are automatically suppressed. In dev, warnings remain active to help developers identify issues early.
FAQs
Q1: Can Alertmanager work without Prometheus?
No, Alertmanager is designed specifically to receive alerts from Prometheus. It does not generate alerts itself. Other systems (e.g., Grafana, VictoriaMetrics) can send alerts to Alertmanager via webhooks, but Prometheus is the standard and most integrated source.
Q2: How do I silence an alert temporarily?
Use the Alertmanager web UI. Click Silence on an active alert, set the duration (e.g., 1 hour), and optionally add a reason. The silence will suppress matching alerts until it expires. You can also use amtool silence add from the CLI.
Q3: What happens if Alertmanager crashes?
Prometheus continues to generate alerts but queues them in memory. When Alertmanager restarts, it reprocesses the queued alerts. To avoid data loss, run Alertmanager in high availability (HA) mode with multiple instances sharing a distributed storage backend like Consul or etcd.
Q4: Can I use Alertmanager with Docker or Kubernetes?
Yes. Alertmanager is commonly deployed as a Docker container or via the Prometheus Operator in Kubernetes. Use Helm charts like prometheus-community/kube-prometheus-stack for automated deployment.
Q5: Why am I not receiving email alerts?
Common causes:
- Incorrect SMTP credentials or port
- Firewall blocking outbound SMTP (port 587 or 465)
- Missing
smtp_require_tls: truefor Gmail - Using account password instead of app password with Gmail
- Alert not firing due to misconfigured Prometheus rule
Check the Alertmanager logs: journalctl -u alertmanager -f
Q6: How do I add a new notification channel like Microsoft Teams?
Add a new receiver in alertmanager.yml:
- name: 'teams-notifications'
webhook_configs:
- url: 'https://outlook.office.com/webhook/your-webhook-id'
send_resolved: true
Then update the route to send matching alerts to this receiver.
Q7: Whats the difference between grouping and inhibition?
Grouping bundles multiple similar alerts into one notification to reduce noise. Inhibition prevents lower-severity alerts from triggering if a higher-severity alert already exists for the same context.
Q8: Can Alertmanager send alerts to SMS or phone calls?
Yes, indirectly. Integrate with services like PagerDuty, Opsgenie, or Twilio via webhook. Alertmanager sends the alert to the service, which then triggers SMS or voice calls.
Q9: How often should I review my alerting rules?
Review alerting rules quarterly. Remove outdated rules, adjust thresholds based on historical data, and ensure annotations remain accurate. Alert fatigue often stems from stale or overly sensitive rules.
Q10: Is Alertmanager suitable for small teams?
Absolutely. Even small teams benefit from clean, grouped, and resolved notifications. Start with email or Slack, and scale to PagerDuty as your infrastructure grows.
Conclusion
Setting up Alertmanager is not just a technical taskits a strategic decision that directly impacts your systems reliability and your teams ability to respond effectively to incidents. A well-configured Alertmanager transforms raw metrics into intelligent, actionable alerts, reducing noise, preventing alert fatigue, and ensuring that the right people are notified at the right time.
In this guide, youve learned how to install Alertmanager, configure it to work seamlessly with Prometheus, define intelligent routing rules, integrate with modern notification platforms, and implement best practices that scale from small deployments to enterprise environments. Youve seen real-world examples that demonstrate how to tailor alerting to different scenariosfrom Kubernetes clusters to web applicationsand you now understand how to troubleshoot common issues.
Remember: Alerting is not a set it and forget it process. Regularly review your rules, test your notifications, and refine your routing based on incident response patterns. The goal is not to alert on everythingbut to alert on the right things, at the right time, with the right context.
With Alertmanager properly configured, youre no longer just monitoring systemsyoure building resilience into your operations. And in todays world of distributed systems and high-availability expectations, thats not just an advantageits a necessity.