How to Setup Alertmanager

How to Setup Alertmanager Alertmanager is a critical component of the Prometheus monitoring ecosystem, designed to handle alerts sent by Prometheus servers and route them to the appropriate notification channels. Whether you’re managing cloud infrastructure, on-premise servers, or microservices architectures, effective alerting is non-negotiable for maintaining system reliability and minimizing do

Nov 6, 2025 - 10:32
Nov 6, 2025 - 10:32
 1

How to Setup Alertmanager

Alertmanager is a critical component of the Prometheus monitoring ecosystem, designed to handle alerts sent by Prometheus servers and route them to the appropriate notification channels. Whether youre managing cloud infrastructure, on-premise servers, or microservices architectures, effective alerting is non-negotiable for maintaining system reliability and minimizing downtime. Alertmanager doesnt just send notificationsit deduplicates, silences, and aggregates alerts, ensuring that your team is alerted only when necessary and with the right context.

Many organizations struggle with alert fatiguereceiving too many notifications, often redundant or low-priorityleading to missed critical incidents. Alertmanager solves this by providing intelligent alert routing based on labels, grouping rules, and inhibition policies. When properly configured, it transforms chaotic alert streams into actionable, prioritized events delivered via email, Slack, PagerDuty, Microsoft Teams, or custom webhooks.

This guide walks you through every step required to set up Alertmanager from scratch, including configuration, integration with Prometheus, testing alerts, and implementing best practices. By the end, youll have a production-ready alerting system that reduces noise, improves response times, and enhances operational resilience.

Step-by-Step Guide

Prerequisites

Before beginning the setup, ensure you have the following:

  • A Linux or Unix-based system (Ubuntu 20.04/22.04, CentOS 7/8, or similar)
  • Prometheus server installed and running (version 2.0 or higher)
  • Basic familiarity with YAML configuration files
  • Access to a terminal with sudo privileges
  • A notification endpoint (e.g., email server, Slack webhook, PagerDuty integration)

If Prometheus is not yet installed, download it from the official Prometheus downloads page and follow the installation instructions for your platform.

Step 1: Download and Install Alertmanager

Alertmanager is distributed as a standalone binary. Visit the Alertmanager GitHub releases page and select the latest stable version compatible with your system architecture (typically amd64 for most servers).

For Ubuntu/Debian systems, use the following commands:

wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz

tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz

cd alertmanager-0.26.0.linux-amd64

Move the binary to a system-wide location and create a symbolic link for easy access:

sudo mv alertmanager /usr/local/bin/

sudo mv amtool /usr/local/bin/

Verify the installation:

alertmanager --version

You should see output similar to:

alertmanager, version 0.26.0 (branch: HEAD, revision: 99826564436502876069525311115054386a4671)

build user: root@e4694909242c

build date: 20230821-12:28:01

go version: go1.20.7

platform: linux/amd64

Step 2: Create Alertmanager Configuration File

The core of Alertmanagers behavior is defined in its configuration file, typically named alertmanager.yml. Create this file in a dedicated directory:

sudo mkdir -p /etc/alertmanager

sudo nano /etc/alertmanager/alertmanager.yml

Below is a minimal but functional configuration template:

global:

resolve_timeout: 5m

smtp_smarthost: 'smtp.gmail.com:587'

smtp_from: 'your-email@gmail.com'

smtp_auth_username: 'your-email@gmail.com'

smtp_auth_password: 'your-app-password'

smtp_hello: 'localhost'

smtp_require_tls: true

route:

group_by: ['alertname', 'cluster', 'service']

group_wait: 30s

group_interval: 5m

repeat_interval: 3h

receiver: 'email-notifications'

receivers:

- name: 'email-notifications'

email_configs:

- to: 'ops-team@yourcompany.com'

send_resolved: true

inhibit_rules:

- source_match:

severity: 'critical'

target_match:

severity: 'warning'

equal: ['alertname', 'cluster', 'service']

Lets break down each section:

  • global: Defines default settings for all alerts, including SMTP credentials for email delivery, timeout durations, and TLS requirements.
  • route: Determines how alerts are grouped and routed. The group_by field ensures alerts with matching labels (e.g., same alert name, cluster, and service) are bundled together. group_wait delays initial notification to allow grouping; group_interval sets the time between subsequent notifications for the same group; repeat_interval defines how often a resolved alert is re-notified if still firing.
  • receivers: Defines where alerts are sent. In this example, email is configured. You can add multiple receivers for different teams or channels.
  • inhibit_rules: Prevents low-severity alerts from triggering if a higher-severity alert already exists for the same context. For example, if a critical service outage alert fires, all related warning alerts (e.g., high CPU) are suppressed.

Note: If using Gmail, generate an App Password instead of your account password. Enable 2FA on your Google account and generate the app password under Security ? 2-Step Verification ? App passwords.

Step 3: Configure Prometheus to Send Alerts to Alertmanager

Prometheus must be configured to forward alerts to Alertmanager. Edit your Prometheus configuration file (usually /etc/prometheus/prometheus.yml):

sudo nano /etc/prometheus/prometheus.yml

Add or update the alerting section:

alerting:

alertmanagers:

- static_configs:

- targets:

- localhost:9093

Ensure the port (9093) matches Alertmanagers default listener port. If Alertmanager is running on a different host, replace localhost with the servers IP or hostname.

Also, verify that alerting rules are defined in your Prometheus configuration. Create a rules file if needed:

sudo mkdir -p /etc/prometheus/rules

sudo nano /etc/prometheus/rules/alerts.rules

Add a sample alert rule:

groups:

- name: example

rules:

- alert: HighRequestLatency

expr: job:request_latency_seconds:mean5m{job="myapp"} > 0.5

for: 10m

labels:

severity: warning

annotations:

summary: "High request latency detected"

description: "Job {{ $labels.job }} has a 5-minute average request latency above 0.5 seconds."

Then, include the rules file in your Prometheus configuration:

rule_files:

- "/etc/prometheus/rules/alerts.rules"

Restart Prometheus to apply changes:

sudo systemctl restart prometheus

Step 4: Create a Systemd Service for Alertmanager

To ensure Alertmanager starts automatically on boot and restarts on failure, create a systemd service file:

sudo nano /etc/systemd/system/alertmanager.service

Paste the following:

[Unit]

Description=Alertmanager

Wants=network-online.target

After=network-online.target

[Service]

Type=simple

User=prometheus

Group=prometheus

ExecStart=/usr/local/bin/alertmanager \

--config.file=/etc/alertmanager/alertmanager.yml \

--storage.path=/var/lib/alertmanager \

--web.listen-address=:9093 \

--web.route-prefix=/

Restart=always

[Install]

WantedBy=multi-user.target

Create the user and data directory:

sudo useradd --no-create-home --shell /bin/false prometheus

sudo mkdir -p /var/lib/alertmanager

sudo chown prometheus:prometheus /var/lib/alertmanager

Reload systemd and start Alertmanager:

sudo systemctl daemon-reload

sudo systemctl start alertmanager

sudo systemctl enable alertmanager

Verify the service status:

sudo systemctl status alertmanager

You should see active (running) with no errors.

Step 5: Access the Alertmanager Web UI

Alertmanager includes a built-in web interface that provides real-time visibility into active alerts, silences, and inhibition rules. By default, it listens on port 9093.

Open your browser and navigate to:

http://your-server-ip:9093

Youll see a dashboard with tabs for:

  • Alerts: Lists all active alerts, grouped by labels.
  • Silences: View and create temporary alert suppressions.
  • Status: Shows configuration health, version, and cluster status (if running in HA mode).

Use this UI to test your configuration. You can manually trigger an alert via the Prometheus UI or wait for the configured rule to fire. Once triggered, you should see the alert appear in the Alertmanager UI and receive the configured notification (e.g., email).

Step 6: Test Alert Routing

To confirm everything is working, force a test alert using the amtool CLI utility:

amtool alert add \

--summary="Test Alert" \

--description="This is a test alert from amtool" \

--label="severity=critical" \

--label="instance=test-server"

Check the Alertmanager UI. The alert should appear immediately. Then, check your email or configured notification channel. You should receive a notification with the summary and description.

To clear the alert:

amtool alert delete --label="summary=Test Alert"

Verify that a resolved notification is sent if send_resolved: true is configured in your receiver.

Step 7: Secure Alertmanager with Reverse Proxy (Optional but Recommended)

Exposing Alertmanager directly on port 9093 is not secure for production. Use a reverse proxy like Nginx to add TLS encryption and authentication.

Install Nginx:

sudo apt update

sudo apt install nginx -y

Obtain an SSL certificate using Lets Encrypt (Certbot):

sudo apt install certbot python3-certbot-nginx -y

sudo certbot --nginx -d alertmanager.yourdomain.com

Configure Nginx to proxy requests to Alertmanager:

sudo nano /etc/nginx/sites-available/alertmanager

Add:

server {

listen 443 ssl;

server_name alertmanager.yourdomain.com;

ssl_certificate /etc/letsencrypt/live/alertmanager.yourdomain.com/fullchain.pem;

ssl_certificate_key /etc/letsencrypt/live/alertmanager.yourdomain.com/privkey.pem;

location / {

proxy_pass http://localhost:9093;

proxy_http_version 1.1;

proxy_set_header Host $host;

proxy_set_header X-Real-IP $remote_addr;

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

proxy_set_header X-Forwarded-Proto $scheme;

}

}

Enable the site:

sudo ln -s /etc/nginx/sites-available/alertmanager /etc/nginx/sites-enabled/

sudo nginx -t

sudo systemctl restart nginx

Now access Alertmanager securely at https://alertmanager.yourdomain.com.

Best Practices

1. Use Meaningful Labels and Annotations

Alerts are only as useful as the metadata they carry. Always define clear, consistent labels such as severity, service, cluster, and environment. Use annotations for human-readable details like summary and description, which appear in notifications.

Example:

labels:

severity: critical

service: database

cluster: prod-us-east

annotations:

summary: "Database cluster prod-us-east is unreachable"

description: "All nodes in cluster prod-us-east are down. Check replication status."

These labels enable intelligent grouping and routing in Alertmanager.

2. Implement Alert Inhibition Rules

Alert fatigue is one of the biggest causes of operational failure. Use inhibition rules to prevent redundant alerts. For example, if a node down alert fires, suppress all related high CPU, disk full, or network latency alerts from that node. This reduces noise and helps teams focus on root causes.

3. Group Alerts by Logical Context

Grouping alerts by job, instance, or service ensures that a single incident doesnt trigger 50 separate notifications. For instance, if a Kubernetes pod restarts, group all related container alerts under one group instead of flooding the team with individual container alerts.

4. Set Appropriate Timeouts

Adjust group_wait, group_interval, and repeat_interval based on your SLAs. For critical systems, a group_wait of 1030 seconds is acceptable. For non-critical alerts, extend it to 25 minutes to allow for automatic recovery.

Never set repeat_interval too low. A 5-minute repeat for a critical alert is often sufficient; hourly repeats are better for warnings.

5. Use Multiple Receivers for Escalation

Implement tiered alerting. For example:

  • First tier: On-call engineer via Slack
  • Second tier: Manager via email after 15 minutes
  • Third tier: PagerDuty if unresolved after 1 hour

Use Alertmanagers routing tree to achieve this:

route:

receiver: 'slack-notifications'

routes:

- receiver: 'email-notifications'

group_wait: 15m

match_re:

severity: warning

- receiver: 'pagerduty-notifications'

group_wait: 1h

match_re:

severity: critical

6. Enable Alert Resolution Notifications

Always set send_resolved: true in your receivers. Knowing when an alert has been resolved is as important as knowing when it fired. It provides closure and helps with post-mortem analysis.

7. Avoid Over-Monitoring

Not every metric needs an alert. Focus on business-impacting indicators: service availability, error rates, latency percentiles, and resource exhaustion. Avoid alerting on metrics that self-correct within seconds (e.g., brief CPU spikes).

8. Test and Simulate Alerts Regularly

Run monthly alerting drills. Use amtool to simulate critical alerts and verify notification delivery, routing, and resolution. Document what worked and what didnt.

9. Secure Your Configuration Files

Never commit secrets like SMTP passwords or webhook URLs to version control. Use environment variables or secrets managers like HashiCorp Vault or Kubernetes Secrets.

Modify your systemd service to use environment variables:

EnvironmentFile=-/etc/alertmanager/env

ExecStart=/usr/local/bin/alertmanager \

--config.file=/etc/alertmanager/alertmanager.yml \

--storage.path=/var/lib/alertmanager \

--web.listen-address=:9093

Create /etc/alertmanager/env:

SMTP_PASSWORD=your_app_password_here

SMTP_USERNAME=your-email@gmail.com

Then reference them in alertmanager.yml:

smtp_auth_password: ${SMTP_PASSWORD}

smtp_auth_username: ${SMTP_USERNAME}

10. Monitor Alertmanager Itself

Alertmanager exposes metrics at /metrics. Set up a Prometheus job to scrape Alertmanagers metrics:

- job_name: 'alertmanager'

static_configs:

- targets: ['localhost:9093']

Then create an alert to notify you if Alertmanager is down:

- alert: AlertmanagerDown

expr: up{job="alertmanager"} == 0

for: 5m

labels:

severity: critical

annotations:

summary: "Alertmanager is down"

description: "Alertmanager has been unreachable for 5 minutes."

Tools and Resources

Official Documentation

Configuration Validators

amtool config check /etc/alertmanager/alertmanager.yml

  • YAML Lint Online tool to validate YAML syntax and indentation.

Notification Integrations

  • Slack: Use Incoming Webhooks. Generate a webhook URL from your Slack app settings.
  • PagerDuty: Use the Alertmanager PagerDuty integration via webhook endpoint provided by PagerDuty.
  • Microsoft Teams: Use a Connector Webhook URL from your Teams channel.
  • Discord: Use a Webhook URL from your Discord server settings.
  • Webhooks: Send alerts to custom apps via HTTP POST. Useful for internal ticketing systems or custom scripts.

Community Templates

Monitoring Dashboards

Debugging Tools

  • amtool alert query List all active alerts from the CLI.
  • amtool silence list View active silences.
  • Prometheus UI ? Alerts tab See which rules are firing and their labels.
  • Alertmanager UI ? Status ? Config View the loaded configuration with resolved variables.

Real Examples

Example 1: Kubernetes Cluster Alerting

Scenario: Youre managing a production Kubernetes cluster and want to be notified if any node becomes unready or if etcd is unhealthy.

Prometheus Rule (k8s-alerts.rules):

- alert: KubernetesNodeNotReady

expr: kube_node_status_condition{condition="Ready",status="true"} == 0

for: 10m

labels:

severity: critical

team: platform

annotations:

summary: "Kubernetes node {{ $labels.node }} is not ready"

description: "Node {{ $labels.node }} has been in NotReady state for more than 10 minutes."

- alert: EtcdMembersDown

expr: etcdserver_members{status="alive"}

for: 5m

labels:

severity: critical

team: platform

annotations:

summary: "Etcd cluster has less than 2 healthy members"

description: "etcd cluster health is compromised. Risk of data loss or split-brain."

Alertmanager Configuration:

route:

group_by: ['alertname', 'team']

group_wait: 15s

group_interval: 5m

repeat_interval: 1h

receiver: 'slack-platform'

receivers:

- name: 'slack-platform'

slack_configs:

- api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ' channel: '

platform-alerts'

send_resolved: true

title: '{{ .CommonLabels.alertname }}'

text: |

*Summary:* {{ .CommonAnnotations.summary }}

*Description:* {{ .CommonAnnotations.description }}

*Labels:* {{ .CommonLabels }}

- name: 'email-platform'

email_configs:

- to: 'platform-team@company.com'

send_resolved: true

headers:

Subject: "[CRITICAL] {{ .CommonLabels.alertname }}"

inhibit_rules:

- source_match:

severity: 'critical'

target_match:

severity: 'warning'

equal: ['alertname', 'team']

Outcome: Platform team receives a single grouped Slack message for all node issues. If etcd goes critical, no redundant high CPU or low disk alerts from affected nodes appear.

Example 2: Web Application Monitoring

Scenario: A customer-facing web application experiences high error rates. You want to alert only if the 5-minute error rate exceeds 5% and only during business hours.

Prometheus Rule:

- alert: HighErrorRate

expr: sum(rate(http_requests_total{code=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05

for: 10m

labels:

severity: warning

service: webapp

annotations:

summary: "Web application error rate exceeds 5%"

description: "Error rate is {{ printf \"%.2f\" $value }}%. Check application logs and deployment status."

Alertmanager Configuration with Time-Based Routing:

route:

group_by: ['alertname', 'service']

group_wait: 30s

group_interval: 10m

repeat_interval: 4h

receiver: 'email-during-business'

routes:

- receiver: 'slack-outside-hours'

match:

time_start: "18:00"

time_end: "08:00"

group_wait: 1m

group_interval: 15m

repeat_interval: 12h

receivers:

- name: 'email-during-business'

email_configs:

- to: 'dev-team@company.com'

send_resolved: true

- name: 'slack-outside-hours'

slack_configs:

- api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ' channel: '

oncall'

send_resolved: true

Outcome: During business hours, developers receive email alerts. After hours, alerts are routed to the on-call engineer via Slack, reducing noise for the team.

Example 3: Multi-Tenant Alerting with Inhibition

Scenario: You manage multiple environments (dev, staging, prod). You want to suppress low disk space alerts in dev if a node down alert is active.

Alertmanager Inhibit Rule:

inhibit_rules:

- source_match:

severity: 'critical'

environment: 'prod'

target_match:

severity: 'warning'

environment: 'prod'

equal: ['alertname', 'instance']

Result: In production, if a node crashes (critical), all related disk usage > 85% warnings are automatically suppressed. In dev, warnings remain active to help developers identify issues early.

FAQs

Q1: Can Alertmanager work without Prometheus?

No, Alertmanager is designed specifically to receive alerts from Prometheus. It does not generate alerts itself. Other systems (e.g., Grafana, VictoriaMetrics) can send alerts to Alertmanager via webhooks, but Prometheus is the standard and most integrated source.

Q2: How do I silence an alert temporarily?

Use the Alertmanager web UI. Click Silence on an active alert, set the duration (e.g., 1 hour), and optionally add a reason. The silence will suppress matching alerts until it expires. You can also use amtool silence add from the CLI.

Q3: What happens if Alertmanager crashes?

Prometheus continues to generate alerts but queues them in memory. When Alertmanager restarts, it reprocesses the queued alerts. To avoid data loss, run Alertmanager in high availability (HA) mode with multiple instances sharing a distributed storage backend like Consul or etcd.

Q4: Can I use Alertmanager with Docker or Kubernetes?

Yes. Alertmanager is commonly deployed as a Docker container or via the Prometheus Operator in Kubernetes. Use Helm charts like prometheus-community/kube-prometheus-stack for automated deployment.

Q5: Why am I not receiving email alerts?

Common causes:

  • Incorrect SMTP credentials or port
  • Firewall blocking outbound SMTP (port 587 or 465)
  • Missing smtp_require_tls: true for Gmail
  • Using account password instead of app password with Gmail
  • Alert not firing due to misconfigured Prometheus rule

Check the Alertmanager logs: journalctl -u alertmanager -f

Q6: How do I add a new notification channel like Microsoft Teams?

Add a new receiver in alertmanager.yml:

- name: 'teams-notifications'

webhook_configs:

- url: 'https://outlook.office.com/webhook/your-webhook-id'

send_resolved: true

Then update the route to send matching alerts to this receiver.

Q7: Whats the difference between grouping and inhibition?

Grouping bundles multiple similar alerts into one notification to reduce noise. Inhibition prevents lower-severity alerts from triggering if a higher-severity alert already exists for the same context.

Q8: Can Alertmanager send alerts to SMS or phone calls?

Yes, indirectly. Integrate with services like PagerDuty, Opsgenie, or Twilio via webhook. Alertmanager sends the alert to the service, which then triggers SMS or voice calls.

Q9: How often should I review my alerting rules?

Review alerting rules quarterly. Remove outdated rules, adjust thresholds based on historical data, and ensure annotations remain accurate. Alert fatigue often stems from stale or overly sensitive rules.

Q10: Is Alertmanager suitable for small teams?

Absolutely. Even small teams benefit from clean, grouped, and resolved notifications. Start with email or Slack, and scale to PagerDuty as your infrastructure grows.

Conclusion

Setting up Alertmanager is not just a technical taskits a strategic decision that directly impacts your systems reliability and your teams ability to respond effectively to incidents. A well-configured Alertmanager transforms raw metrics into intelligent, actionable alerts, reducing noise, preventing alert fatigue, and ensuring that the right people are notified at the right time.

In this guide, youve learned how to install Alertmanager, configure it to work seamlessly with Prometheus, define intelligent routing rules, integrate with modern notification platforms, and implement best practices that scale from small deployments to enterprise environments. Youve seen real-world examples that demonstrate how to tailor alerting to different scenariosfrom Kubernetes clusters to web applicationsand you now understand how to troubleshoot common issues.

Remember: Alerting is not a set it and forget it process. Regularly review your rules, test your notifications, and refine your routing based on incident response patterns. The goal is not to alert on everythingbut to alert on the right things, at the right time, with the right context.

With Alertmanager properly configured, youre no longer just monitoring systemsyoure building resilience into your operations. And in todays world of distributed systems and high-availability expectations, thats not just an advantageits a necessity.