How to Send Alerts With Grafana

How to Send Alerts With Grafana Grafana is one of the most widely adopted open-source platforms for monitoring and observability. Originally designed for visualizing time-series data, Grafana has evolved into a comprehensive observability stack that supports alerting across a vast array of data sources—including Prometheus, Loki, InfluxDB, MySQL, PostgreSQL, and more. The ability to send alerts wh

Nov 6, 2025 - 10:32
Nov 6, 2025 - 10:32
 0

How to Send Alerts With Grafana

Grafana is one of the most widely adopted open-source platforms for monitoring and observability. Originally designed for visualizing time-series data, Grafana has evolved into a comprehensive observability stack that supports alerting across a vast array of data sourcesincluding Prometheus, Loki, InfluxDB, MySQL, PostgreSQL, and more. The ability to send alerts when metrics cross predefined thresholds is critical for maintaining system reliability, reducing mean time to resolution (MTTR), and proactively preventing outages. Sending alerts with Grafana empowers DevOps teams, SREs, and infrastructure engineers to respond swiftly to anomalies, performance degradation, or service failures before they impact end users.

Unlike traditional monitoring tools that require complex configurations or proprietary integrations, Grafana offers a unified, intuitive interface for defining alert rules, managing notification channels, and testing alert logicall within a single dashboard. Whether you're monitoring a small application stack or a large-scale cloud-native environment, Grafanas alerting system scales elegantly and integrates seamlessly with modern communication tools like Slack, Microsoft Teams, PagerDuty, Email, and Webhooks.

This guide provides a comprehensive, step-by-step walkthrough on how to send alerts with Grafana. Youll learn how to configure alert rules, define conditions, set up notification channels, test alerts, and follow industry best practices to ensure your alerts are actionable, reliable, and noise-free. Real-world examples and essential tools are included to help you implement a robust alerting strategy that enhances system resilience and operational efficiency.

Step-by-Step Guide

Prerequisites

Before configuring alerts in Grafana, ensure the following prerequisites are met:

  • Grafana server is installed and running (version 8.0 or higher recommended)
  • A supported data source is configured (e.g., Prometheus, InfluxDB, Loki, etc.)
  • You have administrative or editor permissions in the Grafana instance
  • Network connectivity to your notification endpoints (Slack, email server, webhook URL, etc.)

For this guide, well use Prometheus as the primary data source, as it is the most commonly paired system with Grafana for alerting. However, the steps are broadly applicable to other time-series or log-based data sources.

Step 1: Access the Alerting Section

Log in to your Grafana instance. In the left-hand navigation panel, click on the Alerting icon (a bell symbol). This opens the Alerting dashboard, where you can view all existing alerts, create new ones, and manage notification channels.

If youre using Grafana 9.0 or later, youll notice the Alerting section has been reorganized into two tabs: Alert Rules and Notification Channels. These are the two core components youll need to configure for successful alerting.

Step 2: Create a Notification Channel

An alert rule defines when an alert triggers, but a notification channel determines where the alert is sent. Without a properly configured channel, your alert will firebut no one will know about it.

To create a notification channel:

  1. In the Alerting menu, click on Notification channels.
  2. Click the Add channel button.
  3. Select the type of notification you want to use. Common options include:
    • Email
    • Slack
    • Microsoft Teams
    • PagerDuty
    • Webhook
    • Opsgenie
    • VictorOps

For this example, well configure a Slack notification channel.

Configuring Slack

Before configuring Grafana, ensure you have a Slack webhook URL:

  1. Go to your Slack workspace and navigate to App Directory.
  2. Search for Incoming Webhooks and install it.
  3. Click Add New Webhook to Workspace.
  4. Select the channel where you want alerts to be posted (e.g.,

    alerts).

  5. Click Allow. Grafana will generate a unique webhook URL.
  6. Copy the webhook URL.

Back in Grafana:

  1. In the notification channel form, select Slack.
  2. Paste the webhook URL into the Webhook URL field.
  3. Optionally, set a Name for the channel (e.g., Slack Alerts - Production).
  4. Under Message, you can customize the alert message using Grafanas template variables. For example:
    {{ .Title }}
    

    {{ .Description }}

    Status: {{ .Status }}

    Triggered at: {{ .EvalTime }}

    Value: {{ .Value }}

  5. Click Test to send a sample alert. If successful, youll see a confirmation message in Slack and a green checkmark in Grafana.
  6. Click Save.

Step 3: Create an Alert Rule

Now that your notification channel is set up, create an alert rule that triggers based on a metric threshold.

From the Alerting dashboard, click New alert rule.

Define the Alert Rule Basics

Fill in the following fields:

  • Name: Give your alert a clear, descriptive name. Example: High CPU Usage on Web Servers
  • Namespace: (Optional) Group alerts into logical categories for easier management.
  • Condition: This is where you define the metric and threshold.

Select Your Data Source

In the Data source dropdown, choose the data source you want to monitor (e.g., Prometheus).

Write the Query

Use the query editor to write a PromQL (Prometheus Query Language) expression. For example, to monitor CPU usage above 80% for more than 5 minutes:

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80

This query calculates the percentage of CPU time not spent in idle mode across all instances. If the result exceeds 80%, the condition becomes true.

Click Apply to preview the data. You should see a graph showing the metric over time. Ensure the values are realistic and the trend matches your expectations.

Set Alert Conditions

Under Condition, select:

  • When: of the time series
  • is above
  • Value: 80

Then, under For, set the duration to 5m. This ensures the alert only triggers if the condition persists for five consecutive minutes, reducing false positives from transient spikes.

Configure Alert Rules

Scroll down to the Alert rules section:

  • Group by: Leave as default unless you want to group alerts by specific labels (e.g., instance, job).
  • Repeat interval: Set this to 1h or 2h to avoid alert fatigue. This determines how often Grafana will re-send the alert if it remains firing.
  • Resolve condition: Automatically resolve the alert when the condition returns to normal. This is enabled by default.

Assign Notification Channel

Under Notifications, click Add notification and select the Slack channel you created earlier.

You can add multiple notification channelsfor example, send critical alerts to PagerDuty and informational alerts to Slack.

Save the Alert Rule

Click Save. Your alert rule is now active. Grafana will begin evaluating it every 15 seconds (default evaluation interval). If the condition is met, an alert will trigger and notify your channel.

Step 4: Test the Alert

To verify your alert works:

  1. Simulate a high CPU load on one of your monitored servers. For example, use the command:
    stress --cpu 4 --timeout 300

  2. Wait for 5 minutes to allow the alert to trigger.
  3. Check your Slack channel. You should receive a formatted message with the alert title, value, and timestamp.
  4. Stop the stress test and wait for the alert to resolve automatically.
  5. Go back to the Alerting dashboard. You should see the alert status change from Firing to Resolved.

Step 5: Enable Alerting in Dashboard Panels (Optional)

You can also create alerts directly from a dashboard panel:

  1. Open a dashboard containing a time-series graph.
  2. Click the panel title and select Edit.
  3. Scroll down to the Alert tab.
  4. Click Create alert.
  5. Follow the same steps as above to define the condition, data source, and notification channel.

This method is ideal for quick, panel-specific alerts. However, for complex or reusable alert logic, creating rules in the Alerting section is recommended.

Step 6: Manage and Review Alerts

After creating alerts, regularly review their status:

  • Use the Alerting > Alert rules page to see active, firing, and resolved alerts.
  • Click on any alert to view its history, evaluation logs, and trigger times.
  • Use the Alerting > Notification channels page to test or edit delivery methods.
  • Enable Alert history in Grafana settings to retain alert records for compliance or audit purposes.

Best Practices

Creating alerts is only half the battle. Poorly designed alerts can lead to alert fatigue, false positives, and missed incidents. Follow these best practices to ensure your alerting system is effective, reliable, and maintainable.

1. Define Clear, Actionable Alerts

Every alert should answer two questions: What is wrong? and What should I do about it? Avoid vague alerts like System is unhealthy. Instead, use specific language:

  • ? High Resource Usage
  • ? CPU Usage > 90% on web-01 for 5 minutes Restart service or scale up

Include context in the alert message using template variables. For example:

Alert: {{ .Title }}

Instance: {{ .Labels.instance }}

Value: {{ .Value }} (Threshold: 80%)

Link: {{ .PanelURL }}

This gives responders immediate context and a direct link to the dashboard for investigation.

2. Use Firing Duration to Reduce Noise

Always set a For duration (e.g., 5m, 10m) to avoid alerting on transient spikes. A 30-second CPU spike due to a background job is normalalerting on it creates unnecessary noise. A 5-minute sustained high usage, however, likely indicates a real problem.

3. Tier Your Alerts by Severity

Not all alerts require the same response. Use labels to categorize alerts by severity:

  • P0 (Critical): Service outage, data loss, payment system failure ? Notify via PagerDuty, SMS, phone call
  • P1 (High): Performance degradation, high error rate ? Notify via Slack + Email
  • P2 (Medium): Disk space low, non-critical service down ? Notify via Email
  • P3 (Low): Unused resource, informational ? Log only, no notification

In Grafana, use labels like severity=p0 in your alert rules and route them to different notification channels based on those labels.

4. Avoid Alerting on Derived Metrics Without Context

Dont alert on ratios or percentages without understanding the underlying data. For example, alerting on Error Rate > 1% might seem sensiblebut if your total requests are only 10 per minute, thats just one error. Context matters. Combine metrics:

sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.01

and

sum(rate(http_requests_total[5m])) > 100

This ensures you only alert when the error rate is high AND traffic volume is significant enough to matter.

5. Test Alerts Regularly

Alerts can break silently. Test them monthly using synthetic load or chaos engineering tools. Use tools like Locust, k6, or Prometheus Blackbox Exporter to simulate failures and verify alert delivery.

6. Use Alert Annotations for Runbooks

Grafana allows you to add annotations to alertsextra metadata that doesnt trigger notifications but is visible in the alert details. Use this to link to runbooks, dashboards, or documentation:

  • runbook_url: https://internal-docs.example.com/runbooks/web-server-cpu
  • dashboard_id: 42

This reduces mean time to diagnosis (MTTD) and ensures on-call personnel have all the information they need.

7. Monitor Alerting System Health

Set up an alert to notify you if Grafanas alerting engine fails. For example:

sum(rate(grafana_alerting_evaluation_failures[5m])) > 0

This ensures your alerting system itself remains operational.

8. Regularly Review and Retire Alerts

Alerts decay over time. Services are decommissioned, thresholds become outdated, and teams change. Schedule quarterly reviews to:

  • Remove alerts for decommissioned services
  • Update thresholds based on new baselines
  • Consolidate redundant alerts

Use Grafanas alert history to identify alerts that never fireor fire too often without action. These are candidates for deletion or tuning.

Tools and Resources

Enhance your Grafana alerting strategy with these complementary tools and resources.

1. Prometheus Exporters

Exporters collect metrics from systems and expose them to Prometheus. Essential exporters for alerting include:

  • node_exporter: Monitors host-level metrics (CPU, memory, disk, network)
  • blackbox_exporter: Tests HTTP, TCP, ICMP endpoints for availability
  • postgres_exporter: Monitors PostgreSQL health and query performance
  • nginx_exporter: Tracks Nginx request rates, errors, and latency

Install these on your monitored hosts and configure Prometheus to scrape them.

2. Grafana Loki for Log-Based Alerts

Alert on log patterns using Loki, Grafanas log aggregation system. For example:

sum(rate({job="app"} |= "ERROR" [5m])) > 5

This triggers an alert if more than 5 error lines appear in 5 minutes. Combine with alert rules to detect application failures before they impact users.

3. Alertmanager (for Advanced Routing)

If youre using Prometheus with Alertmanager, you can leverage its advanced routing, inhibition, and grouping features. Grafana can integrate with Alertmanager as a data source, allowing you to manage alerts centrally while still benefiting from Alertmanagers powerful routing logic.

4. Grafana OnCall

Grafana Labs offers Grafana OnCall, a purpose-built on-call management platform that integrates natively with Grafana alerting. It supports escalation policies, scheduling, alert deduplication, and mobile push notifications. Ideal for teams serious about reducing alert fatigue and improving incident response.

5. Terraform for Infrastructure-as-Code Alerting

Manage alert rules and notification channels as code using the Grafana Terraform Provider. This ensures consistency across environments and enables version control.

Example Terraform snippet:

resource "grafana_alert_rule" "high_cpu" {

name = "High CPU Usage on Web Servers"

condition = "A"

data {

ref_id = "A"

query = "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100) > 80"

datasource_uid = "Prometheus"

}

for = "5m"

annotations = {

runbook_url = "https://docs.example.com/runbooks/cpu-alert"

}

labels = {

severity = "p1"

}

notification {

uid = grafana_notification_channel.slack.uid

}

}

6. Community Dashboards and Alert Rules

Explore the Grafana Dashboard Library (grafana.com/grafana/dashboards) for pre-built alerting dashboards. Many include alert rules you can import and customize.

Popular dashboards:

  • Node Exporter Full (ID: 1860)
  • PostgreSQL Exporter (ID: 13571)
  • Kubernetes / API Server (ID: 3119)

Download and import these dashboards, then enable their embedded alert rules with one click.

7. Alerting Best Practice Templates

Use these template structures for consistent alert naming and formatting:

  • Name: [Service] [Metric] Exceeds Threshold on [Host]
  • Description: [What happened] + [Impact] + [Action Required]
  • Severity: p0/p1/p2/p3
  • Runbook: URL to documented response procedure

Real Examples

Here are three real-world alerting scenarios with exact configurations.

Example 1: HTTP 5xx Error Rate Spike

Goal: Alert when the error rate for web requests exceeds 1% for 5 minutes, but only if total requests exceed 100 per minute.

Query (Prometheus):

sum(rate(http_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.01

and

sum(rate(http_requests_total[5m])) > 100

Condition: When value is above 0.01, for 5m

Name: High HTTP 5xx Error Rate on API Gateway

Annotations:

  • runbook_url: https://docs.example.com/runbooks/http-5xx
  • dashboard_id: 101

Severity: p1

Notification: Slack + Email

Example 2: Disk Space Below 10%

Goal: Alert when any servers disk usage exceeds 90% (i.e., free space

Query:

100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 > 90

Condition: Above 90, for 10m

Name: Low Disk Space on Server

Annotations:

  • runbook_url: https://docs.example.com/runbooks/disk-space
  • action: Clean logs or expand volume

Severity: p2

Notification: Email only

Example 3: PostgreSQL Connection Pool Exhaustion

Goal: Alert when more than 80% of PostgreSQL connections are in use.

Query:

pg_stat_activity_count{state="active"} / pg_settings_value{name="max_connections"} > 0.8

Condition: Above 0.8, for 5m

Name: PostgreSQL Connection Pool Exhausted

Annotations:

  • runbook_url: https://docs.example.com/runbooks/postgres-connections
  • impact: Applications may timeout or fail to connect

Severity: p0

Notification: PagerDuty + Slack

FAQs

Can Grafana send alerts without Prometheus?

Yes. Grafana supports alerting from multiple data sources, including InfluxDB, Loki, MySQL, PostgreSQL, Elasticsearch, and more. Each data source has its own query language (e.g., InfluxQL, SQL, Lucene), but the alerting configuration process remains the same.

Why isnt my alert firing even though the metric exceeds the threshold?

Common causes:

  • The For duration hasnt elapsed yet
  • The data source is not returning data (check scrape targets)
  • The query syntax is incorrect
  • Alert evaluation interval is too long (default is 15s; increase if needed)
  • Notification channel is misconfigured or unreachable

Check the alert rules Evaluation History tab for details on why it didnt trigger.

Can I silence alerts temporarily?

Yes. In the Alerting > Alert rules page, click the three dots next to an alert and select Silence. You can silence for a duration (e.g., 1 hour) or until manually resumed. This is useful during maintenance windows.

How often does Grafana evaluate alert rules?

By default, Grafana evaluates alert rules every 15 seconds. You can change this in the Grafana configuration file (grafana.ini) under [alerting] ? evaluation_interval.

Can I use Grafana alerts with Kubernetes?

Yes. Deploy Grafana and Prometheus as Helm charts in your Kubernetes cluster. Use the Prometheus Operator to auto-discover services and scrape metrics. Grafana alert rules can be defined via Kubernetes Custom Resource Definitions (CRDs) using the Grafana Operator or Terraform.

Is there a limit to the number of alert rules I can create?

Grafana does not impose a hard limit. However, performance may degrade if you create thousands of rules. For large-scale deployments, consider using Prometheus Alertmanager alongside Grafana for better scalability.

How do I prevent alert storms during outages?

Use alert grouping and inhibition:

  • Group alerts by service or instance to avoid hundreds of duplicate alerts
  • Use labels to suppress lower-priority alerts when a higher-priority one is firing (e.g., if a whole data center is down, dont alert on individual server failures)
  • Set a longer repeat interval (e.g., 1h) to reduce notification volume

Conclusion

Sending alerts with Grafana is not just a technical taskits a strategic practice that directly impacts system reliability, team productivity, and user experience. By following the steps outlined in this guidefrom configuring notification channels to writing precise alert conditions and applying industry best practicesyou can transform Grafana from a visualization tool into a proactive observability engine.

Effective alerting is about clarity, context, and actionability. Avoid noise. Prioritize severity. Document responses. Test relentlessly. And always ask: Will this alert help someone fix a problemor just wake them up at 3 a.m.?

As infrastructure grows more distributed and complex, the ability to detect and respond to anomalies quickly becomes a competitive advantage. Grafanas alerting system, when implemented thoughtfully, empowers teams to shift from reactive firefighting to proactive prevention.

Start small. Build one alert. Test it. Refine it. Then expand. Over time, your alerting strategy will evolve into a robust, self-documenting system that keeps your services running smoothlyeven when no one is watching.