How to Setup Prometheus
How to Setup Prometheus Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud in 2012 and now maintained by the Cloud Native Computing Foundation (CNCF). It has become one of the most widely adopted monitoring solutions in modern cloud-native environments, particularly in Kubernetes clusters, microservices architectures, and DevOps pipelines. Unlike tr
How to Setup Prometheus
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud in 2012 and now maintained by the Cloud Native Computing Foundation (CNCF). It has become one of the most widely adopted monitoring solutions in modern cloud-native environments, particularly in Kubernetes clusters, microservices architectures, and DevOps pipelines. Unlike traditional monitoring tools that rely on pull-based or push-based models inconsistently, Prometheus uses a pull-based model with a powerful query language (PromQL), time-series database, and flexible alerting mechanismsall designed for reliability, scalability, and real-time observability.
Setting up Prometheus correctly is essential for gaining deep insights into system performance, application health, and infrastructure metrics. Whether you're monitoring a single server, a containerized application, or a large-scale distributed system, Prometheus provides the tools to collect, store, visualize, and alert on metrics with precision. This guide walks you through every step of setting up Prometheusfrom installation and configuration to integration with exporters, visualization with Grafana, and implementing best practices for production-grade monitoring.
By the end of this tutorial, youll have a fully functional Prometheus instance capable of scraping metrics from multiple targets, triggering alerts based on custom thresholds, and delivering actionable insights through dashboards. Youll also understand how to maintain, scale, and secure your monitoring stack for long-term reliability.
Step-by-Step Guide
Prerequisites
Before beginning the setup process, ensure your environment meets the following minimum requirements:
- A Linux-based system (Ubuntu 20.04/22.04, CentOS 7/8, or Debian 11 recommended)
- At least 2 GB of RAM (4 GB recommended for production)
- At least 20 GB of available disk space (depending on retention period and metric volume)
- Root or sudo privileges
- Basic familiarity with the command line and YAML configuration
- Network access to the targets you intend to monitor (firewall rules permitting traffic on port 9090 and exporter ports)
If youre monitoring applications running in containers or Kubernetes, ensure Docker or Podman is installed, and if using Kubernetes, have kubectl configured with cluster access.
Step 1: Download and Install Prometheus
Prometheus is distributed as a standalone binary. Downloading and installing it manually gives you full control over configuration and versioning.
First, navigate to the official Prometheus releases page and identify the latest stable version. As of this writing, the latest version is 2.51.x. Use wget to download the binary:
wget https://github.com/prometheus/prometheus/releases/download/v2.51.2/prometheus-2.51.2.linux-amd64.tar.gz
Extract the archive:
tar xvfz prometheus-2.51.2.linux-amd64.tar.gz
Move the extracted files to a standard location:
sudo mv prometheus-2.51.2.linux-amd64 /opt/prometheus
cd /opt/prometheus
Verify the installation by checking the version:
./prometheus --version
You should see output similar to:
prometheus, version 2.51.2 (branch: HEAD, revision: 1234567890abcdef)
Step 2: Create a Prometheus User and Directory Structure
For security and organization, create a dedicated system user and directory structure to run Prometheus:
sudo useradd --no-create-home --shell /bin/false prometheus
Create directories for configuration, rules, and data storage:
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo mkdir /etc/prometheus/rules
sudo mkdir /etc/prometheus/alerts
Copy the configuration file and binaries to their appropriate locations:
sudo cp /opt/prometheus/prometheus /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo cp /opt/prometheus/promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/promtool
sudo cp /opt/prometheus/prometheus.yml /etc/prometheus/
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
sudo chmod 755 /usr/local/bin/prometheus
sudo chmod 755 /usr/local/bin/promtool
Step 3: Configure Prometheus
The core configuration file for Prometheus is prometheus.yml. This YAML file defines scrape targets, job configurations, alerting rules, and global settings.
Open the configuration file:
sudo nano /etc/prometheus/prometheus.yml
Replace the default content with the following minimal but functional configuration:
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
rule_files:
- "/etc/prometheus/rules/*.rules"
- "/etc/prometheus/alerts/*.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['localhost:9100']
Lets break this down:
- scrape_interval: How often Prometheus pulls metrics from targets (15 seconds is standard).
- evaluation_interval: How often alerting and recording rules are evaluated.
- alerting: Points Prometheus to Alertmanager for alert routing (configured later).
- rule_files: Specifies where custom alert and recording rules are stored.
- scrape_configs: Defines the targets to monitor. The first job scrapes Prometheus itself; the second scrapes the Node Exporter (explained next).
Save and exit the file.
Step 4: Install and Configure Node Exporter
To monitor system-level metrics such as CPU, memory, disk I/O, and network usage, Prometheus needs an exporter. The Node Exporter is the most commonly used exporter for Linux systems.
Download the Node Exporter binary:
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
Extract and move the binary:
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/node_exporter
Create a systemd service file for Node Exporter:
sudo nano /etc/systemd/system/node_exporter.service
Add the following content:
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
Reload systemd and start the service:
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
sudo systemctl status node_exporter
Verify Node Exporter is running on port 9100:
curl http://localhost:9100/metrics
You should see a long list of system metrics in plain text format.
Step 5: Configure Prometheus as a Systemd Service
To ensure Prometheus starts automatically on boot and runs in the background, create a systemd service file:
sudo nano /etc/systemd/system/prometheus.service
Add the following content:
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console-template=/etc/prometheus/consoles \
--web.console.templates=/etc/prometheus/consoles \
--web.listen-address=0.0.0.0:9090 \
--web.enable-admin-api \
--web.enable-lifecycle \
--storage.tsdb.retention.time=15d \
--enable-feature=remote-write-receiver
Restart=always
[Install]
WantedBy=multi-user.target
Important flags explained:
- --config.file: Path to your configuration file.
- --storage.tsdb.path: Where time-series data is stored.
- --web.listen-address: Listen on all interfaces (0.0.0.0) on port 9090.
- --web.enable-admin-api: Enables administrative APIs (use cautiously in production).
- --web.enable-lifecycle: Allows reloading config via HTTP POST.
- --storage.tsdb.retention.time: How long to retain data (15 days is a good default).
- --enable-feature=remote-write-receiver: Enables receiving remote writes (useful for HA setups).
Reload systemd and start Prometheus:
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus
Step 6: Access the Prometheus Web Interface
Once Prometheus is running, access the web UI by opening your browser and navigating to:
http://your-server-ip:9090
You should see the Prometheus homepage with a search bar and navigation menu. Click on Status > Targets to verify that both the Prometheus job and the Node Exporter job are showing as UP.
If either shows DOWN, check:
- Firewall settings (ensure port 9090 and 9100 are open)
- Service status:
sudo systemctl status prometheusandsudo systemctl status node_exporter - Configuration syntax:
promtool check config /etc/prometheus/prometheus.yml
Step 7: Install and Configure Alertmanager (Optional but Recommended)
Alertmanager handles alerts sent by Prometheus and routes them to notification channels like email, Slack, PagerDuty, or Microsoft Teams.
Download Alertmanager:
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
Extract and move:
tar xvfz alertmanager-0.27.0.linux-amd64.tar.gz
sudo mv alertmanager-0.27.0.linux-amd64/alertmanager /usr/local/bin/
sudo mv alertmanager-0.27.0.linux-amd64/amtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/alertmanager
sudo chown prometheus:prometheus /usr/local/bin/amtool
Create a configuration file:
sudo nano /etc/prometheus/alertmanager.yml
Add a basic configuration:
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'alerts@example.com'
from: 'prometheus@example.com'
smarthost: 'smtp.example.com:587'
auth_username: 'prometheus@example.com'
auth_password: 'your-smtp-password'
html: '{{ template "email.default.html" . }}'
headers:
subject: '[Prometheus Alert] {{ .CommonLabels.alertname }}'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
Create a systemd service for Alertmanager:
sudo nano /etc/systemd/system/alertmanager.service
Add:
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/alertmanager \
--config.file /etc/prometheus/alertmanager.yml \
--web.listen-address=0.0.0.0:9093
Restart=always
[Install]
WantedBy=multi-user.target
Reload and start:
sudo systemctl daemon-reload
sudo systemctl enable alertmanager
sudo systemctl start alertmanager
sudo systemctl status alertmanager
Update your Prometheus configuration to point to Alertmanager:
In /etc/prometheus/prometheus.yml, ensure the alerting section points to localhost:9093 (as shown earlier). Then reload Prometheus:
curl -X POST http://localhost:9090/-/reload
Step 8: Set Up Grafana for Visualization
While Prometheus provides a basic UI, Grafana is the industry standard for creating rich, customizable dashboards.
Install Grafana:
sudo apt-get install -y apt-transport-https software-properties-common wget
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install -y grafana
Start and enable Grafana:
sudo systemctl daemon-reload
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
Access Grafana at http://your-server-ip:3000. Default login: admin/admin (change password immediately).
Add Prometheus as a data source:
- Click Configuration > Data Sources > Add data source
- Select Prometheus
- Set URL to
http://localhost:9090 - Click Save & Test
Import a pre-built dashboard:
- Click Create > Import
- Enter dashboard ID 1860 (Node Exporter Full) and click Load
- Select Prometheus as the data source
- Click Import
You now have a live dashboard showing CPU, memory, disk, and network usage metrics from your server.
Best Practices
Use Meaningful Job Names and Labels
Always use descriptive job names in your prometheus.yml file. Instead of job_name: 'app', use job_name: 'web-api-production'. Labels should be consistent across services to enable powerful grouping and filtering in PromQL queries.
Example:
- job_name: 'web-api-production'
static_configs:
- targets: ['10.0.1.10:9101']
labels:
environment: 'production'
service: 'web-api'
team: 'backend'
Implement Proper Retention Policies
By default, Prometheus retains data for 15 days. For production systems with high metric volume, adjust retention based on storage capacity and compliance needs:
- Short-term: 714 days (development/testing)
- Medium-term: 3060 days (production monitoring)
- Long-term: Use remote storage (Thanos, Cortex, Mimir) for years of data
Set retention in your Prometheus config:
--storage.tsdb.retention.time=60d
Separate Alerting and Recording Rules
Keep alerting rules (conditions that trigger notifications) separate from recording rules (precomputed expressions to improve query performance). Store them in dedicated directories:
/etc/prometheus/alerts/for alerting rules/etc/prometheus/rules/for recording rules
Example recording rule (/etc/prometheus/rules/cpu_usage.rules):
groups:
- name: cpu_usage
rules:
- record: instance:cpu_usage:avg5m
expr: avg_over_time(node_cpu_seconds_total{mode!="idle"}[5m])
Example alerting rule (/etc/prometheus/alerts/high_cpu_alert.rules):
groups:
- name: high_cpu_alert
rules:
- alert: HighCPUUsage
expr: instance:cpu_usage:avg5m > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage has been above 80% for 5 minutes."
Enable Remote Write for Scalability
For high-availability or long-term storage needs, configure Prometheus to send metrics to remote storage like Thanos, Cortex, or Mimir. This decouples storage from the Prometheus server, enabling horizontal scaling and data federation.
remote_write:
- url: "http://thanos-query.example.com/api/v1/write"
queue_config:
max_samples_per_send: 1000
max_retries: 10
min_backoff: 30ms
max_backoff: 100ms
Secure Your Prometheus Instance
By default, Prometheus exposes its web interface and APIs without authentication. In production, secure it using:
- Reverse proxy with TLS: Use Nginx or Caddy to terminate HTTPS and add basic auth.
- Network restrictions: Allow access only from internal networks or monitoring VLANs.
- Disable admin API: Remove
--web.enable-admin-apiunless absolutely necessary. - Use OAuth2 or SAML: Integrate with enterprise identity providers via proxy.
Example Nginx config for basic auth:
server {
listen 9090;
server_name prometheus.example.com;
auth_basic "Prometheus Admin";
auth_basic_user_file /etc/nginx/.htpasswd;
location / {
proxy_pass http://localhost:9090;
proxy_http_version 1.1;
}
}
Monitor Prometheus Itself
Prometheus should monitor its own health. Use the built-in prometheus_build_info and prometheus_target_scrape_duration_seconds metrics to detect scraping failures, memory leaks, or slow queries.
Set up alerts for:
- Prometheus target down (itself)
- Scrape duration exceeding threshold
- TSDB head chunks growing too large
- Rule evaluation failures
Use Labels Consistently Across Services
Standardize labels like environment, region, service, and team across all exporters and applications. This enables cross-service queries like:
sum(rate(http_requests_total{environment="production"}[5m])) by (service)
Regularly Audit and Clean Up Unused Metrics
Over time, unused or noisy metrics can bloat your TSDB. Use the Prometheus UIs Metrics page to identify low-cardinality or rarely queried metrics. Use metric_relabel_configs to drop them at scrape time:
metric_relabel_configs:
- source_labels: [__name__]
regex: 'old_metric_.*'
action: drop
Tools and Resources
Official Prometheus Tools
- Promtool: Command-line utility for validating configuration files, testing rules, and querying metrics. Use
promtool check config prometheus.ymlto validate syntax before restarting. - Prometheus Web UI: Built-in interface for querying metrics and viewing targets. Useful for quick debugging.
- Alertmanager: Handles alert deduplication, grouping, and routing. Integrates with Slack, PagerDuty, Email, and more.
Exporters
Exporters are essential for exposing metrics from third-party systems. Key exporters include:
- Node Exporter: System-level metrics (CPU, memory, disk, network)
- Blackbox Exporter: HTTP, TCP, ICMP probe monitoring (for uptime checks)
- MySQL Exporter: Database performance metrics
- Redis Exporter: Redis instance metrics
- PostgreSQL Exporter: Query performance and connection stats
- Pushgateway: For batch jobs and ephemeral tasks that cannot be scraped
- App Exporters: Custom exporters for Java (Micrometer), Python (Prometheus Client), Go (Prometheus Client Library)
Visualization
- Grafana: The de facto standard for dashboarding. Offers hundreds of community-built dashboards.
- PromLens: A visual PromQL editor with autocomplete and query explanation.
- VictoriaMetrics: A high-performance, scalable Prometheus-compatible time-series database.
Remote Storage
- Thanos: Adds long-term storage, global querying, and high availability to Prometheus.
- Cortex: Multi-tenant, horizontally scalable Prometheus-compatible backend.
- Mimir: Grafana Labs next-generation Prometheus backend with advanced features like sharding and compression.
Learning Resources
- Prometheus Official Documentation
- PromQL Query Language Guide
- Grafana + Prometheus Integration
- Prometheus GitHub Repository
- Instrumentation Best Practices
- Prometheus YouTube Channel
Community and Support
Join the Prometheus community for real-time help:
- Slack:
prometheus channel on CNCF Slack
- Forum: https://discuss.prometheus.io
- GitHub Issues: Report bugs or request features
Real Examples
Example 1: Monitoring a Web Application with cURL and Custom Metrics
Suppose you have a simple web API that returns a JSON status. You want to monitor its response time and success rate.
Create a custom script (web_monitor.sh) to expose metrics:
!/bin/bash
while true; do
start=$(date +%s.%N)
response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health)
end=$(date +%s.%N)
duration=$(echo "$end - $start" | bc -l)
echo "
HELP web_api_response_time_seconds Time taken to respond to health check"
echo "TYPE web_api_response_time_seconds gauge"
echo "web_api_response_time_seconds{status=\"$response\"} $duration"
sleep 10
done
Run it on port 9101:
python3 -m http.server 9101
Then add to Prometheus config:
- job_name: 'web-api-custom'
static_configs:
- targets: ['localhost:9101']
Now you can query:
rate(web_api_response_time_seconds[5m])
Example 2: Alerting on High HTTP Error Rates
Assume youre monitoring a web server with a metric http_requests_total{code="500"}.
Create an alert rule:
groups:
- name: web_errors
rules:
- alert: High5xxErrors
expr: rate(http_requests_total{code=~"5.."}[5m]) > 0.1
for: 10m
labels:
severity: critical
annotations:
summary: "High 5xx errors detected on {{ $labels.instance }}"
description: "HTTP 5xx error rate has exceeded 0.1 per second for 10 minutes."
This triggers an alert if more than one 5xx error occurs every 10 seconds over a 5-minute window.
Example 3: Monitoring Kubernetes with kube-state-metrics
In a Kubernetes cluster, install kube-state-metrics:
kubectl apply -f https://github.com/kubernetes/kube-state-metrics/releases/download/v2.12.0/kube-state-metrics.yaml
Add to Prometheus config:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
Now you can monitor pod restarts, resource requests, and container statuses directly in Prometheus.
FAQs
What is Prometheus used for?
Prometheus is used for monitoring and alerting on time-series metrics from systems, applications, and services. It excels at collecting metrics like CPU usage, request rates, error counts, and latency, enabling teams to detect anomalies, troubleshoot performance issues, and ensure system reliability.
Can Prometheus monitor Windows servers?
Yes. Use the Windows Exporter (https://github.com/prometheus-community/windows_exporter) to collect metrics such as disk usage, network interfaces, and service states on Windows systems.
Does Prometheus support log monitoring?
No. Prometheus is designed for metrics, not logs. For log aggregation, use Loki (also by Grafana Labs) or ELK stack. Prometheus and Loki are often used together for full observability.
How much disk space does Prometheus need?
It depends on the number of metrics and retention period. A typical server with 1000 time series and 15-day retention uses ~510 GB. High-cardinality metrics (e.g., per-request IDs) can consume hundreds of GBs quickly. Use remote storage for large-scale deployments.
Is Prometheus suitable for production?
Yes. Prometheus is used by major organizations including Google, GitHub, and Netflix. However, for high availability and long-term storage, pair it with Thanos, Cortex, or Mimir.
How do I update Prometheus?
Download the new binary, stop the service, replace the executable, and restart. Always test new versions in staging first. Use version control for your config files to roll back if needed.
Can Prometheus scrape metrics over HTTPS?
Yes. Configure TLS in the scrape config:
scrape_configs:
- job_name: 'secure-app'
scheme: https
tls_config:
ca_file: /etc/prometheus/ca.crt
cert_file: /etc/prometheus/cert.crt
key_file: /etc/prometheus/key.key
static_configs:
- targets: ['app.example.com:443']
What is the difference between Prometheus and Zabbix?
Prometheus is pull-based, cloud-native, and designed for dynamic environments like Kubernetes. Zabbix is push-based, traditionally used for static infrastructure, and has a heavier GUI. Prometheus is more scalable and integrates better with modern DevOps toolchains.
How do I backup Prometheus data?
Backup the /var/lib/prometheus directory. Since its a time-series database, you can also use promtool tsdb backup to create a consistent snapshot. Always stop Prometheus before backing up to avoid corruption.
Why are my targets showing as DOWN?
Common causes:
- Network connectivity issues
- Firewall blocking port
- Exporter not running
- Incorrect target URL or port
- Authentication required but not configured
Check the Prometheus UI under Status > Targets for detailed error messages.
Conclusion
Setting up Prometheus is a foundational skill for modern DevOps and SRE teams. From installing the binary and configuring scrape targets to integrating with exporters, Alertmanager, and Grafana, this guide has provided a comprehensive, production-ready roadmap for deploying Prometheus successfully.
Prometheus is not just a toolits a philosophy of observability: collect meaningful metrics, alert on what matters, and visualize trends to drive informed decisions. When paired with best practices like consistent labeling, proper retention policies, and remote storage, Prometheus becomes a powerful engine for system reliability.
Remember: Monitoring is not a one-time setup. Its an ongoing discipline. Regularly review your alerts, prune unused metrics, and refine your dashboards as your infrastructure evolves. The goal is not to collect every possible metric, but to understand the health of your systems at a glanceand act before users are impacted.
With Prometheus, you now have the tools to build a resilient, transparent, and proactive monitoring culture. Start small, validate your setup, and scale gradually. The insights you gain will transform how you operate and maintain your systemstoday and into the future.