How to Index Logs Into Elasticsearch

How to Index Logs Into Elasticsearch Indexing logs into Elasticsearch is a foundational practice for modern observability, security monitoring, and operational analytics. As systems grow in complexity—spanning microservices, cloud infrastructure, containers, and distributed applications—centralized log management becomes not just beneficial, but essential. Elasticsearch, part of the Elastic Stack

alex

Nov 6, 2025 - 19:47

How to Index Logs Into Elasticsearch

Indexing logs into Elasticsearch is a foundational practice for modern observability, security monitoring, and operational analytics. As systems grow in complexityspanning microservices, cloud infrastructure, containers, and distributed applicationscentralized log management becomes not just beneficial, but essential. Elasticsearch, part of the Elastic Stack (formerly ELK Stack), provides a powerful, scalable, and real-time search and analytics engine capable of ingesting, indexing, and visualizing massive volumes of log data from diverse sources. This tutorial provides a comprehensive, step-by-step guide to indexing logs into Elasticsearch, covering configuration, optimization, tooling, and real-world implementation. Whether you're managing a small application or a large-scale enterprise environment, understanding how to properly index logs ensures faster troubleshooting, improved system reliability, and actionable insights.

Log data contains critical information about system behavior, application errors, user activity, security events, and performance metrics. Without proper indexing, this data remains unsearchable and unusable. Elasticsearch transforms raw, unstructured log entries into structured, queryable documents with rich metadata, enabling powerful filtering, aggregation, and visualization through Kibana or other frontends. This guide walks you through the entire lifecyclefrom log collection to Elasticsearch ingestionwith best practices that ensure efficiency, scalability, and maintainability.

Step-by-Step Guide

1. Understand Your Log Sources

Before you begin indexing, identify all sources generating logs. Common sources include:

Application logs (e.g., Node.js, Python, Java, .NET)
System logs (e.g., systemd, syslog, Windows Event Log)
Web servers (e.g., Nginx, Apache access and error logs)
Container platforms (e.g., Docker, Kubernetes)
Cloud services (e.g., AWS CloudWatch, Azure Monitor, GCP Logging)
Network devices (e.g., firewalls, routers)

Each source may produce logs in different formats: plain text, JSON, CSV, or proprietary formats. Understanding the structure and schema of each log type is critical. For example, Nginx access logs typically follow a space-delimited format, while application logs from modern frameworks often emit structured JSON. If logs are unstructured, youll need to parse them during ingestion.

2. Choose a Log Shipper

A log shipper is responsible for collecting logs from sources and forwarding them to Elasticsearch. The most widely used shippers are:

Filebeat: Lightweight, agent-based, ideal for file-based logs (e.g., .log files on disk). Built by Elastic, it integrates seamlessly with Elasticsearch and Logstash.
Fluent Bit: Open-source, low-resource, supports multiple inputs and outputs. Excellent for Kubernetes and containerized environments.
Logstash: Feature-rich, server-side processor. Can parse, filter, and enrich logs but requires more memory and CPU.
Vector: High-performance, Rust-based agent with rich transformation capabilities and low latency.

For most use cases, Filebeat is the recommended starting point due to its simplicity, reliability, and official support from Elastic. Its designed specifically for tailing log files and sending them to Elasticsearch or Logstash.

3. Install and Configure Filebeat

Install Filebeat on each host or container where logs are generated. On Ubuntu/Debian:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-8.x.list sudo apt update sudo apt install filebeat

On CentOS/RHEL:

sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
sudo cat > /etc/yum.repos.d/elastic-8.x.repo 
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF
sudo yum install filebeat

After installation, configure Filebeat by editing /etc/filebeat/filebeat.yml. Heres a minimal configuration for collecting Nginx access logs:

filebeat.inputs: - type: filestream enabled: true paths: - /var/log/nginx/access.log output.elasticsearch: hosts: ["http://your-elasticsearch-host:9200"] username: "filebeat_internal" password: "your-secure-password" index: "nginx-access-%{+yyyy.MM.dd}" setup.template.name: "nginx-access" setup.template.pattern: "nginx-access-*" setup.template.enabled: true setup.template.overwrite: false

Key configuration notes:

type: filestream: Replaces the deprecated log input in Filebeat 7.14+. Its more efficient and supports multiline events.
paths: Use glob patterns (e.g., /var/log/nginx/*.log) to match multiple files.
output.elasticsearch: Specify the Elasticsearch host(s). Use HTTPS and authentication in production.
index: Use date-based naming (nginx-access-%{+yyyy.MM.dd}) for time-series indexing and easier retention policies.

4. Enable and Start Filebeat

Enable the configuration and start the service:

sudo filebeat modules enable system nginx sudo filebeat setup sudo systemctl enable filebeat sudo systemctl start filebeat

The filebeat setup command does several things:

Loads the default index template into Elasticsearch
Creates Kibana dashboards (if Kibana is configured)
Configures index lifecycle management (ILM) policies

If youre using a custom template or dont want to load default dashboards, skip filebeat setup and manually upload templates using the Elasticsearch API.

5. Configure Elasticsearch for Log Indexing

Elasticsearch must be configured to handle high-volume log ingestion efficiently. Key settings include:

Cluster Settings

Adjust these in elasticsearch.yml:

cluster.name: logging-cluster node.name: node-01 network.host: 0.0.0.0 http.port: 9200 discovery.seed_hosts: ["192.168.1.10", "192.168.1.11"] cluster.initial_master_nodes: ["node-01"]

For production, use a multi-node cluster with dedicated master, data, and coordinating nodes.

Index Settings

Create a custom index template to optimize for logs. Use the Elasticsearch Index Template API:

PUT _index_template/log_template
{
"index_patterns": ["app-logs-*", "nginx-*", "system-*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "30s",
"index.lifecycle.name": "log_policy",
"index.lifecycle.rollover_alias": "app-logs"
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"host.name": {
"type": "keyword"
},
"log.level": {
"type": "keyword"
},
"service.name": {
"type": "keyword"
}
}
}
},
"priority": 500,
"version": 1
}

Important settings:

number_of_shards: Start with 35 shards per index. Too many shards increase overhead; too few limit scalability.
refresh_interval: Increase from default 1s to 30s for high-throughput logging to reduce indexing load.
index.lifecycle.name: Enables Index Lifecycle Management (ILM) for automated rollover and deletion.

6. Set Up Index Lifecycle Management (ILM)

ILM automates the management of time-series log indices. It helps prevent storage bloat and ensures cost-effective retention.

Create an ILM policy:

PUT _ilm/policy/log_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50GB",
"max_age": "7d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"forcemerge": {
"max_num_segments": 1
},
"shrink": {
"number_of_shards": 1
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"freeze": {}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}

Then create an index with the alias:

PUT app-logs-000001
{
"aliases": {
"app-logs": {
"is_write_index": true
}
},
"settings": {
"index.lifecycle.name": "log_policy",
"index.lifecycle.rollover_alias": "app-logs"
}
}

Filebeat will automatically use this alias when writing new logs. When the index reaches 50GB or 7 days old, Elasticsearch rolls over to a new index (app-logs-000002, etc.), and the old one moves to the warm phase.

7. Ingest and Parse Logs (Optional: Use Logstash)

If your logs require complex parsing, enrichment, or transformation, use Logstash. For example, parsing unstructured Apache logs into structured fields:

input {
beats {
port => 5044
}
}
filter {
if [fileset][module] == "apache" {
if [fileset][name] == "access" {
grok {
match => { "message" => "%{IPORHOST:client.ip} - %{DATA:client.user} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:method} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http.version})?|%{DATA:raw_request})\" %{NUMBER:response.code} (?:%{NUMBER:response.bytes}|-)" }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
target => "@timestamp"
}
geoip {
source => "client.ip"
}
}
}
}
output {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "%{[fileset][module]}-%{+yyyy.MM.dd}"
user => "logstash_writer"
password => "secure-password"
}
}

Logstash is resource-intensive, so only use it when necessary. For JSON logs, Filebeats built-in JSON parser is often sufficient:

filebeat.inputs: - type: filestream paths: - /var/log/myapp/*.json json.keys_under_root: true json.add_error_key: true json.message_key: log

This automatically flattens JSON fields into Elasticsearch document properties.

8. Verify Indexing

After configuration, verify logs are being indexed:

GET _cat/indices?v

You should see indices like nginx-access-2024.06.01 or app-logs-000001 with a green health status.

Search for recent logs:

GET app-logs-*/_search
{
"query": {
"match_all": {}
},
"size": 5
}

Check the number of documents indexed:

GET app-logs-*/_count

If no documents appear, check Filebeat logs at /var/log/filebeat/filebeat and Elasticsearch logs at /var/log/elasticsearch/ for errors.

Best Practices

1. Use Structured Logging (JSON)

Always prefer structured logging over plain text. Applications should emit logs in JSON format with consistent keys:

{ "@timestamp": "2024-06-01T12:34:56Z", "log.level": "error", "service.name": "payment-service", "message": "Failed to process payment: insufficient funds", "user.id": "usr_789", "transaction.id": "txn_123", "duration.ms": 234 }

Structured logs enable precise querying, filtering, and aggregation. They eliminate the need for complex grok patterns and reduce parsing errors.

2. Avoid High Cardinality Fields

Cardinality refers to the number of unique values in a field. High-cardinality fields (e.g., user IDs, session IDs, request URLs) can severely impact Elasticsearch performance and memory usage.

Best practices:

Use keyword type only for fields you need to aggregate on.
For long or variable text (e.g., URLs), use text for full-text search and keyword for exact matches.
Avoid indexing entire stack traces unless necessary. Instead, extract error codes or message summaries.

3. Optimize Index Settings for Write Throughput

For high-volume log ingestion:

Set refresh_interval to 30s or 60s.
Disable _source if you dont need to retrieve original documents (not recommended for logs).
Use index.codec: best_compression to reduce disk usage.
Use SSD storage for data nodes.

4. Implement Index Lifecycle Management (ILM)

Never manually delete indices. Use ILM to automate rollover and deletion based on size or age. This prevents storage exhaustion and ensures compliance with data retention policies.

5. Secure Your Stack

Enable TLS/SSL between Filebeat and Elasticsearch:

output.elasticsearch:
hosts: ["https://elasticsearch:9200"]
ssl.certificate_authorities: ["/etc/pki/tls/certs/ca.crt"]
username: "filebeat"
password: "secret"

Use role-based access control (RBAC) in Elasticsearch. Create dedicated users for each shipper with minimal privileges:

PUT /_security/role/filebeat_writer
{
"cluster": ["monitor"],
"indices": [
{
"names": ["app-logs-*", "nginx-*"],
"privileges": ["write", "create_index"]
}
]
}

6. Monitor Shipper and Cluster Health

Use Filebeats built-in monitoring or Prometheus + Grafana to track:

Events sent vs. events received
Backlog size
Connection errors
Elasticsearch indexing rate and latency

Set up alerts for:

Filebeat stopped
Elasticsearch cluster red status
Indexing errors exceeding threshold

7. Avoid Over-Indexing

Not every log line needs to be indexed. Filter out noisy or irrelevant logs (e.g., health checks, debug messages) using Filebeat or Logstash filters:

if [message] contains "GET /health" {
drop {}
}

This reduces storage costs and improves query performance.

Tools and Resources

Core Tools

Elasticsearch: The search and analytics engine that stores and indexes logs.
Filebeat: Lightweight log shipper for file-based logs.
Logstash: Server-side pipeline for parsing, filtering, and enriching logs.
Kibana: Visualization and dashboarding tool for exploring indexed logs.
Fluent Bit: Alternative lightweight shipper, ideal for Kubernetes.
Vector: High-performance, single-binary agent with rich transformations.

Template Repositories

Elastic Filebeat Modules Pre-built configurations for common services.
Elastic Common Schema (ECS) Standardized field names for consistent log structure.
Ansible Playbooks Automate Elasticsearch and Filebeat deployment.

Monitoring and Observability

Elastic APM: Instrument applications to correlate logs with performance metrics.
Prometheus + Grafana: Monitor Filebeat and Elasticsearch metrics via exporters.
ELK Stack Monitoring: Built-in monitoring dashboard in Kibana under Stack Monitoring.

Learning Resources

Real Examples

Example 1: Indexing Kubernetes Pod Logs

In a Kubernetes cluster, logs from pods are typically stored at /var/log/containers/ on the node. Filebeat can be deployed as a DaemonSet to collect them:

apiVersion: apps/v1 kind: DaemonSet metadata: name: filebeat namespace: kube-system spec: selector: matchLabels: app: filebeat template: metadata: labels: app: filebeat spec: containers: - name: filebeat image: docker.elastic.co/beats/filebeat:8.12.0 args: [ "-c", "/etc/filebeat.yml", "-e" ] volumeMounts: - name: config-volume mountPath: /etc/filebeat.yml subPath: filebeat.yml - name: varlog mountPath: /var/log/containers - name: varlibdockercontainers mountPath: /var/lib/docker/containers volumes: - name: config-volume configMap: name: filebeat-config - name: varlog hostPath: path: /var/log/containers - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers

Filebeat configuration:

filebeat.inputs: - type: container paths: - /var/log/containers/*.log processors: - add_kubernetes_metadata: in_cluster: true host: ${NODE_NAME} json.keys_under_root: true json.add_error_key: true output.elasticsearch: hosts: ["https://elasticsearch:9200"] ssl.certificate_authorities: ["/etc/pki/tls/certs/ca.crt"] username: "filebeat" password: "${ELASTIC_PASSWORD}" index: "k8s-logs-%{+yyyy.MM.dd}"

This setup automatically enriches logs with Kubernetes metadata: pod name, namespace, labels, and container ID.

Example 2: Indexing AWS CloudWatch Logs

Use the AWS CLI or Lambda to forward CloudWatch logs to Elasticsearch:

import boto3
import requests
import json
def lambda_handler(event, context):
es_endpoint = "https://your-es-domain.us-east-1.es.amazonaws.com"
index_name = "cloudwatch-logs-2024.06.01"
es_username = "es-user"
es_password = "secret"
for record in event['Records']:
log_data = json.loads(record['Sns']['Message'])
for log_event in log_data['logEvents']:
doc = {
"@timestamp": log_event['timestamp'],
"message": log_event['message'],
"logGroup": log_data['logGroup'],
"logStream": log_data['logStream']
}
response = requests.post(
f"{es_endpoint}/{index_name}/_doc",
auth=(es_username, es_password),
json=doc,
headers={'Content-Type': 'application/json'}
)
if response.status_code != 201:
print(f"Failed to index: {response.text}")

Trigger this Lambda via CloudWatch Logs subscription filter. This method is useful for centralized ingestion from multiple AWS accounts.

Example 3: Centralized Application Logs with Docker Compose

Deploy a full stack using Docker Compose:

version: '3.8' services: elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0 environment: - discovery.type=single-node - xpack.security.enabled=false ports: - "9200:9200" volumes: - esdata:/usr/share/elasticsearch/data kibana: image: docker.elastic.co/kibana/kibana:8.12.0 ports: - "5601:5601" depends_on: - elasticsearch filebeat: image: docker.elastic.co/beats/filebeat:8.12.0 volumes: - ./filebeat.yml:/usr/share/filebeat/filebeat.yml - ./logs:/var/log/app depends_on: - elasticsearch volumes: esdata:

Run docker-compose up and place sample logs in the ./logs directory. Filebeat will pick them up and index them into Elasticsearch.

FAQs

What is the difference between indexing and searching logs in Elasticsearch?

Indexing is the process of ingesting raw log data and converting it into structured documents stored in Elasticsearch indices. Searching is the act of querying those indexed documents using DSL (Domain Specific Language) to retrieve specific logs based on filters, ranges, or keywords. Indexing must happen before searching.

Can I index logs without using Filebeat?

Yes. You can use Logstash, Fluent Bit, Vector, or even custom scripts (e.g., Python with Elasticsearch client) to send logs. However, Filebeat is the most reliable, lightweight, and officially supported option for file-based logs.

How much disk space do logs consume in Elasticsearch?

It depends on volume and compression. On average, structured JSON logs consume 15 GB per million events. Using best_compression codec and ILM can reduce this by 3050%. Monitor usage with _cat/indices?v and set retention policies accordingly.

Why are my logs not appearing in Kibana?

Common causes:

Filebeat is not running or has connection errors.
Elasticsearch index pattern in Kibana doesnt match the actual index name (e.g., nginx-* vs nginx-access-*).
Missing or incorrect @timestamp field.
Index template not loaded or overridden.

Check Filebeat logs, Elasticsearch logs, and verify the index pattern in Kibana under Stack Management ? Index Patterns.

Should I use one index or many indices for logs?

Use many time-series indices (e.g., daily or weekly) with ILM. A single large index is harder to manage, slower to query, and harder to delete. Time-based indices improve performance, simplify backups, and enable granular retention.

How do I handle multiline logs (e.g., Java stack traces)?

In Filebeat, use the multiline processor:

filebeat.inputs:
- type: filestream
paths:
- /var/log/myapp/*.log
multiline.pattern: '^[[:space:]]+(at|\.{3})\b|^Caused by:'
multiline.match: after

This combines lines starting with whitespace or at into a single event, preserving stack trace context.

Can I index logs from SaaS applications?

Yes. Many SaaS platforms (e.g., Datadog, Sentry, Heroku) offer webhook or syslog integrations. You can forward their logs to a central Filebeat or Logstash instance, then into Elasticsearch.

Is Elasticsearch the only option for log indexing?

No. Alternatives include OpenSearch, Loki (with Promtail), Splunk, and Graylog. However, Elasticsearch remains the most popular due to its rich ecosystem, performance, and integration with Kibana.

Conclusion

Indexing logs into Elasticsearch is a critical capability for modern infrastructure observability. By following the steps outlined in this guidefrom selecting the right log shipper and configuring index templates, to implementing ILM and securing your stackyou can build a robust, scalable, and maintainable log management system. The key to success lies in structuring your logs consistently, automating lifecycle management, and monitoring every component of the pipeline.

As your environment grows, so too should your logging strategy. Start simple with Filebeat and JSON logs, then progressively add complexity with Logstash, Kubernetes integration, and advanced Kibana visualizations. Always prioritize performance, security, and cost-efficiency. With Elasticsearch as your central log repository, you gain the power to not only react to incidents but to predict and prevent them through data-driven insights.

Remember: logs are not just for debuggingthey are your systems memory. Index them well, and youll never lose sight of whats happening inside your applications, no matter how complex they become.

alex

How to Index Logs Into Elasticsearch

How to Index Logs Into Elasticsearch

Step-by-Step Guide

1. Understand Your Log Sources

2. Choose a Log Shipper

3. Install and Configure Filebeat

4. Enable and Start Filebeat

5. Configure Elasticsearch for Log Indexing

Cluster Settings

Index Settings

6. Set Up Index Lifecycle Management (ILM)

7. Ingest and Parse Logs (Optional: Use Logstash)

8. Verify Indexing

Best Practices

1. Use Structured Logging (JSON)

2. Avoid High Cardinality Fields

3. Optimize Index Settings for Write Throughput

4. Implement Index Lifecycle Management (ILM)

5. Secure Your Stack

6. Monitor Shipper and Cluster Health

7. Avoid Over-Indexing

Tools and Resources

Core Tools

Template Repositories

Monitoring and Observability

Learning Resources

Real Examples

Example 1: Indexing Kubernetes Pod Logs

Example 2: Indexing AWS CloudWatch Logs

Example 3: Centralized Application Logs with Docker Compose

FAQs

What is the difference between indexing and searching logs in Elasticsearch?

Can I index logs without using Filebeat?

How much disk space do logs consume in Elasticsearch?

Why are my logs not appearing in Kibana?

Should I use one index or many indices for logs?

How do I handle multiline logs (e.g., Java stack traces)?

Can I index logs from SaaS applications?

Is Elasticsearch the only option for log indexing?

Conclusion

Related Posts

Popular Posts

Recommended Posts

Popular Tags