How to Backup Elasticsearch Data
How to Backup Elasticsearch Data Elasticsearch is a powerful, distributed search and analytics engine widely used for real-time data indexing, log analysis, application monitoring, and full-text search. As organizations increasingly rely on Elasticsearch to store mission-critical data—ranging from user activity logs to product catalogs—the risk of data loss becomes a serious concern. Whether due t
How to Backup Elasticsearch Data
Elasticsearch is a powerful, distributed search and analytics engine widely used for real-time data indexing, log analysis, application monitoring, and full-text search. As organizations increasingly rely on Elasticsearch to store mission-critical dataranging from user activity logs to product catalogsthe risk of data loss becomes a serious concern. Whether due to hardware failure, human error, software bugs, or cyberattacks, losing Elasticsearch data can result in costly downtime, compliance violations, and operational disruption.
Backing up Elasticsearch data is not optionalits a fundamental requirement for any production environment. A well-planned backup strategy ensures data durability, enables rapid recovery, and supports compliance with data governance policies. This guide provides a comprehensive, step-by-step approach to backing up Elasticsearch data, covering best practices, recommended tools, real-world examples, and answers to frequently asked questions. By the end of this tutorial, youll have the knowledge and confidence to implement a robust, scalable backup solution tailored to your infrastructure.
Step-by-Step Guide
Understand Elasticsearch Snapshot Architecture
Before initiating any backup process, its essential to understand how Elasticsearch handles backups natively. Elasticsearch does not support traditional file-level backups. Instead, it uses a feature called Snapshot and Restore, which creates point-in-time backups of indices and cluster metadata. These snapshots are stored in a shared repository, which can be located on a file system, S3-compatible object storage, HDFS, or Azure Blob Storage.
Each snapshot contains:
- Index data (shards and segments)
- Cluster state and metadata (settings, mappings, aliases)
- Reference to the actual data files, not copies (incremental and efficient)
Because snapshots are incremental, only new or changed data since the last snapshot is stored. This makes subsequent backups fast and storage-efficient. However, the repository must be accessible by all nodes in the clusterthis is a critical architectural consideration.
Step 1: Choose a Repository Type
The first step in creating a backup is selecting a suitable repository type. Elasticsearch supports several repository plugins, each suited for different environments:
- File System Repository: Best for single-node or small clusters with shared storage (e.g., NFS). Simple to configure but not recommended for production clusters with multiple nodes unless the storage is highly available.
- S3 Repository: Ideal for cloud-native deployments. Uses the
repository-s3plugin and integrates seamlessly with AWS S3. Highly scalable and durable. - Azure Blob Storage Repository: For Azure-hosted environments. Uses the
repository-azureplugin. - HDFS Repository: For organizations using Hadoop ecosystems. Uses the
repository-hdfsplugin.
For most modern deployments, S3 is the recommended choice due to its durability, scalability, and cost-effectiveness.
Step 2: Install the Required Repository Plugin
If youre using S3 (the most common scenario), you must install the S3 repository plugin on every Elasticsearch node. This plugin is not included by default.
On Linux systems, run the following command on each node:
bin/elasticsearch-plugin install repository-s3
After installation, restart each Elasticsearch node to load the plugin:
sudo systemctl restart elasticsearch
Verify the plugin is installed by checking the plugins directory or using the Elasticsearch API:
GET _cat/plugins?v
You should see repository-s3 listed in the output.
Step 3: Configure AWS Credentials
To allow Elasticsearch to write to S3, you must provide AWS credentials. There are several ways to do this:
- Explicit credentials in repository settings (less secure)
- EC2 Instance Profile (recommended for AWS-hosted clusters)
- Environment variables
- AWS credentials file (
~/.aws/credentials)
For production environments, the EC2 Instance Profile method is strongly preferred. Assign an IAM role to your EC2 instances with the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-backup-bucket"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::your-backup-bucket/*"
]
}
]
}
If you must use explicit credentials, configure them in the Elasticsearch elasticsearch.yml file:
s3.client.default.access_key: YOUR_ACCESS_KEY
s3.client.default.secret_key: YOUR_SECRET_KEY
s3.client.default.endpoint: s3.amazonaws.com
Warning: Never commit credentials to version control. Use secrets management tools like HashiCorp Vault or AWS Secrets Manager instead.
Step 4: Register a Snapshot Repository
Once the plugin is installed and credentials are configured, register your S3 bucket as a snapshot repository using the Elasticsearch REST API.
Use the following PUT request to create a repository named backup-s3-repo:
PUT _snapshot/backup-s3-repo
{
"type": "s3",
"settings": {
"bucket": "your-backup-bucket",
"region": "us-east-1",
"base_path": "elasticsearch/snapshots",
"compress": true,
"chunk_size": "500mb"
}
}
Key settings explained:
- bucket: The name of your S3 bucket.
- region: The AWS region where the bucket resides.
- base_path: Optional subdirectory within the bucket to organize snapshots.
- compress: Enables compression of metadata (recommended).
- chunk_size: Size of data chunks uploaded to S3 (default is 5GB; reduce for slower networks).
After sending the request, Elasticsearch will validate access to the bucket and register the repository. You can verify registration with:
GET _snapshot
You should see your repository listed:
{
"backup-s3-repo": {
"type": "s3",
"settings": {
"bucket": "your-backup-bucket",
"region": "us-east-1",
...
}
}
}
Step 5: Create Your First Snapshot
Now that the repository is registered, you can create a snapshot. Snapshots can include all indices, specific indices, or exclude certain indices.
To back up all indices:
PUT _snapshot/backup-s3-repo/snapshot-2024-06-15
{
"indices": "*",
"ignore_unavailable": true,
"include_global_state": true
}
Key parameters:
- indices: Use
*for all indices, or specify comma-separated names likelogs-2024-06-15,users. - ignore_unavailable: Prevents the snapshot from failing if some indices are offline or missing.
- include_global_state: Includes cluster settings and persistent settings (recommended for full recovery).
By default, snapshots are created asynchronously. To monitor progress, use:
GET _snapshot/backup-s3-repo/snapshot-2024-06-15
Response will show status as IN_PROGRESS initially, then SUCCESS or FAILED.
Step 6: Automate Snapshots with Index Lifecycle Management (ILM)
Manually creating snapshots is not scalable. For production environments, automate backups using Elasticsearchs Index Lifecycle Management (ILM) policy with a snapshot phase.
First, define an ILM policy:
PUT _ilm/policy/backup-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_age": "30d",
"max_size": "50gb"
}
}
},
"warm": {
"min_age": "30d",
"actions": {
"allocate": {
"number_of_replicas": 0
}
}
},
"cold": {
"min_age": "90d",
"actions": {
"freeze": {}
}
},
"delete": {
"min_age": "365d",
"actions": {
"delete": {}
}
},
"snapshot": {
"min_age": "7d",
"actions": {
"snapshot": {
"repository": "backup-s3-repo",
"name": "<logs-{now/d}-snapshot>"
}
}
}
}
}
}
Then, associate this policy with an index template:
PUT _index_template/logs-template
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.lifecycle.name": "backup-policy",
"index.lifecycle.rollover_alias": "logs"
}
}
}
This setup automatically creates a snapshot every 7 days for any index matching logs-*, and deletes the index after 365 days. This ensures consistent, scheduled backups without manual intervention.
Step 7: Test Your Backup by Restoring
Creating a backup is only half the battle. You must validate that you can restore from it. Never assume your backup works until youve tested it.
To restore a snapshot to a new index:
POST _snapshot/backup-s3-repo/snapshot-2024-06-15/_restore
{
"indices": "logs-2024-06-15",
"rename_pattern": "logs-(.+)",
"rename_replacement": "restored-logs-$1",
"include_global_state": false
}
Key parameters:
- rename_pattern and rename_replacement: Allow you to restore with a different index name, avoiding conflicts.
- include_global_state: Set to
falseunless you want to overwrite cluster-wide settings.
Monitor restore progress:
GET _cat/restore?v
Once complete, verify data integrity by searching:
GET restored-logs-2024-06-15/_search
Compare document counts and sample data with the original index. If they match, your backup is valid.
Best Practices
1. Schedule Regular Snapshots
Establish a consistent backup schedule based on your data volatility and recovery point objective (RPO). For high-traffic systems, daily snapshots are recommended. For less dynamic data, weekly snapshots may suffice. Use cron jobs or orchestration tools (like Apache Airflow or Kubernetes CronJobs) to trigger snapshots via the Elasticsearch API.
2. Retain Multiple Versions
Dont overwrite snapshots. Keep at least 7 daily snapshots, 4 weekly, and 12 monthly. This provides multiple recovery points and protects against latent corruption or accidental deletion. Use lifecycle policies to automatically delete older snapshots after a set period.
3. Use Dedicated Storage
Never store snapshots on the same storage as your Elasticsearch data. Use a separate, geographically redundant object store (e.g., S3 with cross-region replication). This ensures availability even if your cluster is destroyed.
4. Encrypt Snapshots
Enable server-side encryption on your S3 bucket (SSE-S3 or SSE-KMS). Elasticsearch does not encrypt snapshot data at rest by default. Encryption protects sensitive data from unauthorized access if the bucket is compromised.
5. Monitor Snapshot Health
Set up alerts for failed snapshots using Elasticsearchs monitoring features or external tools like Prometheus + Grafana. Monitor metrics such as:
snapshot_stats.snapshot_countsnapshot_stats.failed_snapshot_countsnapshot_stats.bytes_per_second
Failure to detect a failed snapshot can lead to a false sense of security.
6. Exclude Unnecessary Indices
Not all indices need to be backed up. Exclude temporary, internal, or cache indices (e.g., .kibana_*, .monitoring*, .logstash*) unless they contain critical configuration. This reduces snapshot size and speeds up the process.
7. Test Restores Periodically
Perform a full restore test at least quarterly. Simulate a disaster scenario: shut down a node, delete an index, and restore from snapshot. Document the steps and time required. This ensures your team is prepared for real emergencies.
8. Secure Access to Snapshots
Restrict access to your snapshot repository. Use IAM policies, VPC endpoints, or private S3 buckets with bucket policies that only allow access from your Elasticsearch clusters IP range or VPC. Never expose snapshot repositories to the public internet.
9. Document Your Backup Strategy
Create a runbook detailing:
- Repository configuration
- Snapshot schedule
- Retention policy
- Restore procedure
- Contact persons for recovery
Store this documentation in a version-controlled repository (e.g., Git) so its accessible during outages.
10. Plan for Cross-Cluster Recovery
If you operate multiple clusters (e.g., dev, staging, prod), ensure snapshots can be restored across clusters. Snapshot metadata is version-sensitivesnapshots created on Elasticsearch 8.x cannot be restored on 7.x. Always maintain version compatibility between source and target clusters.
Tools and Resources
Elasticsearch Native Tools
- Snapshot and Restore API: The core mechanism for creating and managing backups. Accessible via REST API or Kibana Dev Tools.
- Kibana Snapshot and Restore UI: Available in Elasticsearch Service and Elastic Cloud. Provides a graphical interface to manage repositories and snapshots without writing API requests.
- Index Lifecycle Management (ILM): Automates snapshot creation based on index age or size.
- Elasticsearch Monitoring: Built-in metrics and alerts for snapshot success/failure rates.
Third-Party Tools
- Elastic Cloud (Elasticsearch Service): Fully managed service that includes automated snapshots, cross-region replication, and one-click restore. Ideal for teams without dedicated DevOps staff.
- Curator: A Python-based tool for managing Elasticsearch indices, including snapshot creation and deletion. Useful for legacy deployments or complex filtering.
- Logstash + S3 Output: Not a true backup tool, but useful for exporting data to S3 for archival. Does not preserve mappings or settings.
- Velero: Kubernetes-native backup tool that can back up Elasticsearch stateful sets and associated PVCs. Best used in conjunction with Elasticsearch snapshots for full-stack recovery.
- Percona Backup for MongoDB (PBM): Not for Elasticsearch, but worth noting for comparisonmany tools are now adopting snapshot-based architectures inspired by Elasticsearchs model.
Documentation and Community
- Official Elasticsearch Snapshot and Restore Guide
- Repository Types and Configuration
- Elastic Discuss Forum Community support and troubleshooting
- Elasticsearch GitHub Repository Source code and issue tracking
Monitoring and Alerting Tools
- Prometheus + Elasticsearch Exporter: Collects snapshot metrics for visualization.
- Grafana: Dashboards for snapshot success rate, duration, and size trends.
- ELK Stack (Elasticsearch, Logstash, Kibana): Use Kibana to monitor your own backup health via custom visualizations.
- PagerDuty / Opsgenie: Integrate with Elasticsearch alerts to notify on snapshot failures.
Real Examples
Example 1: E-Commerce Platform with Daily Snapshots
A mid-sized e-commerce company runs Elasticsearch to index product catalogs, user reviews, and search logs. They process 2TB of data daily across 10 indices.
Strategy:
- Repository: S3 bucket in us-west-2 with versioning enabled
- Schedule: Daily snapshot at 2 AM UTC
- Retention: 30 daily, 12 weekly, 6 monthly snapshots
- Automation: Cron job triggers API call via curl
- Monitoring: Prometheus scrapes snapshot metrics; alert triggered if snapshot fails for 2 consecutive days
- Restore Test: Quarterly full restore to a staging cluster
Outcome: After a database corruption incident caused by a faulty data pipeline, the team restored the product catalog from a 24-hour-old snapshot in under 45 minutes. Downtime was limited to 1 hour, and no customer data was lost.
Example 2: Financial Services Log Aggregation
A bank uses Elasticsearch to store compliance logs from 50+ applications. Logs must be retained for 7 years for audit purposes.
Strategy:
- Repository: S3 with lifecycle policy moving data to Glacier Deep Archive after 1 year
- Schedule: Hourly snapshots for last 7 days; daily for last 30 days
- Index Template: Uses ILM to rollover daily, freeze after 90 days, snapshot after 180 days
- Encryption: SSE-KMS with customer-managed key
- Access Control: S3 bucket policy allows access only from VPC endpoint
- Compliance: Snapshots audited monthly; checksums stored in AWS CloudTrail
Outcome: During a regulatory audit, auditors requested logs from 2 years ago. The team restored the required index from a snapshot in 12 minutes, demonstrating full compliance.
Example 3: Startup Using Elastic Cloud
A startup with limited engineering resources uses Elastic Cloud (hosted Elasticsearch) to power its analytics dashboard.
Strategy:
- Repository: Managed by Elastic Cloud (automatically configured)
- Schedule: Automatic daily snapshots with 14-day retention
- Restore: One-click restore via Kibana UI
- Backup Verification: Elastic Cloud performs integrity checks on snapshots
Outcome: After a misconfigured script deleted all user data, the team restored the entire cluster from the most recent snapshot in 15 minutes using the Elastic Cloud console. No custom tooling was required.
FAQs
Can I backup Elasticsearch by copying the data directory?
No. Directly copying the data directory is unsupported and will result in corrupted or incomplete backups. Elasticsearch shards are distributed and actively written to. A file-level copy will capture inconsistent states and may not be restorable. Always use the Snapshot and Restore API.
How long does a snapshot take?
Snapshot time depends on data size, network bandwidth, and storage performance. Small clusters (under 100GB) may complete in minutes. Large clusters (10TB+) may take hours. Incremental snapshots are much faster than full ones. Use the GET _snapshot/_status API to monitor progress in real time.
Do snapshots include security settings and users?
Yes, if include_global_state: true is set. This includes role mappings, API keys, and other security configurations. However, if youre restoring to a cluster with different security settings (e.g., different realm configurations), you may need to manually reconcile permissions.
Can I restore a snapshot to a different Elasticsearch version?
Restores are only supported to the same major version or a higher minor version (e.g., 8.1 ? 8.5). Restoring from 7.x to 8.x is not supported without a full reindex. Always test cross-version compatibility in a non-production environment.
What happens if my snapshot repository becomes unavailable?
Snapshots are stored in the repository, so if the repository (e.g., S3 bucket) is deleted or inaccessible, you lose access to all snapshots. Never delete or modify the repository manually. Use versioning and bucket policies to prevent accidental deletion.
Are snapshots compressed?
Yes. By default, metadata is compressed. You can enable compression for data segments by setting "compress": true in the repository settings. This reduces storage costs and improves transfer speed.
Can I backup only specific fields or documents?
No. Snapshots are index-level and include all documents and mappings. To backup subsets of data, use the reindex API to copy filtered data into a new index, then snapshot that index.
How much does storing snapshots cost?
Costs depend on your storage provider. For example, AWS S3 Standard costs approximately $0.023 per GB/month. With compression and incremental snapshots, storage costs are typically 1020% of your total Elasticsearch data volume. Glacier storage reduces this further to $0.004 per GB/month.
Should I backup Kibana saved objects separately?
Yes. Kibana dashboards, visualizations, and saved searches are stored in the .kibana_* index. If you snapshot this index, theyll be restored with your cluster. Alternatively, export them manually via Kibanas Save Objects feature for safekeeping outside the cluster.
Can I snapshot a single shard?
No. Snapshots are taken at the index level. You cannot snapshot individual shards. However, you can snapshot specific indices, which may contain a single shard if youve configured them that way.
Conclusion
Backing up Elasticsearch data is not a one-time taskits an ongoing operational discipline. The native Snapshot and Restore feature provides a powerful, efficient, and scalable mechanism to protect your data, but only if implemented correctly. By following the step-by-step guide above, adopting best practices, leveraging automation tools, and regularly testing restores, you can ensure your Elasticsearch clusters remain resilient against data loss.
Remember: A backup that hasnt been tested is not a backup. Regular validation, clear documentation, and automated monitoring are what separate reactive teams from proactive, reliable ones. Whether youre managing a small development cluster or a global enterprise system, investing time in a robust backup strategy today will save you from catastrophic failure tomorrow.
Start by registering your first repository. Schedule your first snapshot. Test your first restore. Then repeat. Your dataand your organizationwill thank you.