How to Restore Elasticsearch Snapshot
How to Restore Elasticsearch Snapshot Elasticsearch snapshots are a critical component of any robust data management strategy. Whether you're recovering from accidental deletion, migrating data across clusters, or preparing for disaster recovery, the ability to restore an Elasticsearch snapshot ensures business continuity and data integrity. A snapshot is a point-in-time backup of one or more indi
How to Restore Elasticsearch Snapshot
Elasticsearch snapshots are a critical component of any robust data management strategy. Whether you're recovering from accidental deletion, migrating data across clusters, or preparing for disaster recovery, the ability to restore an Elasticsearch snapshot ensures business continuity and data integrity. A snapshot is a point-in-time backup of one or more indices, stored in a shared repository such as Amazon S3, HDFS, or a network file system. Restoring a snapshot allows you to recover your data to a previous state, minimizing downtime and data loss. In this comprehensive guide, well walk you through the entire process of restoring Elasticsearch snapshotsfrom preparation and configuration to execution and validationalong with best practices, real-world examples, and essential tools to ensure success.
Step-by-Step Guide
Prerequisites: Preparing Your Environment
Before initiating a restore operation, ensure your environment meets the following prerequisites:
- Elasticsearch cluster running The target cluster must be operational and accessible.
- Snapshot repository registered The repository where the snapshot was created must be registered in the target cluster. If the repository is not already registered, you must register it using the same settings as the source cluster.
- Compatible versions Elasticsearch snapshots are backward compatible within the same major version. For example, a snapshot created on Elasticsearch 8.5 can be restored on 8.6 or 8.7, but not on 7.x. Always verify version compatibility before proceeding.
- Sufficient disk space The target cluster must have adequate storage capacity to accommodate the restored indices. Monitor available disk space using the
_cat/allocationAPI. - Appropriate permissions Ensure the user executing the restore has the necessary privileges, such as
manage_snapshotsandcreate_indexon the target indices.
Step 1: List Available Snapshots
Begin by listing all snapshots stored in your registered repository to identify the exact snapshot you wish to restore. Use the following API request:
GET /_snapshot/my_backup_repository/_all
Replace my_backup_repository with the name of your registered repository. The response will include a JSON array of all snapshots, each containing:
snapshotThe unique name of the snapshotversionThe Elasticsearch version used to create the snapshotstateThe current state (e.g.,SUCCESS,FAILED)start_timeandend_timeTimestamps for when the snapshot was takenindicesList of indices included in the snapshot
Example response snippet:
{
"snapshots": [
{
"snapshot": "snapshot_2024_04_01",
"version": "8.12.0",
"state": "SUCCESS",
"start_time": "2024-04-01T02:00:00.000Z",
"end_time": "2024-04-01T02:15:00.000Z",
"indices": [
"logs-2024-03",
"metrics-2024-03",
"events-index"
]
}
]
}
Take note of the snapshot name and the indices it contains. This information is critical for the next step.
Step 2: Check the Status of the Snapshot
Before restoring, verify that the snapshot is complete and healthy. Use the following command to inspect the status of a specific snapshot:
GET /_snapshot/my_backup_repository/snapshot_2024_04_01
This returns detailed metadata about the snapshot, including the number of files, total size, and any failed shards. A snapshot with a state of FAILED or IN_PROGRESS should not be restored until the issue is resolved.
Step 3: Close or Delete Conflicting Indices (Optional)
If you are restoring a snapshot that contains indices with the same names as existing indices in your target cluster, you must either:
- Close the existing indices:
POST /logs-2024-03/_close - Delete the existing indices:
DELETE /logs-2024-03
Restoring into an open index with the same name will result in an error. Closing an index preserves its mapping and settings but prevents writes. Deleting removes it entirely. Choose based on your recovery goals.
Use the _cat/indices API to confirm the current state of your indices:
GET /_cat/indices?v
Step 4: Execute the Restore Command
Once prerequisites are met, initiate the restore using the _restore API. The simplest form restores all indices from the snapshot:
POST /_snapshot/my_backup_repository/snapshot_2024_04_01/_restore
This command restores all indices in the snapshot with their original names and settings. However, you can customize the restore process using optional parameters:
Restore Specific Indices
To restore only a subset of indices from the snapshot:
POST /_snapshot/my_backup_repository/snapshot_2024_04_01/_restore
{
"indices": "logs-2024-03,metrics-2024-03",
"ignore_unavailable": true,
"include_global_state": false
}
indicesComma-separated list of indices to restore.ignore_unavailableIf set totrue, ignores indices in the snapshot that do not exist (useful when restoring partial data).include_global_stateIftrue, restores cluster-wide settings and templates. Use with cautionthis may overwrite existing configurations.
Rename Indices During Restore
One of the most powerful features of Elasticsearch restore is the ability to rename indices during the process. This is essential when restoring into a production cluster without overwriting live data:
POST /_snapshot/my_backup_repository/snapshot_2024_04_01/_restore
{
"indices": "logs-2024-03",
"rename_pattern": "logs-(.+)",
"rename_replacement": "logs-2024-03-backup-$1"
}
In this example:
rename_patternUses a regular expression to match the original index name (logs-2024-03).rename_replacementReplaces the matched pattern with a new name (logs-2024-03-backup-2024-03).
This technique is invaluable for testing restores in staging environments or creating archives without disrupting active indices.
Step 5: Monitor the Restore Progress
After initiating the restore, monitor its progress using the following API:
GET /_recovery?pretty
This returns detailed information about all ongoing recovery operations, including:
- Index name
- Shard ID
- Source repository
- Bytes transferred
- Percentage completed
- Estimated time remaining
For a focused view of a specific index:
GET /_recovery/logs-2024-03-backup-2024-03?pretty
Alternatively, use the snapshot status API to check the restore status:
GET /_snapshot/my_backup_repository/_all?pretty
Look for the snapshots state field. During restore, it will show as IN_PROGRESS. Once complete, it returns to SUCCESS.
Step 6: Validate the Restored Data
After the restore completes, validate the integrity and completeness of the data:
Check Index Health
GET /_cluster/health/logs-2024-03-backup-2024-03?pretty
Ensure the status is green (all primary and replica shards allocated) or at least yellow (all primary shards allocated).
Count Documents
Compare the document count in the restored index with the original:
GET /logs-2024-03-backup-2024-03/_count
If the count matches the expected value from the source, the restore was successful.
Query Sample Data
Perform a sample search to confirm data integrity:
GET /logs-2024-03-backup-2024-03/_search
{
"size": 1,
"query": {
"match_all": {}
}
}
Verify that the returned documents contain expected fields and values.
Step 7: Reopen or Reindex (If Needed)
If you closed indices before restoring, reopen them after validation:
POST /logs-2024-03-backup-2024-03/_open
If you restored into a renamed index and need to replace the original, you can use the Reindex API to copy data:
POST /_reindex
{
"source": {
"index": "logs-2024-03-backup-2024-03"
},
"dest": {
"index": "logs-2024-03"
}
}
Reindexing is useful when you need to preserve the original index name while ensuring data consistency.
Best Practices
1. Automate Snapshot Creation and Retention
Manually creating snapshots is error-prone and unsustainable. Use Elasticsearchs Index Lifecycle Management (ILM) or third-party tools like Elastic Curator to automate snapshot creation on a schedule (daily, weekly). Define retention policies to automatically delete snapshots older than a specified period (e.g., 30 days) to avoid storage bloat.
2. Test Restores Regularly
A snapshot is only as good as its ability to be restored. Schedule quarterly restore drills in a non-production environment. Simulate real-world scenarios: restore a single index, rename indices, restore from a corrupted snapshot. Document the process and refine it based on findings.
3. Use Separate Repositories for Different Environments
Do not share snapshot repositories between development, staging, and production clusters. Use distinct repositories (e.g., prod-backup-s3, staging-backup-nfs) to avoid accidental overwrites and ensure isolation.
4. Avoid Restoring Global State Unless Necessary
The include_global_state parameter restores cluster settings, templates, and machine learning jobs. This can overwrite critical configurations in your target cluster. Only enable this if you are restoring an entire cluster from scratch and have a full backup of the current configuration.
5. Monitor Storage and Network Bandwidth
Snapshot restores can consume significant network bandwidth and disk I/O. Schedule restores during off-peak hours. For large snapshots (>100GB), consider using high-bandwidth connections and SSD-backed storage to reduce restore time.
6. Use Repository Types Wisely
Choose the right repository type based on your infrastructure:
- S3 Ideal for cloud deployments; highly durable and scalable.
- FS (File System) Suitable for on-premises clusters with shared storage (NFS, SAN).
- Azure, HDFS, GCS Use for cloud providers with native integrations.
Ensure the repository is configured with proper access controls and encryption.
7. Enable Snapshot Verification
When registering a repository, use the verify parameter to ensure Elasticsearch can read and write to the repository before creating snapshots:
PUT /_snapshot/my_backup_repository
{
"type": "s3",
"settings": {
"bucket": "my-es-backups",
"region": "us-east-1",
"base_path": "snapshots"
},
"verify": true
}
This prevents silent failures due to misconfigured credentials or permissions.
8. Document Your Snapshot Strategy
Document:
- Which indices are included in snapshots
- Frequency of snapshot creation
- Retention policy
- Restore procedure and contact points
- Known limitations (e.g., version compatibility)
Ensure this documentation is accessible to all relevant team members and reviewed annually.
Tools and Resources
Elasticsearch Built-in APIs
Elasticsearch provides a rich set of REST APIs for managing snapshots:
GET /_snapshotList all registered repositoriesGET /_snapshot/{repository}List snapshots in a repositoryGET /_snapshot/{repository}/{snapshot}Get detailed snapshot infoPOST /_snapshot/{repository}/{snapshot}/_restoreInitiate restoreGET /_recoveryMonitor restore progressGET /_cat/snapshotsHuman-readable snapshot list
Elastic Curator
Elastic Curator is a Python-based command-line tool for managing Elasticsearch indices and snapshots. It allows you to:
- Automate snapshot creation via cron jobs
- Apply retention policies
- Perform restores using YAML configuration files
Example Curator configuration for daily snapshots:
actions:
1:
action: snapshot
description: "Create daily snapshot"
options:
repository: my_backup_repository
name: "daily-snapshot-%Y.%m.%d"
ignore_unavailable: false
include_global_state: false
filters:
- filtertype: pattern
kind: regex
value: '^(logs|metrics|events)-.*'
2:
action: delete_snapshots
description: "Delete snapshots older than 30 days"
options:
repository: my_backup_repository
timeout_override: 300
continue_if_exception: false
filters:
- filtertype: age
source: creation_date
direction: older
unit: days
unit_count: 30
Third-Party Tools
- Portworx Provides container-native storage with snapshot capabilities for Kubernetes-hosted Elasticsearch.
- Stash by AppsCode Kubernetes-native backup solution that supports Elasticsearch snapshots via plugins.
- Elastic Cloud Managed Elasticsearch service that includes automated snapshots and one-click restore functionality via the UI.
Monitoring and Alerting
Integrate snapshot and restore operations into your observability stack:
- Use Elastic Observability to monitor snapshot success/failure rates.
- Set up alerts in Alerting for failed snapshots or long-running restores.
- Log restore events to a SIEM system for audit purposes.
Documentation and Community
Always refer to the official Elasticsearch documentation:
Community forums like Discuss Elastic and Stack Overflow are valuable for troubleshooting edge cases.
Real Examples
Example 1: Restoring After Accidental Index Deletion
Scenario: A developer accidentally ran DELETE /sales-data in production. The index contained 12 million documents and was critical for daily reporting.
Resolution:
- Identified the most recent snapshot:
snapshot_2024_04_01(created at 2:00 AM). - Confirmed the snapshot contained
sales-datausingGET /_snapshot/my_backup_repository/snapshot_2024_04_01. - Executed a restore with rename to avoid conflicts:
rename_replacement: "sales-data-restored". - Monitored restore progress via
_recoveryAPI (took 18 minutes). - Verified document count: 12,005,432 (matches original).
- Used Reindex API to copy data back to
sales-data. - Confirmed application functionality with QA team.
Outcome: Zero data loss. Downtime: 25 minutes.
Example 2: Migrating Data Between Clusters
Scenario: Migrating from an on-premises Elasticsearch 7.17 cluster to a cloud-hosted 8.12 cluster.
Resolution:
- Created a snapshot on the source cluster using an S3 repository.
- Registered the same S3 repository on the target cluster with identical credentials.
- Verified snapshot state:
SUCCESS. - Restored indices with rename pattern:
logs-(.*) ? logs-prod-$1. - Updated Logstash and Kibana configurations to point to new index names.
- Performed end-to-end testing with sample queries and dashboards.
- Decommissioned old cluster after 72 hours of stable operation.
Outcome: Successful migration with no service disruption.
Example 3: Disaster Recovery After Node Failure
Scenario: A data center outage caused 3 out of 5 master nodes to fail. The cluster became unresponsive.
Resolution:
- Provisioned a new 5-node cluster in a different region.
- Registered the snapshot repository (S3) on the new cluster.
- Restored the latest snapshot with
include_global_state: trueto recover cluster settings and templates. - Restored all indices with original names.
- Reconfigured load balancers to point to the new cluster.
- Monitored cluster health for 24 hours.
Outcome: Full cluster recovery in 4 hours. Data integrity confirmed.
FAQs
Can I restore a snapshot from a higher Elasticsearch version to a lower one?
No. Elasticsearch snapshots are not forward-compatible. A snapshot created on version 8.x cannot be restored on 7.x. Always ensure the target cluster is running the same or a higher major version.
What happens if a snapshot is corrupted or incomplete?
If a snapshot is marked as FAILED or has missing files, the restore will fail. Elasticsearch validates snapshot integrity before restore. If corruption is suspected, recreate the snapshot from a healthy source. Use the verify flag when registering repositories to catch issues early.
Can I restore a snapshot to a different cluster with fewer nodes?
Yes, but Elasticsearch will allocate shards based on available nodes. If the number of replicas exceeds available nodes, some shards will remain unassigned (cluster status: yellow). You can reduce the number of replicas before restore using the settings parameter:
POST /_snapshot/my_backup_repository/snapshot_2024_04_01/_restore
{
"indices": "logs-*",
"settings": {
"index.number_of_replicas": 0
}
}
Do snapshots include security settings and users?
By default, no. Snapshots do not include security-related data (users, roles, API keys) unless you explicitly enable include_global_state: true. However, even then, security data may not be fully compatible across clusters with different authentication backends (e.g., LDAP vs. native realm).
How long does a restore take?
Restore time depends on:
- Snapshot size (GB/TB)
- Network bandwidth between repository and cluster
- Storage performance (SSD vs. HDD)
- Number of shards and documents
As a rough estimate: 10GB takes 510 minutes; 1TB may take 24 hours. Monitor progress via the _recovery API.
Can I restore only the mapping or settings without data?
No. Elasticsearch snapshots are atomicthey restore indices as a whole. To restore only settings or mappings, export them manually using the GET /{index}/_mapping and GET /{index}/_settings APIs, then recreate the index with those settings and reindex data.
Is it safe to restore while the cluster is under heavy load?
Its not recommended. Restores consume significant I/O and network resources. Schedule restores during maintenance windows or low-traffic periods to avoid impacting query performance.
What if I need to restore a snapshot that contains deleted indices?
Use the ignore_unavailable: true parameter. This allows the restore to proceed even if some indices in the snapshot no longer exist in the target cluster. The existing indices will be restored, and missing ones will be skipped.
Conclusion
Restoring an Elasticsearch snapshot is a fundamental skill for any engineer managing data at scale. Whether youre recovering from human error, hardware failure, or migrating infrastructure, the ability to restore data quickly and accurately is non-negotiable. This guide has provided a comprehensive, step-by-step walkthroughfrom verifying snapshot integrity to renaming indices and validating resultsalong with best practices to prevent common pitfalls.
Remember: a snapshot is only valuable if it can be restored. Automate your backup strategy, test your restores regularly, document your procedures, and choose the right tools for your environment. With the right approach, Elasticsearch snapshots become not just a safety net, but a cornerstone of your data resilience strategy.
As data volumes grow and system complexity increases, the importance of reliable, repeatable restore processes will only rise. Start today by auditing your current snapshot strategy. Are your snapshots being created? Are they being tested? If not, take the first step nowbecause when disaster strikes, you wont have time to learn how to restore.