How to Scale Elasticsearch Nodes

How to Scale Elasticsearch Nodes Elasticsearch is a distributed, scalable, and highly available search and analytics engine built on Apache Lucene. As data volumes grow and query loads intensify, the performance and reliability of your Elasticsearch cluster depend heavily on how well you scale its nodes. Scaling Elasticsearch isn’t just about adding more servers—it’s about strategically expanding

Nov 6, 2025 - 10:40
Nov 6, 2025 - 10:40
 1

How to Scale Elasticsearch Nodes

Elasticsearch is a distributed, scalable, and highly available search and analytics engine built on Apache Lucene. As data volumes grow and query loads intensify, the performance and reliability of your Elasticsearch cluster depend heavily on how well you scale its nodes. Scaling Elasticsearch isnt just about adding more serversits about strategically expanding your architecture to maintain low latency, high throughput, and fault tolerance under increasing demand. Whether youre managing a small deployment or a large enterprise system, understanding how to scale Elasticsearch nodes effectively ensures your search infrastructure remains responsive, resilient, and cost-efficient.

Many organizations encounter bottlenecks when their Elasticsearch clusters reach capacityslow search responses, frequent shard allocations, node failures, or out-of-memory errors. These issues often stem from improper scaling decisions: adding too few nodes, misconfiguring shard counts, or ignoring hardware-to-workload alignment. This guide provides a comprehensive, step-by-step roadmap to scaling Elasticsearch nodes, grounded in real-world best practices, architectural principles, and operational insights.

Step-by-Step Guide

1. Assess Your Current Cluster Health

Before scaling, you must understand your clusters current state. Use Elasticsearchs built-in monitoring tools to gather metrics and identify bottlenecks. Start by querying the cluster health endpoint:

GET _cluster/health

Pay attention to the status (green, yellow, red), number of nodes, active shards, and unassigned shards. A yellow status indicates replica shards are not allocatedoften a sign of insufficient nodes. A red status means primary shards are missing, which can lead to data unavailability.

Next, inspect node statistics:

GET _nodes/stats

Look for high CPU usage (>80% sustained), memory pressure (heap usage >75%), disk I/O latency, and thread pool rejections (especially search and index pools). High GC times (>10% of total time) indicate heap pressure. Use Kibanas Stack Monitoring or Elasticsearchs CCR (Cross-Cluster Replication) dashboard for visual insights.

Also, review shard allocation:

GET _cat/shards?v&h=index,shard,prirep,state,docs,store,node

If you see many shards on a single node or uneven distribution, your cluster is not optimized for scaling. A general rule: aim for 2050 shards per node, depending on hardware and query complexity.

2. Define Your Scaling Goals

Scaling should be goal-driven. Ask yourself:

  • Are you scaling for performance (faster queries, lower latency)?
  • Are you scaling for capacity (more data, higher ingestion rates)?
  • Are you scaling for availability (fault tolerance, zero downtime)?

Each goal requires a different approach. Performance scaling often means adding data nodes with faster CPUs and SSDs. Capacity scaling requires more disk space and careful shard planning. Availability scaling demands redundancymultiple replicas and distributed node roles.

Establish measurable KPIs: target query latency (10k docs/sec), and uptime (>99.95%). Use these to validate your scaling success.

3. Choose the Right Node Roles

Elasticsearch 7.0+ introduced dedicated node roles. Assigning specific roles improves scalability and stability. Use the following roles:

  • Master-eligible nodes: Only 35 nodes. Handle cluster state management. Do not store data or handle queries.
  • Data nodes: Store data and handle search/index requests. These are your primary scaling target.
  • Ingest nodes: Preprocess data (painless scripts, enrichments). Offload from data nodes.
  • Coordinating nodes: Handle client requests and distribute them. Optional if data nodes handle routing.

Deploy separate node types to prevent resource contention. For example, if ingest nodes are overloaded with transformations, they can slow down data ingestion on data nodes. Isolate them.

Configure roles in elasticsearch.yml:

node.roles: [ master, data, ingest ]

For production, separate roles:

Master node

node.roles: [ master ]

Data node

node.roles: [ data ]

Ingest node

node.roles: [ ingest ]

4. Optimize Shard Allocation

Shards are the fundamental unit of scalability in Elasticsearch. Each index is split into primary shards and replicated into replica shards. Too many shards cause overhead; too few limit parallelism.

Best practices:

  • Keep shard size between 1050 GB. Larger shards increase recovery time and reduce parallelism.
  • Avoid shards smaller than 1 GBthey create excessive metadata overhead.
  • Use index lifecycle management (ILM) to rollover indices based on size or age.

Calculate optimal shard count:

Suppose you expect 5 TB of data and want 30 GB shards: 5000 GB / 30 GB ? 167 primary shards. With 2 replicas, total shards = 167 3 = 501.

If you have 10 data nodes, each handles ~50 shardswithin the recommended range.

Use index.number_of_shards and index.number_of_replicas during index creation:

PUT /my-index

{

"settings": {

"number_of_shards": 167,

"number_of_replicas": 2

}

}

For time-series data (logs, metrics), use index rollover with ILM:

PUT _ilm/policy/my-policy

{

"policy": {

"phases": {

"hot": {

"actions": {

"rollover": {

"max_size": "50GB",

"max_age": "30d"

}

}

}

}

}

}

5. Add Data Nodes Strategically

Once youve optimized shard allocation, add data nodes to increase capacity and performance.

Steps:

  1. Provision new nodes with identical or better specs than existing ones (CPU, RAM, disk type).
  2. Install the same Elasticsearch version and configuration.
  3. Set node.roles: [ data ] and ensure cluster.name matches.
  4. Start the node. Elasticsearch automatically rebalances shards across the cluster.

Monitor the rebalance process:

GET _cat/recovery?v

Rebalancing can take hours for large clusters. Avoid adding multiple nodes simultaneously unless you have high network bandwidth.

Use shard allocation filtering to control where new shards go:

PUT /my-index/_settings

{

"index.routing.allocation.require.box_type": "hot"

}

Then tag nodes:

node.attr.box_type: hot

6. Scale Memory and Heap Correctly

Elasticsearch relies heavily on the JVM heap. The heap should be no more than 50% of available RAM, capped at 32 GB (due to compressed object pointers).

Set heap size in jvm.options:

-Xms31g

-Xmx31g

Never exceed 32 GB. For nodes with 128 GB RAM, allocate 31 GB heap and leave the rest for OS file system cachecritical for Lucene performance.

Monitor heap usage with:

GET _nodes/stats/jvm

If heap usage consistently exceeds 75%, either add more nodes or reduce shard count. Increasing heap beyond 32 GB leads to longer GC pauses and degraded performance.

7. Optimize Disk I/O and Storage

Storage performance directly impacts indexing and search speed. Use SSDspreferably NVMefor all data nodes. Avoid spinning disks for production clusters.

Ensure adequate disk space. Elasticsearch reserves 15% of disk space for segment merges and recovery. Configure the threshold:

cluster.routing.allocation.disk.watermark.low: 85%

cluster.routing.allocation.disk.watermark.high: 90%

cluster.routing.allocation.disk.watermark.flood_stage: 95%

Use dedicated disks for data and logs. Avoid sharing disks with other services (e.g., databases, applications).

For high ingestion workloads, consider using RAID 0 (striping) for performance, but only if you have redundancy at the cluster level (via replicas).

8. Tune Thread Pools and Queues

Elasticsearch uses thread pools for indexing, search, and bulk operations. When queues fill up, requests are rejected, causing client timeouts.

Check thread pool stats:

GET _nodes/stats/thread_pool

Look for rejected counts in index, search, and bulk pools.

Adjust settings in elasticsearch.yml if needed:

thread_pool.index.size: 32

thread_pool.index.queue_size: 1000

thread_pool.search.size: 48

thread_pool.search.queue_size: 1000

Be cautious: increasing queue size delays rejection but doesnt solve root causes. Focus on scaling nodes and optimizing queries instead.

9. Enable Cross-Cluster Replication (CCR) for Geographical Scaling

If your users are distributed globally, use CCR to replicate indices across clusters in different regions. This reduces latency by serving queries from the nearest cluster.

Setup steps:

  1. Enable CCR on both clusters (source and follower).
  2. Configure network connectivity (SSL, firewall rules).
  3. Create a follower index that replicates from the source.
PUT /follower-index/_ccr/follow

{

"remote_cluster": "source-cluster",

"leader_index": "leader-index"

}

CCR is ideal for read-heavy, write-once workloads (e.g., audit logs, user activity tracking).

10. Automate Scaling with Kubernetes or Cloud Services

For dynamic environments, automate node scaling using orchestration tools:

  • Elastic Cloud: Fully managed Elasticsearch on AWS, GCP, or Azure. Auto-scaling based on CPU, memory, or disk usage.
  • Kubernetes with Elastic Cloud Operator: Deploy Elasticsearch as a StatefulSet. Use Horizontal Pod Autoscaler (HPA) to scale data nodes based on custom metrics (e.g., heap usage, shard count).
  • Custom scripts: Use Elasticsearchs REST API to monitor metrics and trigger node addition via cloud APIs (AWS EC2, GCP Compute Engine).

Example: Auto-scale when heap usage >80% for 5 minutes:

!/bin/bash

HEAP_USAGE=$(curl -s http://localhost:9200/_nodes/stats/jvm | jq -r '.nodes[] | .jvm.mem.heap_used_percent' | awk '{sum += $1} END {print sum/NR}')

if (( $(echo "$HEAP_USAGE > 80" | bc -l) )); then

aws ec2 run-instances --image-id ami-123456 --instance-type r5.xlarge --count 1

fi

Combine with a load balancer to route traffic to new nodes once they join the cluster.

Best Practices

1. Avoid Over-Sharding

Shards are not free. Each shard consumes memory for metadata, segment information, and open file handles. A cluster with 10,000 shards may have 50+ GB of heap consumed by shard metadata aloneleaving little for caching and queries.

Rule of thumb: Keep total shards under 1,000 per node. For a 20-node cluster, dont exceed 20,000 total shards.

2. Use Index Lifecycle Management (ILM)

ILM automates index rollover, cold storage migration, and deletion. This prevents uncontrolled growth and ensures optimal shard sizing.

Example ILM workflow:

  • Hot: Active writes, high-performance SSDs, 3 replicas.
  • Warm: Read-only, fewer replicas (1), slower disks.
  • Cold: Archived, no replicas, long-term storage (S3, HDFS).
  • Frozen: Read-only, offloaded to Elasticsearchs frozen tier (low memory).
  • Delete: Remove after retention period.

3. Monitor and Alert Proactively

Set up alerts for:

  • Cluster status changes (yellow/red)
  • Heap usage >75%
  • Thread pool rejections
  • Disk space
  • Slow queries (>1s)

Use Prometheus + Grafana with the Elasticsearch exporter, or Elastics built-in alerting in Kibana.

4. Plan for Node Failures

Always have at least 2 replicas for critical data. With 3 master-eligible nodes, you can tolerate 1 node failure without losing quorum.

Never run a cluster with only 1 master node. Use an odd number (3, 5, 7) for master-eligible nodes to avoid split-brain scenarios.

5. Avoid Node Hotspots

Uneven shard distribution creates hotspots. Use shard allocation awareness to spread shards across availability zones or racks:

cluster.routing.allocation.awareness.attributes: az

Tag nodes:

node.attr.az: us-east-1a

Elasticsearch will then balance shards across AZs, improving fault tolerance.

6. Optimize Queries and Mappings

Scaling nodes wont fix bad queries. Avoid:

  • Wildcard searches (*term*)
  • Deep pagination (from: 10000)
  • Unnecessary fields in _source
  • Script fields in high-frequency queries

Use keyword fields for aggregations, not text. Enable doc_values on all aggregatable fields.

7. Use Filter Context for Better Caching

Use filter context instead of query context for boolean conditions that dont require scoring. Filters are cached in the filter cache, improving performance on repeated queries.

{

"query": {

"bool": {

"filter": [

{ "term": { "status": "active" } },

{ "range": { "date": { "gte": "2024-01-01" } } }

]

}

}

}

8. Regularly Force Merge Read-Only Indices

After rollover, force merge read-only indices to reduce segment count:

POST /my-index-000001/_forcemerge?max_num_segments=1

This reduces disk usage and improves search speed. Schedule this during off-peak hours.

Tools and Resources

1. Elasticsearch Built-in Tools

  • _cat APIs: _cat/nodes, _cat/shards, _cat/indices for quick diagnostics.
  • Cluster Allocation Explain API: GET _cluster/allocation/explain reveals why shards are unassigned.
  • Index Stats: GET /_stats for indexing/search performance metrics.
  • Snapshot and Restore: Use S3, HDFS, or Azure Blob to backup and restore data during scaling.

2. Monitoring Tools

  • Kibana Stack Monitoring: Real-time metrics, alerts, and visualizations.
  • Prometheus + Elasticsearch Exporter: Open-source monitoring with custom dashboards.
  • Datadog / New Relic: Commercial APM tools with Elasticsearch integrations.
  • Elasticsearch Observability: Full-stack observability with logs, metrics, and traces.

3. Automation and Orchestration

  • Elastic Cloud: Managed service with auto-scaling, backups, and updates.
  • Elastic Cloud Operator (ECK): Kubernetes operator for deploying and managing Elasticsearch clusters.
  • Terraform: Provision cloud infrastructure (nodes, disks, networks) declaratively.
  • Ansible / Puppet: Configure node settings at scale across environments.

4. Learning Resources

Real Examples

Example 1: E-commerce Platform Scaling from 5 to 50 Nodes

A global e-commerce company experienced slow product search during peak sales. Their cluster had 5 data nodes, each with 64 GB RAM and 2 TB SSD. They had 120 indices with 15 shards each (1,800 total shards), averaging 100 GB per shard.

Problems:

  • Shard size too large ? slow recovery
  • Heap usage at 90%
  • Search latency >1.5s

Solution:

  1. Reduced shard count to 50 per index using ILM rollover at 30 GB.
  2. Added 15 new data nodes with 128 GB RAM and NVMe drives.
  3. Separated master and ingest nodes.
  4. Enabled filter caching and optimized mappings.

Results:

  • Shard size: 25 GB
  • Heap usage: 60%
  • Search latency: 120ms
  • Throughput increased 300%

Example 2: Log Aggregation System with 100+ Nodes

A cloud provider ingested 5 TB/day of logs. Their cluster had 80 data nodes, but queries were slow due to uneven shard distribution and lack of ILM.

Problems:

  • Shards unevenly distributed: some nodes had 200+, others had 50.
  • Old logs not deleted ? disk full.
  • No replication ? data loss during node failure.

Solution:

  1. Implemented ILM with 7-day hot, 30-day warm, 1-year cold lifecycle.
  2. Used shard allocation awareness across 3 availability zones.
  3. Set replica count to 2 for hot indices.
  4. Automated deletion of indices older than 2 years.

Results:

  • Storage costs reduced by 40%
  • Query performance improved 50%
  • Zero data loss during 3 node failures

Example 3: Financial Services Real-Time Analytics

A bank needed real-time fraud detection using Elasticsearch. They had 10 nodes, but bulk indexing was slow due to network saturation and lack of dedicated ingest nodes.

Problems:

  • Indexing rate: 1,200 docs/sec
  • Network bandwidth saturated
  • High CPU on data nodes from transformations

Solution:

  1. Added 5 dedicated ingest nodes with 32 GB RAM and 16 cores.
  2. Used Kafka as a buffer between producers and Elasticsearch.
  3. Offloaded enrichment to ingest pipelines.
  4. Increased bulk thread pool size to 16.

Results:

  • Indexing rate: 8,500 docs/sec
  • Latency reduced from 500ms to 80ms
  • System handled 10x peak load during market events

FAQs

How many nodes do I need to scale Elasticsearch?

Theres no fixed number. Start with 3 master-eligible nodes and 35 data nodes for small deployments. Scale data nodes as your data grows or query load increases. A typical enterprise cluster may have 20100+ data nodes. Use shard count and heap usage as your guides, not arbitrary node counts.

Can I scale Elasticsearch without downtime?

Yes. Add new nodes while the cluster is running. Elasticsearch automatically rebalances shards. Avoid rolling restarts during peak hours. Use node shutdown with allocation deciders to drain traffic before decommissioning old nodes.

Whats the maximum number of shards per node?

Keep it under 50100 shards per node for optimal performance. Above 200, you risk metadata overhead, slow recovery, and increased GC pressure. Monitor with _cat/shards and _cat/nodes.

Should I use SSDs or HDDs for Elasticsearch nodes?

Always use SSDspreferably NVMefor data nodes. HDDs are too slow for random I/O required by Lucene segments. Only use HDDs for cold storage or backups.

How do I know if I need more memory or more nodes?

If heap usage is consistently >75% and GC pauses are long (>5s), you need more nodesnot more heap. If disk I/O is saturated or shard count is too low, add nodes to distribute load. Memory scaling helps caching; node scaling helps parallelism.

Can I scale Elasticsearch horizontally and vertically at the same time?

Yes. Horizontal scaling (adding nodes) is preferred for elasticity and fault tolerance. Vertical scaling (upgrading node specs) can be used for existing nodes, but requires restarts. Combine both: upgrade existing nodes while adding new ones for minimal disruption.

What happens if I add too many shards?

Too many shards increase cluster state size, slow down cluster operations (recovery, routing), and consume excessive heap memory. It can cause master node instability and long restart times. Always aim for shard sizes between 1050 GB.

How often should I rebalance shards manually?

Never manually rebalance unless absolutely necessary. Elasticsearchs automatic shard allocation is highly optimized. Use cluster.reroute only to fix stuck shards or enforce allocation rules.

Conclusion

Scaling Elasticsearch nodes is a strategic, multi-faceted process that goes beyond simply adding hardware. It requires a deep understanding of your workload, careful planning of shard allocation, intelligent node role separation, and proactive monitoring. The goal is not just to handle more dataits to maintain sub-second search performance, ensure high availability, and reduce operational complexity as your system evolves.

By following the step-by-step guide in this tutorialassessing your current state, defining clear goals, optimizing shards, adding nodes with the right specs, and automating where possibleyou can build a scalable, resilient Elasticsearch infrastructure that grows with your business.

Remember: scaling is not a one-time event. Its an ongoing discipline. Regularly review your metrics, refine your ILM policies, and stay aligned with Elasticsearchs evolving best practices. The most successful deployments are those that anticipate growth rather than react to failure.

With the right approach, your Elasticsearch cluster wont just survive scalingit will thrive.