How to Search Data in Elasticsearch

How to Search Data in Elasticsearch Elasticsearch is a powerful, distributed search and analytics engine built on Apache Lucene. It enables near real-time searching across vast datasets with high scalability and performance. Whether you're indexing logs, e-commerce product catalogs, user behavior data, or sensor readings, Elasticsearch provides flexible, full-text search capabilities that go far b

Nov 6, 2025 - 10:43
Nov 6, 2025 - 10:43
 1

How to Search Data in Elasticsearch

Elasticsearch is a powerful, distributed search and analytics engine built on Apache Lucene. It enables near real-time searching across vast datasets with high scalability and performance. Whether you're indexing logs, e-commerce product catalogs, user behavior data, or sensor readings, Elasticsearch provides flexible, full-text search capabilities that go far beyond traditional SQL-based queries. Mastering how to search data in Elasticsearch is essential for developers, data engineers, and analysts working with large-scale, unstructured, or semi-structured data. This tutorial provides a comprehensive, step-by-step guide to searching data in Elasticsearchfrom basic queries to advanced filtering, aggregations, and performance optimizationensuring you can extract meaningful insights efficiently and accurately.

Step-by-Step Guide

1. Understanding Elasticsearch Indexes and Documents

Before you can search data, you must understand the foundational structure of Elasticsearch: indexes and documents. An index is akin to a database table in relational systems, but it stores a collection of documents. A document is a JSON object representing a single record, such as a product, user, or log entry. Each document has a unique ID and is stored in an index with a defined mapping that specifies the data types of its fields.

For example, an index named products might contain documents like:

{

"id": "1",

"name": "Wireless Headphones",

"category": "Electronics",

"price": 129.99,

"in_stock": true,

"description": "Noise-canceling wireless headphones with 30-hour battery life"

}

To search effectively, ensure your data is properly indexed with accurate mappings. Use the PUT endpoint to create an index with a custom mapping:

PUT /products

{

"mappings": {

"properties": {

"name": { "type": "text" },

"category": { "type": "keyword" },

"price": { "type": "float" },

"in_stock": { "type": "boolean" },

"description": { "type": "text" }

}

}

}

Use GET /products/_mapping to verify your index structure. Incorrect mappingssuch as treating a numeric field as textcan severely impact search accuracy and performance.

2. Indexing Sample Data

Once your index is created, populate it with data using the _bulk API for efficiency or individual POST requests for simplicity. Heres how to index multiple products:

POST /products/_bulk

{ "index": { "_id": "1" } }

{ "name": "Wireless Headphones", "category": "Electronics", "price": 129.99, "in_stock": true, "description": "Noise-canceling wireless headphones with 30-hour battery life" }

{ "index": { "_id": "2" } }

{ "name": "Smart Watch", "category": "Electronics", "price": 199.99, "in_stock": false, "description": "Fitness tracker with heart rate monitor and GPS" }

{ "index": { "_id": "3" } }

{ "name": "Coffee Maker", "category": "Home & Kitchen", "price": 89.99, "in_stock": true, "description": "Programmable drip coffee maker with thermal carafe" }

{ "index": { "_id": "4" } }

{ "name": "Bluetooth Speaker", "category": "Electronics", "price": 79.99, "in_stock": true, "description": "Waterproof portable speaker with 20-hour playtime" }

After indexing, confirm the data is present with:

GET /products/_search

{ "query": { "match_all": {} } }

This returns all documents in the index and confirms successful ingestion.

3. Performing a Basic Match Query

The most common search operation in Elasticsearch is the match query, which performs full-text search on analyzed text fields. It breaks down the search term into tokens and matches against the inverted index.

To find all products containing the word wireless in any text field:

GET /products/_search

{

"query": {

"match": {

"name": "wireless"

}

}

}

This returns documents where wireless appears in the name field. Elasticsearch uses the standard analyzer by default, which converts text to lowercase and removes punctuation.

You can also search across multiple fields using multi_match:

GET /products/_search

{

"query": {

"multi_match": {

"query": "noise canceling",

"fields": ["name", "description"]

}

}

}

This finds documents where noise or canceling appear in either the name or description, improving recall for user-facing search interfaces.

4. Using Term Queries for Exact Matches

For non-analyzed fields like category or in_stock, use the term query to match exact values. Unlike match, term does not analyze the inputit looks for the literal term as stored.

To find all products in the Electronics category:

GET /products/_search

{

"query": {

"term": {

"category": "Electronics"

}

}

}

Important: term queries are case-sensitive. If your data contains electronics in lowercase, the query above will return no results. Always ensure your data and queries match in casing, or use keyword fields with consistent normalization.

5. Combining Queries with Bool Queries

Elasticsearchs bool query allows you to combine multiple queries using logical operators: must, should, must_not, and filter.

To find all in-stock electronics products priced under $150:

GET /products/_search

{

"query": {

"bool": {

"must": [

{ "term": { "category": "Electronics" } },

{ "range": { "price": { "lt": 150 } } }

],

"filter": [

{ "term": { "in_stock": true } }

]

}

}

}

Here, must ensures both conditions are required, while filter is used for conditions that dont affect scoring (i.e., they only filter results). Filters are cached and faster than queries that compute relevance scores.

6. Filtering with Range Queries

Range queries are essential for numeric, date, or geographic data. You can specify boundaries using gt (greater than), gte (greater than or equal), lt (less than), and lte (less than or equal).

To find products priced between $80 and $120:

GET /products/_search

{

"query": {

"range": {

"price": {

"gte": 80,

"lte": 120

}

}

}

}

For date fields, use ISO 8601 format:

GET /logs/_search

{

"query": {

"range": {

"timestamp": {

"gte": "2024-01-01T00:00:00Z",

"lt": "2024-02-01T00:00:00Z"

}

}

}

}

7. Sorting Results

By default, Elasticsearch sorts results by relevance score (_score). You can override this with explicit sorting on any field.

To sort products by price ascending:

GET /products/_search

{

"query": {

"match_all": {}

},

"sort": [

{

"price": {

"order": "asc"

}

}

]

}

To sort by multiple fieldse.g., price ascending, then name descending:

GET /products/_search

{

"query": {

"match_all": {}

},

"sort": [

{

"price": {

"order": "asc"

}

},

{

"name.keyword": {

"order": "desc"

}

}

]

}

Notice the use of name.keywordthis accesses the raw, unanalyzed version of the field for accurate alphabetical sorting. Always use the .keyword subfield for sorting non-text fields.

8. Pagination with From and Size

Elasticsearch limits the number of returned results per request. Use the from and size parameters to paginate results.

To retrieve the second page of 5 products:

GET /products/_search

{

"query": {

"match_all": {}

},

"from": 5,

"size": 5

}

This skips the first 5 results and returns the next 5. For deep pagination (beyond 10,000 results), use search_after or scroll APIs to avoid performance degradation from high from values.

9. Highlighting Search Terms

When building user interfaces, highlight matching terms to improve UX. Elasticsearchs highlight feature wraps matched text in HTML tags.

GET /products/_search

{

"query": {

"match": {

"description": "wireless headphones"

}

},

"highlight": {

"fields": {

"name": {},

"description": {}

}

}

}

Response includes a highlight section:

"highlight": {

"name": ["Wireless Headphones"],

"description": ["Noise-canceling wireless headphones with 30-hour battery life"]

}

You can customize the highlight tags using pre_tags and post_tags parameters.

10. Using Aggregations for Data Analysis

Aggregations allow you to perform analytics on your datasimilar to SQL GROUP BY. Common use cases include counting categories, computing averages, or creating histograms.

To count products by category:

GET /products/_search

{

"size": 0,

"aggs": {

"categories": {

"terms": {

"field": "category.keyword"

}

}

}

}

The size: 0 suppresses document results, returning only the aggregation. The output:

"aggregations": {

"categories": {

"buckets": [

{

"key": "Electronics",

"doc_count": 3

},

{

"key": "Home & Kitchen",

"doc_count": 1

}

]

}

}

To compute average price per category:

GET /products/_search

{

"size": 0,

"aggs": {

"categories": {

"terms": {

"field": "category.keyword"

},

"aggs": {

"avg_price": {

"avg": {

"field": "price"

}

}

}

}

}

}

Aggregations are invaluable for dashboards, reporting, and business intelligence applications.

Best Practices

1. Choose the Right Field Types

Use text for full-text search (analyzed) and keyword for exact matches, sorting, and aggregations. Never use text for fields you intend to sort or aggregatethis leads to poor performance and inaccurate results.

2. Use Filters Over Queries When Possible

Filters are cached and do not compute relevance scores. Use them for conditions that dont affect rankinge.g., status flags, date ranges, or category filters. Queries are for when you need scoring (e.g., full-text relevance).

3. Optimize Index Mappings

Define mappings explicitly rather than relying on dynamic mapping. Disable dynamic field creation with "dynamic": "strict" to prevent accidental schema drift:

PUT /products

{

"mappings": {

"dynamic": "strict",

"properties": { ... }

}

}

4. Avoid Deep Pagination

Using from beyond 10,000 can exhaust heap memory. For large datasets, use search_after with a sort value from the last result:

GET /products/_search

{

"size": 10,

"sort": [

{ "price": "asc" },

{ "_id": "asc" }

],

"search_after": [129.99, "1"]

}

This method scales efficiently and avoids memory overhead.

5. Use Index Aliases for Zero-Downtime Operations

When reindexing or updating schemas, use aliases to point to the current index. This allows seamless transitions without changing application code:

POST /_aliases

{

"actions": [

{ "add": { "index": "products_v2", "alias": "products" } }

]

}

6. Monitor Query Performance with Profile API

To diagnose slow queries, use the profile parameter:

GET /products/_search

{

"profile": true,

"query": {

"match": { "name": "wireless" }

}

}

The response includes timing and execution details for each query component, helping you identify bottlenecks.

7. Enable Caching for Frequent Queries

Elasticsearch automatically caches filter results. To ensure optimal caching, avoid using dynamic values (e.g., timestamps) in filters. Instead, precompute ranges or use date math.

8. Use Index Templates for Consistency

Define index templates to automatically apply mappings, settings, and aliases to new indices. This ensures uniformity across time-series or log data:

PUT _index_template/products_template

{

"index_patterns": ["products-*"],

"template": {

"settings": { "number_of_shards": 3 },

"mappings": {

"properties": {

"name": { "type": "text" },

"category": { "type": "keyword" }

}

}

}

}

9. Avoid Wildcard Queries in Production

Queries like *term* or te*m are slow and do not use the inverted index efficiently. Use n-gram or edge-ngram analyzers for prefix/suffix matching instead.

10. Regularly Optimize Indexes with Force Merge

After bulk indexing or deletions, use _forcemerge to reduce segment count and improve search performance:

POST /products/_forcemerge?max_num_segments=1

Run this during off-peak hours, as its I/O intensive.

Tools and Resources

1. Kibana

Kibana is the official visualization and data exploration tool for Elasticsearch. Use the Dev Tools console to write and test queries, visualize aggregation results, and monitor cluster health. Kibanas Discover tab allows interactive exploration of indexed data with filters, sorting, and field selection.

2. Elasticsearch REST API

Direct interaction with Elasticsearch is done via HTTP REST endpoints. Tools like cURL, Postman, or Insomnia are ideal for testing queries outside of applications. Always use HTTPS in production and authenticate with API keys or X-Pack security.

3. Elasticsearch Client Libraries

For integration into applications, use official client libraries:

  • Python: elasticsearch-py
  • Java: Java High Level REST Client (deprecated) or Elasticsearch Java API Client
  • Node.js: @elastic/elasticsearch
  • .NET: Elastic.Clients.Elasticsearch

These libraries handle serialization, connection pooling, and retries automatically.

4. Elasticsearch Query DSL Reference

The official Query DSL documentation is your primary reference for all query types, parameters, and examples. Bookmark it for daily use.

5. Elasticsearch Monitoring Tools

Use the GET /_cluster/health and GET /_nodes/stats endpoints to monitor cluster status, memory usage, and query latency. Integrate with Prometheus and Grafana for long-term observability.

6. Elasticsearch Playground

The Elasticsearch Getting Started Guide includes a free sandbox environment where you can experiment with sample datasets and queries without installation.

7. OpenSearch

For open-source alternatives, consider OpenSearcha fork of Elasticsearch 7.10.2 with community-driven enhancements. Its API-compatible and supports similar search features.

8. Online Courses and Books

  • Elasticsearch in Action by Radu Gheorghe, Matthew Lee Hinman, and Roy Russo
  • Udemy: Elasticsearch 7 and the Elastic Stack
  • Pluralsight: Elasticsearch Fundamentals

Real Examples

Example 1: E-Commerce Product Search

Scenario: An online store wants to let users search for products by name, filter by category and price range, and sort by popularity.

Index mapping:

PUT /products

{

"mappings": {

"properties": {

"name": { "type": "text", "analyzer": "standard" },

"category": { "type": "keyword" },

"price": { "type": "float" },

"in_stock": { "type": "boolean" },

"popularity_score": { "type": "integer" },

"tags": { "type": "keyword" }

}

}

}

Sample query:

GET /products/_search

{

"query": {

"bool": {

"must": [

{

"multi_match": {

"query": "wireless headphones",

"fields": ["name^3", "tags"],

"type": "best_fields"

}

}

],

"filter": [

{ "term": { "in_stock": true } },

{ "range": { "price": { "lte": 200 } } }

]

}

},

"sort": [

{ "popularity_score": { "order": "desc" } },

{ "price": { "order": "asc" } }

],

"highlight": {

"fields": { "name": {} }

},

"aggs": {

"categories": {

"terms": {

"field": "category.keyword",

"size": 10

}

}

}

}

Results show top-selling in-stock wireless headphones under $200, with highlighted matches and category distribution for UI filters.

Example 2: Log Analysis for Error Trends

Scenario: A DevOps team needs to find all ERROR logs from the last 24 hours and count occurrences by service.

Index: logs-2024-06-15 with timestamp (date) and level (keyword) fields.

Query:

GET /logs-*/_search

{

"size": 0,

"query": {

"bool": {

"must": [

{ "term": { "level": "ERROR" } },

{ "range": {

"timestamp": {

"gte": "now-24h/h",

"lt": "now/h"

}

}

}

]

}

},

"aggs": {

"services": {

"terms": {

"field": "service.keyword",

"size": 20

}

}

}

}

Response reveals top 20 services generating errors, enabling rapid incident triage.

Example 3: Geospatial Search for Nearby Stores

Scenario: A retail app needs to find stores within 10 km of a users location.

Mapping includes a geo-point field:

"location": {

"type": "geo_point"

}

Query:

GET /stores/_search

{

"query": {

"geo_distance": {

"distance": "10km",

"location": {

"lat": 40.7128,

"lon": -74.0060

}

}

},

"sort": [

{

"_geo_distance": {

"location": {

"lat": 40.7128,

"lon": -74.0060

},

"order": "asc",

"unit": "km"

}

}

]

}

This returns stores sorted by proximity, ideal for location-based services.

FAQs

What is the difference between match and term queries in Elasticsearch?

The match query analyzes the input text and searches across analyzed fields using tokenized terms. Its ideal for full-text search. The term query searches for exact, unanalyzed values and is used for structured fields like keywords, numbers, or booleans.

Why is my search not returning expected results?

Common causes include mismatched field types (e.g., using text for sorting), case sensitivity in term queries, or incorrect analyzer settings. Use the _analyze API to see how your text is tokenized:

GET /products/_analyze

{

"field": "name",

"text": "Wireless Headphones"

}

How do I search across multiple indexes?

Simply specify multiple index names in the request URL: GET /products,logs/_search or use wildcards: GET /logs-*/_search. Elasticsearch will search all matching indices.

Can I search for partial words in Elasticsearch?

Yes, but not efficiently with wildcards. Use n-gram or edge-ngram analyzers to index substrings. For example, index wireless as w, wi, wir, wire, etc., enabling prefix matching without performance penalties.

How do I handle synonyms in Elasticsearch searches?

Use the synonym_graph token filter in your custom analyzer. Define synonyms in a file or inline, and apply the analyzer to your text fields during indexing and querying.

Is Elasticsearch case-sensitive?

By default, notext fields are analyzed and lowercased. However, keyword fields are case-sensitive. Always use .keyword for case-sensitive exact matches.

How can I improve search performance?

Use filters instead of queries, limit result size, avoid deep pagination, pre-warm caches, use appropriate field types, and optimize index segments. Monitor slow queries with the Profile API.

What happens if I delete documents in Elasticsearch?

Deleted documents are marked for removal but remain on disk until a _forcemerge or segment cleanup occurs. Search results exclude them immediately, but storage is reclaimed only during optimization.

Can Elasticsearch handle real-time search?

Yes. Elasticsearch refreshes indexes every second by default, making new data searchable within one second. For sub-second latency, use refresh=wait_for in indexing requests.

How do I secure Elasticsearch searches?

Enable X-Pack security (or OpenSearch Security) to enforce authentication, role-based access control, and field-level security. Use API keys or TLS certificates for client communication.

Conclusion

Searching data in Elasticsearch is a powerful skill that unlocks the full potential of modern data applications. From basic full-text queries to complex aggregations and geospatial searches, Elasticsearch provides a rich, flexible query DSL that adapts to nearly any use case. By following the step-by-step guide in this tutorial, youve learned how to structure queries, optimize performance, and apply best practices that ensure accurate, scalable, and efficient search results.

Remember: the key to mastering Elasticsearch search lies in understanding your data structure, choosing the right query types, and leveraging filters and aggregations wisely. Combine this knowledge with the tools and real-world examples provided, and youll be equipped to build high-performance search systems that deliver instant, relevant resultseven across petabytes of data.

Continue experimenting with the Dev Tools in Kibana, explore the official documentation, and challenge yourself with increasingly complex queries. The deeper your understanding, the more effectively youll harness Elasticsearchs power to turn raw data into actionable insights.