Elasticsearch guides

Select the category & articles you are interested in

Elasticsearch
OpenSearch
glossary-category-img Basics
glossary-category-img Best practices
glossary-category-img Capacity planning
glossary-category-img Data architecture
glossary-category-img High availability
glossary-category-img How to's
glossary-category-img Operations
glossary-category-img Search APIs
glossary-category-img Security
_source

Elasticsearch keeps the original JSON document in a field called _source. The source field serves special purposes such as...

Aggregation

The aggregations framework is a tool built in every Elasticsearch deployment. The different aggregation types: Bucket, Metric & Pipeline...

Alias

In Elasticsearch, an alias is a secondary name to refer to one or more indices. Aliases can be created and deleted dynamically using...

Bulk

Elasticsearch bulk makes it possible to perform many write operations in a single API call, which increases indexing speed. Using bulk API...

Cache: Node Request, Shard Data & Field Data Cache

Elasticsearch uses 3 types of caches to improve the efficiency of operation: node requests, shards and field data cache. It is possible to...

Circuit Breakers

Elasticsearch has circuit breakers to deal with OutOfMemory errors that cause nodes to crash. Each breaker is used to...

Client

Official Elasticsearch clients are available for java, javascript, Perl, PHP, python, ruby and .NET. To avoid surprises, keep your client....

Cluster

An Elasticsearch cluster consists of a number of servers (nodes) working together as one to store data and respond to requests. It enables...

DELETE

DELETE is an Elasticsearch API which removes a document from a specific index. It requires an index name and _id document in order to...

Deprecation

To find out which functions have been deprecated in Elasticsearch, you can use deprecation logs, deprecation API, read breaking pages...

Discovery

Discovery occurs when an Elasticsearch node starts, restarts or loses contact with the master node. In those cases the node needs to...

DiskThreshold

Elasticsearch uses several parameters to enable it to manage hard disk storage across the cluster, such as...

Document

Each Elasticsearch document is a JSON structure, which is ultimately considered to be a series of key:value pairs. An example for creating...

Fielddata

In Elasticsearch the term Fielddata is relevant when performing sorting and aggregations on text field. To set fielddata=true, you...

Filter

Elasticsearch Filters apply conditions inside the query to narrow down the matching results. A filter clause can be used used in...

Flush, Translog and Refresh

In Elasticsearch, flush is the process of permanently storing data onto the disk for all of the operations that have been stored in memory.

Index - How to create Elasticsearch Index and what it is

How to create an Elasticsearch Index & what it is with a general overview - an index (plural: indices) contains a schema and can have

Indexing

Indexing is the process of adding or updating new documents to an Elasticsearch index. In its simplest form, you can index a document by...

Lucene

Elasticsearch Lucene or Apache Lucene is an open-source Java library used as a search engine. Elasticsearch is built on top of Lucene...

Mapping

Mapping contains the properties of each field in the index. A common issue in Elasticsearch is an incorrectly defined mapping. Examples of...

Metadata

Elasticsearch metadata refers to additional information stored for each document using metadata fields. Metadata fields can be customized...

Nodes

There are different types of nodes in Elasticsearch. Each has its own role and purpose. Master, coordinating and data nodes differ...

Persistent

In Elasticsearch, Persistent refers to cluster settings that persist across cluster restarts. This setting is used in Cluster Update API...

Plugins

Plugins in Elasticsearch are used to extend the functionality of Elasticsearch. An Elasticsearch plugin is installed and removed using the...

Queue

Queues in Elasticsearch exist in the context of Thread Pools. Queues are used to hold the pending requests for thread pools instead of...

Rebalance

Cluster rebalancing is the process by which an Elasticsearch cluster distributes data across the nodes. To force rebalance manually...

Recovery

In Elasticsearch, recovery refers to the process of recovering an index or shard when something goes wrong. You can recover data by using...

Refresh Interval

Elasticsearch requires a refresh operation to make indexed information available for search. You can set the refresh interval by...

Reindex

Reindex is the concept of copying existing data from a source index to a destination index. In some scenarios, the reindex API is...

Replica

In Elasticsearch there are two types of shards: the primary shard & the replica copy. Each replica is located on a different node to ensure...

Replication

Elasticsearch replication refers to storing a redundant copy of the data. Elasticsearch creates 1 primary shard with a replication factor...

Repository

An Elasticsearch repository needs to be registered using the _snapshot endpoint. The supported repository types are: S3, HDFS, Azure...

Restore

In Elasticsearch, restore refers to a snapshot restore mechanism. To restore a cluster from the snapshot, an index, or selected indices...

Routing

In Elasticsearch, routing refers to document routing. When you index a document, Elasticsearch will determine which shard will be used...

Scroll

The Elasticsearch scroll API is useful when a search returns a large set of results. Large search results are exhaustive for the system...

Search

To search in Elasticsearch, send a GET request to the _search endpoint in the search API. In the query phase and the fetch phase there are...

Settings

Elasticsearch settings can be configured on the cluster-level, node-level and index-level. Here's how to set up and optimize your settings...

Shards

The number of shards is set when an index is created, and cannot be changed without reindexing. To handle unassigned Elasticsearch shards...

Task

A task is equivalent to an Elasticsearch operation, any request performed on an Elasticsearch cluster. The following commands are used...

Template

An Elasticsearch template falls into one of these categories: index templates or search templates. Examples of index templates include...

Threadpool

Elasticsearch threadpools are used to manage how requests are processed and to optimize the use of resources. The write threadpool...

Upgrade

An Elasticsearch upgrade of an existing cluster can be done in 2 ways: through a rolling upgrade or a full cluster restart. To upgrade...

Version

A version corresponds to the Elasticsearch built-in tracking system that tracks the changes in each document. By using _version...

All Script Types are Allowed to Run

Elasticsearch scripts can place heavy loads on clusters if they are not written carefully. It is a best practice to limit the type of..

Bootstrap Checks

Elasticsearch carries out "bootstrap checks" to ensure that important settings have been set correctly. If any of these fail, ES won't start.

Bootstrap Memory_Lock is Set to False

Elasticsearch can be configured to prevent memory swapping on its host machine by adding bootstrap memory_lock true. If bootstrap checks...

Cluster Blocks Read-Only

A read-only delete block can be applied automatically by the cluster because of a disk space issue. It can also be applied manually by...

Dangerous Default Settings

Cluster name and data path are default settings that could be destructive for proper Elasticsearch function if handled incorrectly. If you...

Enable Adaptive Replica Selection

Adaptive replica selection is a process that prevents a distressed Elasticsearch node from delaying the response to queries. To enable it...

Enable Shard Rebalancing and Allocation

Cluster shard rebalancing and allocation are often confused with each other. If cluster shard rebalancing isn't enabled, then...

Expensive Queries are Allowed to Run

By default, Elasticsearch expensive queries are allowed to run. By setting search.allow_expensive_queries to false, you can prevent users...

File Descriptors

File descriptors are required to keep track of all the files Elasticsearch has open at any given time, as well as all network...

Misuse of Wildcards

It's possible to reduce the risk of accidental deletion of indices by preventing the use of wildcard for destructive operations. To check...

Rest-high-level

Rest-high-level is built on top of low-level rest-client and is a method of communicating with Elasticsearch based on HTTP REST endpoints...

Script Regex is Enabled in Painless Scripts

Script regex is disabled in Elasticsearch by default, but you can decide to enable it. Regex must be used with care in painless scripts...

Split Brain

Elasticsearch split brain occurs when there is more than one master in the cluster. By setting the quorum of minimum master nodes...

Zen Discovery Settings

Zen discovery settings for cluster formation were deprecated in Elasticsearch V.7 and should be removed from version 7 and above due to...

Choosing the right amount of memory based on number of shards in Elasticsearch

If the ratio of memory to number of shards in the cluster is low, it suggests that you have insufficient memory compared to the volume...

Disk Watermark

There are various watermark thresholds on an Elasticsearch cluster. As the disk fills up on a node, the 1st threshold to be crossed is...

Elastic Pricing Calculator - How to Use the Elastic Pricing Calculator

The different categories in the Elastic Pricing Calculator can impact your final cost. Here's how to efficiently use the pricing calculator.

Elasticsearch Cluster State

Elasticsearch clusters need to maintain the cluster state in memory on each and every nodes, which requires a large amount of resources...

Elasticsearch Large Cluster State - How to Discover, Resolve and Prevent (Customer Post Mortem)

When cluster state becomes too large it poses many challenges. In order to determine the size of your cluster state and reduce it, you...

Elasticsearch Multi-Tier Architecture - How to Set Up a Hot/Warm/Cold/Frozen Elasticsearch Architecture

In Elasticsearch’s multi-tier architecture, the tiers are named hot, warm, cold & frozen. This Elasticsearch architecture allows better...

Flood Stage Disk Watermark

When the “disk flood stage” threshold is exceeded on an Elasticsearch cluster, it will start to block core actions. To resolve this issue...

Heap Size Usage and JVM Garbage Collection

A high heap size in Elasticsearch will give your node more memory for indexing and search operations. However, your node also requires...

High Disk Watermark

High disk watermark is one of the various thresholds on your Elasticsearch cluster. Passing this threshold is a warning and you should not...

How to Choose the Correct Number of Shards per Index in Elasticsearch

Finding the right number of shards for your Elasticsearch indices, and the right size for each shard depends on many factors, including...

How to Optimize Elasticsearch Disk Space and Usage

If you don’t have enough disk space available, Elasticsearch will stop allocating shards to the node. This will eventually prevent you from...

How to Reduce the Number of Shards in an Elasticsearch Cluster

When you have too many shards in your Elasticsearch cluster, there are a few steps you can take in order to reduce the number of shards...

Loaded Data Nodes

Sometimes you can observe that the CPU and load on some of your data nodes is higher than on others. This can occasionally be caused by...

Low Disk Watermark

Low disk watermark is one of the various thresholds on your Elasticsearch cluster. Here are possible actions you can take to resolve...

Memory Usage Guide

The Elasticsearch process is very memory intensive. Here are the memory requirements and some tips to reduce your Elasticsearch memory usage.

Oversharding

A large number of shards on an Elasticsearch cluster requires extra resources. Learn key ways to avoid and correct oversharding...

Shard Allocation is Unbalanced

Shard allocation is an algorithm by which Elasticsearch decides which unallocated shards should go on which nodes. To resolve unbalanced...

Shards Too Large

It is a best practice that Elasticsearch shard size should not go above 50GB for a single shard. If you go above this limit...

An Overview of Source Filtering, Stored Fields, Fields and Docvalues Fields

There are various methods for retrieving fields in Elasticsearch, including: _source, stored_fields, fields & docvalue_fields. To retrieve...

Elasticsearch Data Stream

The Elasticsearch data stream is an abstraction layer between the names used by applications to facilitate ingestion and search operations...

Elasticsearch Field Size - How to Calculate the Storage Size of Specific Fields in an Index 

The 3 main methods in Elasticsearch to calculate the storage size of specific fields in an index are: using the _disk_usage API, creating...

Elasticsearch Global Ordinals, Eager Global Ordinals & High Cardinality Fields

Terms aggregations rely on an internal data structure known as global ordinals. The eager_global_ordinals parameter is used to...

Elasticsearch match_only_text Field Type (For Storage Optimization)

The new match_only_text feature in Elasticsearch can save up to 10% of disk space on logging datasets. This field type will set a flat...

Elasticsearch Text Analyzers - Tokenizers, Standard Analyzers, Stopwords and More

The text analysis process is tasked with two functions: tokenization and normalization and is carried out by employing analyzers. When you...

Elasticsearch Token Filters

A tokenizer decides how Elasticsearch will take a set of words and divide it into separated terms called “tokens”. To work with synonyms...

How to configure all Elasticsearch node roles (master, data, coordinating..)

Follow these steps to configure all Elasticsearch node role types (master, data, coordinating, ingest, machine learning, remote eligible...

How to Define Efficient Mapping in Elasticsearch

Mappings are the core element of index creation in Elasticsearch. Defining them correctly can improve performance. Mapping types include...

How to leverage ingest pipelines to transform data transparently in Elasticsearch

Ingest pipelines sit within the Elasticsearch node and will perform a set of alterations on your data that you...

How to Reduce the Number of Shards in an Elasticsearch Cluster

When you have too many shards in your Elasticsearch cluster, there are a few steps you can take in order to reduce the number of shards...

Index Lifecycle Management & Policy

Index lifecycle management helps automate the creation, management & removal of an Elasticsearch index. Define the index lifecycle policy...

Index Templating in Elasticsearch - How to Use Composable Templates

Elasticsearch index templates allow us to create indices with user defined configuration. An index can pull the configuration from these...

Object Fields VS. Nested Field Types in Elasticsearch

Nested is a special object type that is indexed as a separate document. To demonstrate the use of Elasticsearch nested VS. object fields...

When You Should Transform Your Data Instead of Using Aggregations

There are at least three use cases where you should consider using transforms instead of aggregations in Elasticsearch. First, when the...

Dedicated Client Nodes

Many clusters use coordinating or ingest nodes, while others leave the ingest and coordination functions to the data nodes. In order to...

Dedicated Master Node

Once an Elasticsearch cluster reaches a certain size, it's recommended to create 3 dedicated master nodes. Here is how you can create...

Elasticsearch Coordinating Node - When to Use Coordinating Only Nodes

A coordinating node is a node that handles HTTP(S) requests for the cluster, especially indexing & search requests. A coordinating only...

Elasticsearch Indexing Downtime (Customer Post Mortem)

When looking at Shard View for the index, it was clear that the index in question wasn’t carrying out the highest indexing rate and wasn’t...

Lack of Quorum

This error occurs when the Elasticsearch cluster doesn't have a quorum of nodes with voting rights to elect a new master node. To resolve...

Node Concurrent Recoveries Setting is Too High / Low

The node concurrent recoveries setting determines the max number of shards that can be recovered at once from each node. It's important to...

Number of Master Nodes

Master nodes are responsible for actions such as creating or deleting indices. If you don't have enough master nodes, it could lead to...

Setting Up Zone Awareness for Shard Allocation in Elasticsearch

Setting up zone awareness for shard allocation ensures high availability in the case of several servers going down. Here's how to...

Autocomplete Guide

There are various approaches for autocomplete in Elasticsearch. Here are some tips & examples for choosing the approach best suited to your...

Delete By Query

Elasticsearch delete by query is an API, which provides functionality to delete all documents based on the matching query. If you don't...

Elasticsearch Pagination - Which Technique to Use Depending on Your Use Case

Elasticsearch currently provides 3 different techniques for fetching many results: Pagination, Search-After and Scroll. To learn how to...

Elasticsearch Rollup: How to Rollup Data in Elasticsearch

Rollup jobs in Elasticsearch reduce old data storage costs by storing summaries of data for a given time period. Rollup examples include...

Elasticsearch Runtime Fields - How to Use Runtime Fields in Elasticsearch

An Elasticsearch runtime field is a field evaluated at query time instead of indexing time, which allows to modify our schema at the...

Elasticsearch Search Suggestion - Term Suggester, Phrase Suggester, Completion Suggester (Autocomplete)

Elasticsearch offers three types of suggesters: term suggesters, phrase suggesters & completion suggesters (autocomplete). Suggesters work...

How to Activate and Use Elasticsearch Slow Logs

By analyzing your slow logs, you can understand why searches are slow and how to optimize them. To enable slow logging in Elasticsearch...

How to Ensure Slow Logs Don’t Get Cut Off (Applicable before ES 8.0)

Analyzing search slow logs can provide users with advanced insights like the number of costly queries, reasons why queries were costly, so...

How to Improve your Elasticsearch Aggregation Performance

There are multiple ways to improve your Elasticsearch aggregation performance. First, you should limit the scope by filtering documents...

How to Increase Elasticsearch Search Speed

By optimizing and maintaining Elasticsearch search speed, you can improve your product’s user experience. Here's how to speed up search...

How to Optimize Search Performance in Elasticsearch

One of the most difficult issues to manage and resolve in Elasticsearch is poor search performance. This blog goes through clear steps to...

Register Snapshot Repository

To create & restore snapshots, you need to register a snapshot repository with every Elasticsearch node in the cluster. Here are the steps...

Search Latency

This guide explores how to reduce Elasticsearch search latency based on a key study. The first lesson is to always...

Snapshot

An Elasticsearch snapshot is a backup of an index taken from a running cluster. It's better to use snapshots instead of disk backups due...

Cluster Concurrent Rebalance High / Low

The cluster concurrent rebalance setting determines the maximum number of shards the cluster can move to rebalance the distribution of...

Elasticsearch - Many Index Get Requests with Missing Documents

When you try to retrieve a document by ID, Elasticsearch will count the number of times that it searches for an ID which doesn't exist...

Elasticsearch Circuit Breaker Exceptions: How to Handle Circuit Breakers

Circuit breaker exceptions are thrown to alert us that something needs to be fixed in Elasticsearch in order to reduce memory usage. To fix...

Elasticsearch High Indexing Throttle Time

When Elasticsearch detects that the merge process cannot keep up with the rate of indexing, then it will start to throttle indexing...

Elasticsearch Hotspots - Load Balancing, Data Allocation and How to Avoid Hotspots

"Hotspots" refers to a situation in which a cluster with multiple nodes is not balanced - some nodes are handling more load than others...

Elasticsearch Memory and Disk Usage Management

One way to evaluate whether your resources are cost efficient it check the ratio of disk usage to the memory allocated...

Elasticsearch Rolling Restart: How to Perform Rolling Restarts Using the API

By executing Elasticsearch rolling restarts with the help of the API, you can maintain high cluster availability & avoid downtime. To do..

Elasticsearch Version Upgrades - Using Feature Migration APIs to Avoid Deprecation Issues

When upgrading to a new Elasticsearch version, you can use the feature migration APIs to avoid deprecation issues. These APIs simplify...

Flood Stage Disk Watermark

When the “disk flood stage” threshold is exceeded on an Elasticsearch cluster, it will start to block core actions. To resolve this issue...

Heavy Merges Were Detected

Heavy merges use CPU, memory and disk resources, which can slow down the cluster’s response speed. In order to fix...

High Cluster Pending Tasks

Elasticsearch cluster pending tasks are updates to the cluster state that were initiated by a user or the cluster. To resolve, list the...

High CPU

High CPU is often a symptom of other underlying issues. It should be fixed because a distressed node will slow query response time and...

High Disk Watermark

High disk watermark is one of the various thresholds on your Elasticsearch cluster. Passing this threshold is a warning and you should not...

High Management Queue

A high number of tasks in management queue can cause Elasticsearch cluster instability which could result in data loss. To resolve...

How to configure all Elasticsearch node roles (master, data, coordinating..)

Follow these steps to configure all Elasticsearch node role types (master, data, coordinating, ingest, machine learning, remote eligible...

How to Increase Primary Shard Count in Elasticsearch

There are 2 methods to increase the primary shard count in Elasticsearch: _reindex API & the _split API. Before using either method, you…

How to Upgrade Elasticsearch Versions

The 2 approaches for upgrading Elasticsearch versions are full cluster restarts & rolling restarts. Before making an Elasticsearch upgrade...

Index Queue Size Is High

Once an indexing queue exceeds the maximum size, the Elasticsearch node will start rejecting index requests. To resolve this, check the...

Loaded Client Nodes

A saturated coordinating node could cause an increase in search or indexing response latency. This can be fixed by putting a load balancer...

Loaded Master Nodes

An overloaded master node may cause instability in the cluster. There are 3 ways to fix loaded master nodes: (1) Checking for...

Low Disk Watermark

Low disk watermark is one of the various thresholds on your Elasticsearch cluster. Here are possible actions you can take to resolve...

Master Node Not Discovered

An Elasticsearch cluster requires a master node to be identified in the cluster. Reasons why a master node is not discovered yet include...

Max Shards Per Node Exceeded

If the max of shards per node is exceeded in Elasticsearch, shards can't be allocated. It is crucial to check if the limit is set at a...

Node Disconnected

An Elasticsearch node can disconnect from a cluster for several reasons, including: excessive garbage collection from JVM, configuration...

Red Status

Elasticsearch red status indicates not only that the primary shard has been lost, but also that a replica has not been promoted...

Rejected Search Requests in Elasticsearch - Causes and Solutions

There are a number of reasons why a search request can be rejected by the Elasticsearch cluster. To resolve the issue, you need to...

Search is Slow in nodesNames

There are a number of possible causes for slow searches on particular nodes. To correct the issue and improve search performance, you...

Search Rejected Queue

An Elasticsearch cluster can start to reject search requests for several reasons. To resolve this, check the state of the thread pool and..

Slow Indexing in Nodes

If the indexing queue is high/causes timeouts, it hints that Elasticsearch nodes can't keep up with the indexing rate. To fix slow indexing...

Yellow Status

Yellow status indicates that one or more of the replica shards on the Elasticsearch cluster are not allocated to a node. This could occur...

Combined_Fields Query Type in Elasticsearch

In Elasticsearch, the combined_fields query allows you to search several text fields as though their indexed values have been indexed into...

Elasticsearch Boolean Queries

There are 4 types of Elasticsearch boolean clauses: filter, must, should & must_not. A single bool query can contain a mix of them. To use...

Elasticsearch Boosting Query

Elasticsearch boosting query is used to return only documents that match a positive query while minimizing the score of documents that...

Elasticsearch Composite Aggregations

An Elasticsearch composite aggregation allows to paginate every bucket from a multi-level aggregation effectively. An example of....

Elasticsearch Constant Score Query

In Elasticsearch, the constant score query wraps other queries by executing them in a filter context. To implement constant_score query...

Elasticsearch DSL Exists Query

The exists query is used for returning the documents that have an indexed value for a specific field, which means it returns the documents...

Elasticsearch Keyword vs. Text

Elasticsearch keyword vs. text vs. wildcard vs. text field types. All have different features and are ideal for different use cases

Elasticsearch Runtime Fields: How to Use Lookup Runtime Fields

Elasticsearch runtime fields with a type of lookup can retrieve field values from the associated indices using the fields parameter on...

ElasticsearchAsync Search in

The Elasticsearch async search API retrieves many data in a stream fashion instead of a single request. To limit the maximum response size...

Named Queries

Named queries allow you to label your queries with a name. Named queries can be utilized in a variety of use cases such as...

Slow Query Troubleshooting Guide

There are several potential reasons for a slow query in Elasticsearch. Slow logs can be used to detect & troubleshoot slow queries issues...

Terms Enum API in Elasticsearch (For Low Latency Lookups)

In Elasticsearch, the Terms enum API looks for similarities in the index based on partial matches. To use the terms_enum API...

How to Secure an Elasticsearch Cluster: TLS, SSL & CERTUTIL Certificates

Securing an Elasticsearch cluster and creating TLS certificates will require some downtime on your cluster. Here's how to create...

X-Pack Basic Security is Off

The popularity of Elasticsearch has made it a target for hackers. It's important to protect your cluster by enabling X-Pack Security...

Skip to content