Elasticsearch

_source

Elasticsearch keeps the original JSON document in a field called _source. The source field serves special purposes such as...

Aggregation

The aggregations framework is a tool built in every Elasticsearch deployment. The different aggregation types: Bucket, Metric & Pipeline...

Alias

In Elasticsearch, an alias is a secondary name to refer to one or more indices. Aliases can be created and deleted dynamically using...

All Script Types are Allowed to Run

Elasticsearch scripts can place heavy loads on clusters if they are not written carefully. It is a best practice to limit the type of..

An Overview of Source Filtering, Stored Fields, Fields and Docvalues Fields

There are various methods for retrieving fields in Elasticsearch, including: _source, stored_fields, fields & docvalue_fields. To retrieve...

Autocomplete Guide

There are various approaches for autocomplete in Elasticsearch. Here are some tips & examples for choosing the approach best suited to your...

Bootstrap Checks

Elasticsearch carries out "bootstrap checks" to ensure that important settings have been set correctly. If any of these fail, ES won't start.

Bootstrap Memory_Lock is Set to False

Elasticsearch can be configured to prevent memory swapping on its host machine by adding bootstrap memory_lock true. If bootstrap checks...

Bulk

Elasticsearch bulk makes it possible to perform many write operations in a single API call, which increases indexing speed. Using bulk API...

Cache: Node Request, Shard Data & Field Data Cache

Elasticsearch uses 3 types of caches to improve the efficiency of operation: node requests, shards and field data cache. It is possible to...

Choosing the right amount of memory based on number of shards in Elasticsearch

If the ratio of memory to number of shards in the cluster is low, it suggests that you have insufficient memory compared to the volume...

Circuit Breakers

Elasticsearch has circuit breakers to deal with OutOfMemory errors that cause nodes to crash. Each breaker is used to...

Client

Official Elasticsearch clients are available for java, javascript, Perl, PHP, python, ruby and .NET. To avoid surprises, keep your client....

Cluster

An Elasticsearch cluster consists of a number of servers (nodes) working together as one to store data and respond to requests. It enables...

Cluster Blocks Read-Only

A read-only delete block can be applied automatically by the cluster because of a disk space issue. It can also be applied manually by...

Cluster Concurrent Rebalance High / Low

The cluster concurrent rebalance setting determines the maximum number of shards the cluster can move to rebalance the distribution of...

Combined_Fields Query Type in Elasticsearch

In Elasticsearch, the combined_fields query allows you to search several text fields as though their indexed values have been indexed into...

Composite Aggregations

In Elasticsearch, the composite aggregation allows to paginate every bucket from a multi-level aggregation effectively. An example of....

Dangerous Default Settings

Cluster name and data path are default settings that could be destructive for proper Elasticsearch function if handled incorrectly. If you...

Dedicated Client Nodes

Many clusters use coordinating or ingest nodes, while others leave the ingest and coordination functions to the data nodes. In order to...

Dedicated Master Node

Once an Elasticsearch cluster reaches a certain size, it's recommended to create 3 dedicated master nodes. Here is how you can create...

DELETE

DELETE is an Elasticsearch API which removes a document from a specific index. It requires an index name and _id document in order to...

Delete By Query

Elasticsearch delete by query is an API, which provides functionality to delete all documents based on the matching query. If you don't...

Deprecation

To find out which functions have been deprecated in Elasticsearch, you can use deprecation logs, deprecation API, read breaking pages...

Discovery

Discovery occurs when an Elasticsearch node starts, restarts or loses contact with the master node. In those cases the node needs to...

Disk Watermark

There are various watermark thresholds on your Elasticsearch cluster. Elasticsearch considers the available disk space before...

DiskThreshold

Elasticsearch uses several parameters to enable it to manage hard disk storage across the cluster, such as...

Document

Each Elasticsearch document is a JSON structure, which is ultimately considered to be a series of key:value pairs. An example for creating...

Elasticsearch - Many Index Get Requests with Missing Documents

When you try to retrieve a document by ID, Elasticsearch will count the number of times that it searches for an ID which doesn't exist...

Elasticsearch Boolean Queries

There are 4 types of Elasticsearch boolean clauses: filter, must, should & must_not. A single bool query can contain a mix of them. To use...

Elasticsearch Circuit Breaker Exceptions: How to Handle Circuit Breakers

Circuit breaker exceptions are thrown to alert us that something needs to be fixed in Elasticsearch in order to reduce memory usage. To fix...

Elasticsearch Cluster State

Elasticsearch clusters need to maintain the cluster state in memory on each and every nodes, which requires a large amount of resources...

Elasticsearch Coordinating Node - When to Use Coordinating Only Nodes

A coordinating node is a node that handles HTTP(S) requests for the cluster, especially indexing & search requests. A coordinating only...

Elasticsearch Data Stream

The Elasticsearch data stream is an abstraction layer between the names used by applications to facilitate ingestion and search operations...

Elasticsearch DSL Exists Query

The exists query is used for returning the documents that have an indexed value for a specific field, which means it returns the documents...

Elasticsearch Field Size - How to Calculate the Storage Size of Specific Fields in an Index 

The 3 main methods in Elasticsearch to calculate the storage size of specific fields in an index are: using the _disk_usage API, creating...

Elasticsearch Global Ordinals, Eager Global Ordinals & High Cardinality Fields

Terms aggregations rely on an internal data structure known as global ordinals. The eager_global_ordinals parameter is used to...

Elasticsearch High Indexing Throttle Time

When Elasticsearch detects that the merge process cannot keep up with the rate of indexing, then it will start to throttle indexing...

Elasticsearch Hotspots - Load Balancing, Data Allocation and How to Avoid Hotspots

"Hotspots" refers to a situation in which a cluster with multiple nodes is not balanced - some nodes are handling more load than others...

Elasticsearch Indexing Downtime (Customer Post Mortem)

When looking at Shard View for the index, it was clear that the index in question wasn’t carrying out the highest indexing rate and wasn’t...

Elasticsearch Keyword vs. Text

Elasticsearch keyword vs. text vs. wildcard vs. text field types. All have different features and are ideal for different use cases

Elasticsearch Large Cluster State - How to Discover, Resolve and Prevent (Customer Post Mortem)

When cluster state becomes too large it poses many challenges. In order to determine the size of your cluster state and reduce it, you...

Elasticsearch match_only_text Field Type (For Storage Optimization)

The new match_only_text feature in Elasticsearch can save up to 10% of disk space on logging datasets. This field type will set a flat...

Elasticsearch Memory and Disk Usage Management

One way to evaluate whether your resources are cost efficient it check the ratio of disk usage to the memory allocated...

Elasticsearch Multi-Tier Architecture - How to Set Up a Hot/Warm/Cold/Frozen Elasticsearch Architecture

In Elasticsearch’s multi-tier architecture, the tiers are named hot, warm, cold & frozen. This Elasticsearch architecture allows better...

Elasticsearch Pagination - Which Technique to Use Depending on Your Use Case

Elasticsearch currently provides 3 different techniques for fetching many results: Pagination, Search-After and Scroll. To learn how to...

Elasticsearch Rollup: How to Rollup Data in Elasticsearch

Rollup jobs in Elasticsearch reduce old data storage costs by storing summaries of data for a given time period. Rollup examples include...

Elasticsearch Runtime Fields - How to Use Runtime Fields in Elasticsearch

An Elasticsearch runtime field is a field evaluated at query time instead of indexing time, which allows to modify our schema at the...

Elasticsearch Search Suggestion - Term Suggester, Phrase Suggester, Completion Suggester (Autocomplete)

Elasticsearch offers three types of suggesters: term suggesters, phrase suggesters & completion suggesters (autocomplete). Suggesters work...

Elasticsearch Text Analyzers - Tokenizers, Standard Analyzers, Stopwords and More

The text analysis process is tasked with two functions: tokenization and normalization and is carried out by employing analyzers. When you...

Elasticsearch Token Filters

A tokenizer decides how Elasticsearch will take a set of words and divide it into separated terms called “tokens”. To work with synonyms...

Elasticsearch Version Upgrades - Using Feature Migration APIs to Avoid Deprecation Issues

When upgrading to a new Elasticsearch version, you can use the feature migration APIs to avoid deprecation issues. These APIs simplify...

ElasticsearchAsync Search in

The Elasticsearch async search API retrieves many data in a stream fashion instead of a single request. To limit the maximum response size...

Enable Adaptive Replica Selection

Adaptive replica selection is a process that prevents a distressed Elasticsearch node from delaying the response to queries. To enable it...

Enable Shard Rebalancing and Allocation

Cluster shard rebalancing and allocation are often confused with each other. If cluster shard rebalancing isn't enabled, then...

Expensive Queries are Allowed to Run

By default, Elasticsearch expensive queries are allowed to run. By setting search.allow_expensive_queries to false, you can prevent users...

Fielddata

In Elasticsearch the term Fielddata is relevant when performing sorting and aggregations on text field. To set fielddata=true, you...

File Descriptors

File descriptors are required to keep track of all the files Elasticsearch has open at any given time, as well as all network...

Filter

Elasticsearch Filters apply conditions inside the query to narrow down the matching results. A filter clause can be used used in...

Flood Stage Disk Watermark

When the “disk flood stage” threshold is exceeded on an Elasticsearch cluster, it will start to block core actions. To resolve this issue...

Flush, Translog and Refresh

In Elasticsearch, flush is the process of permanently storing data onto the disk for all of the operations that have been stored in memory.

Heap Size Usage and JVM Garbage Collection

A high heap size in Elasticsearch will give your node more memory for indexing and search operations. However, your node also requires...

Heavy Merges Were Detected

Heavy merges use CPU, memory and disk resources, which can slow down the cluster’s response speed. In order to fix...

High Cluster Pending Tasks

Elasticsearch cluster pending tasks are updates to the cluster state that were initiated by a user or the cluster. To resolve, list the...

High CPU

High CPU is often a symptom of other underlying issues. It should be fixed because a distressed node will slow query response time and...

High Disk Watermark

High disk watermark is one of the various thresholds on your Elasticsearch cluster. Passing this threshold is a warning and you should not...

High Management Queue

A high number of tasks in management queue can cause Elasticsearch cluster instability which could result in data loss. To resolve...

How to Activate and Use Elasticsearch Slow Logs

By analyzing your slow logs, you can understand why searches are slow and how to optimize them. To enable slow logging in Elasticsearch...

How to Choose the Correct Number of Shards per Index in Elasticsearch

Finding the right number of shards for your Elasticsearch indices, and the right size for each shard depends on many factors, including...

How to Define Efficient Mapping in Elasticsearch

Mappings are the core element of index creation in Elasticsearch. Defining them correctly can improve performance. Mapping types include...

How to Ensure Slow Logs Don’t Get Cut Off (Applicable before ES 8.0)

Analyzing search slow logs can provide users with advanced insights like the number of costly queries, reasons why queries were costly, so...

How to Improve your Elasticsearch Aggregation Performance

There are multiple ways to improve your Elasticsearch aggregation performance. First, you should limit the scope by filtering documents...

How to Increase Elasticsearch Search Speed

By optimizing and maintaining Elasticsearch search speed, you can improve your product’s user experience. Here's how to speed up search...

How to Increase Primary Shard Count in Elasticsearch

There are 2 methods to increase the primary shard count in Elasticsearch: _reindex API & the _split API. Before using either method, you…

How to leverage ingest pipelines to transform data transparently in Elasticsearch

Ingest pipelines sit within the Elasticsearch node and will perform a set of alterations on your data that you...

How to Optimize Elasticsearch Disk Space and Usage

If you don’t have enough disk space available, Elasticsearch will stop allocating shards to the node. This will eventually prevent you from...

How to Optimize Search Performance in Elasticsearch

One of the most difficult issues to manage and resolve in Elasticsearch is poor search performance. This blog goes through clear steps to...

How to Reduce the Number of Shards in an Elasticsearch Cluster

When you have too many shards in your Elasticsearch cluster, there are a few steps you can take in order to reduce the number of shards...

How to Secure an Elasticsearch Cluster

Securing an Elasticsearch cluster and creating TLS certificates might require some downtime, but the key is simply to...

How to Upgrade Versions in Elasticsearch

The 2 approaches for upgrading Elasticsearch versions are full cluster restarts & rolling restarts. Before starting an upgrade, you need...

Index - How to create Elasticsearch Index and what it

How to create an Elasticsearch Index & what it is with a general overview - an index (plural: indices) contains a schema and can have

Index Lifecycle Management & Policy

Index lifecycle management helps automate the creation, management & removal of an Elasticsearch index. Define the index lifecycle policy...

Index Queue Size Is High

Once an indexing queue exceeds the maximum size, the Elasticsearch node will start rejecting index requests. To resolve this, check the...

Index Templating in Elasticsearch - How to Use Composable Templates

Elasticsearch index templates allow us to create indices with user defined configuration. An index can pull the configuration from these...

Indexing

Indexing is the process of adding or updating new documents to an Elasticsearch index. In its simplest form, you can index a document by...

Lack of Quorum

This error occurs when the Elasticsearch cluster doesn't have a quorum of nodes with voting rights to elect a new master node. To resolve...

Loaded Client Nodes

A saturated coordinating node could cause an increase in search or indexing response latency. This can be fixed by putting a load balancer...

Loaded Data Nodes

Sometimes you can observe that the CPU and load on some of your data nodes is higher than on others. This can occasionally be caused by...

Loaded Master Nodes

An overloaded master node may cause instability in the cluster. There are 3 ways to fix loaded master nodes: (1) Checking for...

Low Disk Watermark

Low disk watermark is one of the various thresholds on your Elasticsearch cluster. Here are possible actions you can take to resolve...

Lucene

Elasticsearch Lucene or Apache Lucene is an open-source Java library used as a search engine. Elasticsearch is built on top of Lucene...

Mapping

Mapping contains the properties of each field in the index. A common issue in Elasticsearch is an incorrectly defined mapping. Examples of...

Master Node Not Discovered

An Elasticsearch cluster requires a master node to be identified in the cluster. Reasons why a master node is not discovered yet include...

Max Shards Per Node Exceeded

If the max of shards per node is exceeded in Elasticsearch, shards can't be allocated. It is crucial to check if the limit is set at a...

Memory Usage Guide

The Elasticsearch process is very memory intensive. Here are the memory requirements and some tips to reduce your Elasticsearch memory usage.

Metadata

Elasticsearch metadata refers to additional information stored for each document using metadata fields. Metadata fields can be customized...

Misuse of Wildcards

It's possible to reduce the risk of accidental deletion of indices by preventing the use of wildcard for destructive operations. To check...

Named Queries

Named queries allow you to label your queries with a name. Named queries can be utilized in a variety of use cases such as...

Node Concurrent Recoveries Setting is Too High / Low

The node concurrent recoveries setting determines the max number of shards that can be recovered at once from each node. It's important to...

Node Disconnected

An Elasticsearch node can disconnect from a cluster for several reasons, including: excessive garbage collection from JVM, configuration...

Nodes

There are different types of nodes in Elasticsearch. Each has its own role and purpose. Master, coordinating and data nodes differ...

Number of Master Nodes

Master nodes are responsible for actions such as creating or deleting indices. If you don't have enough master nodes, it could lead to...

Object Fields VS. Nested Field Types in Elasticsearch

Nested is a special object type that is indexed as a separate document. To demonstrate the use of Elasticsearch nested VS. object fields...

Oversharding

A large number of shards on an Elasticsearch cluster requires extra resources. Learn key ways to avoid and correct oversharding...

Persistent

In Elasticsearch, Persistent refers to cluster settings that persist across cluster restarts. This setting is used in Cluster Update API...

Plugins

Plugins in Elasticsearch are used to extend the functionality of Elasticsearch. An Elasticsearch plugin is installed and removed using the...

Queue

Queues in Elasticsearch exist in the context of Thread Pools. Queues are used to hold the pending requests for thread pools instead of...

Rebalance

Cluster rebalancing is the process by which an Elasticsearch cluster distributes data across the nodes. To force rebalance manually...

Recovery

In Elasticsearch, recovery refers to the process of recovering an index or shard when something goes wrong. You can recover data by using...

Red Status

Elasticsearch red status indicates not only that the primary shard has been lost, but also that a replica has not been promoted...

Refresh Interval

Elasticsearch requires a refresh operation to make indexed information available for search. You can set the refresh interval by...

Register Snapshot Repository

To create & restore snapshots, you need to register a snapshot repository with every Elasticsearch node in the cluster. Here are the steps...

Reindex

Reindex is the concept of copying existing data from a source index to a destination index. In some scenarios, the reindex API is...

Rejected Search Requests in Elasticsearch - Causes and Solutions

There are a number of reasons why a search request can be rejected by the Elasticsearch cluster. To resolve the issue, you need to...

Replica

In Elasticsearch there are two types of shards: the primary shard & the replica copy. Each replica is located on a different node to ensure...

Replication

Elasticsearch replication refers to storing a redundant copy of the data. Elasticsearch creates 1 primary shard with a replication factor...

Repository

The Elasticsearch snapshot provides a backup mechanism that takes the current state and data in the cluster and saves it to a repository.

Rest-high-level

Rest-high-level is built on top of low-level rest-client and is a method of communicating with Elasticsearch based on HTTP REST endpoints...

Restore

In Elasticsearch, restore refers to a snapshot restore mechanism. To restore a cluster from the snapshot, an index, or selected indices...

Routing

In Elasticsearch, routing refers to document routing. When you index a document, Elasticsearch will determine which shard will be used...

Script Regex is Enabled in Painless Scripts

Script regex is disabled in Elasticsearch by default, but you can decide to enable it. Regex must be used with care in painless scripts...

Scroll

The Elasticsearch scroll API is useful when a search returns a large set of results. Large search results are exhaustive for the system...

Search

To search in Elasticsearch, send a GET request to the _search endpoint in the search API. In the query phase and the fetch phase there are...

Search is Slow in nodesNames

There are a number of possible causes for slow searches on particular nodes. To correct the issue and improve search performance, you...

Search Latency

This guide explores how to reduce Elasticsearch search latency based on a key study. The first lesson is to always...

Search Rejected Queue

An Elasticsearch cluster can start to reject search requests for several reasons. To resolve this, check the state of the thread pool and..

Setting Up Zone Awareness for Shard Allocation in Elasticsearch

Setting up zone awareness for shard allocation ensures high availability in the case of several servers going down. Here's how to...

Settings

Elasticsearch settings can be configured on the cluster-level, node-level and index-level. Here's how to set up and optimize your settings...

Shard Allocation is Unbalanced

Shard allocation is an algorithm by which Elasticsearch decides which unallocated shards should go on which nodes. To resolve unbalanced...

Shards

The number of shards is set when an index is created, and cannot be changed without reindexing. To handle unassigned Elasticsearch shards...

Shards Too Large

It is a best practice that Elasticsearch shard size should not go above 50GB for a single shard. If you go above this limit...

Slow Indexing in Nodes

If the indexing queue is high/causes timeouts, it hints that Elasticsearch nodes can't keep up with the indexing rate. To fix slow indexing...

Slow Query Troubleshooting Guide

There are several potential reasons for a slow query in Elasticsearch. Slow logs can be used to detect & troubleshoot slow queries issues...

Snapshot

An Elasticsearch snapshot is a backup of an index taken from a running cluster. It's better to use snapshots instead of disk backups due...

Split Brain

Elasticsearch split brain occurs when there is more than one master in the cluster. By setting the quorum of minimum master nodes...

Task

A task is equivalent to an Elasticsearch operation, any request performed on an Elasticsearch cluster. The following commands are used...

Template

An Elasticsearch template falls into one of these categories: index templates or search templates. Examples of index templates include...

Terms Enum API in Elasticsearch (For Low Latency Lookups)

The Terms enum API looks for similarities in the index based on partial matches. This approach can help us run...

Threadpool

Elasticsearch threadpools are used to manage how requests are processed and to optimize the use of resources. The write threadpool...

Upgrade

An Elasticsearch upgrade of an existing cluster can be done in 2 ways: through a rolling upgrade or a full cluster restart. To upgrade...

Version

A version corresponds to the Elasticsearch built-in tracking system that tracks the changes in each document with the purpose of...

When You Should Transform Your Data Instead of Using Aggregations

There are at least three use cases where you should consider using transforms instead of aggregations in Elasticsearch. First, when the...

X-Pack Basic Security is Off

The popularity of Elasticsearch has made it a target for hackers. It's important to protect your cluster by enabling X-Pack Security...

Yellow Status

Yellow status indicates that one or more of the replica shards on the Elasticsearch cluster are not allocated to a node. This could occur...

Zen Discovery Settings

Zen discovery settings for cluster formation were deprecated in Elasticsearch V.7 and should be removed from version 7 and above due to...

Skip to content