Elasticsearch

Aggregation

In Elasticsearch, an aggregation is a collection or the gathering of related things together. It's important to understand how...

Alias

In Elasticsearch, an alias is a secondary name to refer to one or more indices. Aliases can be created and removed dynamically using...

All Script Types are Allowed to Run

Elasticsearch scripts can place heavy loads on clusters if they are not written carefully. It is a best practice to limit the type of..

An Overview of Source Filtering, Stored Fields, Fields and Docvalues Fields

There are various options for retrieving fields in Elasticsearch that can boost performance or enable additional formatting options when...

Async Search in Elasticsearch

The async search API is designed to retrieve huge amounts of data in a stream fashion instead of a single request. It's important to...

Autocomplete Guide

There are various approaches for autocomplete in Elasticsearch. Here are some tips for choosing the approach best suited for your needs.

Bootstrap Checks

Elasticsearch carries out "bootstrap checks" to ensure that important settings have been set correctly. If any of these fail, ES won't start.

Bootstrap Memory Lock is Set to False

Elasticsearch can be configured to automatically prevent memory swapping on its host machine by adding the bootstrap...

Bulk

Elasticsearch bulk API makes it is possible to perform many write operations in a single API call, which increases indexing speed.

Cache

Elasticsearch uses three types of caches to improve the efficiency of operation: node requests, shards and field data. It is possible to...

Circuit Breakers

Elasticsearch has circuit breakers to deal with OutOfMemory errors that cause nodes to crash. Each breaker is used to...

Client

Any application that interfaces with Elasticsearch using various APIs can be considered a client. Elasticsearch clients follow a similar...

Cluster

An Elasticsearch cluster consists of a number of servers (nodes) working together as one to store data and respond to requests. It enables...

Cluster Blocks Read-Only

A read-only delete block can be applied automatically by the cluster because of a disk space issue. It can also be applied manually by...

Cluster Concurrent Rebalance High / Low

The cluster concurrent rebalance setting determines the maximum number of shards the cluster can move to rebalance the distribution of...

Dangerous Default Settings

Cluster name and data path are default settings that could be destructive for proper Elasticsearch function if handled incorrectly. If you...

Dedicated Client Nodes

Many clusters use coordinating or ingest nodes, while others leave the ingest and coordination functions to the data nodes. In order to...

Dedicated Master Node

Once an Elasticsearch cluster reaches a certain size, it's recommended to create 3 dedicated master nodes. Here is how you can create...

DELETE

DELETE is an Elasticsearch API which removes a document from a specific index. It requires an index name and _id document in order to...

Delete-By-Query

Delete-by-query is an Elasticsearch API, which provides functionality to delete all documents based on the matching query. If you don't...

Deprecation

There are a number of ways you can find out which functions in Elasticsearch have been deprecated, including: deprecation logs and...

Discovery

Discovery occurs when an Elasticsearch node starts, restarts or loses contact with the master node. In those cases the node needs to...

Disk Watermark

There are various watermark thresholds on your Elasticsearch cluster. Elasticsearch considers the available disk space before...

Document

Each ES document is a JSON structure, which is ultimately considered to be a series of key:value pairs. It's important to understand...

Elasticsearch Global Ordinals and High Cardinality Fields

Terms aggregations rely on an internal data structure known as global ordinals. These structures maintain statistics for each unique...

Elasticsearch match_only_text Field Type (For Storage Optimization)

The new match_only_text feature in Elasticsearch can save up to 10% of disk space on logging datasets. This field type will set a flat...

Elasticsearch Token Filters

A tokenizer decides how Elasticsearch will take a set of words and divide it into separated terms called “tokens”. To work with synonyms...

Enable Adaptive Replica Selection

Adaptive replica selection is a process that prevents a distressed Elasticsearch node from delaying the response to queries. To enable it...

Enable Shard Rebalance and Allocation

Cluster shard rebalancing and allocation are often confused with each other. If cluster shard rebalancing isn't enabled, then...

Expensive Queries are Allowed to Run

By default, Elasticsearch expensive queries are allowed to run. To prevent users from running certain expensive queries, you can add...

Fielddata

In Elasticsearch the term Fielddata is relevant when performing sorting and aggregations on text field. The field is important because...

File Descriptors

File descriptors are required to keep track of all the files Elasticsearch has open at any given time, as well as all network...

Filter

Filters in Elasticsearch apply conditions inside the query to narrow down the matching results. The most common problems with filters are...

Flood Stage Disk Watermark

When the “disk flood stage” threshold is passed on an Elasticsearch cluster, it will start to block core actions. To resolve this issue...

Flush, Translog and Refresh

In Elasticsearch, flushing is the process of permanently storing data onto the disk for all of the operations that have been stored in memory.

Heap Size Usage and JVM Garbage Collection

Higher heap sizes will give nodes more memory for indexing & search operations. However, it's key to maintain a healthy balance. First, you...

Heavy Merges Were Detected

Heavy merges use CPU, memory and disk resources, which can slow down the cluster’s response speed. In order to fix...

High Cluster Pending Tasks

Cluster pending tasks are updates to the cluster state which may have been initiated directly by a user or by the cluster itself. When...

High CPU

High CPU is often a symptom of other underlying issues. It should be fixed because a distressed node will slow query response time and...

High Disk Watermark

The high disk watermark is one of the various thresholds on your ES cluster. It's important to understand what it means and how to handle...

High Management Queue

A high number of tasks in management queue can cause Elasticsearch cluster instability which could result in data loss. To resolve...

How to Activate and Use Elasticsearch Slow Logs

Elasticsearch provides the possibility to create a log output of all searches that took longer than a specified amount of time. You should...

How to Choose the Correct Number of Shards per Index in Elasticsearch

Finding the right number of shards for your ES indices, and the right size for each shard, depends on many factors. These factors include...

How to Create Data Streams in Elasticsearch

The Elasticsearch data stream is an abstraction layer between the names used by applications to facilitate ingestion and search operations...

How to Define Efficient Mapping in Elasticsearch

Mappings are the core element of index creation in Elasticsearch. Defining them correctly can vastly improve performance. Here's how to...

How to Handle Circuit Breakers in Elasticsearch

Circuit breaker exceptions are thrown to alert us that something needs to be fixed in Elasticsearch in order to reduce memory usage. To fix...

How to Improve your Elasticsearch Aggregation Performance

There are multiple ways you can improve your aggregation performance. First, you should limit the scope by filtering documents out. Then...

How to Increase Elasticsearch Search Speed

By optimizing and maintaining Elasticsearch search speed, you can improve your product’s user experience. Here's how to easily increase...

How to leverage ingest pipelines to transform data transparently in Elasticsearch

Ingest pipelines sit within the Elasticsearch node and will perform a set of alterations on your data that you...

How to Optimize Search Performance in Elasticsearch

One of the most difficult issues to manage and resolve in Elasticsearch is poor search performance. This blog goes through clear steps to...

How to Reduce the Number of Shards in an Elasticsearch Cluster

When you have too many shards in your Elasticsearch cluster, there are a few steps you can take in order to reduce the number of shards...

How to Roll Up Data in Elasticsearch

Rollup jobs in Elasticsearch reduce storage costs for old data by storing summaries of data for a given time period. This way, you can...

How to Secure an Elasticsearch Cluster

It's essential to secure your Elasticsearch cluster effectively. Doing so will probably require some downtime, but the key is simply to...

How to Set Up a Hot/Warm/Cold/ Frozen Architecture in Elasticsearch - The Complete Guide

A multi-tier architecture allows for better organization of resources to fit various search use cases in Elasticsearch. In order to...

How to Use Runtime Fields in Elasticsearch

A runtime field is a field evaluated at query time instead of indexing time, which allows us to modify our schema at the query stage and to...

Index

In Elasticsearch an index (indices in plural) can be thought of as a table inside a database that maintains a number of related elements.

Index Lifecycle Management

Index lifecycle management is a feature that helps automate the creation, management and deletion of an Elasticsearch index. Here's how to...

Index Queue Size Is High

Once an indexing queue exceeds the maximum size, the Elasticsearch node will start rejecting index requests. To resolve this, check the...

Indexing

Indexing is the process of adding or updating new documents to an Elasticsearch index. In its simplest form, you can index a document by...

Lack of Quorum

This error occurs when the Elasticsearch cluster doesn't have a quorum of nodes with voting rights to elect a new master node. To resolve...

Loaded Client Nodes

A saturated coordinating node could cause an increase in search or indexing response latency. This can be fixed by putting a load balancer...

Loaded Data Nodes

Sometimes you can observe that the CPU and load on some of your data nodes is higher than on others. This can occasionally be caused by...

Loaded Master Nodes

An overloaded master node may cause instability in the cluster. There are 3 ways to fix loaded master nodes: (1) Checking for...

Low Disk Watermark

Low disk watermark is one of the various thresholds on your Elasticsearch cluster. Here are possible actions you can take to resolve...

Lucene

Elasticsearch Lucene or Apache Lucene is an open-source Java library used as a search engine. Elasticsearch is built on top of Lucene...

Mapping

Mapping contains the properties of each field in the index. The most common issue in Elasticsearch is an incorrectly defined mapping which...

Master Node Not Discovered

An Elasticsearch cluster requires a master node to be identified in the cluster in order for it to start properly. If the cluster lacks a...

Max Shards Per Node Exceeded

If the max of shards per node is exceeded in Elasticsearch, shards can't be allocated. It is crucial to check if the limit is set at a...

Memory Usage Guide

The Elasticsearch process is very memory intensive. Elasticsearch uses a JVM (Java Virtual Machine), and close to 50% of the memory should...

Metadata

Metadata in Elasticsearch is to additional information stored with each document, using metadata fields. This is how you can check metadata...

Misuse of Wildcards

It's possible to reduce the risk of accidental deletion of indices by preventing the use of wildcard for destructive operations. To check...

Node Concurrent Recoveries Setting is Too High / Low

The node concurrent recoveries setting determines the max number of shards that can be recovered at once from each node. It's important to...

Node Disconnected

It's important to realize that node disconnection is often a symptom of underlying problems which must be found and solved - learn how to...

Nodes

There are different types of nodes in Elasticsearch. Each has its own role and purpose. It's important to understand that...

Number of Master Nodes

Master nodes are responsible for actions such as creating or deleting indices. If you don't have enough master nodes, it could lead to...

Object Fields VS. Nested Field Types in Elasticsearch

Nested is a special type of object that is indexed as a separate document, and is required for certain types of queries. However, only...

Oversharding

A large number of shards on an Elasticsearch cluster requires extra resources. Learn key ways to avoid and correct oversharding...

Persistent

In Elasticsearch, Persistent refers to cluster settings that persist across cluster restarts. This setting is used in Cluster Update API...

Plugins

Plugins in Elasticsearch are used to extend the functionality of Elasticsearch. In addition to the core plugins available, there are also...

Queue

Queues in Elasticsearch exist in the context of Thread Pools. Queues are used to hold the pending requests for thread pools instead of...

Rebalance

Cluster rebalancing is the process by which an Elasticsearch cluster distributes data across the nodes. It can be done automatically or...

Recovery

In Elasticsearch, recovery refers to the process of recovering an index or shard when something goes wrong. You can recover data by...

Refresh

Elasticsearch requires a refresh operation to make indexed information available for search. This means that there is a time delay between...

Register Snapshot Repository

To backup Elasticsearch indices you need to use the Elasticsearch snapshot mechanism. It is not sufficient to have backups of the...

Reindex

Reindexing is the concept of copying existing data from a source index to a destination index. Reindexing is mostly required for updating...

Rejected Search Requests in Elasticsearch - Causes and Solutions

There are a number of reasons why a search request can be rejected by the Elasticsearch cluster. To resolve the issue, you need to...

Replica

In Elasticsearch there are two types of shards, the primary shard and the replica copy. This is done so that...

Replication

In Elasticsearch, replication refers to storing the redundant copy of the data. Replicas never get assigned on the same node as...

Repository

The Elasticsearch snapshot provides a backup mechanism that takes the current state and data in the cluster and saves it to a repository.

Rest-high-level

Rest-high-level is built on top of low-level rest-client and is a method of communicating with Elasticsearch based on HTTP REST endpoints...

Restore

In Elasticsearch, restore refers to a snapshot restore mechanism. A restoration can be carried out once you set up the snapshot repository...

Routing

In Elasticsearch, routing refers to document routing. When you index a document, Elasticsearch will determine which shard will be used...

Script Regex is Enabled in Painless Scripts

Script regex is disabled in Elasticsearch by default, so if it has been enabled on your cluster, there may be a reason. Be careful because...

Scroll

In Elasticsearch, the scroll API is useful when a search returns a large set of results. Large search results are exhaustive for the system...

Search

To search in Elasticsearch, send a GET request to the _search endpoint in the search API. In the query phase and the fetch phase there are...

Search Latency

Based on a key case study, this guide explores how to reduce search latency in Elasticsearch. The first lesson is to always...

Search Rejected Queue

If the Elasticsearch cluster starts to reject search requests, there could be a number of causes. Generally it indicates that...

Setting Up Zone Awareness for Shard Allocation in Elasticsearch

Setting up zone awareness for shard allocation ensures high availability in the case of several servers going down. Here's how to...

Settings

Elasticsearch settings can be configured on the cluster-level, node-level and index-level. Here's how to set up and optimize your settings...

Shard Allocation is Unbalanced

Elasticsearch usually balances index shards evenly across all of the active data nodes in the cluster. If this isn't happening it's because..

Shards

The number of shards is set when an index is created, and this number cannot be changed later without reindexing. Unassigned shards can...

Shards Too Large

It is a best practice that Elasticsearch shard size should not go above 50GB for a single shard. If you go above this limit...

Slow Indexing in Nodes

If the indexing queue is high or produces time outs, it indicates that one or more Elasticsearch nodes cannot keep up with the rate of...

Slow Query Troubleshooting Guide

There are several potential reasons for slow queries You can use slow logs to detect and troubleshoot issues related to slow queries by...

Snapshot

Snapshots in Elasticsearch are used to backup and restore data. It's better to use Elasticsearch snapshots instead of disk backups because...

Source

Elasticsearch keeps the original JSON document in a field called _source. The source field serves special purposes such as...

Split Brain

Elasticsearch Split Brain occurs when there is more than 1 master in the cluster. It can be avoided by setting the min number of...

Status Red

Elasticsearch red status indicates not only that the primary shard has been lost, but also that a replica has not been promoted...

Status Yellow

Yellow status indicates that one or more of the replica shards on the Elasticsearch cluster are not allocated to a node. This could occur...

Task

A task is equivalent to an Elasticsearch operation, any request performed on an Elasticsearch cluster. The following commands are used...

Template

A template in Elasticsearch falls into one of these categories: index templates or search templates. Templates are indexed inside Elasticsearch using...

Terms Enum API in Elasticsearch (For Low Latency Lookups)

The Terms enum API looks for similarities in the index based on partial matches. This approach can help us run...

Threadpool

Threadpools are used to manage how requests are processed and to optimize the use of resources on each node in the cluster. To see which...

Threshold

Elasticsearch uses several parameters to enable it to manage hard disk storage across the cluster, such as...

Upgrade

Upgrade refers to migrating your Elasticsearch version to a newer version. There are two ways to upgrade existing clusters...

Version

A version corresponds to the Elasticsearch built-in tracking system that tracks the changes in each document with the purpose of...

When You Should Transform Your Data Instead of Using Aggregations

There are at least three use cases where you should consider using transforms instead of aggregations in Elasticsearch. First, when the...

Which Pagination Technique to Use Depending on Your Use Case

Elasticsearch currently provides 3 different techniques for fetching many results: Pagination, Search-After and Scroll. To learn how to...

X-Pack Basic Security is Off

The growing popularity of Elasticsearch has made it a target for hackers. It's important to protect your Elasticsearch cluster by enabling...

Zen Discovery Settings

Zen discovery settings for cluster formation were deprecated in Elasticsearch V.7 and should be removed from version 7 and above due to...