Elasticsearch guides

AJAX progress indicator
Search
(clear)
  • a
  • All Script Types are Allowed to Run
    Overview Elasticsearch scripts can place a heavy load on your cluster, particularly if the scripts are not written carefully with thought for the resources they may require. For this reason it is a best practice to limit both the type of scripts that can run on a cluster, and also the contexts in which scripts can run.  How to resolve it Script settings are advanced settings which require you to have knowledge of how scripts on your cluster are implemented (if at all). If you are(...) Read More
  • Autocomplete Guide
    In addition to reading this guide, you should run Opster's Search Log Analyzer if you want to improve your search performance in Elasticsearch. With Opster's Analyzer, you can easily locate slow searches and understand what led to them adding additional load to your system. You'll receive customized recommendations for how to reduce search latency and improve your search performance. The tool is free and takes just 2 minutes to run. Background In this article we will cover how(...) Read More
  • b
  • Bootstrap Checks
    Overview Elasticsearch has many settings that can cause significant performance problems if not set correctly. To prevent this happening, Elasticsearch carries out "bootstrap checks" to ensure that these important settings have been covered. If any of the checks fail, Elasticsearch will write an error to the logs and will not start. In this guide we cover common bootstrap checks you should know and how to configure your settings correctly to pass the checks successfully. Bootstrap(...) Read More
  • Bootstrap Memory Lock is Set to False
    Overview Elasticsearch performance can be heavily penalised if the node is allowed to swap memory to disk. Elasticsearch can be configured to automatically prevent memory swapping on its host machine by adding the bootstrap memory_lock true setting to elasticsearch.yml. If bootstrap checks are enabled, Elasticsearch will not start if memory swapping is not disabled. You can learn more about bootstrap checks here: Bootstraps Check in Elasticsearch - A Detailed Guide With(...) Read More
  • c
  • Cluster Blocks Read-Only
    Overview A read-only delete block can be applied automatically by the cluster because of a disk space issue, or may be applied manually by an operator to prevent indexing to the Elasticsearch cluster. There are two types of block: cluster.blocks.read_onlycluster.blocks.read_only_allow_delete A read-only block is typically applied by an operator because some sort of cluster maintenance is taking place or in order to recover cluster stability. A read-only allow delete block(...) Read More
  • Cluster Concurrent Rebalance High / Low
    What it means The cluster concurrent rebalance setting determines the maximum number of shards which the cluster can move to rebalance the distribution of disk space requirements across the nodes at any one time. When moving shards, a shard rebalance is required in order to rebalance the disk usage requirements across the clusters. This rebalance uses cluster resources. Therefore, it’s advisable to reduce the concurrent rebalance setting to limit the number of shards that can be(...) Read More
  • d
  • Dangerous Default Settings
    Overview Cluster name It is important to change the name of the cluster in elasticsearch.yml to avoid Elasticsearch nodes joining the wrong cluster. This is particularly important when development, staging and production environments can find themselves on the same network.  How to prevent it from happening If you want to change the name of the cluster, then you need to modify the setting in elasticsearch.yml and perform a rolling restart: cluster.name:(...) Read More
  • Dedicated Client Nodes
    Overview There is some confusion in the use of coordinating node terminology. Client nodes were removed from Elasticsearch after version 2.4 and became Coordinating Nodes. At the same time a new node type, Ingest Node, also appeared. Many clusters do not use dedicated coordinating or ingest nodes, and leave the ingest and coordination functions to the data nodes.  Coordinating Node A coordinating (or client) node is a node which has: node.master: false node.data: false(...) Read More
  • Dedicated Master Node
    Overview Master nodes are responsible for actions such as creating or deleting indices, deciding which shards should be allocated on which nodes, and maintaining the cluster state of all nodes. The cluster state includes information about which shards are on which node, index mappings, which nodes are in the cluster and other settings necessary for the cluster to operate. Even though these actions are not resource intensive, it is essential for cluster stability to ensure that the(...) Read More
  • e
  • Elasticsearch Global Ordinals and High Cardinality Fields
    Global ordinals in Elasticsearch  Terms aggregations rely on an internal data structure known as global ordinals. These structures maintain statistics for each unique value of a given field. Those statistics are calculated, kept at the shard level and further combined in the reduce phase to produce a final result. The performance of terms aggregations on a given field can be harmed as the number of unique possible values for that field increases (high cardinality), but also because(...) Read More
  • Elasticsearch match_only_text Field Type (For Storage Optimization)
    Overview A new feature of Elasticsearch 7.14 is the new match_only_text that can save up to 10% of disk space on logging datasets. When defining mappings, a trivial decision is whether to set a field as “keyword” or “text”, depending on how we are querying it. Keyword field We use keyword fields when we want to look for exact matches and when we want to filter documents, such as showing the user a select box with options (e.g. status = “done”). This also works for operations(...) Read More
  • Enable Adaptive Replica Selection
    Overview Adaptive replica selection is a process intended to prevent a distressed Elasticsearch node from delaying the response to queries, while reducing the search load on that node. To understand how it works, imagine a situation where a single node is in distress. This could be because of hardware, network or configuration issues, but as a consequence the response time for shards on that node are much longer than the response time from the other nodes. When an Elasticsearch(...) Read More
  • Enable Shard Rebalance and Allocation
    Overview Cluster shard rebalancing and allocation are often confused with each other. Cluster shard allocation This refers to the process by which any shard including new, recovered or rebalanced shards are allocated to Elasticsearch nodes. Cluster shard allocation may be temporarily disabled during maintenance in order to avoid shards from being relocated to nodes that are being restarted and may temporarily leave the cluster. If cluster shard allocation is NOT enabled, then(...) Read More
  • Expensive Queries are Allowed to Run
    Overview By default this setting is set to true. This means that users can use certain query types which require a lot of resources to return results, causing slow results for other users and possibly affecting the stability of the cluster. It is particularly appropriate in installations where you have no control over the queries being run (eg. where users have access to kibana or other graphical interface tools). Setting this to false will prevent running the following(...) Read More
  • f
  • File Descriptors
    What it means File descriptors are required so that the Elasticsearch process can keep track of all the files it has open at any given time as well as all network connections to other nodes. Running out of file descriptors would result in the Elasticsearch process not being able to keep track of the files it has open or not being able to open new files or socket connections when it needs to, and will most probably lead to data loss. The Elasticsearch process should be permitted(...) Read More
  • Flood Stage Disk Watermark
    Overview There are various “watermark” thresholds on your Elasticsearch cluster. As the disk fills up on a node, the first threshold to be crossed will be the “low disk watermark”. The second threshold will then be the “high disk watermark threshold”. Finally, the “disk flood stage” will be reached. Once this threshold is passed, the cluster will then block writing to ALL indices that have one shard (primary or replica) on the node which has passed the watermark. Reads (searches) will(...) Read More
  • Flush, Translog and Refresh
    What is an Elasticsearch flush? In Elasticsearch, flushing is the process of permanently storing data onto the disk for all of the operations that have temporarily been stored in memory. This is also known as a Lucene commit. How are Elasticsearch documents indexed? To understand the relevance of flushing, it is necessary to understand how Elasticsearch indexes documents. As new documents are indexed, the operations are recorded on disk in the translog and stored in memory in(...) Read More
  • h
  • Heap Size Usage and JVM Garbage Collection
    Overview The heap size is the amount of RAM allocated to the Java Virtual Machine of an Elasticsearch node. As a general rule, you should set -Xms and -Xmx to the SAME value, which should be 50% of your total available RAM subject to a maximum of (approximately) 31GB. A higher heap size will give your node more memory for indexing and search operations. However, your node also requires memory for caching, so using 50% maintains a healthy balance between the two. For this same(...) Read More
  • Heavy Merges Were Detected
    Overview Elasticsearch indices are stored in shards, and each shard in turn stores the data on disk in segments. Elasticsearch processes such as updates and deletion can result in many small segments being created on disk, which Elasticsearch will merge into bigger sized segments in order to optimize disk usage. The merging process uses CPU, memory and disk resources, which can slow down the cluster’s response speed. How to fix it In general, the Elasticsearch merging process is(...) Read More
  • High Cluster Pending Tasks
    What are cluster pending tasks? Cluster pending tasks are updates to the cluster state which may have been initiated directly by a user or by the cluster itself. Note that cluster pending tasks are specific tasks relating to the cluster state, and are not necessarily the same as the tasks from the task API (although there may be some overlap).  The task API relates to tasks created by users or the cluster but these are not necessarily related to cluster state. The reason to be(...) Read More
  • High CPU
    Overview High CPU is often a symptom of other underlying issues, and as such there are a number of possible causes for it. Causes of high CPU should be investigated and fixed, because a distressed node will at best slow down query response times resulting in time outs for clients, and at worst cause the node to disconnect and be lost from the cluster altogether. How to resolve it To minimize the impact of distressed nodes on your search queries, make sure you have the(...) Read More
  • High Disk Watermark
    Overview There are various “watermark” thresholds on your Elasticsearch cluster. As the disk fills up on a node, the first threshold to be crossed will be the “low disk watermark”. The second threshold will then be the “high disk watermark”. If you pass this threshold then Elasticsearch will try to relocate shards away from the node to other nodes in the cluster. How to resolve this issue Passing this threshold is a warning and you should not delay in taking action before the(...) Read More
  • High Management Queue
    Overview The management queue is where tasks such as node allocation or index management tasks are queued if they cannot be carried out immediately. In a stable cluster, it would be normal to have one management thread per node, with no rejections. If management tasks start to back up, it’s an indication that: An excessive number of management tasks are being created, orSomething is stopping the management tasks from being carried out properly.  A high number of tasks in the(...) Read More
  • How to Activate and Use Elasticsearch Slow Logs
    If you're suffering from search latency issues or poor search performance, you should run Opster's free Search Log Analyzer to optimize your searches. With Opster's Search Analyzer, you can easily locate slow searches and understand what led to them adding additional load to your system. You'll receive customized recommendations for how to reduce search latency and improve your search performance. The tool is free and takes just 2 minutes to run.(...) Read More
  • How to Choose the Correct Number of Shards per Index in Elasticsearch
    How can we choose the correct number of shards per index? An Elasticsearch index consists of one or more primary shards. As of Elasticsearch version 7, the current default value for the number of primary shards per index is 1. In earlier versions, the default was 5 shards. Finding the right number of primary shards for your indices, and the right size for each shard, depends on a variety of factors. These factors include: the amount of data that you have, your use case(s), your(...) Read More
  • How to Create Data Streams in Elasticsearch
    What is a data stream? The Elasticsearch data stream is an abstraction layer between the names used by applications to facilitate ingestion and search operations on data, and on the underlying indices used by Elasticsearch to store that data. Data streams let you store append-only time series data across multiple indices while providing you with a single named resource for requests. Data sent to a data stream is stored in indices with a name format like this: .ds--- The date is(...) Read More
  • How to Define Efficient Mapping in Elasticsearch
    Mapping in Elasticsearch Mapping is the core element of index creation. Mapping acts as the skeleton structure that represents the document and the definition of each field showing how the document will be indexed or searched. Mappings are a set of key-value pairs, where the key is the field and the value is the type of the field and other parameters like index, store options.  Elasticsearch doesn’t impose a strict structure for documents - any document can be stored. A document can(...) Read More
  • How to Handle Circuit Breakers in Elasticsearch
    What are circuit breakers? As explained in Opster’s Elasticsearch Memory Usage Guide, 50% of memory on an Elasticsearch node is generally used for the JVM (Java Virtual Machine) heap, while the other half of the memory is used for other requirements such as cache. In order to prevent “Out of Memory” (OOM) errors, Elasticsearch implements circuit breakers. If a certain request could cause errors in the node because of memory issues, Elasticsearch will throw a(...) Read More
  • How to Improve your Elasticsearch Aggregation Performance
    Improving aggregation performance in Elasticsearch Even though Elasticsearch is most known for its full text search capabilities, many use cases also take advantage of another very powerful feature Elasticsearch delivers out of the box: the aggregations framework. Aggregations are used everywhere in Kibana. Every dashboard with visualization that sums up data collected from the Beats agents uses aggregations. Elastic's APM, which is Elastic's alternative to instrumentation and(...) Read More
  • How to Increase Elasticsearch Search Speed
    Search speed is the major selling point of Elasticsearch. Most of the time, it’s the reason people decide to use Elasticsearch in the first place - which is why it’s key to ensure it produces results quickly. By optimizing and maintaining Elasticsearch search speed, you can improve your product’s user experience and in turn improve your product’s conversion rate. In this article, we will detail how to increase Elasticsearch speed by optimizing query and Elasticsearch(...) Read More
  • How to leverage ingest pipelines to transform data transparently in Elasticsearch
    What ingest pipeline are used for Do you have some adjustments you’d like to make to your data, but would like to use a method that is more lightweight than Logstash or some other data parsing tool? Ingest pipelines may just be what you’re looking for.  With ingest pipelines you can manipulate your data to fit your needs without much overhead. Ingest pipelines sit within the Elasticsearch node (the ingest node, if you’ve defined one), and will perform a set of(...) Read More
  • How to Optimize Search Performance in Elasticsearch
    One of the most difficult issues to manage and resolve in Elasticsearch is poor search performance. To optimize Elasticsearch search performance, you need to find the heavy and slow searches in your system, which is no easy task.  Once you’ve succeeded at finding a “culprit” search that is degrading search performance, you need to know exactly how to configure your settings differently to resolve the issue and optimize future searches. Aside from configuration, you also want to(...) Read More
  • How to Reduce the Number of Shards in an Elasticsearch Cluster
    Elasticsearch Reduce Number of Shards - Explanation & Code Snippets When you have too many shards in your cluster, there are a few steps you can take in order to reduce the number of shards. Deleting or closing indices and reindexing into larger indices are covered in this Opster guide. Below, we will review how to reduce the number of shards of newly created indices, how to reduce the number of shards of already existing indices, how to reduce the number of primary shards and how to(...) Read More
  • How to Roll Up Data in Elasticsearch
    Why you may want to roll up your data The cost of running an Elasticsearch cluster is largely relative to the volume of data stored on the cluster. If you are storing time-based data, it’s common to find that old data is queried less often than the newer data, and that the old data is often only used to look at the “bigger picture” or to compare historical trends.   Rollup jobs provide a way to drastically reduce storage cost for old data, by means of storing documents which(...) Read More
  • How to Secure an Elasticsearch Cluster
    Elasticsearch Cluster Security Securing an Elasticsearch cluster and creating TLS certificates will almost inevitably require some downtime on your cluster, since the cluster will not be available until all nodes have their certificates installed.    What is TLS and why do we need it? TLS (Transport Layer Security) certificates are necessary to provide encryption keys to enable the nodes to encrypt their communications. Furthermore, each certificate must be created with the(...) Read More
  • How to Use Runtime Fields in Elasticsearch
    Overview Elasticsearch 7.12 released a new feature called runtime fields. A runtime field is a field evaluated at query time instead of indexing time, which allows us to modify our schema at the query stage. Below we’ll review query and index phases, and how, when and why you should (or shouldn’t) use runtime fields. Index time vs query time When we talk about index time we refer to the actions before running our queries, for example ingesting documents or setting up mappings.(...) Read More
  • i
  • Index Lifecycle Management
    Why index lifecycle management is necessary Index lifecycle management is a feature that helps automate the creation, management and deletion of an Elasticsearch index. Being able to automate the creation of a new index when the index reaches the optimum size of 50GB per shard is very useful. Setting up a time-based index with one index per day, or one index per month, is likely to create index shards that have an optimal size. Shards that are either too small or too large can cause(...) Read More
  • Index Queue Size Is High
    Overview If the Elasticsearch cluster starts to reject indexing requests, there could be a number of causes. Generally it is an indication that one or more nodes cannot keep up with the volume of indexing / delete / update / bulk requests, resulting in a queue building up on that node. Once the indexing queue exceeds the index queue maximum size (as defined here: Threadpools) then the node will start to reject the indexing requests. How to resolve it You should check the state of(...) Read More
  • l
  • Lack of Quorum
    Overview This error is produced when the Elasticsearch cluster does not have a “quorum” of nodes with voting rights to elect a new master node.   Nodes with voting rights may be any nodes with either of the following configurations: node.master: true node.voting_only: true It does not matter whether the node is a dedicated master node or not. Quorum can be lost for one or more of the following reasons: Bad configuration (insufficient nodes configured with voting(...) Read More
  • Loaded Client Nodes
    Overview Sometimes you can observe that the CPU and load on some coordinating nodes (client nodes) is higher than others. This can be caused by applications that are not load balancing correctly across the coordinating nodes, and are making all their HTTP calls to just one or some of the nodes. Possible effects A saturated coordinating node could cause an increase in search or indexing response latency, or an increase in write queue/search queue when the cluster is under load(...) Read More
  • Loaded Data Nodes
    Overview Sometimes you can observe that the CPU and load on some of your data nodes is higher than on others. This can occasionally be caused by applications that are not load balancing correctly across the data nodes, and are making all their HTTP calls to just one or some of the nodes. You should fix this in your application. However it is more frequently caused by “hot” indices being located on just a small number of nodes. A typical example of this would be a logging(...) Read More
  • Loaded Master Nodes
    Overview Sometimes you can observe that the CPU and load on one of your master nodes is higher than on others. This is absolutely normal behavior assuming that the loaded master node is the elected master. Although you need more than one master node (and ideally an odd number), only one of these nodes will be active at any one time. If CPU is very high and the node appears to be overloaded, then this may be cause for concern, since an overloaded master node may cause instability in(...) Read More
  • Low Disk Watermark
    Overview There are various “watermark” thresholds on your Elasticsearch cluster. As the disk fills up on a node, the first threshold to be crossed will be the “low disk watermark”.  Once this threshold is crossed, the Elasticsearch cluster will stop allocating shards to that node.  This means that your cluster may become yellow. How to resolve it Passing this threshold is a warning and you should not delay in taking action before the higher thresholds are reached. Here are(...) Read More
  • m
  • Master Node Not Discovered
    Overview An Elasticsearch cluster requires a master node to be identified in the cluster in order for it to start properly. Furthermore, the election of the master node requires that there be a quorum of 50% and one of the nodes must have voting rights. If the cluster lacks a quorum, it will not start. For further information please see this guide on the split-brain problem. Possible causes Incorrect discovery settings If you are getting this warning in the logs: Master(...) Read More
  • Max Shards Per Node Exceeded
    Overview Elasticsearch permits you to set a limit of shards per node, which could result in shards not being allocated once that limit is exceeded. The effect of having unallocated replica shards is that you do not have replica copies of your data, and could lose data if the primary shard is lost or corrupted (cluster yellow). The outcome of having unallocated primary shards is that you are not able to write data to the index at all (cluster red). If you get this warning it is(...) Read More
  • Memory Usage Guide
    Elasticsearch memory requirements The Elasticsearch process is very memory intensive. Elasticsearch uses a JVM (Java Virtual Machine), and close to 50% of the memory available on a node should be allocated to JVM. The JVM machine uses memory because the Lucene process needs to know where to look for index values on disk. The other 50% is required for the file system cache which keeps data that is regularly accessed in memory. For a full explanation of JVM management, please see:(...) Read More
  • Misuse of Wildcards
    Overview It is possible to reduce the risk of accidental deletion of indices by preventing the use of wildcard for destructive (deletion) operations. How to fix the issue To check whether this setting exists on the cluster, run: GET /_cluster/settings/action* Look for a setting called: action.destructive_requires_name To apply this setting use: PUT /_cluster/settings { "transient": { "action.destructive_requires_name":true } } To remove this setting(...) Read More
  • n
  • Node Concurrent Recoveries Setting is Too High / Low
    An overview of Node_Concurrent_Recoveries_High and Node_Concurrent_Recoveries_Low.  What it means The node concurrent recoveries setting determines the maximum number of shards that can be recovered at once from each node. Recovering shards requires both disk and network resources, so it is advisable to limit the number of shards that can be recovered from a given node at any one time.  If, on the other hand, the concurrent recoveries setting is too limited and is set too low,(...) Read More
  • Node Disconnected
    Overview There are a number of possible reasons for a node to become disconnected from a cluster. It is important to take into account that node disconnection is often a symptom of some underlying problem which must be investigated and solved.  How to diagnose The best way to understand what is going on in your cluster is to: Look at monitoring dataLook at Elasticsearch logs Possible causes Excessive garbage collection from JVM If you can see that the JVM heap is not(...) Read More
  • Number of Master Nodes
    Overview Master nodes are responsible for actions such as creating or deleting indices, deciding which shards should be allocated on which nodes, and maintaining and updating the cluster state on all of the nodes. The cluster state includes information about which shards are on which node, index mappings, which nodes are in the cluster and other settings necessary for the cluster to operate.  If you have just one or two master nodes in your Elasticsearch cluster, then the loss of(...) Read More
  • o
  • Object Fields VS. Nested Field Types in Elasticsearch
    Overview  When defining mappings, Elasticsearch will configure the fields that contain an array of objects within them as “object” type. This is fine in many cases, but sometimes the mappings will need to be adjusted. Below we will cover different scenarios and how to choose the correct mapping for every case. Object fields One of the advantages of using document based structures is that its properties can be grouped in a hierarchical shape. This is what we call objects. { (...) Read More
  • Oversharding
    In addition to reading this guide, run the free Elasticsearch Health Check-Up. Get actionable recommendations that can improve performance and prevent incidents (does not require any installation). The check-up includes a specific check on shard sizes and can provide an actionable recommendation specific to your ES deployment. Overview Oversharding is a status that indicates that you have too many shards, and thus they are too small. While there is no minimum limit for an(...) Read More
  • r
  • Register Snapshot Repository
    Overview To backup Elasticsearch indices you need to use the Elasticsearch snapshot mechanism. It is not sufficient to have backups of the individual data directories of the data nodes, because if you were to restore these directories there is no guarantee that the data recovered would form a consistent copy of the cluster. At best, data could be lost, and at worst it could be impossible to restore the cluster entirely. To create and restore snapshots, you need to register a(...) Read More
  • Rejected Search Requests in Elasticsearch - Causes and Solutions
    Rejected Search Requests There are a number of reasons why a search request can be rejected by the cluster. These reasons generally break down into 2 main groups:  Performance / workload related issuesMapping or syntax related issues Performance / workload related issues These are some of the issues that could cause search requests to be rejected: 403 Request throttled due to too many requests400 Circuit Breaker Errors400 Queue Full Errors As a general rule, you should(...) Read More
  • s
  • Script Regex is Enabled in Painless Scripts
    Overview Regex (short for regular expression) refers to a technique for searching using a sequence of characters defining a search pattern. For example, gray|grey would find both words gray and grey. Regex must be used with care in painless scripts, since some expressions can be extremely slow and require a great deal of resources to run. For this reason regex is disabled by default in painless scripts. If you decide to enable regex, remember the following best practices:(...) Read More
  • Search Latency
    If you’re suffering from search latency issues, you should run Opster’s Search Log Analyzer. With Opster's Analysis, you can easily locate slow searches and understand what led to them adding additional load to your system. You'll receive customized recommendations for how to reduce search latency and improve your search performance. The tool is free and takes just 2 minutes to run. Background Opster(...) Read More
  • Search Rejected Queue
    If you're suffering from search related issues or poor search performance, you should run Opster's free Search Log Analyzer to optimize your searches. With Opster's Analyzer, you can easily locate slow searches and understand what led to them adding additional load to your system. You'll receive customized recommendations for how to handle rejected searches and improve your search performance. The tool is free and takes just 2 minutes to(...) Read More
  • Setting Up Zone Awareness for Shard Allocation in Elasticsearch
    What is zone awareness and why is it used? Elasticsearch is a distributed system designed to maintain data availability, even in cases when individual Elasticsearch nodes become unavailable. For this reason, Elasticsearch creates replicas of shards. If one node crashes or becomes unavailable, the replica shard will be promoted to become the primary shard, and a new replica will be created to replace the one that was lost.  By default, Elasticsearch will ensure that a replica shard(...) Read More
  • Shard Allocation is Unbalanced
    Overview Elasticsearch will usually balance the index shards evenly across all active data nodes in the cluster. This is generally a process which happens automatically without any specific user intervention. If this is not happening, it is usually because there are certain settings on the cluster which are preventing shard balancing from occurring as expected. In an extreme case, these settings may result in NO shards being allocated to an individual node. There are two basic(...) Read More
  • Shards Too Large
    Overview It is a best practice that Elasticsearch shard size should not go above 50GB for a single shard.   The limit for shard size is not directly enforced by Elasticsearch. However, if you go above this limit you can find that Elasticsearch is unable to relocate or recover index shards (with the consequence of possible loss of data) or you may reach the lucene hard limit of 2 ³¹ documents per index. How to resolve this issue If your shards are too large, then you have 3(...) Read More
  • Slow Indexing in Nodes
    Overview If the indexing queue is high or produces time outs, this indicates that one or more Elasticsearch nodes cannot keep up with the rate of indexing. Rejected indexing might occur as a result of slow indexing. Elasticsearch will reject indexing requests when the number of queued index requests exceeds the queue size. See the recommendations below to resolve this. Possible causes Suboptimal indexing procedure Apply as many of the indexing tips as you can from the(...) Read More
  • Slow Query Troubleshooting Guide
    Overview How to use slow logs to detect and troubleshoot issues related to slow queries.  To read more about slow logs and how to use them read this guide on how to activate and use Elasticsearch slow logs Slow queries are often caused by  Poorly written or expensive search queries. Poorly configured Elasticsearch clusters or indices. Saturated CPU, Memory, Disk and network resources on the cluster. Periodic background processes like snapshots or merging segments that(...) Read More
  • Split Brain
    Overview Elasticsearch is a distributed system and may contain one or more nodes in each cluster. For a cluster to become operational, Elasticsearch needs a quorum of a minimum number of master nodes. By default, every node in Elasticsearch is master eligible. These master nodes are responsible for all the cluster coordination tasks to manage the cluster state.  When you create a cluster, no matter how many nodes you are configuring, the quorum is by default set to one. That means(...) Read More
  • Status Red
    Overview A red status indicates that one or more indices do not have allocated primary shards. The causes may be similar to those described in Status Yellow, but certainly indicate that something is not right with the cluster. What it means A red status indicates that not only has the primary shard been lost, but also that a replica has not been promoted to primary in its place. However, just as with yellow status, you should not panic and start firing off commands without(...) Read More
  • Status Yellow
    Overview Yellow status indicates that one or more of the replica shards on the Elasticsearch cluster are not allocated to a node. No need to panic! There are several reasons why a yellow status can be perfectly normal, and in many cases Elasticsearch will recover to green by itself, so the worst thing you can do is start tweaking things without knowing exactly what the cause is. While status is yellow, search and index operations are still available. How to resolve There are(...) Read More
  • w
  • When You Should Transform Your Data Instead of Using Aggregations
    Transform API Starting from version 7.3, Elasticsearch offers the Transform API, which allows you to convert existing Elasticsearch indices into summarized indices. This provides opportunities for new insights and analytics.  With this API you can, for example: Pivot your data into entity-centric indices that summarize the behavior of users, sessions or other entities in your data.Find the latest document among all the documents that have a certain unique key. There are at(...) Read More
  • Which Pagination Technique to Use Depending on Your Use Case
    Elasticsearch Pagination Techniques Elasticsearch currently provides 3 different techniques for fetching many results: pagination, Search-After and Scroll. Each use case calls for a different technique. We’ll cover the considerations in this guide.  When you build a user facing search application or an API reading from Elasticsearch, it’s crucial to think about the number of results to be returned per search request.  In many search applications, 10 hits are shown on the first(...) Read More
  • x
  • X-Pack Basic Security is Off
    Overview The growing popularity of Elasticsearch has made both Elasticsearch and Kibana targets for hackers and ransomware, so it is important never to leave your Elasticsearch cluster unprotected. From Elasticsearch Version 6.8 and onwards, X Pack Basic License (free) includes security in the standard Elasticsearch version, while prior to that it was a paid for feature. How to resolve it Bear in mind that the following steps will inevitably require some cluster down time. If(...) Read More
  • z
  • Zen Discovery Settings
    Overview Zen discovery settings for cluster formation were deprecated in Elasticsearch version 7. If these settings are included in elasticsearch.yml files for version 7 and above, they should be removed to avoid confusion. Reason for the changes Up until version 6 it was possible, using zen discovery mechanism, to inadvertently set unsafe settings which could result in a cluster becoming separated into two separate clusters (the split brain problem). The changes introduced in(...) Read More