Elasticsearch Glossary

AJAX progress indicator
  • a
  • Aggregation
    Aggregation in Elasticsearch Overview In Elasticsearch an aggregation is a collection or the gathering of related things together. The aggregation framework collects all the data based on the documents that match a search request which helps in building summaries of the data. Unlike Elasticsearch facets, aggregations can be nested. So aggregations can have sub-aggregations that operated on the documents which are generated by parent aggregation. Below are the(...) Read More
  • Alias
      Alias in Elasticsearch   In Elasticsearch, an alias is a secondary name to refer to one or more indices. Aliases can be created and removed dynamically using _aliases REST endpoint. What it is used for Aliases are used for multiple purposes such as to search across more than one index with a single name, perform the reindexing process with zero downtime and query data based on predefined filters.   Examples Creating an alias on a(...) Read More
  • Autocomplete Guide
    How to avoid critical performance mistakes, why the Elasticsearch default solution doesn't cut it and important implementation considerations. Background All modern-day websites have autocomplete features on their search bar to improve user experience (no one wants to type entire search terms...). It's imperative that the autocomplete be faster than the standard search, as the whole point of autocomplete is to start showing the results while the user is typing. If the latency is(...) Read More
  • b
  • Bootstrap Checks
    Bootstrap Checks in Elasticsearch Overview Elasticsearch has many settings that can cause significant performance problems if not set correctly. To prevent this happening Elasticsearch will carry out what are known as bootstrap checks to ensure that these important settings have been covered.   If any of the checks fail then elasticsearch will write an error to the logs and will not start. Bootstrap checks are carried out when the network.host setting in network.host:(...) Read More
  • Breaker
    Breaker in Elasticsearch Elasticsearch has the concept of circuit breakers to deal with OutOfMemory errors that cause nodes to crash. When a request reaches Elasticsearch nodes, the circuit breakers first estimate the amount of memory needed to load the required data. They then compare the estimated size with the configured heap size limit. If the estimated size is greater than the heap size, the query is terminated and an exception is thrown to avoid the node loading more than the(...) Read More
  • Bulk
    Bulk in Elasticsearch What it is In Elasticsearch, when using the Bulk API it is possible to perform many write operations in a single API call, which increases the indexing speed. Using the Bulk API is more efficient than sending multiple, separate requests. This can be done for the following four actions: IndexUpdateCreate Delete Examples The bellow bulk request will index a document, delete another document, and update an existing document. POST _bulk { "index" : {(...) Read More
  • c
  • Cache
    Cache in Elasticsearch What it is Elasticsearch uses three types of cache to improve the efficiency of operation.   Node request cacheShard data cacheField data cache How it works Node request cache maintains the results of queries used in a filter context.  The results are evicted on a least recently used basis. Shard level cache maintains the results of frequently used queries where size=0, particularly the results of aggregations. (...) Read More
  • Circuit Breakers
    Breakers in Elasticsearch What it is Circuit breakers are used to prevent operations from causing an OutOfMemoryError in Elasticsearch. There are many settings related to circuit breakers, and each of those settings can be configured using the cluster update API. There are many types of circuit breakers, such as parent level circuit breakers, request circuit breakers, field data circuit breakers, script compilation circuit breakers and more. All of these put default limits on(...) Read More
  • Client
    Elasticsearch Client What it is Any application that interfaces with Elasticsearch to index, update or search data, or to monitor and maintain Elasticsearch using various APIs can be considered a client.It is very important to configure clients properly in order to ensure optimum use of Elasticsearch resources. Examples There are many open-source client applications for monitoring, alerting and visualization, such as ElasticHQ, Elastalerts, and Grafana to name a few. On top(...) Read More
  • Cluster
    The Basics An Elasticsearch cluster consists of a number of servers (nodes) working together as one. Clustering is a technology which enables Elasticsearch to scale up to hundreds of nodes that together are able to store many Terabytes of data and respond coherently to large numbers of requests at the same time. Search or indexing requests will usually be load-balanced across the Elasticsearch data nodes, and the node that receives the request will relay requests to other nodes as(...) Read More
  • Cluster Blocks Read-Only
    An Explanation On cluster.blocks.read_only  & cluster.blocks.read_only_allow_delete What Does it Mean? A read-only delete block can be applied automatically by the cluster because of a disk space issue, or may be applied manually by an operator to prevent indexing to the Elasticsearch cluster. There are two types of block: cluster.blocks.read_onlycluster.blocks.read_only_allow_delete A read-only block is typically applied by an operator because some sort of cluster(...) Read More
  • Cluster Concurrent Rebalance High / Low
    An overview of CLUSTER_CONCURRENT_REBALANCE_HIGH and CLUSTER_CONCURRENT_REBALANCE_LOW. What Does it Mean The cluster concurrent rebalance setting determines the maximum number of shards which the cluster can move to rebalance the distribution of disk space requirements across the nodes at any one time. When moving shards, a shard rebalance is required in order to rebalance the disk usage requirements across the clusters. This rebalance uses cluster resources. Therefore, it’s(...) Read More
  • d
  • Dangerous Default Settings
    A review of two dangerous default settings in Elasticsearch: Cluster Name and Data Path. Cluster Name is Default ‘elasticsearch’ What Does it Mean? It is important to change the name of the cluster in elasticsearch.yml to avoid Elasticsearch nodes joining the wrong cluster. This is particularly important when development, staging and production environments can find themselves on the same network.  How to Prevent it from Happening If you want to change the name of the(...) Read More
  • Dedicated Client Node / Coordinating and Ingest Nodes
    What Does it Mean? There is some confusion in the use of coordinating node terminology. Client nodes were removed from Elasticsearch after version 2.4 and became Coordinating Nodes. At the same time a new node type, Ingest Node, also appeared. Many clusters do not use dedicated coordinating or ingest nodes, and leave the ingest and coordination functions to the data nodes.  Coordinating Node A coordinating (or client) node is a node which has: node.master: false(...) Read More
  • Dedicated Master Node
    What Does it Mean? Master nodes are responsible for actions such as creating or deleting indices, deciding which shards should be allocated on which nodes, and maintaining the cluster state on all of the nodes. The cluster state includes information about which shards are on which node, index mappings, which nodes are in the cluster and other settings necessary for the cluster to operate. Even though these actions are not resource intensive, it is essential for cluster stability to(...) Read More
  • DELETE
    DELETE Elasticsearch API What is it DELETE  is an Elasticsearch API which removes a document from a specific index. This API requires an index name and _id document to delete the document.  Delete a document DELETE /my_index/_doc/1 Notes A delete request throws 404 error code if the document does not already exist in the index.If you want to delete a set of documents that  matches a query, you need to use delete by query API. Read More
  • Delete-By-Query
      Delete-By-Query in Elasticsearch   Delete-by-query is an Elasticsearch API, which was introduced in version 5.0 and provides functionality to delete all the documents based on the matching query. In lower versions, users had to install the Delete-By-Query plugin and use the DELETE /_query endpoint for this same use case.   What it is used for This API is used for deleting all the documents from indices based on the matching query. Once the(...) Read More
  • Deprecation
    Deprecation in Elasticsearch What it is Deprecation refers to processes and functions that are in the process of being eliminated and (possibly) replaced by newer ones. Typically, a function will not disappear from one version to the next without warning. Normally this will happen across a number of versions. When you use a deprecated function in intermediate versions, it will continue to work as before, but you will receive warnings that the function in question is intended(...) Read More
  • Discovery
    Discovery in Elasticsearch What it is The process known as discovery occurs when an Elasticsearch node starts, restarts or loses contact with the master node for any reason. In those cases, the node needs to contact other nodes in the cluster to find the existing master node for the cluster or initiate the election of a new master node.  How It Works Upon startup, each node looks for other nodes, firstly by contacting those ip addresses of eligible(...) Read More
  • Disk Watermark
    Disk watermarks in Elasticsearch Elasticsearch considers the available disk space before deciding whether to allocate new shards, relocate shards away or put all indices on read mode based on a different threshold of this error. The reason is Elasticsearch indices consists of different shards which are persisted on data nodes and low disk space can cause issues. Relevant settings related  cluster.routing.allocation.disk.watermark and have(...) Read More
  • Document
    Document in Elasticsearch Overview A document is simply a json document that is stored in Elasticsearch index. It consists of one or more fields; where each field has its own data type. This field type defines the type of data that can be stored in the field such as integer, string, object. Document is schema-free, which means we do not require to specify schema before indexing document, when a field is indexed for the first time, its type is decided and(...) Read More
  • e
  • Enable Adaptive Replica Selection
    What Does it Mean? Adaptive replica selection is a process intended to prevent a distressed Elasticsearch node from delaying the response to queries, while reducing the search load on that node. To understand how it works, imagine a situation where a single node is in distress. This could be because of hardware, network or configuration issues, but as a consequence the response time for shards on that node are much longer than the response time from the other nodes. When an(...) Read More
  • Enable Shard Rebalance and Shard Allocation
    What does it mean? Cluster shard rebalancing and allocation are often confused with each other. Cluster shard allocation This refers to the process by which any shard including new, recovered or rebalanced shards are allocated to Elasticsearch nodes. Cluster shard allocation may be temporarily disabled during maintenance in order to avoid shards from being relocated to nodes which are being restarted and may temporarily leave the cluster. If cluster shard allocation is NOT(...) Read More
  • Enable X-Pack Basic Security
    What does it mean? The growing popularity of Elasticsearch has made both Elasticsearch and Kibana targets for hackers and ransomware, so it is important never to leave your Elasticsearch cluster unprotected. From Elasticsearch Version 6.8 and onwards,  X Pack Basic License (free) includes security in the standard Elasticsearch version, while prior to that it was a paid for feature. How to resolve Bear in mind that the following steps will inevitably require some cluster down(...) Read More
  • Expensive Queries are Allowed to Run
    What does it mean? By default this setting is set to true. This means that users can use certain query types which require a lot of resources to return results, causing slow results for other users and possibly affecting the stability of the cluster. It is particularly appropriate in installations where you have no control over the queries being run (eg. where users have access to kibana or other graphical interface tools). Setting this to false will prevent running the following(...) Read More
  • f
  • Fielddata
    Fielddata in Elasticsearch What it is  In Elasticsearch the term Fielddata is relevant when doing Sorting and Aggregations ( similar to SQL GROUP BY COUNT and AVERAGE functions ) on text fields.   For performance reasons, there are some rules as to which kinds of fields you can aggregate. You can group by any numeric field but for text fields, which have to be of keyword type or have fielddata=true since they dont support doc_values ( Doc values are the on-disk inverted index(...) Read More
  • Filter
    Filter in Elasticsearch A filter in Elasticsearch is all about applying some conditions inside the query that are used to narrow down the matching result set. What it is used for When a query is executed, Elasticsearch by default calculates the relevance score of the matching documents.  But in some conditions it does not require scores to be calculated, for instance if a document falls in the range of two given timestamps. For all these Yes/No criteria, a filter clause(...) Read More
  • Flood stage disk watermark
    What Does it Mean? There are various “watermark” thresholds on your Elasticsearch cluster.  As the disk fills up on a node, the first threshold to be crossed will be the “low disk watermark”.  The second threshold will then be the “high disk watermark threshold”.  Finally, the “disk flood stage” will be reached. Once this threshold is passed, the cluster will then block writing to ALL indices that have one shard (primary or replica) on the node which has passed the watermark. Reads(...) Read More
  • h
  • Heap Size Usage and JVM Garbage Collection
    What Does it Mean? The heap size is the amount of RAM allocated to the Java Virtual Machine of an Elasticsearch node.   As a general rule, you should set -Xms and -Xmx to the SAME value, which should be 50% of your total available RAM subject to a maximum of (approximately) 31GB.    A higher heap size will give your node more memory for indexing and search operations. However, your node also requires memory for cache, so using 50% maintains a healthy balance between the two. For(...) Read More
  • Heavy Merges Were Detected
    What Does it Mean? Elasticsearch indices are stored in shards, and each shard in turn stores the data on disk in segments. Elasticsearch processes such as updates and deletion can result in many small segments being created on disk, which Elasticsearch will merge into bigger sized segments in order to optimize disk usage. The merging process uses cpu, memory and disk resources, which can slow down the cluster’s response speed. How to Fix it In general, the Elasticsearch(...) Read More
  • High Disk Watermark
    What Does it Mean? There are various “watermark” thresholds on your Elasticsearch cluster.  As the disk fills up on a node, the first threshold to be crossed will be the “low disk watermark”.  The second threshold will then be the “high disk watermark threshold”.  If you pass this threshold then Elasticsearch will try to relocate shards from the node to other nodes in the cluster. How to Resolve it Passing this threshold is a warning and you should not delay in taking action(...) Read More
  • i
  • Index
    Index in Elasticsearch https://youtu.be/pYPMpQhBrKQ?mute=1&mute=1; What it is In Elasticsearch, an index (indices in plural) can be thought of as a table inside a database that has a schema and can have one or more shards and replicas. An Elasticsearch index is divided into shards and each shard is an instance of a Lucene index. Indices are used to store the documents in dedicated data structures corresponding to the data type of fields. For example, text fields are(...) Read More
  • Indexing
    Elasticsearch Indexing What it is Indexing is the process of adding or updating new documents to an Elasticsearch index. Examples In its simplest form, you can index a document like this: POST /test/_doc { "message": "Opster Rocks Elasticsearch Management" } This will create the index “test” (if it doesn’t already exist) and add a document with the source equal to the body of the POST call.  In this case, the ID will be created automatically. If you repeat this(...) Read More
  • l
  • License
    license in Elasticsearch What is it Elasticsearch offers various licenses with different sets of features, ranging from Open Source Basic, Gold, Platinum and Enterprise. The default is set to basic. The basic license is a forever free plan but lacks many advanced x-pack features, such as alerts and advanced security. The following parameter is used inside elaticsearch.yml file to set a license type: xpack.license.self_generated.type: basic Read More
  • Loaded Client Nodes/Coordinating Nodes
    What Does it Mean Sometimes you can observe that the CPU and load on some coordinating nodes (client nodes) is higher than others.This can be caused by applications that are not load balancing correctly across the coordinating nodes, and are making all their HTTP calls to just one or some of the nodes. Possible Effects A saturated coordinating node could cause an increase in search or indexing response latency, or an increase in write queue/search queue when the cluster is under(...) Read More
  • Loaded Data Nodes
    What Does it Mean? Sometimes you can observe that the CPU and load on some of your data nodes is higher than on others. This can occasionally be caused by applications that are not load balancing correctly across the data nodes, and are making all their HTTP calls to just one or some of the nodes. You should fix this in your application. However it is more frequently caused by “hot” indices being located on just a small number of nodes.  A typical example of this would be a(...) Read More
  • Loaded Master Nodes
    What Does it Mean Sometimes you can observe that the CPU and load on one of your master nodes is higher than on others. This is absolutely normal behavior assuming that the loaded master node is the elected master. Although you need more than one master node (and ideally an odd number), only one of these nodes will be active at any one time. If CPU is very high and the node appears to be overloaded, then this may be cause for concern, since an overloaded master node may cause(...) Read More
  • Low Disk Watermark
    What Does it Mean? There are various “watermark” thresholds on your Elasticsearch cluster.  As the disk fills up on a node, the first threshold to be crossed will be the “low disk watermark”.  Once this threshold is crossed, the Elasticsearch cluster will stop allocating replica shards to that node.  This means that your cluster may become YELLOW. How to Resolve it Passing this threshold is a warning and you should not delay in taking action before the higher thresholds are(...) Read More
  • Lucene
    Lucene and Elasticsearch What it is  Lucene or Apache Lucene is an open-source Java library used as a search engine. Elasticsearch is built on top of Lucene.  Elasticsearch converts Lucene into a distributed system/search engine for scaling horizontally. Elasticsearch also provides other features like thread-pool, queues, node/cluster monitoring API, data monitoring API, Cluster management, etc. In short , Elasticsearch extends Lucene and provides additional features in addition(...) Read More
  • m
  • Mapping
    Mapping in Elasticsearch Mapping is similar to database schemas that define the properties of each field in the index. These properties may contain the data type of each field and how fields are going to be tokenized and indexed. In addition, the mapping may also contain various advanced level properties for each field to define the options exposed by Lucene and Elasticsearch. You can create a mapping of an index using _mappings REST endpoint. The very first time Elasticsearch(...) Read More
  • Master Node Not Discovered
    What Does it Mean? An Elasticsearch cluster requires a master node to be identified in the cluster in order for it to start properly. Furthermore, the election of the master node requires that there be a quorum of 50% and one of the nodes must have voting rights. If the cluster lacks a quorum, it will not start. For further information please see this guide on the split-brain problem. Possible Causes Incorrect Discovery Settings If you are getting this warning in the(...) Read More
  • Max Shards Per Node Exceeded
    What Does it Mean? Elasticsearch permits you to set a limit of shards per node, which could result in shards not being allocated once that limit is exceeded. The effect of having unallocated replica shards is that you do not have replica copies of your data, and could lose data if the primary shard is lost or corrupted (cluster yellow). The outcome of having unallocated primary shards is that you are not able to write data to the index at all (cluster red). If you get this warning it(...) Read More
  • Memory
    Memory in Elasticsearch What is it Memory is one of the most critical resources to monitor in Elasticsearch. Elasticsearch runs on JVM and uses heap memory areas for query cache, request cache, accessing lucene segments and storing fielddata for aggregations and sorting. Commos problems and important points The most common error that arises in Elasticsearch is OutOfMemory error. This error comes when the node is not able to cope up with the required heap size space. To(...) Read More
  • Metadata
    Metadata in Elasticsearch What it is Metadata in Elasticsearch refers to storing some additional information for each document. This is achieved using the specific metadata fields available in Elasticsearch. The default behavior of some of these metadata fields can be customized during mapping creation. Examples Using _meta meta-field for storing application-specific information with the mapping: PUT /my_index?pretty { "mappings": { "_meta": { (...) Read More
  • Minimum Master Node Higher Than
    An Overview on Errors What Does it Mean? This error is produced when the Elasticsearch cluster does not have a “quorum” of nodes with voting rights to elect a new master node.   Nodes with voting rights may be any nodes with either of the following configurations: node.master: true node.voting_only: true It does not matter whether the node is a dedicated master node or not. Quorum can be lost for one or more of the following reasons: Bad configuration (insufficient(...) Read More
  • n
  • Node
    Nodes in Elasticsearch What it is Simply explained a node is a single server that is part of a cluster. Each node is assigned with one or more roles, which describes the node responsibility and operations - Data nodes stores the data, and participates in the cluster’s indexing and search capabilities, while master nodes are responsible for managing the cluster activities and storing the cluster state, including the metadata. While it’s possible to run several Node instances of(...) Read More
  • Node Disconnected - Possible Root Causes
    What Does it Mean? There are a number of possible reasons for a node to become disconnected from a cluster. It is important to take into account that node disconnection is often a symptom of some underlying problem which must be investigated and solved.  How To Diagnose  The best way to understand what is going on in your cluster is to: Look at monitoring dataLook at Elasticsearch logs Possible Causes Excessive Garbage Collection from JVM If you can see that the JVM(...) Read More
  • p
  • Persistent
    Persistent in Elasticsearch What it is In Elasticsearch, Persistent refers to cluster settings that persist across cluster restarts. This setting is used in Cluster Update API calls. Persistent settings can also be configured in elasticsearch.yml file. Examples ## enable shard routing PUT /_cluster/settings { "persistent" : { "cluster.routing.allocation.enable" : "all" } } ## enable rebalancing of shards PUT /_cluster/settings { "persistent" : { (...) Read More
  • Plugin
    Plugin in Elasticsearch https://youtu.be/lY4-C0ZZyeY What it is A plugin is used to enhance the core functionalities of Elasticsearch. Elasticsearch provides some core plugins as a part of their release installation. In addition to those core plugins, it is possible to write your own custom plugins as well. There are several community plugins available on GitHub for various use cases. Examples: Get all the instructions for the plugin usage sudo(...) Read More
  • q
  • Queue
    Queue in Elasticsearch The queue term in Elasticsearch is used in the context of Thread Pools. Each node of the Elasticsearch cluster holds various thread pools to manage the memory consumption on that node for different types of requests. The queues come up with initial default limits as per node size but can be modified dynamically using _settings REST endpoint. What it is used for Queues are used to hold the pending requests for the corresponding thread pool(...) Read More
  • r
  • Rebalance
    Elasticsearch Rebalance What it is Cluster rebalancing is the process by which an Elasticsearch cluster distributes data across the nodes.  Specifically, it refers to the movement of existing data shards to another node to improve the balance across the nodes (as opposed to the allocation of new shards to nodes).  Usually, it is a completely automatic process that requires no outside intervention. However, there are a number of parameters Elasticsearch uses to regulate this(...) Read More
  • Recovery
    Recovery in Elasticsearch What it is In Elasticsearch, recovery refers to the process of recovering an index/shard when something goes wrong. You can recover an index/shards in many ways such as by re-indexing the data from a  backup/failover cluster to the current one or by restoring from an Elasticsearch snapshot. Alternatively, Elasticsearch may be performing recoveries automatically in some cases, such as when a node restarts or when a node disconnects and connects again. There(...) Read More
  • Refresh
    Refresh in Elasticsearch What it is When indexing data, Elasticsearch requires a “refresh” operation to make indexed information available for search. This means that there is a time delay between indexing and the updated information actually becoming available for the client applications. How it works Index operations occur in memory. The operations are accumulated in a buffer until refreshed, which requires that the buffer be transferred to a newly created lucene(...) Read More
  • Register Snapshot Repository
    What does it mean? To backup Elasticsearch indices you need to use the Elasticsearch snapshot mechanism. It is not sufficient to have backups of the individual data directories of the data nodes, because if you were to restore these directories there is no guarantee that the data recovered would form a consistent copy of the cluster. At best, data could be lost, and at worst it could be impossible to restore the cluster entirely. To create and restore snapshots, you need to register(...) Read More
  • Reindex
    Elasticsearch Reindex What it is Reindex is the concept of copying existing data from a source index to a destination index which can be inside the same or a different cluster. Elasticsearch has a dedicated endpoint _reindex for this purpose. A reindexing is mostly required for updating mapping or settings. Examples Reindex data from a source index to destination index in the same cluster POST /_reindex?pretty { "source": { "index": "news" }, "dest": { (...) Read More
  • Replica
    Replicas in Elasticsearch What it is In order to understand Replicas in Elasticsearch, you need to have a thorough understanding of - Shards and their use in Elasticsearch. While each shard contains a single copy of the data, an index can contain multiple copies of the shard. There are thus two types of shards, the primary shard and a copy, or replica. Each replica of the shard is always located on a different node, which ensures access to your data in the event of a node failure.(...) Read More
  • Replication
    Replication in Elasticsearch What it is Replication refers to storing the redundant copy of the data. Starting from version 7.x, Elasticsearch creates one primary shard with a replication factor set to 1.  Replicas never get assigned on the same node on which primary shards are assigned, which means you should have at least two nodes in the cluster to assign the replicas. If a primary shard goes down, the replica automatically acts as a primary shard. What it is used(...) Read More
  • Repositories
    Repository in Elasticsearch What it is An Elasticsearch snapshot provides a backup mechanism that takes the current state and data in the cluster and saves it to a repository (read the Glossary term Snapshot for more information). The backup process requires a repository to be created first. The repository needs to be registered using the _snapshot endpoint, and multiple repositories per cluster can be created. The following repository types are supported.  Repository(...) Read More
  • Rest-high-level
    Rest-high-level in Elasticsearch Rest-high-level is built on top of low-level rest-client and is a method of communicating with Elasticsearch based on HTTP REST endpoints. This concept is majorly popular in the context of a Java-based Elasticsearch client. From day one, Elasticsearch supports transport clients for Java to communicate with Elasticsearch. In version 5.0, a low-level rest-client was released with lots of advantages over the existing transport client such as version(...) Read More
  • Restore
    Restore in Elasticsearch What it is In Elasticsearch, restore refers to a snapshot restore mechanism. A snapshot restore can be carried out once you have already set up the snapshot repository and have taken the snapshot. You can restore the entire cluster from the snapshot or restore an individual index or selected indices. Examples To restore the whole snapshot : POST /_snapshot/my_backup/snapshot-01-11-2019/_restore To restore an individual index : POST(...) Read More
  • Routing
    Routing in Elasticsearch What it is In Elasticsearch, routing refers to document routing. When you index a document, Elasticsearch will determine which shard will be used to index the document to.  The shard is selected based on the following formula: shard = hash(_routing) % number_of_primary_shards Where the default value of _routing is _id.It is important to know which shard the document is routed to, because Elasticsearch will need to determine where to find that(...) Read More
  • s
  • Scroll
    Scroll in Elasticsearch What it is In Elasticsearch, the concept of scroll comes into play when you have a large set of search results. Large search results are exhaustive for both the Elasticsearch cluster and the requesting client in terms of memory and processing. The scroll API enables you  to take a snapshot of a large number of results from a single search request. Examples To perform a scroll search, you need to add the scroll parameter to search query and specify how(...) Read More
  • Search
    Search in Elasticsearch What it is Search refers to the searching of documents in an index or multiple indices. The simple search is just a GET request to _search endpoint. The search query can either be provided in query string or through  a request body. Examples When looking for any documents in this index, if search parameters are not provided, every document is a hit and by default 10 hits will be returned. GET my_documents/_search A JSON object is returned in(...) Read More
  • Search is Slow in nodesNames
    What Does it Mean Slow search might become a bottleneck and may cause a waiting queue to build. There are a number of possible causes for slow search on particular nodes. Your application is not load balancing properly across all of the data nodes.Search and/or indexing operations are concentrated on specific nodes because of the way shards are allocated.The queries running on certain indices (concentrated on the nodes in question) are slow and need optimization.There are other(...) Read More
  • Search Latency In-Depth Guide
    Opster incorporates deep knowledge learned from some of the best Elasticsearch experts around the world. This troubleshooting guide is based on our very own Elasticsearch expert’s first-hand encounter with a burst of search traffic and focuses on how the correct configuration of primary shards and replicas can help ES  handle such cases (explained through a case study). For the basic internals and optimization of shards and replicas please visit our blog post: Elasticsearch Shards and(...) Read More
  • Settings
    Settings in Elasticsearch What it is  In ElasticSearch, you can configure cluster-level settings, node-level settings and index level settings. Here we discuss each of them. A. Cluster Wide Settings These settings can be either persistent, meaning they apply across restarts, or transient, meaning they won’t survive a full cluster restart. If a transient setting is reset, the first one of these values that is defined is applied: the persistent setting the setting in the(...) Read More
  • Shards
    Shards in Elasticsearch https://youtu.be/qj9lO0TdO3k What it is Data in an Elasticsearch index can grow to massive proportions. In order to keep it manageable, it is split into a number of shards. Each Elasticsearch shard is an Apache Lucene index, with each individual Lucene index containing a subset of the documents in the Elasticsearch index. Splitting indices in this way keeps resource usage under control. An Apache Lucene index has a limit of 2,147,483,519(...) Read More
  • Shards Too Large
    What Does it Mean? It is a best practice that Elasticsearch shard size should not go above 50GB for a single shard.   The limit for shard size is not directly enforced by Elasticsearch. However, if you go above this limit you can find that Elasticsearch is unable to relocate or recover index shards (with the consequence of possible loss of data) or you may reach the lucene hard limit of 2 ³¹ documents per index. How to Resolve it If your shards are too large, then you have(...) Read More
  • Shards Too Small (Oversharding)
    What Does it Mean? While there is no minimum limit for an Elastic Shard size, a large number of shards on an Elasticsearch cluster requires extra resources since the cluster needs to maintain metadata on the state of all the shards in the cluster state. While there is no absolute limit, as a guideline, the ideal shard size is between a few GB and a few tens of GB. You can learn more about scalability in this official guide. This issue should be considered in combination with(...) Read More
  • Slow Indexing in Nodes
    What does it mean? If the indexing queue is high or produces time outs, this indicates that one or more Elasticsearch nodes cannot keep up with the rate of indexing. Rejected indexing might occur as a result of slow indexing. Elasticsearch will reject indexing requests when the number of queued index requests exceeds the queue size. See the recommendations below to resolve this. Possible Causes Suboptimal Indexing Procedure Apply as many of the indexing tips as you can from(...) Read More
  • Slow Log Search Queries
    Overview Search Queries Slow Log can be very handy while troubleshooting Elasticsearch performance issues. There are two main operations in Elasticsearch (search and indexing) and both are logged separately.  This troubleshooting snippet targets the Search heavy systems where search TPS (Transaction per second) is much higher than the indexing TPS, such as with e-commerce sites or medium, Quora-like platforms. Slow queries are often caused by:  Poorly written or expensive search(...) Read More
  • Snapshot
    Snaphshots in Elasticsearch What it is An Elasticsearch snapshot is a backup of an index taken from a running cluster. Snapshots are taken incrementally, i.e. when it creates a snapshot of an index, Elasticsearch will not copy any data that is already stored in the elasticsearch repository as part of an earlier snapshot of the index (for the one that is already completed with no further writes ). Therefore you can take snapshots quite often and efficiently. You can restore snapshots(...) Read More
  • Source
    Source in Elasticsearch What it is When a document is sent to for indexing, Elasticsearch indexes all the fields in the format of inverted index but it also keeps the original json document in a special field called _source.  Examples Disabling source field in the index PUT /api-logs?pretty { "mappings": { "_source": { "enabled": false } } } Store only selected fields as a part of _source field PUT api-logs { "mappings": { "_source": { (...) Read More
  • Split Brain
    Overview Elasticsearch is a distributed system and may contain one more node in each cluster. For a cluster to become operational, Elasticsearch needs a quorum of a minimum number of master nodes. By default, every node in Elasticsearch is master eligible. These master nodes are responsible for all the cluster coordination tasks to manage the cluster state.  When you create a cluster, no matter how many nodes you are configuring, the quorum is by default set to one. That means if a(...) Read More
  • Status Red
    A red status indicates that one or more indices do not have allocated primary shards. The causes may be similar to those described in Status Yellow, but certainly indicate that  something is not right with the cluster. What does it mean? A red status indicates that not only has the primary shard been lost, but also that a replica has not been promoted to primary in its place.  However, just as with yellow status, you should not panic and start firing off commands without finding(...) Read More
  • Status Yellow
    There are several reasons why your Elasticsearch cluster could indicate a yellow status. What does it mean? Yellow status indicates that one or more of the replica shards on the Elasticsearch cluster are not allocated to a node. No need to  panic! There are several reasons why a yellow status can be perfectly normal, and in many cases Elasticsearch will recover to green by itself, so the worst thing you can do is start tweaking things without knowing exactly what the cause is.(...) Read More
  • t
  • Task
    Task in Elasticsearch What it is A task is equivalent to an Elasticsearch operation, which can be any request performed on an Elasticsearch cluster. For example, a delete by query request, a search request and so on. Elasticsearch provides a dedicated Task API for the task management which includes various actions, from retrieving the status of current running tasks to canceling any long running task. Examples Get all currently running tasks on all nodes of the(...) Read More
  • Template
    Template in Elasticsearch What it is A template in Elasticsearch falls into  one of the two following categories and is  indexed inside Elasticsearch using its dedicated endpoint:  Index Templates, which are a way to define a set of rules including index settings, mappings and an index pattern. The template is applied automatically whenever a new index is created with the matching pattern. Templates are also used to dynamically apply custom mapping for the fields which are not(...) Read More
  • The Bootstrap Memory Lock Setting is Set to False
    What Does it Mean Elasticsearch performance can be heavily penalised if the node is allowed to swap memory to disk. Elasticsearch can be configured to automatically prevent memory swapping on its host machine by adding the bootstrap memory_lock true setting to elasticsearch.yml. If bootstrap checks are enabled, Elasticsearch will not start if memory swapping is not disabled. You can learn more about bootstrap checks here: Bootstraps Check in Elasticsearch - A Detailed Guide With(...) Read More
  • Threadpool
    Elasticsearch Threadpool What it is Elasticsearch uses Threadpools to manage how requests are processed and to optimize the use of resources on each node in the cluster. What it used for The main threadpools are for search, get and write, but there are a number of others which you can see by running:  GET /_cat/thread_pool/?v&h=id,name,active,rejected,completed,size,type&pretty You can see by running the above command that each node has a number of different thread(...) Read More
  • Threshold
    Threshold in Elasticsearch What it is Elasticsearch uses several parameters to enable it to manage hard disk storage across the cluster.  What it’s used for Elasticsearch will actively try to relocate shards away from nodes which exceed the disk watermark high threshold.Elasticsearch will NOT locate new shards or relocate shards on to nodes which exceed the disk watermark low threshold.Elasticsearch will prevent all writes to an index which has any shard on a node that(...) Read More
  • u
  • Upgrade
    Upgrade in Elasticsearch What is it Upgrade refers to migrating your Elasticsearch version to a newer version. An upgrade of an existing cluster can be done in two ways: through a rolling upgrade and through a  full cluster restart. The benefit of a rolling upgrade is having zero downtime. Commos problems and important points The major problem with upgrades is version incompatibility between upgrades. Elasticsearch supports rolling upgrades only between minor versions. You(...) Read More
  • Use of Wildcards Can Accidentally Cause Index Deletion
    What Does it Mean? It is possible to reduce the risk of accidental deletion of indices by preventing the use of wildcard for destructive (delete) operations. How to Fix it To check whether this setting exists on the cluster, run: GET /_cluster/settings/action* Look for a setting called: action.destructive_requires_name To apply this setting use: PUT /_cluster/settings { "transient": { "action.destructive_requires_name":true } } To remove this setting(...) Read More
  • v
  • Version
    Version in Elasticsearch A version corresponds to the Elasticsearch built-in tracking system that tracks the changes in each document’s update. When a document is indexed for the first time, it is assigned a version 1 using _version key. When the same document gets a subsequent update, the _version is incremented by 1 with every index, update or delete API call. What it is used for A version is used to handle the concurrency issues in Elasticsearch which come into(...) Read More
  • z
  • ZEN_DISCOVERY_ SETTINGS_NOT_USED
    What Does it Mean? Zen discovery settings for cluster formation were deprecated in Elasticsearch version 7. If these settings are included in elasticsearch.yml files for version 7 and above, they should be removed to avoid confusion. Reason for the Changes Up until version 6 it was possible, using zen discovery mechanism, to inadvertently set unsafe settings which could result in a cluster becoming separated into two separate clusters (the split brain problem). The changes(...) Read More