Elasticsearch Glossary

AJAX progress indicator
  • a
  • Aggregation
    Aggregations in Elasticsearch What is an Elasticsearch aggregation? The aggregations framework is a powerful tool built in every Elasticsearch deployment. In Elasticsearch, an aggregation is a collection or the gathering of related things together. The aggregation framework collects data based on the documents that match a search request which helps in building summaries of the data. With aggregations you can not only search your data, but also take it a step further and extract(...) Read More
  • Alias
    Overview In Elasticsearch, an alias is a secondary name to refer to one or more indices. Aliases can be created and removed dynamically using _aliases REST endpoint. What it is used for Aliases are used for multiple purposes such as to search across more than one index with a single name, perform the reindexing process with zero downtime and query data based on predefined filters. Examples Creating an alias on a single index: POST /_aliases?pretty { "actions": [ { (...) Read More
  • b
  • Bulk
    Overview In Elasticsearch, when using the Bulk API it is possible to perform many write operations in a single API call, which increases the indexing speed. Using the Bulk API is more efficient than sending multiple separate requests. This can be done for the following four actions: IndexUpdateCreate Delete Examples The bulk request below will index a document, delete another document, and update an existing document. POST _bulk { "index" : { "_index" : "myindex", "_id" :(...) Read More
  • c
  • Cache
    Overview Elasticsearch uses three types of caches to improve the efficiency of operation.   Node request cacheShard data cacheField data cache How they work Node request cache maintains the results of queries used in a filter context. The results are evicted on a least recently used basis. Shard data cache maintains the results of frequently used queries where size=0, particularly the results of aggregations.  This cache is particularly relevant for logging use cases where(...) Read More
  • Circuit Breakers
    Overview Elasticsearch has the concept of circuit breakers to deal with OutOfMemory errors that cause nodes to crash. When a request reaches Elasticsearch nodes, the circuit breakers first estimate the amount of memory needed to load the required data. They then compare the estimated size with the configured heap size limit. If the estimated size is greater than the heap size, the query is terminated and an exception is thrown to avoid the node loading more than the available heap(...) Read More
  • Client
    Overview Any application that interfaces with Elasticsearch to index, update or search data, or to monitor and maintain Elasticsearch using various APIs can be considered a client It is very important to configure clients properly in order to ensure optimum use of Elasticsearch resources. Examples There are many open-source client applications for monitoring, alerting and visualization, such as ElasticHQ, Elastalerts, and Grafana to name a few. On top of Elastic client(...) Read More
  • Cluster
    Overview An Elasticsearch cluster consists of a number of servers (nodes) working together as one. Clustering is a technology which enables Elasticsearch to scale up to hundreds of nodes that together are able to store many terabytes of data and respond coherently to large numbers of requests at the same time. Search or indexing requests will usually be load-balanced across the Elasticsearch data nodes, and the node that receives the request will relay requests to other nodes as(...) Read More
  • d
  • DELETE
    Overview DELETE is an Elasticsearch API which removes a document from a specific index. This API requires an index name and _id document to delete the document.  Delete a document DELETE /my_index/_doc/1 Notes A delete request throws 404 error code if the document does not already exist in the index.If you want to delete a set of documents that  matches a query, you need to use delete by query API. Read More
  • Delete-By-Query
    Overview Delete-by-query is an Elasticsearch API, which was introduced in version 5.0 and provides functionality to delete all documents that match the provided query. In lower versions, users had to install the Delete-By-Query plugin and use the DELETE /_query endpoint for this same use case. What it is used for This API is used for deleting all the documents from indices based on a query. Once the query is executed, Elasticsearch runs the process in the background to delete all(...) Read More
  • Deprecation
    Overview Deprecation refers to processes and functions that are in the process of being eliminated and (possibly) replaced by newer ones. Typically, a function will not disappear from one version to the next without warning. Normally this will happen across a number of versions. When you use a deprecated function in intermediate versions, it will continue to work as before, but you will receive warnings that the function in question is intended to disappear in the future. How it(...) Read More
  • Discovery
    Overview The process known as discovery occurs when an Elasticsearch node starts, restarts or loses contact with the master node for any reason. In those cases, the node needs to contact other nodes in the cluster to find any existing master node or initiate the election of a new master node.  How it works Upon startup, each node looks for other nodes, firstly by contacting the IP addresses of eligible master nodes held in the previous cluster state.  If they are not available,(...) Read More
  • Disk Watermark
    Overview There are various “watermark” thresholds on your Elasticsearch cluster. As the disk fills up on a node, the first threshold to be crossed will be the “low disk watermark”. The second threshold will then be the “high disk watermark threshold”. Finally, the “disk flood stage” will be reached. Once this threshold is passed, the cluster will then block writing to ALL indices that have one shard (primary or replica) on the node which has passed the watermark. Reads (searches) will(...) Read More
  • Document
    Document in Elasticsearch What is an Elasticsearch document? While an SQL database has rows of data stored in tables, Elasticsearch stores data as multiple documents inside an index. This is where the analogy must end however, since the way that Elasticsearch treats documents and indices differs significantly from a relational database. For example, documents could be: Products in an e-commerce indexLog lines in a data logging applicationInvoice lines in an invoicing(...) Read More
  • f
  • Fielddata
    Overview In Elasticsearch the term fielddata is relevant when sorting and doing aggregations (similar to SQL GROUP BY COUNT and AVERAGE functions) on text fields.   For performance reasons, there are some rules as to the kinds of fields that can be aggregated. You can group by any numeric field but for text fields, which have to be of keyword type or have fielddata=true since they don't support doc_values (Doc values are the on-disk inverted index data structure, built at document(...) Read More
  • File Descriptors
    What it means File descriptors are required so that the Elasticsearch process can keep track of all the files it has open at any given time as well as all network connections to other nodes. Running out of file descriptors would result in the Elasticsearch process not being able to keep track of the files it has open or not being able to open new files or socket connections when it needs to, and will most probably lead to data loss. The Elasticsearch process should be permitted(...) Read More
  • Filter
    Overview A filter in Elasticsearch is all about applying some conditions inside the query that are used to narrow down the matching result set. What it is used for When a query is executed, Elasticsearch by default calculates the relevance score of the matching documents. But in some conditions it does not require scores to be calculated, for instance if a document falls in the range of two given timestamps. For all these Yes/No criteria, a filter clause is used.(...) Read More
  • i
  • Index
    Overview In Elasticsearch, an index (plural: indices) contains a schema and can have one or more shards and replicas. An Elasticsearch index is divided into shards and each shard is an instance of a Lucene index. Indices are used to store the documents in dedicated data structures corresponding to the data type of fields. For example, text fields are stored inside an inverted index whereas numeric and geo fields are stored inside BKD trees. Examples Create index The(...) Read More
  • Indexing
    Overview Indexing is the process of adding documents to and updating documents on an Elasticsearch index. Examples In its simplest form, you can index a document like this: POST /test/_doc { "message": "Opster Rocks Elasticsearch Management" } This will create the index “test” (if it doesn’t already exist) and add a document with the source equal to the body of the POST call.  In this case, the ID will be created automatically. If you repeat this command, a second(...) Read More
  • l
  • Lucene
    Overview Lucene or Apache Lucene is an open-source Java library used as a search engine. Elasticsearch is built on top of Lucene.  Elasticsearch converts Lucene into a distributed system/search engine for scaling horizontally. Elasticsearch also provides other features like thread-pool, queues, node/cluster monitoring API, data monitoring API, Cluster management, etc. In short, Elasticsearch extends Lucene and provides additional features beyond it. Elasticsearch hosts data on(...) Read More
  • m
  • Mapping
    Overview Mapping is similar to database schemas that define the properties of each field in the index. These properties may contain the data type of each field and how fields are going to be tokenized and indexed. In addition, the mapping may also contain various advanced level properties for each field to define the options exposed by Lucene and Elasticsearch. You can create a mapping of an index using the _mappings REST endpoint. The very first time Elasticsearch finds a new field(...) Read More
  • Metadata
    Overview Metadata in Elasticsearch refers to additional information stored for each document. This is achieved using the specific metadata fields available in Elasticsearch. The default behavior of some of these metadata fields can be customized during mapping creation. Examples Using _meta meta-field for storing application-specific information with the mapping: PUT /my_index?pretty { "mappings": { "_meta": { "domain": "security", "release_information": { (...) Read More
  • n
  • Nodes
    Overview To put it simply, a node is a single server that is part of a cluster. Each node is assigned one or more roles, which describe the node's responsibility and operations. Data nodes store the data, and participate in the cluster’s indexing and search capabilities, while master nodes are responsible for managing the cluster's activities and storing the cluster state, including the metadata. While it is possible to run several node instances of Elasticsearch on the same(...) Read More
  • p
  • Persistent
    Overview In Elasticsearch, persistent refers to cluster settings that persist across cluster restarts. This setting is used in Cluster Update API calls. Persistent settings can also be configured in the elasticsearch.yml file. Examples ## enable shard routing PUT /_cluster/settings { "persistent" : { "cluster.routing.allocation.enable" : "all" } } ## enable rebalancing of shards PUT /_cluster/settings { "persistent" : { (...) Read More
  • Plugins
    Overview A plugin is used to enhance the core functionalities of Elasticsearch. Elasticsearch provides some core plugins as a part of their release installation. In addition to those core plugins, it is possible to write your own custom plugins as well. There are several community plugins available on GitHub for various use cases. Examples Get all of the instructions for the plugin: sudo bin/elasticsearch-plugin -h Installing the S3 plugin for storing Elasticsearch(...) Read More
  • q
  • Queue
    Overview The queue term in Elasticsearch is used in the context of thread pools. Each node of the Elasticsearch cluster holds various thread pools to manage the memory consumption on that node for different types of requests. The queues come up with initial default limits as per node size but can be modified dynamically using _settings REST endpoint. What it is used for Queues are used to hold the pending requests for the corresponding thread pool instead of requests being(...) Read More
  • r
  • Rebalance
    Overview Cluster rebalancing is the process by which an Elasticsearch cluster distributes data across the nodes. Specifically, it refers to the movement of existing data shards to another node to improve the balance across the nodes (as opposed to the allocation of new shards to nodes). Usually, it is a completely automatic process that requires no outside intervention. However, there are a number of parameters Elasticsearch uses to regulate this process. Examples The command(...) Read More
  • Recovery
    Overview In Elasticsearch, recovery refers to the process of recovering an index or shard when something goes wrong. There are many ways to recover an index or shard, such as by re-indexing the data from a backup / failover cluster to the current one, or by restoring from an Elasticsearch snapshot. Alternatively, Elasticsearch performs recoveries automatically, such as when a node restarts or disconnects and connects again. There is an API to check the updated status of index / shard(...) Read More
  • Refresh
    Overview When indexing data, Elasticsearch requires a “refresh” operation to make indexed information available for search. This means that there is a time delay between indexing and the updated information actually becoming available for the client applications. How it works Index operations occur in memory. The operations are accumulated in a buffer until refreshed, which requires that the buffer itself be transferred to a newly created lucene segment. Refresh happens by(...) Read More
  • Reindex
    Overview Reindex is the concept of copying existing data from a source index to a destination index which can be inside the same or a different cluster. Elasticsearch has a dedicated endpoint _reindex for this purpose. A reindexing is mostly required for updating mapping or settings. Examples Reindex data from a source index to destination index in the same cluster POST /_reindex?pretty { "source": { "index": "news" }, "dest": { "index": "news_v2" (...) Read More
  • Replica
    Overview In order to understand replicas in Elasticsearch, you need to have a thorough understanding of shards and their use in Elasticsearch. While each shard contains a single copy of the data, an index can contain multiple copies of the shard. There are thus two types of shards, the primary shard and a replica, or copy. Each replica is located on a different node, which ensures access to your data in the event of a node failure. In addition to providing redundancy and their role in(...) Read More
  • Replication
    Overview Replication refers to storing a redundant copy of the data. Starting from version 7.x, Elasticsearch creates one primary shard with a replication factor set to 1. Replicas never get assigned to the same node on which primary shards are assigned, which means you should have at least two nodes in the cluster to assign the replicas. If a primary shard goes down, the replica automatically acts as a primary shard. What it is used for Replicas are used to provide high(...) Read More
  • Repository
    Overview An Elasticsearch snapshot provides a backup mechanism that takes the current state and data in the cluster and saves it to a repository (read snapshot for more information). The backup process requires a repository to be created first. The repository needs to be registered using the _snapshot endpoint, and multiple repositories can be created per cluster. The following repository types are supported:  Repository types Repository typeConfiguration typeShared file(...) Read More
  • Rest-high-level
    Overview Rest-high-level is built on top of low-level rest-client and is a method of communicating with Elasticsearch based on HTTP REST endpoints. This concept is majorly popular in the context of a Java-based Elasticsearch client. From day one, Elasticsearch supports transport clients for Java to communicate with Elasticsearch. In version 5.0, a low-level rest-client was released with lots of advantages over the existing transport client such as version independencies, increased(...) Read More
  • Restore
    Overview In Elasticsearch, restore refers to the snapshot restore mechanism, which returns indices or clusters to a previous, saved state. You can restore the entire cluster from the snapshot or restore an individual index or selected indices. Examples To restore the whole snapshot: POST /_snapshot/my_backup/snapshot-01-11-2019/_restore To restore an individual index: POST /_snapshot/my_backup/snapshot-01-11-2019/_restore { "indices": "my_index" } Notes If you(...) Read More
  • Routing
    Overview In Elasticsearch, routing refers to document routing. When you index a document, Elasticsearch will determine which shard the document should be routed to for indexing.  The shard is selected based on the following formula: shard = hash(_routing) % number_of_primary_shards Where the default value of _routing is _id.It is important to know which shard the document is routed to, because Elasticsearch will need to determine where to find that document later on for(...) Read More
  • s
  • Scroll
    Overview In Elasticsearch, the concept of scroll comes into play when you have a large set of search results. Large search results are exhaustive for both the Elasticsearch cluster and the requesting client in terms of memory and processing. The scroll API enables you to take a snapshot of a large number of results from a single search request. Examples To perform a scroll search, you need to add the scroll parameter to a search query and specify how long Elasticsearch should(...) Read More
  • Search
    Overview Search refers to the searching of documents in an index or multiple indices. The simple search is just a GET API request to the _search endpoint. The search query can either be provided in query string or through a request body. Examples When looking for any documents in this index, if search parameters are not provided, every document is a hit and by default 10 hits will be returned. GET my_documents/_search A JSON object is returned in response to a search query.(...) Read More
  • Settings
    Settings in Elasticsearch In Elasticsearch, you can configure cluster-level settings, node-level settings and index level settings. Here is a quick rundown of each level. A. Cluster settings These settings can either be: Persistent, meaning they apply across restarts, orTransient, meaning they won’t survive a full cluster restart. If a transient setting is reset, the first one of these values that is defined is applied: The persistent setting The setting in the(...) Read More
  • Shards
    Overview Data in an Elasticsearch index can grow to massive proportions. In order to keep it manageable, it is split into a number of shards. Each Elasticsearch shard is an Apache Lucene index, with each individual Lucene index containing a subset of the documents in the Elasticsearch index. Splitting indices in this way keeps resource usage under control. An Apache Lucene index has a limit of 2,147,483,519 documents. Examples The number of shards is set when an index is(...) Read More
  • Snapshot
    Overview An Elasticsearch snapshot is a backup of an index taken from a running cluster. Snapshots are taken incrementally. This means that when Elasticsearch creates a snapshot of an index, it will not copy any data that was already backed up in an earlier snapshot of the index (unless it was changed). Therefore, it is recommended to take snapshots often. You can restore snapshots into a running cluster via the restore API. Snapshots can only be restored to versions of(...) Read More
  • Source
    Overview When a document is sent for indexing, Elasticsearch indexes all the fields in the format of an inverted index, but it also keeps the original JSON document in a special field called _source.  Examples Disabling source field in the index: PUT /api-logs?pretty { "mappings": { "_source": { "enabled": false } } } Store only selected fields as a part of _source field: PUT api-logs { "mappings": { "_source": { "includes": [ (...) Read More
  • t
  • Task
    Overview A task is an Elasticsearch operation, which can be any request performed on an Elasticsearch cluster, such as a delete by query request, a search request and so on. Elasticsearch provides a dedicated Task API for the task management which includes various actions, from retrieving the status of current running tasks to canceling any long running task. Examples Get all currently running tasks on all nodes of the cluster Apart from other information, the response of the(...) Read More
  • Template
    Overview A template in Elasticsearch falls into one of the two following categories and is indexed inside Elasticsearch using its dedicated endpoint:  Index templates, which are a way to define a set of rules including index settings, mappings and an index pattern. The template is applied automatically whenever a new index is created with the matching pattern. Templates are also used to dynamically apply custom mapping for the fields which are not predefined inside existing(...) Read More
  • Threadpool
    Overview Elasticsearch uses threadpools to manage how requests are processed and to optimize the use of resources on each node in the cluster. What it's used for The main threadpools are for search, get and write, but there are a number of others which you can see by running:  GET /_cat/thread_pool/?v&h=id,name,active,rejected,completed,size,type&pretty You can see by running the above command that each node has a number of different thread pools, what the size and type of(...) Read More
  • Threshold
    Overview Elasticsearch uses several parameters to enable it to manage hard disk storage across the cluster.  What it’s used for Elasticsearch will actively try to relocate shards away from nodes which exceed the disk watermark high threshold.Elasticsearch will NOT locate new shards or relocate shards on to nodes which exceed the disk watermark low threshold.Elasticsearch will prevent all writes to an index which has any shard on a node that exceeds the disk.watermark.flood_stage(...) Read More
  • u
  • Upgrade
    Overview Upgrade refers to migrating your Elasticsearch version to a newer version. An upgrade of an existing cluster can be done in two ways: through a rolling upgrade and through a full cluster restart. The benefit of a rolling upgrade is having zero downtime. Common problems and important points The major problem with upgrades is version incompatibility. Elasticsearch supports rolling upgrades only between minor versions. You need to make sure to go through the official(...) Read More
  • v
  • Version
    Overview A version corresponds to the Elasticsearch built-in tracking system that tracks the changes in each document’s update. When a document is indexed for the first time, it is assigned a version 1 using _version key. When the same document gets a subsequent update, the _version is incremented by 1 with every index, update or delete API call. What it is used for A version is used to handle the concurrency issues in Elasticsearch which come into play during simultaneous(...) Read More