Elasticsearch Glossary

AJAX progress indicator
Search: (clear)
  • a

  • Aggregation
    Aggregation in Elasticsearch Overview In Elasticsearch an aggregation is a collection or the gathering of related things together. The aggregation framework collects all the data based on the documents that match a search request which helps in building summaries of the data. Unlike Elasticsearch facets, aggregations can be nested. So aggregations can have sub-aggregations that operated on the documents which are generated by parent aggregation. Below are the different types of(...)

  • b

  • Bulk
    Bulk in Elasticsearch What it is In Elasticsearch, when using the Bulk API it is possible to perform many write operations in a single API call, which increases the indexing speed. Using the Bulk API is more efficient than sending multiple, separate requests. This can be done for the following four actions: IndexUpdateCreate Delete Examples The bellow bulk request will index a document, delete another document, and update an existing document. POST _bulk { "index" : {(...)

  • c

  • Cache
    What it is Elasticsearch uses three types of cache to improve the efficiency of operation.   Node request cacheShard data cacheField data cache How It Works Node request cache maintains the results of queries used in a filter context.  The results are evicted on a least recently used basis. Shard level cache maintains the results of frequently used queries where size=0, particularly the results of aggregations.  This cache is particularly relevant for logging use cases(...)

  • Circuit
    Circuit in Elasticsearch What it is Circuit breakers are used to prevent operations from causing an OutOfMemoryError in Elasticsearch. There are many settings related to circuit breakers, and each of those settings can be configured using the cluster update API. There are many types of circuit breakers, such as parent level circuit breakers, request circuit breakers, field data circuit breakers, script compilation circuit breakers and more. All of these put default limits on the(...)

  • Client
    Elasticsearch Client What it is Any application that interfaces with Elasticsearch to index, update or search data, or to monitor and maintain Elasticsearch using various APIs can be considered a client.It is very important to configure clients properly in order to ensure optimum use of Elasticsearch resources. Examples There are many open-source client applications for monitoring, alerting and visualization, such as ElasticHQ, Elastalerts, and Grafana to name a few. On top(...)

  • Cluster
    Cluster in Elasticsearch What is it In Elasticsearch a cluster is a collection of one or more nodes (servers / VMs). A cluster can consist of an unlimited number of nodes. The cluster provides interface for indexing and storing data and search capability across all of the data which is stored in the data nodes Each cluster has a single master node that is elected by the master eligible nodes. In cases where the master is not available the other connected master eligible nodes(...)

  • d

  • DELETE
    DELETE Elasticsearch API What is it DELETE  is an Elasticsearch API which removes a document from a specific index. This API requires an index name and _id document to delete the document.  Delete a document DELETE /my_index/_doc/1 Notes A delete request throws 404 error code if the document does not already exist in the index.If you want to delete a set of documents that  matches a query, you need to use delete by query API.

  • Document
    Document in Elasticsearch Overview A document is simply a json document that is stored in Elasticsearch index. It consists of one or more fields; where each field has its own data type. This field type defines the type of data that can be stored in the field such as integer, string, object. Document is schema-free, which means we do not require to specify schema before indexing document, when a field is indexed for the first time, its type is decided and(...)

  • f

  • Fielddata
    Fielddata in Elasticsearch What it is  In Elasticsearch the term Fielddata is relevant when doing Sorting and Aggregations ( similar to SQL GROUP BY COUNT and AVERAGE functions ) on text fields.   For performance reasons, there are some rules as to which kinds of fields you can aggregate. You can group by any numeric field but for text fields, which have to be of keyword type or have fielddata=true since they dont support doc_values ( Doc values are the on-disk inverted index(...)

  • i

  • Index
    Index in Elasticsearch What it is In Elasticsearch, an index (indices in plural) can be thought of as a table inside a database that has a schema and can have one or more shards and replicas. An Elasticsearch index is divided into shards and each shard is an instance of a Lucene index. Indices are used to store the documents in dedicated data structures corresponding to the data type of fields. For example, text fields are stored inside an inverted index whereas numeric(...)

  • Indexing
    Elasticsearch Indexing What it is Indexing is the process of adding or updating new documents to an Elasticsearch index. Examples In its simplest form, you can index a document like this: POST /test/_doc { "message": "Opster Rocks Elasticsearch Management" } This will create the index “test” (if it doesn’t already exist) and add a document with the source equal to the body of the POST call.  In this case, the ID will be created automatically. If you repeat this(...)

  • l

  • License
    license in Elasticsearch What is it Elasticsearch offers various licenses with different sets of features, ranging from Open Source Basic, Gold, Platinum and Enterprise. The default is set to basic. The basic license is a forever free plan but lacks many advanced x-pack features, such as alerts and advanced security. The following parameter is used inside elaticsearch.yml file to set a license type: xpack.license.self_generated.type: basic

  • Lucene
    Lucene and Elasticsearch What it is  Lucene or Apache Lucene is an open-source Java library used as a search engine. Elasticsearch is built on top of Lucene.  Elasticsearch converts Lucene into a distributed system/search engine for scaling horizontally. Elasticsearch also provides other features like thread-pool, queues, node/cluster monitoring API, data monitoring API, Cluster management, etc. In short , Elasticsearch extends Lucene and provides additional features in addition(...)

  • m

  • Memory
    Memory in Elasticsearch What is it Memory is one of the most critical resources to monitor in Elasticsearch. Elasticsearch runs on JVM and uses heap memory areas for query cache, request cache, accessing lucene segments and storing fielddata for aggregations and sorting. Commos problems and important points The most common error that arises in Elasticsearch is OutOfMemory error. This error comes when the node is not able to cope up with the required heap size space. To(...)

  • Metadata
    Metadata in Elasticsearch What it is Metadata is information about the data. In Elasticsearch, each document has associated metadata such as _id and _index meta fields. Examples Routing meta field:_routing, a routing value that places a document in a particular shard.Other meta field_meta, not used by Elasticsearch but can be used to store application-specific metadata. PUT index_01 { "mappings": { "_meta": { "class": "App01::User01", "version": "01" (...)

  • n

  • Node
    Nodes in Elasticsearch What it is Simply explained a node is a single server that is part of a cluster. Each node is assigned with one or more roles, which describes the node responsibility and operations - Data nodes stores the data, and participates in the cluster’s indexing and search capabilities, while master nodes are responsible for managing the cluster activities and storing the cluster state, including the metadata. While it’s possible to run several Node instances of(...)

  • p

  • Persistent
    Persistent in Elasticsearch What it is In Elasticsearch, Persistent refers to cluster settings that persist across cluster restarts. This setting is used in Cluster Update API calls. Persistent settings can also be configured in elasticsearch.yml file. Examples ## enable shard routing PUT /_cluster/settings { "persistent" : { "cluster.routing.allocation.enable" : "all" } } ## enable rebalancing of shards PUT /_cluster/settings { "persistent" : { (...)

  • Plugin
    Plugin in Elasticsearch What it is Plugins are used to extend the functionality of Elasticsearch. In addition to the core plugins available to you, it is possible to write custom plugins as well. Plugins are generated in a zip format with the mandatory file structure. Examples: Core Plugins: Xpack for Security and monitoring, Discovery plugins for EC2Adding S3 plugin for storing snapshots on S3 sudo bin/elasticsearch-plugin install repository-s3 Adding HDFS plugin for(...)

  • r

  • Rebalance
    Elasticsearch Rebalance What it is Cluster rebalancing is the process by which an Elasticsearch cluster distributes data across the nodes.  Specifically, it refers to the movement of existing data shards to another node to improve the balance across the nodes (as opposed to the allocation of new shards to nodes).  Usually, it is a completely automatic process that requires no outside intervention. However, there are a number of parameters Elasticsearch uses to regulate this(...)

  • Recovery
    Recovery in Elasticsearch What it is In Elasticsearch, recovery refers to the process of recovering an index/shard when something goes wrong. You can recover an index/shards in many ways such as by re-indexing the data from a  backup/failover cluster to the current one or by restoring from an Elasticsearch snapshot. Alternatively, Elasticsearch may be performing recoveries automatically in some cases, such as when a node restarts or when a node disconnects and connects again. There(...)

  • Reindex
    Elasticsearch Reindex What it is Reindex is the concept of copying existing data from a source index to a destination index which can be inside the same or a different cluster. Elasticsearch has a dedicated endpoint _reindex for this purpose. A reindexing is mostly required for updating mapping or settings. Examples Reindex data from a source index to destination index in the same cluster POST /_reindex?pretty { "source": { "index": "news" }, "dest": { (...)

  • Replica
    Replicas in Elasticsearch What it is In order to understand Replicas in Elasticsearch, you need to have a thorough understanding of - Shards and their use in Elasticsearch. While each shard contains a single copy of the data, an index can contain multiple copies of the shard. There are thus two types of shards, the primary shard and a copy, or replica. Each replica of the shard is always located on a different node, which ensures access to your data in the event of a node failure.(...)

  • Replication
    Replication in Elasticsearch What it is Replication refers to storing the redundant copy of the data. Starting from version 7.x, Elasticsearch creates one primary shard with a replication factor set to 1.  Replicas never get assigned on the same node on which primary shards are assigned, which means you should have at least two nodes in the cluster to assign the replicas. If a primary shard goes down, the replica automatically acts as a primary shard. What it is used(...)

  • Repositories
    Repositories/Repository in Elasticsearch What it is An Elasticsearch snapshot provides a backup mechanism that takes the current state and data in the cluster and saves it to a repository (read the Glossary term Snapshot for more information). The backup process requires a repository to be created first. The repository needs to be registered using the _snapshot endpoint, and multiple repositories per cluster can be created. The following repository types are(...)

  • Restore
    Restore in Elasticsearch What it is In Elasticsearch, restore refers to a snapshot restore mechanism. A snapshot restore can be carried out once you have already set up the snapshot repository and have taken the snapshot. You can restore the entire cluster from the snapshot or restore an individual index or selected indices. Examples To restore the whole snapshot : POST /_snapshot/my_backup/snapshot-01-11-2019/_restore To restore an individual index : POST(...)

  • Routing
    Routing in Elasticsearch What it is In Elasticsearch, routing refers to document routing. When you index a document, Elasticsearch will determine which shard will be used to index the document to.  The shard is selected based on the following formula: shard = hash(_routing) % number_of_primary_shards Where the default value of _routing is _id.It is important to know which shard the document is routed to, because Elasticsearch will need to determine where to find that(...)

  • s

  • Scroll
    Scroll in Elasticsearch What it is In Elasticsearch, the concept of scroll comes into play when you have a large set of search results. Large search results are exhaustive for both the Elasticsearch cluster and the requesting client in terms of memory and processing. The scroll API enables you  to take a snapshot of a large number of results from a single search request. Examples To perform a scroll search, you need to add the scroll parameter to search query and specify how(...)

  • Search
    Search in Elasticsearch What it is Search refers to the searching of documents in an index or multiple indices. The simple search is just a GET request to _search endpoint. The search query can either be provided in query string or through  a request body. Examples When looking for any documents in this index, if search parameters are not provided, every document is a hit and by default 10 hits will be returned. GET my_documents/_search A JSON object is returned in(...)

  • Settings
    Settings in Elasticsearch What it is  In ElasticSearch, you can configure cluster-level settings, node-level settings and index level settings. Here we discuss each of them. A. Cluster Wide Settings These settings can be either persistent, meaning they apply across restarts, or transient, meaning they won’t survive a full cluster restart. If a transient setting is reset, the first one of these values that is defined is applied: the persistent setting the setting in the(...)

  • Shards
    Shards in Elasticsearch What it is Data in an Elasticsearch index can grow to massive proportions. In order to keep it manageable, it is split into a number of shards. Each Elasticsearch shard is an Apache Lucene index, with each individual Lucene index containing a subset of the documents in the Elasticsearch index. Splitting indices in this way keeps resource usage under control. An Apache Lucene index has a limit of 2,147,483,519 documents. Examples It is when an index(...)

  • Snapshot
    Snaphshots in Elasticsearch What it is An Elasticsearch snapshot is a backup of an index taken from a running cluster. Snapshots are taken incrementally, i.e. when it creates a snapshot of an index, Elasticsearch will not copy any data that is already stored in the elasticsearch repository as part of an earlier snapshot of the index (for the one that is already completed with no further writes ). Therefore you can take snapshots quite often and efficiently. You can restore snapshots(...)

  • Source
    Source in Elasticsearch What it is When a document is sent to for indexing, Elasticsearch indexes all the fields in the format of inverted index but it also keeps the original json document in a special field called _source.  Examples Disabling source field in the index PUT /api-logs?pretty { "mappings": { "_source": { "enabled": false } } } Store only selected fields as a part of _source field PUT api-logs { "mappings": { "_source": { (...)

  • t

  • Task
    Task in Elasticsearch What it is A task is equivalent to an Elasticsearch operation, which can be any request performed on an Elasticsearch cluster. For example, a delete by query request, a search request and so on. Elasticsearch provides a dedicated Task API for the task management which includes various actions, from retrieving the status of current running tasks to canceling any long running task. Examples Get all currently running tasks on all nodes of the(...)

  • Template
    Template in Elasticsearch What it is A template in Elasticsearch falls into  one of the two following categories and is  indexed inside Elasticsearch using its dedicated endpoint:  Index Templates, which are a way to define a set of rules including index settings, mappings and an index pattern. The template is applied automatically whenever a new index is created with the matching pattern. Templates are also used to dynamically apply custom mapping for the fields which are not(...)

  • Threadpool
    Elasticsearch Threadpool What it is Elasticsearch uses Threadpools to manage how requests are processed and to optimize the use of resources on each node in the cluster. What it used for The main threadpools are for search, get and write, but there are a number of others which you can see by running:  GET /_cat/thread_pool/?v&h=id,name,active,rejected,completed,size,type&pretty You can see by running the above command that each node has a number of different thread(...)

  • Threshold
    Threshold in Elasticsearch What it is Elasticsearch uses several parameters to enable it to manage hard disk storage across the cluster.  What it’s used for Elasticsearch will actively try to relocate shards away from nodes which exceed the disk watermark high threshold.Elasticsearch will NOT locate new shards or relocate shards on to nodes which exceed the disk watermark low threshold.Elasticsearch will prevent all writes to an index which has any shard on a node that(...)

  • u

  • Upgrade
    Upgrade in Elasticsearch What is it Upgrade refers to migrating your Elasticsearch version to a newer version. An upgrade of an existing cluster can be done in two ways: through a rolling upgrade and through a  full cluster restart. The benefit of a rolling upgrade is having zero downtime. Commos problems and important points The major problem with upgrades is version incompatibility between upgrades. Elasticsearch supports rolling upgrades only between minor versions. You(...)