Last Update: March 2020
Before you start reading this page, try the new Elasticsearch Check-Up - Get actionable recommendations that can improve your cluster search and indexing speed (no installation required).
Lucene and Elasticsearch
What it is
Elasticsearch converts Lucene into a distributed system/search engine for scaling horizontally. Elasticsearch also provides other features like thread-pool, queues, node/cluster monitoring API, data monitoring API, Cluster management, etc. In short , Elasticsearch extends Lucene and provides additional features in addition to it.
Elasticsearch hosts data on data nodes. Each data node hosts one or more indices , and each index is divided into shards with each shard holding part of the index data. From technical and operative perspectives, each shard created in Elasticsearch is a separate Lucene instance or process. Therefore, this concept is of significant importance when attempting to understand elasticsearch internals , indices and shards.
Notes and Good Things to Know:
When an index is created in ElasticSearch it is divided into one or more primary shards for scaling the data and splitting it into multiple nodes/instances. The concept of Lucene is relevant here when you are going to decide the number of shards for your index. Too many shards will result in too many Lucene instances, which will consume resources and damage performance.
It takes proper planning to decide the number of primary shards for your index , taking into account the index size , max growth , and the number of data nodes.
Previous versions of Elasticsearch defaulted to creating five shards per index. Starting with 7.0.0, the default is now one shard per index.
Opster is redefining Elasticsearch management - pro-actively troubleshooting, optimizing performance, operating on clusters and assisting with all things needed to successfully run ES in production