Elasticsearch Oversharding

Average Read Time

3 Mins

Elasticsearch Oversharding

Opster Team

October 2021

Average Read Time

3 Mins


In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.

The Elasticsearch Check-Up is free and requires no installation.

Run the Elasticsearch check-up to receive recommendations like this:

checklist Run Check-Up
error

There are more shards per node than is recommended on the following nodes

error-img

Description

There are too many shards per node. Oversharding can cause a cluster to become unstable because the master nodes become unable to keep track of a large number of shards across the nodes. Quick solutions may be....

error-img

Recommendation

Based on your specific ES deployment you should run the following command in order to...

1

X-PUT curl -H "Content-Type: application/json" [customized recommendation]

In addition to reading this guide, run the free Elasticsearch Health Check-Up. Get actionable recommendations that can improve performance and prevent incidents (does not require any installation).

The check-up includes a specific check on shard sizes and can provide an actionable recommendation specific to your ES deployment.

Overview

Oversharding is a status that indicates that you have too many shards, and thus they are too small. While there is no minimum limit for an Elastic shard size, having a larger number of shards on an Elasticsearch cluster requires extra resources since the cluster needs to maintain metadata on the state of all the shards in the cluster.

While there is no absolute limit, as a guideline, the ideal shard size is between a few GB and a few tens of GB. You can learn more about scalability in this official guide.

This issue should be considered in combination with the opposite problem, “Shards Too Big”.

To learn about search latency problems that relates to shards that are too small go to Opster’s search latency guide.

How to resolve the issue

If your shards are too small, then you have 2 options:

Delete or close indices

DELETE myindex-2014*

POST myindex-2014*/_close

Closing certain indices is often a good quick option to get your cluster running quickly again while you decide on the best long term solution.  Closing an index will release the memory resources being used by the indices while keeping the index on disk, and is easily and quickly reversed (POST myindex/_open).

Re-index into bigger indices

The following would reindex multiple daily or monthly indices into one single index for the year (Make sure you don’t cause the opposite problem of shards too large by adjusting the number of shards on the new index according to the total data volume. Make sure that the shard size on the new index does not exceed 50GB/shard).

POST _reindex
{

  "source": {
	"index": "logs-2017*"
  },
  "dest": {
	"index": "logs-2017"
  }
 
}

Notes to keep in mind

Watch out for field explosions and mappings

When combining smaller indices into a single larger index, remember that the mapping or the new index will contain all of the fields from all of the constituent indices.  For this reason, make sure that the indices you are combining have largely similar mappings.  

Watch out for duplicate data during the reindex process

During the reindex process (please refer to our reindexing tips), if your application uses wildcards in the index name (eg. logs-*) then results may be duplicated during the reindex process since you will have two copies of the data until the process completes. This can be avoided through the use of aliases.

Watch out for modifications and updates during the reindex process

If data is susceptible to updates during the reindex process, then a procedure needs to be identified to capture them and ensure that the new indices reflect all modifications that are made during the transition reindexing period.

PUT _cluster/settings
{
  "transient": {
   
    "cluster.routing.allocation.disk.watermark.low": "85%",
    "cluster.routing.allocation.disk.watermark.high": "90%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "95%",
    "cluster.info.update.interval": "1m"
  }
}

How to avoid the issue

There are various mechanisms to optimize shard size, such as:

Apply ILM (index lifecycle management)

Using ILM you can get Elasticsearch to automatically create a new index when your current index size reaches a given maximum size or age. ILM is suitable for documents which require no updates, but is not a viable option when documents require frequent updating, since to do so you need to know which of the previous index volumes contains the document you want to update.

Change application strategy

You can design your application to ensure that shards are created of a sensible size:

Date based approach

myindex-yyyy.MM.dd

or myindex-yyyy.MM

If you are suffering from oversharding, then you will most likely need to move from daily indices to monthly or yearly indices.

Attribute based approach

myindex-

This has the advantage of allowing you to know exactly which index to search/update if you know the client ID. 

The disadvantage is that different clients may produce very different volumes of documents (and hence different shard sizes). This can be mitigated to some extent by adjusting the number of shards per client, but you may create the opposite problem (oversharding) if you have a large number of clients. 

ID range based approach

This may be possible where documents carry an ID coming from another system or application (such as a sql ID, or client ID) which enables us to distribute documents evenly.

eg. myindex-

The above would give us 10 indices which will probably be evenly distributed, and also deterministic (we know which index we need to update).



Run the Check-Up to get a customized report like this:

Analyze your cluster
Synonyms:
Red Status