Elasticsearch Elasticsearch Split Index API

Opster Team

Last updated: Mar 9, 2023

| 2 min read

In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.

The Elasticsearch Check-Up is free and requires no installation.

To detect and resolve Elasticsearch errors, we recommend you try the AutoOps platform. AutoOps diagnoses issues in Elasticsearch based on hundreds of metrics pulled by a lightweight agent. Once diagnosed, the system not only provides root cause analysis but also resolves the issues. Try it for free.

Quick links

Split Index API basics

What is the Split Index API used for?

Once an Elasticsearch Index has been created, users cannot change the number of shards in the index. However, using the Split Index API, an existing index can be split to create a new index with extra primary shards.  

Why would users want to split an existing index and what are the benefits? 

– The index has already grown beyond the optimum size of 50GB per shard, therefore by splitting the index into a higher number of shards you can meet the 50GB shard limit.
– The index is a non-time-based index, in danger of going above the optimum 50GB per shard size (note that if the index is time-based, it is probably better to roll the index over to create a new one).

Before splitting an index – important notes

The index you want to split must be read-only, and the entire cluster health status must be green. Note that users cannot split the Write Index for a datastream; the best approach here is to update the index template to include the number of shards required and roll over the datastream. 

Make an index read-only with the following command:

PUT /index_to_split/_settings
{
  "settings": {
    "index.blocks.write": true 
  }
}

How the Split Index API works

The split operation creates a new target index based on the original index, but with a multiple of the original number of shards. The split operation takes advantage of the underlying data structures on-disk by hardlinking the segments from the original index to those on the new index, while removing any documents belonging to other shards.

Limitations

  • The number of shards being created must be a multiple of the original number of shards.
  • The max number of shards in the split index is also limited by the setting of index.number_of_routing shards, which by default depends on the number of shards in the index.  
  • The node carrying out the split operation must also have sufficient free disk to duplicate the data in the index.

How to split an index

The _split API can be used like this:

POST /index_to_split/_split/new_index
{
  "settings": {
    "index.number_of_shards": 5 
  },
  "aliases": {
    "my_search_indices": {}
  }

 Users can set the number of shards in the new index (subject to restrictions) along with any other settings or aliases for the index. However, mappings for the index cannot be changed. 

By default, the split command will return as soon as the new index has been created, but the new index will not be available for search until all of the shards involved have been assigned, initialized, and recovered into active state. Check progress using the following command:

GET _cat/recovery?v=true

Watch product tour

Try AutoOps to find & fix Elasticsearch problems

Analyze Your Cluster
Skip to content