Elasticsearch OpenSearch Split Index API

By Opster Team

Updated: Apr 2, 2023

| 2 min read

In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.

The Elasticsearch Check-Up is free and requires no installation.

Before you begin reading this guide, we recommend you try running the OpenSearch Error Check-Up which analyzes 2 JSON files to detect many configuration errors.

To easily resolve issues in your deployment and locate their root cause, try AutoOps for OpenSearch. It diagnoses problems by analyzing hundreds of metrics collected by a lightweight agent and offers guidance for resolving them.

Quick links

Split Index API basics

What is the Split Index API used for?

Once an OpenSearch Index has been created, users cannot change the number of shards in the index. However, using the Split Index API, an existing index can be split to create a new index with extra primary shards.  

Why would users want to split an existing OpenSearch index and what are the benefits? 

– The index has already grown beyond the optimum size of 50GB per shard, therefore by splitting the index into a higher number of shards you can meet the 50GB shard limit.
– The index is a non-time-based index, in danger of going above the optimum 50GB per shard size (note that if the index is time-based, it is probably better to roll the index over to create a new one).

Before splitting an index – important notes

The index you want to split must be read-only, and the entire cluster health status must be green. Note that users cannot split the Write Index for a datastream; the best approach here is to update the index template to include the number of shards required and roll over the datastream. 

Make an index read-only with the following command:

PUT /index_to_split/_settings
{
  "settings": {
    "index.blocks.write": true 
  }
}

How the Split Index API works

The split operation creates a new target index based on the original index, but with a multiple of the original number of shards. The split operation takes advantage of the underlying data structures on-disk by hardlinking the segments from the original index to those on the new index, while removing any documents belonging to other shards.

Limitations

  • The number of shards being created must be a multiple of the original number of shards.
  • The max number of shards in the split index is also limited by the setting of index.number_of_routing shards, which by default depends on the number of shards in the index.  
  • The node carrying out the split operation must also have sufficient free disk to duplicate the data in the index.

How to split an index

The _split API can be used like this:

POST /index_to_split/_split/new_index
{
  "settings": {
    "index.number_of_shards": 5 
  },
  "aliases": {
    "my_search_indices": {}
  }

Users can set the number of shards in the new index (subject to restrictions) along with any other settings or aliases for the index. However, mappings for the index cannot be changed. 

By default, the split command will return as soon as the new index has been created, but the new index will not be available for search until all of the shards involved have been assigned, initialized, and recovered into active state. Check progress using the following command:

GET _cat/recovery?v=true

How helpful was this guide?

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?


Analyze your cluster & get personalized recommendations

Skip to content