Cannot move any shard in the cluster due to cluster concurrent recoveries getting breached – How to solve this OpenSearch error

Opster Team

Aug-23, Version: 1-1.1

Before you dig into reading this guide, have you tried asking OpsGPT what this log means? You’ll receive a customized analysis of your log.

Try OpsGPT now for step-by-step guidance and tailored insights into your OpenSearch operation.

Briefly, this error occurs when the number of concurrent shard recoveries in an OpenSearch cluster exceeds the set limit. Shard recovery can be due to node failure, shard reallocation, or index creation. To resolve this, you can increase the limit by adjusting the ‘cluster.routing.allocation.node_concurrent_recoveries’ setting. Alternatively, you can reduce the number of shard recoveries by optimizing your shard allocation, such as reducing the number of shards per index or increasing the number of nodes in your cluster.

For a complete solution to your to your search operation, try for free AutoOps for Elasticsearch & OpenSearch . With AutoOps and Opster’s proactive support, you don’t have to worry about your search operation – we take charge of it. Get improved performance & stability with less hardware.

This guide will help you check for common problems that cause the log ” Cannot move any shard in the cluster due to cluster concurrent recoveries getting breached ” to appear. To understand the issues related to this log, read the explanation below about the following OpenSearch concepts: cluster, allocation, routing, shard, recoveries.

Log Context

Log “Cannot move any shard in the cluster due to cluster concurrent recoveries getting breached” classname is BalancedShardsAllocator.java.
We extracted the following from OpenSearch source code for those seeking an in-depth context :

                checkAndAddInEligibleTargetNode(currentNode.getRoutingNode());
            }
            for (Iterator it = allocation.routingNodes().nodeInterleavedShardIterator(); it.hasNext(); ) {
                //Verify if the cluster concurrent recoveries have been reached.
                if (allocation.deciders().canMoveAnyShard(allocation).type() != Decision.Type.YES) {
                    logger.info("Cannot move any shard in the cluster due to cluster concurrent recoveries getting breached"
                                    + ". Skipping shard iteration");
                    return;
                }
                //Early terminate node interleaved shard iteration when no eligible target nodes are available
                if(sorter.modelNodes.length == inEligibleTargetNode.size()) {

 

How helpful was this guide?

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Get expert answers on Elasticsearch/OpenSearch