All Shards Failed – How to solve this Elasticsearch error

All Shards Failed – How to solve this Elasticsearch error

Opster Team

February-21, Version: 1.7-8.0

 

To understand why all shards failed, we recommend you run the Elasticsearch Error Check-Up which can resolve issues that cause many errors related to shards and help prevent this error from happening again.

The Check-Up will analyze your ES deployment to pinpoint the cause of your shard failure and provide you with suitable actionable recommendations to resolve the issue. The tool is free and require no installation.

What this error means

The exception “all shards failed” arises when at least one shard failed. This can occur due to various reasons, such as: if text fields are being used for document aggregations or performing metric aggregation; if a given search failed on the shard and is in an unrecoverable state, and therefore no response could be given for that shard (though the shard itself is fine); or some special aggregations (like global and reverse nested aggregation) are not used in the proper order.

Possible causes

Below are 5 reasons why this error may appear, including troubleshooting steps for each.

1. Text fields are not optimized for operations

This error sometimes occurs because text fields are not optimized for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default.

Quick troubleshooting steps

To overcome the above error, you need to enable the field data on the field if you want to get rid of the error but beware – it can cause performance issues.

If you are not using any explicit index mapping definition, then you can simply use the .keyword sub-field in order to run aggregation on it.

However, if you have defined index mapping,and if you don’t have the keyword field then you can use a multi-field which is useful to index the same field in different ways for different purposes. You can also change the data type of the name field from text to keyword type in the index mapping definition (to enable aggregation on it), as shown below –

{
  "mappings": {
    "properties": {
      "name":{
        "type":"keyword"       // note this
      }
    }
  }
}

2. Metric aggregations can’t be performed on text fields

Metric Aggregation mainly refers to the maths calculation done on the documents present in parent buckets. Therefore, you cannot perform metric aggregation on text fields. If these aggregations are performed on a text field, you will get the “all shards failed” exception.

Quick troubleshooting steps

  1. The sum/max/min/ ie metric aggregation can work on a script instead of a field. The script would transform the text into a numeric value (e.g. Integer.parseInt(doc.cost.value)) or starting ES 7.11 you can use the runtime field which can be used in the query and aggregations.
  2. If you want to avoid scripts in search query, you can change the data type of the cost field to a numeric type, to avoid the error. The index mapping definition will be like below:
{
  "mappings": {
    "properties": {
      "cost":{
        "type":"integer"     // note this
      }
    }
  }
}

3. At least one shard has failed

The aforementioned exception may arise when at least one shard has failed. Upon restarting the remote server, some shards may not recover, causing the cluster to stay red. You can check the health status of the cluster, by using the Elasticsearch Check-Up or cluster health API:

GET _cluster/health

One way to resolve the error is to delete the index completely (but it’s not an ideal solution).

4. Global aggregations are not defined as top-level

Global aggregation is a special kind of aggregation that is executed globally on all the documents without being influenced by the query. If global aggregations are not defined as top-level aggregations, then you’ll get the “all shards failed” exception.

Quick troubleshooting steps

To avoid this error, you should ensure that global aggregations are defined only as top-level aggregations and not as sub-level aggregation.

For example –  In the above case you should change the search query as follows (note here that global aggregation is defined as a top-level aggregation):

{
 "size": 0,
 "aggs": {
   "all_products": {
     "global": {},
     "aggs": {
       "genres": {
         "terms": {
           "field": "cost"
         }
       }
     }
   }
 }
}

5. Reverse nested aggregation is not used inside a nested aggregation

Reverse nested aggregation is a single bucket aggregation that enables aggregating on parent docs from nested documents. 

The reverse_nested aggregation must be defined inside a nested aggregation. 

But if reverse nested aggregation is not used inside a nested aggregation, you’ll see this exception.

Quick troubleshooting steps

To avoid this error, you should ensure that the reverse_nested aggregation is defined inside a nested aggregation.

The modified search query will be –

{
 "aggs": {
   "comments": {
     "nested": {
       "path": "comments"
     },
     "aggs": {
       "top_usernames": {
         "terms": {
           "field": "comments.username"
         },
         "aggs": {
           "comment_issue": {
             "reverse_nested": {},
             "aggs": {
               "top_tags": {
                 "terms": {
                   "field": "tags"
                 }
               }
             }
           }
         }
       }
     }
   }
 }
}

Log Context

Log”All Shards Failed”classname  is SearchScrollAsyncAction.java We extracted the following from Elasticsearch source code for those seeking an in-depth context :

addShardFailure(new ShardSearchFailure(failure; searchShardTarget));
  int successfulOperations = successfulOps.decrementAndGet();
  assert successfulOperations >= 0 : "successfulOperations must be >= 0 but was: " + successfulOperations;
  if (counter.countDown()) {
  if (successfulOps.get() == 0) {
  listener.onFailure(new SearchPhaseExecutionException(phaseName; "all shards failed"; failure; buildShardFailures()));
  } else {
  SearchPhase phase = nextPhaseSupplier.get();
  try {
  phase.run();
  } catch (Exception e) {

Run the Check-Up to get customized recommendations like this:

checklist Run Check-Up
error

Heavy merges detected in specific nodes

error-img

Description

A large number of small shards can slow down searches and cause cluster instability. Some indices have shards that are too small…

error-img

Recommendations Based on your specific ES deployment you should…

Based on your specific ES deployment you should…

1

X-PUT curl -H [a customized recommendation]

 





Optimize Elasticsearch Performance

Try The Tool