How To Solve Issues Related to Log – Cluster state applier task took above the warn threshold of

Get an Elasticsearch Check-Up

Check if your ES issues are caused from misconfigured settings
(Free 2 min process)

Check-Up

Last update: Jan-20

Elasticsearch Error Guide In Page Navigation (click to jump) :

Troubleshooting Background – start here to get the full picture       
Related Issues – selected resources on related issues  
Log Context – usefull for experts
About Opster – offering a diffrent approach to troubleshoot Elasticsearch

Check Your Elasticsearch Settings for Painfull Mistakes 


Troubleshooting background

To troubleshoot Elasticsearch log “Cluster state applier task took above the warn threshold of” it’s important to know common problems related to Elasticsearch concepts: cluster, task, threshold. See below-detailed explanations complete with common problems, examples and useful tips.

Cluster in Elasticsearch

What is it

In Elasticsearch a cluster is a collection of one or more nodes (servers / VMs). A cluster can consist of an unlimited number of nodes. The cluster provides interface for indexing and storing data and search capability across all of the data which is stored in the data nodes

Each cluster has a single master node that is elected by the master eligible nodes. In cases where the master is not available the other connected master eligible nodes elect a new master. Clusters are identified by a unique name, which defaults to “Elasticsearch”.

Task in Elasticsearch

What it is

A task is equivalent to an Elasticsearch operation, which can be any request performed on an Elasticsearch cluster. For example, a delete by query request, a search request and so on. Elasticsearch provides a dedicated Task API for the task management which includes various actions, from retrieving the status of current running tasks to canceling any long running task.

Examples
Get all currently running tasks on all nodes of the cluster

Apart from other information, the response of the below request contains task IDs of all the tasks which can be used to get detailed information about the particular task in question.

GET _tasks
GET detailed information of a particular task

clQFAL_VRrmnlRyPsu_p8A:1132678759 is the ID of the task in below request.

GET _tasks/clQFAL_VRrmnlRyPsu_p8A:1132678759
Get all the current tasks running on particular nodes
GET _tasks?nodes=nodeId1,nodeId2
Cancel a long-running task

clQFAL_VRrmnlRyPsu_p8A:1132678759 is the ID of the task in the below request.

POST /_tasks/clQFAL_VRrmnlRyPsu_p8A:1132678759/_cancel?pretty
Notes
  • The Task API will be most useful when you want to investigate the spike of resource utilization in the cluster or want to cancel an operation.

Threshold in Elasticsearch

What it is

Elasticsearch uses several parameters to enable it to manage hard disk storage across the cluster. 

What it’s used for
  • Elasticsearch will actively try to relocate shards away from nodes which exceed the disk watermark high threshold.
  • Elasticsearch will NOT locate new shards or relocate shards on to nodes which exceed the disk watermark low threshold.
  • Elasticsearch will prevent all writes to an index which has any shard on a node that exceeds the disk.watermark.flood_stage threshold.
  • The info update interval is the time it will take Elasticsearch to re-check the disk usage
Examples
PUT _cluster/settings
{
  "transient": {
   
    "cluster.routing.allocation.disk.watermark.low": "85%",
    "cluster.routing.allocation.disk.watermark.high": "90%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "95%",
    "cluster.info.update.interval": "1m"
  }
}
Notes and good things to know:
  • ou can use absolute values eg.”100gb” or percentages eg. “90%”, but you cannot mix the two on the same cluster. 
  • In general, it is recommended to use percentages, since this will work in cases where disks are resized.
  • You can put the cluster settings on the elasticsearch.yml on each node,  but it is recommended to use the PUT _cluster/settings API because it is easier to manage, and ensures that the settings are coherent across the cluster.
  • Elasticsearch comes with sensible defaults for these settings, so think twice before modifying them.  If you find you are spending a lot of time fine-tuning these settings, then it is probably time to invest in new disk space.
  • In the event of the flood_stage.the threshold being exceeded, once you delete data, Elasticsearch should detect automatically that the block can be released (bearing in mind the update interval which could be, for instance, a minute).  However if you want to accelerate this process, you can unblock an index manually, with the following call 
PUT /my_index/_settings
{
  "index.blocks.read_only_allow_delete": null
}
Common problems

Inappropriate cluster settings (if the disk watermark.low is too low) can make it impossible for Elasticsearch to allocate shards on the cluster.  In particular, bear in mind that these parameters work in combination with other cluster settings (for example shard allocation awareness) which cause further restraints on how elasticsearch can allocate shards.


To help troubleshoot related issues we have gathered selected Q&A from the community and issues from Github , please review the following for further information :

Lots Of Cluster State Update Task Z
discuss.elastic.co/t/lots-of-cluster-state-update-task-zen-disco-receive-from-master-above-the-warn-threshold-of-30s/175961

 

Timed Out Waiting For All Nodes To
discuss.elastic.co/t/timed-out-waiting-for-all-nodes-to-process-published-state-and-cluster-unavailability/138590

 


Log Context

Log ”Cluster state applier task took above the warn threshold of” classname is ClusterApplierService.java
We have extracted the following from Elasticsearch source code to get an in-depth context :

         }
    }

    protected void warnAboutSlowTaskIfNeeded(TimeValue executionTime; String source) {
        if (executionTime.getMillis() > slowTaskLoggingThreshold.getMillis()) {
            logger.warn("cluster state applier task [{}] took [{}] above the warn threshold of {}"; source; executionTime;
                slowTaskLoggingThreshold);
        }
    }

    class NotifyTimeout implements Runnable {






About Opster

Incorporating deep knowledge and broad history of Elasticsearch issues. Opster’s solution identifies and predicts root causes of Elasticsearch problems, provides recommendations and can automatically perform various actions to manage, troubleshoot and prevent issues

We are constantly updating our analysis of Elasticsearch logs, errors, and exceptions. Sharing best practices and providing troubleshooting guides.

Learn more: Glossary | Blog| Troubleshooting guides | Error Repository

Need help with any Elasticsearch issue ? Contact Opster

Did this page help you?