How to Correctly Identify Slow and Heavy Searches in Elasticsearch

How to Correctly Identify Slow and Heavy Searches in Elasticsearch

A heavy search is a search that consumes many resources in the cluster (like memory, disk bandwidth, CPU and network) and which causes latency in Elasticsearch. Heavy searches are not to be confused with a search that is slow due to existing latency in the system.

Heavy searches have an effect on all of the cluster’s operations, such as increasing indexing and search latencies and API responsiveness. That is why it is so important to detect those heavy searches and prevent them from running.

The challenge is that finding heavy searches is not as simple as it sounds. Your system can be loaded for any number of reasons and searches will be slower as a result. Node disconnection, heavy merges due to indexing activity, backups being stored in snapshots – all of these and more can slow down your searches, without heavy searches themselves having occurred.

Adding to this challenge, you might think that the root cause is heavy searches and false-positive detect a search that looks heavy. 

So how can you identify when searches are slow due to factors truly related to searches and not due to other elements in your system? There are several clear markers.

Firstly, heap usage and breakers can indicate that searches in your system have been requiring a lot of resources, such as when a breaker is tripped or the estimated breaker usage is high/rising.

Secondly, you can check if your searches include expensive queries, such as regex, wildcard, script, prefix and more. If expensive queries were running, it’s a good indication that searches are slow for that reason and not due to other elements.

Once you’ve determined that heavy searches are indeed occurring on your system, the “culprit” searches need to be identified in order to be addressed. Once a single heavy search is running on the cluster, all searches following it will run more slowly as a result and could be unduly suspected as the “culprits”.

Unfortunately, there’s no shortcut when trying to locate a heavy search on your own – you need to manually scan each and every slow search that ran when the slowing down began. You might think you located it, only to discover that the culprit you suspected was not the culprit at all. This can usually be identified by seeing if the search ran quickly at other times, aside from the specific time you were looking at. 

In order to accurately pinpoint when searches are slow due to heavy searches, Opster’s Search Gateway scans each search and provides a score based on the search features to distinguish between slow searches and valid searches.

In addition, the Search Gateway keeps track of historic search execution statistics, grouping searches by their patterns. This allows the Gateway to detect when a search is slow due to its costly attributes, as opposed to when it runs slowly at a specific time, likely due to other factors such as those mentioned above. 

To begin optimizing your searches and improving your performance, you can use Opster’s free Search Log Analyzer. With Opster’s Analyzer, you can easily locate slow searches and understand what led to them adding additional load to your system. You’ll receive customized recommendations for how to reduce search latency and improve your search performance. The tool is free and takes just 2 minutes to run.