Elasticsearch Async Search in Elasticsearch

Average Read Time

2 Mins

Elasticsearch Async Search in Elasticsearch

Opster Expert Team - Gustavo

Dec-2021

Average Read Time

2 Mins

Opster Team

October 2021

Average Read Time

2 Mins


In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.

The Elasticsearch Check-Up is free and requires no installation.

To evaluate your use of async search in Elasticsearch, we recommend you run the Elasticsearch Configuration Check-Up. The Check-Up will also help you optimize other important settings and processes in Elasticsearch to improve performance and ensure high availability for your crucial data.

What is async search?

Waiting for the payload to get to the client can take a very long time when you’re querying large amounts of data.

In Elasticsearch 7.7 a feature called async search was released. This new API is designed to retrieve huge amounts of data in a stream fashion instead of a single request. 

This means that instead of waiting for the query to finish retrieving all the results, the async query will be returning the results partially as it’s collecting them.

The query will return an ID and other status indicators, so you can close your Kibana DevTools console or terminal and come back later to see your query’s progress and the results fetched.

Running an async search query

The async search query receives the same parameters as a regular search.

Let’s index some documents and run a query. 

POST test_async/_doc
{
  "text": "Doc1"
}

POST test_async/_doc
{
  "text": "Doc2"
}

POST test_async/_doc
{
  "text": "Example doc"
}

POST test_async/_async_search

The response will look like this: 

{
  “id”: “SOME_ID”,
  "is_partial" : false,
  "is_running" : false,
  "start_time_in_millis" : 1636010235096,
  "expiration_time_in_millis" : 1636442235096,
  "response" : {
    "took" : 719,
    "timed_out" : false,
    "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
    },
    "hits" : {
      "total" : {
        "value" : 3,
        "relation" : "eq"
      },
      "max_score" : 1.0,
      "hits" : [
        {
          "_index" : "test_async",
          "_type" : "_doc",
          "_id" : "0JjG6XwBpL6RE1SX6qi6",
          "_score" : 1.0,
          "_source" : {
            "title" : "Example doc"
          }
        },
        {
          "_index" : "test_async",
          "_type" : "_doc",
          "_id" : "0ZjO6XwBpL6RE1SX0Kgt",
          "_score" : 1.0,
          "_source" : {
            "text" : "Doc1"
          }
        },
        {
          "_index" : "test_async",
          "_type" : "_doc",
          "_id" : "0pjO6XwBpL6RE1SX1KgU",
          "_score" : 1.0,
          "_source" : {
            "text" : "Doc2"
          }
        }
      ]
    }
  }
}

Important properties in async search queries

FieldDescription
idIf the query takes longer than the preset time set on wait_for_completion_timeout, an ID is generated to retrieve the query status later.
is_partialWhen the query is running, this parameter will always be true. Otherwise, it will indicate if the query failed or is complete.
is_runningIndicates whether the query is running or complete.
shards.totalTotal amount of shards the query will be executed against.
shards.successfulThe amount of shards which, up until the current point in time, have been successfully executed against.
hits.total.valueDocuments returned by the query so far. These documents belong to the “shards successful”.

How to retrieve status and hits

To retrieve the status and hits of our async query we just need to run a GET request: 

GET /_async_search/SOME_ID

The current status and hits of the async query will be returned.

How to retrieve status alone

If we don’t need the hits of the query and only want to check the status, we can call the status endpoint:

GET /_async_search/status/SOME_ID

The response will look like this: 

{
  "id" : "FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=",
  "is_running" : true,
  "is_partial" : true,
  "start_time_in_millis" : 1583945890986,
  "expiration_time_in_millis" : 1584377890986,
  "_shards" : {
      "total" : 562,
      "successful" : 188, 
      "skipped" : 0,
      "failed" : 0
  }
}

The “successful” property indicates the amount of shards the query was executed on. 

For an async search that has been completed, the status response has an additional completion_status field that shows the HTTP status code of the completed async search.

For example, if the query executed correctly: 

 “completion_status” : 200 

If the query had errors: 

“completion_status” : 503 

How to delete a query

If you want to cancel the async query at some point you can call the DELETE verb and the query will be canceled.

DELETE /_async_search/SOME_ID

If Elasticsearch security features are enabled, there are two types of users that can delete queries: 

1. The authenticated user that fired the query

OR 

2. A user that has cancel_task cluster privileges.

Additional parameters

FieldDescription
wait_for_completion_timeoutBlocks the query execution so that it finishes after this time, defaulting to 1 second. Results will not be stored (no ID field) if the query finished before this time.
keep_on_completionStores results even if the query finished within wait_for_completion_timeout.
keep_aliveDefaults to 5 days and determines the amount of time the async queries status will be saved. After this time all the ongoing queries and statuses will be deleted.
batched_reduce_sizeDefines how often partial results become available, defaults to 5.
request_cacheUsed to enable or disable caching on a *per-request* basis.
Defaults to true.

The following parameters cannot be changed but are worth mentioning:

FieldDescription
pre_filter_shard_sizeSet to 1, enforces the execution of a pre-filter roundtrip to skip the documents that don't match the query.
ccs_minimize_roundtripsIndicates whether network round-trips should be minimized as part of cross-cluster search requests execution. Set to false.

Elasticsearch 7.x, by default, does not limit the size of the async queries response. Storing huge responses might destabilize the cluster. To limit the maximum response size you can change the search.max_async_search_response_size cluster setting.   

Conclusion

Using async search is a great idea when you need to run high demanding queries and want to retrieve partial results instead of waiting until the end of the query.



Run the Check-Up to get a customized report like this:

Analyze your cluster