Before you begin reading this guide, we recommend you try running the Elasticsearch Error Check-Up which analyzes 2 JSON files to detect many configuration errors.
To easily locate the root cause and resolve this issue try AutoOps for Elasticsearch & OpenSearch. It diagnoses problems by analyzing hundreds of metrics collected by a lightweight agent and offers guidance for resolving them.
This guide will help you check for common problems that cause the log ” Encountered bulk failures during reindex process ” to appear. To understand the issues related to this log, read the explanation below about the following Elasticsearch concepts: reindex, plugin, bulk.
Overview
Reindex is the concept of copying existing data from a source index to a destination index which can be inside the same or a different cluster. Elasticsearch has a dedicated endpoint _reindex for this purpose. A reindexing is mostly required for updating mapping or settings.
Examples
Reindex data from a source index to destination index in the same cluster:
POST /_reindex?pretty { "source": { "index": "news" }, "dest": { "index": "news_v2" } }
Notes
- Reindex API does not copy settings and mappings from the source index to the destination index. You need to create the destination index with the desired settings and mappings before you begin the reindexing process.
- The API exposes an extensive list of configuration options to fetch data from the source index, such as query-based indexing and selecting multiple indices as the source index.
- In some scenarios reindex API is not useful, where reindexing requires complex data processing and data modification based on application logic. In this case, you can write your custom script using Elasticsearch scroll API to fetch the data from source index and bulk API to index data into destination index.
Overview
In Elasticsearch, when using the Bulk API it is possible to perform many write operations in a single API call, which increases the indexing speed. Using the Bulk API is more efficient than sending multiple separate requests. This can be done for the following four actions:
- Index
- Update
- Create
- Delete
Examples
The bulk request below will index a document, delete another document, and update an existing document.
POST _bulk { "index" : { "_index" : "myindex", "_id" : "1" } } { "field1" : "value" } { "delete" : { "_index" : "myindex", "_id" : "2" } } { "update" : {"_id" : "1", "_index" : "myindex"} } { "doc" : {"field2" : "value5"} }
Notes
- Bulk API is useful when you need to index data streams that can be queued up and indexed in batches of hundreds or thousands, such as logs.
- There is no correct number of actions or limits to perform on a single bulk call, but you will need to figure out the optimum number by experimentation, given the cluster size, number of nodes, hardware specs etc.
Log Context
Log “Encountered bulk failures during reindex process”classname is EnrichPolicyRunner.java We extracted the following from Elasticsearch source code for those seeking an in-depth context :
); failure.getCause() ); } } delegate.onFailure(new ElasticsearchException("Encountered bulk failures during reindex process")); } else if (bulkByScrollResponse.getSearchFailures().size() > 0) { logger.warn( "Policy [{}]: encountered [{}] search failures. Turn on DEBUG logging for details."; policyName; bulkByScrollResponse.getSearchFailures().size()
See how you can use AutoOps to resolve issues