Learn how to improve your Elasticsearch indexing rate for better Elasticsearch performance by following these 11 useful tips:
- Tune Refresh Interval
Tune refresh_interval (default 1 sec) according to your system requirements.
- Disable Replicas
You can follow this official guide to disable replicas and set according to your requirements.
- Automatic ID Field
Do not set the “_id” field of the document. If not necessary, it is better to allow Elasticsearch to set the “_id” automatically.
- Use Multiple Workers/Threads
Use multiple workers / threads to index.
- Use Official Clients
Use “official” Elasticsearch clients since they have been designed to optimize connection pooling and keep alives.
- Avoid Frequent Updates
Avoid frequent updates (to the same document), as every update creates a new document in Elasticsearch and marks the old document as deleted. This can cause there to be several deleted documents and altogether larger segment sizes, which don’t always merge during the segment merging process. To work around this, you may be able to collect all of these updates in the application used to call the index API (such as the search service written in Java or Python) in order to 1) remove unnecessary updates (like multiple updates to counter fields), and 2) send only a few updates to Elasticsearch.
- Design Index Mapping Carefully
Be careful while designing your index mapping. Don’t index fields if they are not used for search (default is true), as this will reduce the inverted index size of Elasticsearch and save the analysis cost on the field. The index option controls it.
- Use Analyzers Carefully
Use analyzers carefully on your fields; some analyzers (ngram, etc.) take significant resources and can slow down the indexing speed and significantly increase the index size of large text fields.
- Use Wait_For Param
If some of your requirements are to search the indexed document immediately, then instead of an explicit refresh, use wait_for param while indexing.
- Use Bulk API
Use bulk API to index multiple documents instead of individually indexing numerous documents. Bulk API performance depends on the size, not on the number of documents in the request.
- Use SSD
Use SSD instead of magnetic disks for a faster segment merge process.
We are sorry that this post was not useful for you!
Let us improve this post!
Tell us how we can improve this post?