How to Improve your Elasticsearch Indexing Performance

By Opster Team

Updated: Jan 28, 2024

| 3 min read

Learn how to improve your Elasticsearch indexing rate for better Elasticsearch performance by following these 11 useful tips:

  • Tune Refresh Interval

    Tune refresh_interval (default 1 sec) according to your system requirements.

  • Disable Replicas

    You can follow this official guide to disable replicas and set according to your requirements.

  • Automatic ID Field

    Do not set the “_id” field of the document.  If not necessary, it is better to allow Elasticsearch to set the “_id” automatically.

  • Use Multiple Workers/Threads

    Use multiple workers / threads to index.

  • Use Official Clients

    Use “official” Elasticsearch clients since they have been designed to optimize connection pooling and keep alives.

  • Avoid Frequent Updates

    Avoid frequent updates (to the same document), as every update creates a new document in Elasticsearch and marks the old document as deleted. This can cause there to be several deleted documents and altogether larger segment sizes, which don’t always merge during the segment merging process. To work around this, you may be able to collect all of these updates in the application used to call the index API (such as the search service written in Java or Python) in order to 1) remove unnecessary updates (like multiple updates to counter fields), and 2) send only a few updates to Elasticsearch.

  • Design Index Mapping Carefully

    Be careful while designing your index mapping. Don’t index fields if they are not used for search (default is true), as this will reduce the inverted index size of Elasticsearch and save the analysis cost on the field. The index option controls it.

  • Use Analyzers Carefully

    Use analyzers carefully on your fields; some analyzers (ngram, etc.) take significant resources and can slow down the indexing speed and significantly increase the index size of large text fields.

  • Use Wait_For Param

    If some of your requirements are to search the indexed document immediately, then instead of an explicit refresh, use wait_for param while indexing.

  • Use Bulk API

    Use bulk API to index multiple documents instead of individually indexing numerous documents. Bulk API performance depends on the size, not on the number of documents in the request.

  • Use SSD

    Use SSD instead of magnetic disks for a faster segment merge process.

How helpful was this guide?

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?