How to Improve your OpenSearch Indexing performance

By Opster Team

Updated: Jan 12, 2023

| 3 min read

Learn how to improve your OpenSearch indexing rate for better OpenSearch performance by following these 11 useful tips:

  • Tune Refresh Interval

    Tune refresh_interval (default 1 sec) according to your system requirements.

  • Disable Replicas

    Disable replicas and set them according to your requirements.

  • Automatic ID Field

    Do not set the “_id” field of the document.  If not necessary, it is better to allow OpenSearch to set the “_id” automatically.

  • Use Multiple Workers/Threads

    Use multiple workers / threads to index.

  • Use Official Clients

    Use “official” OpenSearch clients since they have been designed to optimize connection pooling and keep alives.

  • Avoid Frequent Updates

    Avoid frequent updates (to the same document), as every update creates a new document in OpenSearch and marks the old document as deleted. This can cause there to be several deleted documents and altogether larger segment sizes, which don’t always merge during the segment merging process. To work around this, you may be able to collect all of these updates in the application used to call the index API (such as the search service written in Java or Python) in order to 1) remove unnecessary updates (like multiple updates to counter fields), and 2) send only a few updates to OpenSearch.

  • Design Index Mapping Carefully

    Be careful while designing your index mapping. Don’t index fields if they are not used for search (default is true), as this will reduce the inverted index size of OpenSearch and save the analysis cost on the field. The index option controls it.

  • Use Analyzers Carefully

    Use analyzers carefully on your fields; some analyzers (ngram, etc.) take significant resources and can slow down the indexing speed and significantly increase the index size of large text fields.

  • Use Wait_For Param

    If some of your requirements are to search the indexed document immediately, then instead of an explicit refresh, use wait_for param while indexing.

  • Use Bulk API

    Use bulk API to index multiple documents instead of individually indexing numerous documents. Bulk API performance depends on the size, not on the number of documents in the request.

  • Use SSD

    Use SSD instead of magnetic disks for a faster segment merge process.

To easily improve your indexing and search performance, we recommend you try AutoOps for OpenSearch. AutoOps detects issues and improves OpenSearch performance by analyzing shard sizes, threadpools, memory, snapshots, disk watermarks, and more. Try it for free.

How helpful was this guide?

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?