Last updated on: November 2022
OpenSearch, an open-source, full-text search engine, allows for massive volumes of data to be stored, searched, and analyzed rapidly in near real-time. OpenSearch is employed behind the scenes, integrating with backend infrastructure where it provides the underlying technology that powers applications.
OpenSearch teams have made a tremendous effort in designing OpenSearch so that it can be set up fairly quickly and reliably, without having to invest much thought in its initial configuration. When a new cluster is first created, the scale is usually small, and everything runs smoothly out-of-the-box.
However, unforeseen complications begin to arise once the OpenSearch cluster begins to scale. As the cluster is loaded with more and more data, and indexing and searches are run more frequently, companies begin to experience severe problems such as outages, degraded performance, data loss, and security breaches. Too often, by the time a company realizes that OpenSearch requires additional resources, time, and/or expertise, it has already become a central component of their operations.
At Opster, we’ve seen many potentially disastrous mistakes made when working with OpenSearch. In this blog post, we present five major concerns that should be addressed before your OpenSearch, whether already in production or not, can be considered truly production-ready.
Neglecting to Look Inside
It’s enticing to deploy OpenSearch and just forget about its inner workings. But, because OpenSearch can suddenly slow down, nodes can get disconnected, and systems can even crash unexpectedly. Without proper monitoring and observability, you won’t know why this happened, how it can be fixed, or how to avoid the problem in the future.Monitoring and observability are critical, not just for when things break down, but also for the relentless optimization required of enterprises that wish to maintain their competitive edge. While monitoring reveals whether or not a system is operating as expected, it can’t improve current performance, and it doesn’t explain why something isn’t working the way it should. This is where observability comes in.
Observability gives an end-to-end view of processes, detecting undesirable behavior (such as downtime, errors, and slow response time) and identifying the root causes of problems.
Observability is achieved using logs, metrics, and traces—three powerful tools that are often referred to as the three pillars of observability.
When complex distributed systems start to malfunction, good visibility is crucial for pinpointing the root of the problem and significantly reducing time to resolution. The OpenSearch community provides free open-source monitoring tools that can help enhance visibility, such as Cerbro.
Misconfigured Circuit Breakers
In OpenSearch, circuit breakers are used to limit memory usage so that operations do not cause an OutOfMemoryError. Sometimes, a modest adjustment to your circuit breakers can make the difference between high-performing clusters and detrimental downtime. OpenSearch queries, whether initiated directly by users or by applications, can become extremely resource-intensive. While the default circuit breaker settings may be adequate in some cases, often adjusting breaker limits is absolutely necessary to ensure that queries do not impede performance or cause outages due to running out of memory (OOM).
Poorly Configured Security Settings
It’s dangerously easy to misconfigure OpenSearch security settings. If you are not proactive about your security settings, your OpenSearch database can be exposed or leaked. Common security oversights include exposing the OpenSearch rest API to the public internet, not changing default passwords, and neglecting to encrypt data in transfer or at rest. These oversights can leave OpenSearch servers vulnerable to malware or ransomware and subject data to theft or corruption.
Even if your OpenSearch is configured properly with optimal security settings, unprotected OpenSearch Dashboards instances can still compromise your data. OpenSearch Dashboards is an open-source project that performs data analytics and visualization of OpenSearch data. The platform performs advanced analytics on data that it pulls from OpenSearch databases, which it presents graphically through charts, tables, and maps. The problem is that OpenSearch Dashboards isn’t equipped with comprehensive built-in security, especially when being used with the free open-source version of OpenSearch.
Disks and Data Loss
Developer forums are filled with confusion about lost data nodes and unassigned shards in OpenSearch. This calls to attention the necessity of handling disks mindfully to avoid losing data. If you’re not careful when selecting disks for your data nodes, you might find that shards are unassigned and that data is lost after restart. Ensure that data and master-eligible nodes are using persistent storage. in the case of ephemeral disks, however, this is not enough. It is common to select ephemeral disks for their high performance and cost-efficiency; but, without taking the proper precautions, this choice can lead to data loss. When using ephemeral disks, you must have more than one copy of each shard and have a reliable procedure in place to restore data in case all copies are gone.
In the case of ephemeral disks, however, this is not enough. It is common to select ephemeral disks for their high performance and cost-efficiency; but, without taking the proper precautions, this choice can lead to data loss. When using ephemeral disks, you must have more than one copy of each shard and have a reliable procedure in place to restore data in case all copies are gone.
Neglecting Backup and Restore
Although everyone agrees that backup and restoration are important, many companies do not have sufficient backup and restore strategies in place for their OpenSearch clusters.
There’s a lot to take into account when protecting data in OpenSearch. For starters, you should make sure that all your important information is backed up. This may seem obvious, but, because indices are added constantly, you may not have snapshots of all your vital indices, backup may not run as often as it should, and backup processes may fail silently—oversights that you may only discover after it’s too late. Keep in mind that running backup procedures is resource-intensive, so it should be done when the cluster is less loaded.
Even if your backup appears to be running perfectly, you should periodically execute restore procedures to make sure that the data is truly restorable. This can be very time-consuming, so it is advisable to predetermine the order of restoration, ensuring that the most vital data is taken care of first.
Sometimes it’s wiser not to use backup and restore at all. When OpenSearch mirrors another data source, i.e., it is not the single point of truth, it might be advisable to reconstruct the indices from scratch by reindexing data from the other single point of truth.This might take longer, depending on the nature of the data, but it can take the load off your OpenSearch backup processes, mitigating costs and reducing storage space.
OpenSearch is a powerful and widely-used search engine that is at the core of many of today’s technological platforms. It may be easy to manage at first, but as your business scales, you will encounter serious problems if you have not taken some necessary precautions. To ensure that your OpenSearch is fully prepared for production, it’s imperative that you avoid the major pitfalls detailed above.
To detect and resolve OpenSearch errors, we recommend you try the AutoOps platform. AutoOps diagnoses issues in OpenSearch based on hundreds of metrics pulled by a lightweight agent. Once diagnosed, the system not only provides root cause analysis, but also resolves the issues. Try it for free.