- Which open-source/free tools should you use for OpenSearch monitoring?
- Monitoring OpenSearch with open-source tools
- So which tool should you choose?
- Why standard monitoring tools aren’t enough
Which open-source/free tools should you use for OpenSearch monitoring?
Last update on: Jan 2023
Observability is a critical aspect of operating any system, exposing its inner workings, and facilitating the detection and resolution of problems. Monitoring tools serve as the first and most basic layer in system observability. In OpenSearch, the search engine that powers so many of today’s applications, reliable monitoring is an absolute must and is the primary building block of a successful operation.
OpenSearch infrastructure can be quite complex, requiring the monitoring of many performance parameters that are often interlinked. These include memory, CPU, cluster health, node availability, indexing rates, and JVM metrics (e.g., heap usage, pool size, and garbage collection). There are multiple open-source monitoring tools available for OpenSearch, each with its advantages and limitations. While these tools can be extremely useful, as operations scale, it is common to encounter issues that aren’t easily resolved with the standard tools.
This blog post will explore four popular open-source tools for OpenSearch tracking, their defining features, and their key differences. It will also explain where such standard monitoring tools are lacking and how Opster can help you achieve optimal OpenSearch performance.
Monitoring OpenSearch with open source tools
An open-source MIT-licensed web admin tool, Cerebro enables OpenSearch users to monitor and manipulate indexes and nodes, while also providing an overall view of cluster health. It has over a million downloads on Docker and 5k stars on GitHub. Cerebro is similar to Kopf, an older monitoring tool that was installed as a plugin on earlier OpenSearch versions. When web applications could no longer run as plugins on OpenSearch, Kopf was discontinued and replaced by Cerebro, a standalone application with similar capabilities and UI.
Built with Scala, AngularJS, Framework, and Bootstrap, Cerebro can be set up easily, in just a few steps. It also boasts built-in capabilities to conveniently track and oversee operations in OpenSearch, including resyncing corrupted shards to another node, a dashboard showing the replication process in real-time, configuring backup using snapshots, and activating a selected index with a single click.
The Cerebro community is relatively small, resulting in less frequent updates and fewer features. Its documentation is sparse and it doesn’t support data from logs. In addition, while it is an excellent tool for tracking real-time processes, Cerebro does not provide graphs with historic/time-based node statistics and, thus, doesn’t offer anomaly detection or troubleshooting capabilities.
2. Prometheus and Grafana
Prometheus is a powerful metric-collection system capable of scraping metrics from OpenSearch. Grafana is a tool that, when coupled with Prometheus, can be used to visualize OpenSearch data. Both Prometheus and Grafana have larger communities and more contributors than Cerebro and, therefore, provide more features and capabilities. Prometheus and Grafana have 46k stars and 53.1k stars on GitHub respectively, and both have over 10 million downloads on Docker.
Able to display data over long periods of time, Grafana features versatile visual capabilities, including flexible charts, heat maps, tables, and graphs. It also provides built-in dashboards that can display information taken from multiple data sources. There are a large number of ready-made dashboards created by the Grafana community, which can be imported and used in your environment. For example, Grafana’s OpenSearch time-based graphs can display meaningful statistics on nodes. These capabilities make Grafana a good solution for visualizing and analyzing metrics, enabling users to add conditional rules to dashboard panels that can trigger notifications.
A major drawback of Grafana is that it doesn’t support full-text data querying. Moreover, it doesn’t support data from logs.
3. Opster Management Console (OMC)
Opster Management Console (OMC) provides the orchestration, monitoring and management capabilities that are offered by managed services, completely for free. By using the OMC, a single interface, users can: upgrade versions automatically, scale cluster resources, manage certificates & back-ups, monitor resources & costs, and more.
In addition, OMC routinely analyzes the connected system and provides alerts when there are signs of performance degradation. It offers recommendations on how to improve configuration & resolve issues, optimize templates, improve search performance & resource utilization, and reduce needed hardware.
OMC easily runs on any Kubernetes environment (on cloud and on-premise) and supports all versions of OpenSearch. Although the tool is relatively new, it has gained popularity among OpenSearch users due to its capabilities and ease of use. You can install the OMC from here.
So which tool should you choose?
Before you go straight for the OpenSearch monitoring tool with the greatest functionality, there are a few things to consider.
First, Cerebro is easy to set up and operate. Nevertheless, it has fewer its documentation is sparse, it doesn’t support data from logs, and does not provide graphs with historic/time-based node statistics.
Second, as generic monitoring tools, Prometheus and Grafana enable you to monitor everything, but they aren’t tailored to OpenSearch specifically. This can be quite limiting. Although users can plot many different kinds of graphs in Grafana, they cannot display which nodes are connected to the cluster and which have been disconnected. In addition, Grafana does not support an index or shard view, making it impossible to see where shards are located or to track the progress of shard relocation.
Finally, Opster Management Console (OMC) is relatively new but shows great promise and has gained popularity among OpenSearch users due to its advanced capabilities and ease of use.
Why standard monitoring tools aren’t enough
When it comes to OpenSearch, even with reliable monitoring tools in place, you may still encounter sudden, unexpected, and serious downtime episodes. Let’s take a closer look at why this is the case.
There are several reasons monitoring tools alone aren’t enough. For starters, it’s bad practice to install a monitoring tool and forget about it. Rather, you should keep up with the latest configuration guidelines and best practices and know how to implement them correctly.
Second, choosing which metrics to monitor and knowing how to analyze them is no small feat, as OpenSearch infrastructure can become quite complex. With so many metrics interacting with each other, even the smallest change can adversely impact performance. A monitoring tool may indicate you’ve run out of memory, for example, but this information alone isn’t enough to identify the underlying cause, let alone to resolve the issue and prevent recurrence.
Also, while traditional commercial monitoring tools are useful for event correlation and providing alerts, they still lack the capabilities needed to truly get to the bottom of your OpenSearch issues. Despite claims of providing root-cause analysis, these solutions generally provide basic event correlation analysis while failing to identify the root cause, which is critical for forecasting and avoiding future issues.
OpenSearch performance is crucial, especially as operations scale or when applications that affect end users are on the line. Successfully operating OpenSearch requires much more than improved monitoring and alerts: Teams must have access to tools with advanced prediction and problem-solving capabilities in order to tackle the complicated issues that may arise.
Ensuring visibility is critical for successfully managing complex systems. While there are many tools available for monitoring OpenSearch, not all are created equal. Most standard tools offer only basic analysis and do not get to the heart of the problem. Given the complexity of OpenSearch, this is inadequate in production, especially for operations at scale or those affecting customer experience.
No matter which monitoring tool you decide to use and how you’re hosting your OpenSearch deployment, you can benefit from Opster’s complete solution for OpenSearch. With an Opster AutoOps subscription, your database administration will be taken care of from start to finish, including advanced monitoring with proactive incident prevention. You’ll benefit from complete resolution of issues in your infrastructure & data layers, end-to-end support, and constant optimization of your clusters. Try Opster AutoOps for free.