In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.The Elasticsearch Check-Up is free and requires no installation.
Before you begin reading this guide, we recommend you run the free OpenSearch Error Check-Up which analyzes 2 JSON files to detect many configuration errors.
Node disconnection, is another one of the issues that can be prevented and resolved automatically using AutoOps for OpenSearch. AutoOps will also help you optimize other important settings and processes in OpenSearch to improve performance and ensure high availability for your crucial data. Try it for free.
There are a number of possible reasons for a node to become disconnected from a cluster. It is important to take into account that node disconnection is often a symptom of some underlying problem which must be investigated and solved.
How to diagnose
The best way to understand what is going on in your cluster is to:
- Look at monitoring data
- Look at OpenSearch logs
Excessive garbage collection from JVM
If you can see that the JVM heap is not following a regular saw tooth pattern, but is showing an irregular curve upwards, or if you see many logs like this:
[2020-04-10T13:35:51,628][WARNING][o.e.m.j.JvmGcMonitorService] [ES2-QUERY] [gc] overhead, spent [615ms] collecting in the last [1s]
Then you almost certainly have a JVM garbage collection issue. This in turn is likely to be caused by configuration issues or the type and intensity of queries or indexing on the cluster, as explained on the following page: Heap Size Usage in OpenSearch – A Detailed Guide with Pictures
Configuration issues typically appear immediately when a node is started/restarted, or when nodes are added or removed from the cluster. However, some configuration issues can only come to the surface when the cluster is under stress (see excessive garbage collection above), or loses one or more nodes. Learn more here: Cluster Manager Not Discovered in OpenSearch – An In Depth Explanation
- Intentional node restart/reboot
- Intentional increase or reduction in the number of nodes
- Hardware / networking issues
How to prevent node disconnection
It is highly recommended to monitor your cluster on an independent OpenSearch cluster, so that the monitoring data is available when you need it. The last thing you want is not to be able to see your monitoring data because your OpenSearch cluster has gone down.
Look out for warnings and errors in your OpenSearch logs which may indicate issues that could bring a node and possibly your entire cluster down. Proactively acting upon these issues can result in them being solved before they cause more serious problems.
Optimize your search queries based on this guide: 10 Important Tips to Improve Search in OpenSearch.
Optimize your indexing performance based on this guide: Improve OpenSearch Indexing Speed with These Tips.
To easily detect and resolve the node disconnection issue in your deployment, try AutoOps for OpenSearch. AutoOps detects and resolves issues in OpenSearch, cuts-down administration time and reduces hardware costs.
We are sorry that this post was not useful for you!
Let us improve this post!
Tell us how we can improve this post?