In addition to reading this guide, run the Elasticsearch Health Check-Up. Detect problems and improve performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and many more.
Free tool that requires no installation with +1000 users.
Nodes with voting rights may be any nodes with either of the following configurations:
node.master: true node.voting_only: true
It does not matter whether the node is a dedicated master node or not.
Quorum can be lost for one or more of the following reasons:
- Bad configuration (insufficient nodes configured with voting rights)
- Nodes are deliberately removed from the cluster
- Networking issues causing nodes to disconnect from the cluster
- Performance issues causing nodes to crash
You should not remove more than one node with voting rights at a time. The “quorum” of a cluster is maintained in a cluster state, and the cluster takes some time (by default 30s) to adjust the quorum according to the number of nodes available with voting rights.
This article is based on Elasticsearch version 7. Version 7 has completely changed the way in which the master nodes and voting rights are configured. In older versions then this situation is described as the split-brain problem – read more about older version here: Elasticsearch split brain explained.
How to resolve this issue
Make sure you have at least 3 nodes configured with voting rights alive in your cluster. If you don’t have them, you will need to add new master eligible nodes.
If you think you have 3 or more nodes with voting rights, then look in the logs to see if nodes are accidentally leaving the cluster which could be an indication of networking or other performance issues.
How to avoid this issue
For large clusters it is advisable to configure 3 dedicated master nodes and configure the rest of the nodes as:
node.master: false node.voting_only: false
The dedicated master nodes are not involved in high load activities, so there is less risk of harming their stability. This also helps avoid the risk of addition or removal of data nodes (for example when scaling up and down the cluster) temporarily causing a change in the quorum required for the election of a new master node.
When carrying out this change, do not change the configuration at once. Allow at least one minute between each node restart to ensure that the cluster has time to adjust the quorum to the reduction in number of master nodes.
Alternatively, you can use:
To manually remove a node from the voting configuration.