How To Solve Issues Related to Log – Failed to reconnect to node

Prevent Your Next ELK Incident

Try our free Check Up to test if your ES issues are caused from misconfigured settings

Fix Issue

Updated: Feb-20

In Page Navigation (click to jump) :
Troubleshooting Background       
Related Issues  
Log Context
About Opster

Opster Offer’s World-Class Elasticsearch Expertise In One Powerful Product
Try Our Free ES Check-Up   Prevent Incident

Troubleshooting background

To troubleshoot Elasticsearch log “Failed to reconnect to node” it’s important to understand common problems related to Elasticsearch concepts: cluster, node. See detailed explanations below complete with common problems, examples and useful tips.

Nodes in Elasticsearch

What it is

Simply explained a node is a single server that is part of a cluster. Each node is assigned with one or more roles, which describes the node responsibility and operations – Data nodes stores the data, and participates in the cluster’s indexing and search capabilities, while master nodes are responsible for managing the cluster activities and storing the cluster state, including the metadata.

While it’s possible to run several Node instances of Elasticsearch on the same hardware, it’s considered a best practice to limit a server to a single running instance of Elasticsearch.

Nodes connect to each other and form a cluster by using a discovery method. 

Roles
Master node

Master nodes are in charge of cluster-wide settings and changes  – deleting or creating indices and fields, adding or removing nodes and allocating shards to nodes. Each cluster has a single master node that is elected from the master eligible nodes using a distributed consensus algorithm and is reelected if the current master node fails.

Coordinator Node (aka client node)

Coordinator Node – is a node that does not hold any configured role. It doesn’t hold data, not part of the master eligible group nor execute ingest pipelines. Coordinator node serves incoming search requests and is acting as the query coordinator – running the query and fetch phases, sending requests to every node which holds a shard being queried. The client node also distributes bulk indexing operations and route queries to shards copies based on the nodes responsiveness.


To help troubleshoot related issues we have gathered selected Q&A from the community and issues from Github , please review the following for further information :

1 Node1 Failed To Reconnect To Node N  

2 Message Warn Cluster Service Node1  


Log Context

Log ” failed to reconnect to node {}” classname is InternalClusterService.java
We have extracted the following from Elasticsearch source code to get an in-depth context :

                                 }
                                // log every 6th failure
                                if ((nodeFailureCount % 6) == 0) {
                                    // reset the failure count...
                                    nodeFailureCount = 0;
                                    logger.warn("failed to reconnect to node {}"; e; node);
                                }
                                failureCount.put(node; nodeFailureCount);
                            }
                        }
                    }






About Opster

Incorporating deep knowledge and broad history of Elasticsearch issues. Opster’s solution identifies and predicts root causes of Elasticsearch problems, provides recommendations and can automatically perform various actions to manage, troubleshoot and prevent issues.

Learn more: Glossary | Blog| Troubleshooting guides | Error Repository

Need help with any Elasticsearch issue ? Contact Opster

Did this page help you?