Elasticsearch How to configure all Elasticsearch node roles (master, data, coordinating..)

Opster Expert Team

Nov 2, 2022 | 5 min read

Opster Team

Nov 2, 2022 | 5 min read


In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.

The Elasticsearch Check-Up is free and requires no installation.

This guide will cover how to configure node roles in Elasticsearch. If you’d like to perform this automatically, you can use Opster’s Management Console to configure node roles with the press of a button. The instructions in this guide refer to manual processes in Elasticsearch.

Quick links

Background – what are nodes?

Every Elasticsearch instance we run is called a node, and multiple nodes comprise a cluster. Each node in a cluster is aware of all other nodes and forwards the requests accordingly. Clusters can consist of only a single node, though this isn’t recommended for production. In this article, we will review the different types of node roles and how to configure these roles in Elasticsearch to enable efficient full text search. 

What are node roles?

The node role defines the purpose of the node and its responsibilities. We can configure multiple roles for each node based on the cluster configuration. If we don’t explicitly specify the node’s role, Elasticsearch automatically configures all roles to that node. This does not differ among the different versions of Elasticsearch.

The cluster details of such nodes will appear as:

Node role cluster details

Types of node roles

  1. Master
  2. Data (data_cold, data_hot, data_frozen, data_warm, data_content)
  3. Coordinating
  4. Ingest
  5. Machine learning
  6. Remote eligible
  7. Transform

Master node

The node to which we assign a master role is called a “master” node. The master node manages all cluster operations like creating/deleting an index and it keeps track of all available nodes in the cluster. While creating shards, the master node decides the node upon which each shard should be allocated. This node will not handle any user requests.

When will the master election happen? The election process happens during startup or when the current master node goes down. Any master-eligible node except the “Voting-only” node can become a master node during the master election process. 

To set node role, edit the node’s “elasticsearch.yml” and add the following line:

node.roles: [“master”]

Data node

The node to which we assign a data role is called a “data” node. A data node holds the indexed data and it takes care of CRUD, search and aggregations (operations related to the data).

Without a data node it is difficult for a cluster to operate. Seeing as all the operations carried out by data nodes are I/O, memory and CPU intensive, it is important to monitor and allocate sufficient data nodes.

There are specialized data roles like data_content, data_hot, data_cold, data_warm and data_frozen which can be used in multi-tier deployment architecture. In general it is NOT necessary to configure all of the specific roles, and you can just use the data role.  If you want to configure hot cold architecture, please see this guide.

To set node role, edit the node’s “elasticsearch.yml” and add the following line:

node.roles: [“data”]

Data content node

Data content nodes are part of the content tier. These types of nodes will be used mainly to store archive and catalog data, where we might not do real-time indexing or frequent indexing like logs.

Even though these types of data will not be indexed frequently, their requirement would be to fetch results faster. To provide better search performance, these types of nodes are optimized. They prioritize query processing over usual I/O throughput, so complex searches and aggregations will be processed quickly

To set this node role, edit the node’s “elasticsearch.yml” and add the following line:

node.roles: [“data_content”]

Data hot node

Data hot nodes are part of the hot tier. This role is not necessary unless you want to configure hot-cold architecture.

Hot tier nodes are mainly used to store the most frequently updated and recent data. These types of data nodes should be fast during both search and indexing. Therefore, they require more RAM, CPU and fast storage.

To set this node role, edit the node’s “elasticsearch.yml” and add the following line:

node.roles: [“data_hot”]

Data warm node

Data warm nodes are part of the warm tier. This role is not necessary unless you want to configure hot-cold architecture.

Warm tier nodes are used for storing time series data that are less frequently queried and rarely updated. Warm nodes will typically have larger storage capacity in relation to their RAM and CPU.

To set this node role, edit the node’s “elasticsearch.yml” and add the following line:

node.roles: [“data_warm”]

Data cold node

Data cold nodes are part of the cold tier. This role is not necessary unless you want to configure hot-cold architecture.

Time series data that no longer needs to be searched regularly will be moved from the warm tier to the cold tier.

Since search performance is not a priority, these nodes are usually configured to have higher storage capacity for a given RAM and CPU.

To set this node role, edit the node’s “elasticsearch.yml” and add the following line:

node.roles: [“data_cold”]

Data frozen node

Data frozen nodes are part of the frozen tier.  This role is not necessary unless you want to configure hot-cold architecture.

Data that is queried rarely and never updated will be moved from cold tier to the frozen tier.

This type of node may reduce storage and operating costs, while still allowing the user to search on frozen data.

To set this node role, edit the node’s “elasticsearch.yml” and add the following line:

node.roles: [“data_frozen”]

Coordinating node

Coordinating-only nodes act as load-balancers. This type of node routes requests to data nodes and handles bulk indexing by distributing the requests.

These types of nodes are used in larger clusters. By getting the cluster state from all the nodes, the coordinating-only node will route requests accordingly.

In small clusters, it is usually not necessary to use a coordinating node, since the same role will be handled by data nodes, and the greater complexity is not justified on a small cluster.

To make a node “coordinating only” node, add the following configuration to the “elasticsearch.yml” file:

node.roles: []

Ingest node

If there is any pre-processing needed in the indexing using ingest pipelines, ingest nodes can be configured separately to handle it. 

Allocating separate nodes to do pre-processing will help to reduce the required resources for all nodes performing this operation.

To make a node an “ingest” node, add the following configuration to the “elasticsearch.yml” file:

node.roles: [“ingest”]

Machine learning node

Machine learning nodes are used to handle Machine learning API requests. The machine learning flag (xpack.ml.enabled) is enabled by default and it uses a CPU that supports SSE4.2 instructions. 

To configure a machine learning node, add the following configuration to the “elasticsearch.yml” file:

node.roles: [“ml”]

In the event that you are using the remote_cluster_client functionality for machine learning (see below), then you should also configure this role for the ML nodes.

It is also recommended not to use a dedicated master or coordinating node as a machine learning node.

node.roles: [“ml”,  “remote_cluster_client”]

Remote eligible node

Remote clusters are clusters that are located in different data centers or different regions, where indices are replicated with cross-cluster replication and searched using cross-cluster search.

To configure a remote eligible node, add the following configuration to “elasticsearch.yml”:

node.roles: [“remote_cluster_client”]

Transform node

Transform APIs are mainly used to convert existing indices and provide insights and analytics on the summarized data. Transform nodes handle these transform API requests.

To run transforms, it is mandatory to have at least one transform node in the cluster. Similar to the ML node, it is recommended to configure it as both remote_cluster_client and a transform node in the event that you use remote cluster functionality.

To configure a transform node, add the following configuration to “elasticsearch.yml”:

node.roles: [“transform”, “remote_cluster_client”]

For a full discussion on hot-cold architecture in Elasticsearch, please see https://opster.com/guides/elasticsearch/capacity-planning/elasticsearch-hot-warm-cold-frozen-architecture/.

Conclusion

It is vital to configure the ES cluster and nodes as per the requirements to build a  high-performance and fault-tolerant search application. With this article, we hope you got a clear idea on nodes and roles of each node type, to enable you to configure your cluster accordingly and build an effective search application.


Watch product tour

Watch how AutoOps finds & fixes Elasticsearch problems

Analyze Your Cluster
Skip to content