Opster Team
In addition to reading this guide, we recommend you run the Elasticsearch Template Optimizer to fix problems in your data modeling.
It will analyze your templates to detect issues and improve search performance, reduce indexing bottlenecks and optimize storage utilization. The Template Optimizer is free and requires no installation.
This guide will help you check for common problems that cause the log to appear. It’s important to understand the issues related to the log, so to get started, read the general overview on common issues and tips related to the Elasticsearch concepts below. Advanced users might want to skip right to the common problems section in each concept or run the template optimizer.
Background
A mapping type used to be a separate collection inside the same index. _doc is also a mapping type. For example, before ES version 6, a forum index can have two types: user and messages. Both these types can belong to the same index, and you can search for these multiple types in a single index itself.
But since mapping types were deprecated in ES version 6, users can only use one mapping type. You can either keep the mapping type as user or messages. Later on, for ES version 7, _doc is a part of the path.
For more information on the deprecating of mapping types, refer to this explanation.
How to reproduce this log in Elasticsearch version 7.x
Create index:
PUT /my-index { "mappings": { "properties": { "title": { "type": "text" } } } }
Index data:
POST /my-index/_doctype/1?pretty { "title":"hello world" }
The response will be:
{ "error": { "root_cause": [ { "type": "invalid_type_name_exception", "reason": "mapping type name [_doctype] can't start with '_' unless it is called [_doc]" } ], "type": "invalid_type_name_exception", "reason": "mapping type name [_doctype] can't start with '_' unless it is called [_doc]" }, "status": 400 }
The log generated is:
[INFO ][o.e.a.b.TransportShardBulkAction] [my-index][0] mapping update rejected by primary org.elasticsearch.indices.InvalidTypeNameException: mapping type name [_doctype] can't start with '_' unless it is called [_doc]
What this error means
This log message is an INFO message saying that you cannot use any other mapping type. Elasticsearch indices now support only the single document type, _doc.
You can index a new JSON document or update a document with the _doc mapping type ONLY.
Quick troubleshooting steps
Considering the above example, you need to use _doc instead of _doctype, as shown below:
POST /my-index/_doc/1?pretty { "title":"hello world" }
Overview
In Elasticsearch, when using the Bulk API it is possible to perform many write operations in a single API call, which increases the indexing speed. Using the Bulk API is more efficient than sending multiple separate requests. This can be done for the following four actions:
- Index
- Update
- Create
- Delete
Examples
The bulk request below will index a document, delete another document, and update an existing document.
POST _bulk { "index" : { "_index" : "myindex", "_id" : "1" } } { "field1" : "value" } { "delete" : { "_index" : "myindex", "_id" : "2" } } { "update" : {"_id" : "1", "_index" : "myindex"} } { "doc" : {"field2" : "value5"} }
Notes
- Bulk API is useful when you need to index data streams that can be queued up and indexed in batches of hundreds or thousands, such as logs.
- There is no correct number of actions or limits to perform on a single bulk call, but you will need to figure out the optimum number by experimentation, given the cluster size, number of nodes, hardware specs etc.
Overview
Mapping is similar to database schemas that define the properties of each field in the index. These properties may contain the data type of each field and how fields are going to be tokenized and indexed. In addition, the mapping may also contain various advanced level properties for each field to define the options exposed by Lucene and Elasticsearch. You can create a mapping of an index using the _mappings REST endpoint. The very first time Elasticsearch finds a new field whose mapping is not pre-defined inside the index, it automatically tries to guess the data type and analyzer of that field and set its default value. For example, if you index an integer field without pre-defining the mapping, Elasticsearch sets the mapping of that field as long.
Examples
Create an index with predefined mapping:
PUT /my_index?pretty { "settings": { "number_of_shards": 1 }, "mappings": { "properties": { "name": { "type": "text" }, "age": { "type": "integer" } } } }
Create mapping in an existing index:
PUT /my_index/_mapping?pretty { "properties": { "email": { "type": "keyword" } } }
View the mapping of an existing index:
GET my_index/_mapping?pretty
View the mapping of an existing field:
GET /my_index/_mapping/field/name?pretty
Notes
- It is not possible to update the mapping of an existing field. If the mapping is set to the wrong type, re-creating the index with updated mapping and re-indexing is the only option available.
- In version 7.0, Elasticsearch has deprecated the document type and the default document type is set to _doc. In future versions of Elasticsearch, the document type will be removed completely.
Common problems
- The most common problem in Elasticsearch is incorrectly defined mapping which limits the functionality of the field. For example, if the data type of a string field is set as text, you cannot use that field for aggregations, sorting or exact match filters. Similarly, if a string field is dynamically indexed without predefined mapping, Elasticsearch automatically creates two fields internally. One as a text type for full-text search and another as keyword type, which in most cases is a waste of space.
- Elasticsearch automatically creates an _all field inside the mapping and copies values of each field of a document inside the _all field. This field is used to search text without specifying a field name. Make sure to disable the _all field in production environments to avoid wasting space. Please note that support for the _all field has been removed in version 7.0.
- In versions lower than 5.0, it was possible to create multiple document types inside an index, similar to creating multiple tables inside a database. In those versions, there were higher chances of getting data types conflicts across different document types if they contained the same field name with different data types.
- The mapping of each index is part of the cluster state and is managed by master nodes. If the mapping is too big, meaning there are thousands of fields in the index, the cluster state grows too large to be handled and creates the issue of mapping explosion, resulting in the slowness of the cluster.
What are shards? + A common issue - 2 min
Overview
Data in an Elasticsearch index can grow to massive proportions. In order to keep it manageable, it is split into a number of shards. Each Elasticsearch shard is an Apache Lucene index, with each individual Lucene index containing a subset of the documents in the Elasticsearch index. Splitting indices in this way keeps resource usage under control. An Apache Lucene index has a limit of 2,147,483,519 documents.
Examples
The number of shards is set when an index is created, and this number cannot be changed later without reindexing the data. When creating an index, you can set the number of shards and replicas as properties of the index using:
PUT /sensor { "settings" : { "index" : { "number_of_shards" : 6, "number_of_replicas" : 2 } } }
The ideal number of shards should be determined based on the amount of data in an index. Generally, an optimal shard should hold 30-50GB of data. For example, if you expect to accumulate around 300GB of application logs in a day, having around 10 shards in that index would be reasonable.
During their lifetime, shards can go through a number of states, including:
- Initializing: An initial state before the shard can be used.
- Started: A state in which the shard is active and can receive requests.
- Relocating: A state that occurs when shards are in the process of being moved to a different node. This may be necessary under certain conditions, such as when the node they are on is running out of disk space.
- Unassigned: The state of a shard that has failed to be assigned. A reason is provided when this happens. For example, if the node hosting the shard is no longer in the cluster (NODE_LEFT) or due to restoring into a closed index (EXISTING_INDEX_RESTORED).
In order to view all shards, their states, and other metadata, use the following request:
GET _cat/shards
To view shards for a specific index, append the name of the index to the URL, for example:
sensor: GET _cat/shards/sensor
This command produces output, such as in the following example. By default, the columns shown include the name of the index, the name (i.e. number) of the shard, whether it is a primary shard or a replica, its state, the number of documents, the size on disk, the IP address, and the node ID.
sensor 5 p STARTED 0 283b 127.0.0.1 ziap sensor 5 r UNASSIGNED sensor 2 p STARTED 1 3.7kb 127.0.0.1 ziap sensor 2 r UNASSIGNED sensor 3 p STARTED 3 7.2kb 127.0.0.1 ziap sensor 3 r UNASSIGNED sensor 1 p STARTED 1 3.7kb 127.0.0.1 ziap sensor 1 r UNASSIGNED sensor 4 p STARTED 2 3.8kb 127.0.0.1 ziap sensor 4 r UNASSIGNED sensor 0 p STARTED 0 283b 127.0.0.1 ziap sensor 0 r UNASSIGNED
Notes and good things to know
- Having shards that are too large is simply inefficient. Moving huge indices across machines is both a time- and labor-intensive process. First, the Lucene merges would take longer to complete and would require greater resources. Moreover, moving the shards across the nodes for rebalancing would also take longer and recovery time would be extended. Thus by splitting the data and spreading it across a number of machines, it can be kept in manageable chunks and minimize risks.
- Having the right number of shards is important for performance. It is thus wise to plan in advance. When queries are run across different shards in parallel, they execute faster than an index composed of a single shard, but only if each shard is located on a different node and there are sufficient nodes in the cluster. At the same time, however, shards consume memory and disk space, both in terms of indexed data and cluster metadata. Having too many shards can slow down queries, indexing requests, and management operations, and so maintaining the right balance is critical.

Log Context
Log “{} mapping update rejected by primary” classname is TransportShardBulkAction.java.
We extracted the following from Elasticsearch source code for those seeking an in-depth context :
try { primary.mapperService().merge(context.getRequestToExecute().type(); new CompressedXContent(result.getRequiredMappingUpdate(); XContentType.JSON; ToXContent.EMPTY_PARAMS); MapperService.MergeReason.MAPPING_UPDATE_PREFLIGHT); } catch (Exception e) { logger.info(() -> new ParameterizedMessage("{} mapping update rejected by primary"; primary.shardId()); e); onComplete(exceptionToResult(e; primary; isDelete; version); context; updateResult); return true; } mappingUpdater.updateMappings(result.getRequiredMappingUpdate(); primary.shardId();
Find & fix Elasticsearch problems
Opster AutoOps diagnoses & fixes issues in Elasticsearch based on analyzing hundreds of metrics.
Fix Your Cluster IssuesConnect in under 2 minutes
Arpit Ghiya
Senior Lead SRE at Coupa