Before you begin reading this guide, we recommend you try running the Elasticsearch Error Check-Up which can resolve issues that cause many errors.
This guide will help you check for common problems that cause the log ” Missing definition for aggregation ” + aggregationName + ” ” to appear. It’s important to understand the issues related to the log, so to get started, read the general overview on common issues and tips related to the Elasticsearch concepts: search and aggregations.
Advanced users might want to skip right to the common problems section in each concept or try running the Check-Up which analyses ES to pinpoint the cause of many errors and provides suitable actionable recommendations how to resolve them (free tool that requires no installation).
Overview
Search refers to the searching of documents in an index or multiple indices. The simple search is just a GET request to the _search endpoint. The search query can either be provided in query string or through a request body.
Examples
When looking for any documents in this index, if search parameters are not provided, every document is a hit and by default 10 hits will be returned.
GET my_documents/_search
A JSON object is returned in response to a search query. A 200 response code will mean the request completed successfully.
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.0, "hits" : [ ... ] } }
Notes and good things to know
- Distributed search is challenging and every shard of the index needs to be searched for hits, and then those hits are combined into a single sorted list as a final result.
- There are two phases of search: the query phase and the fetch phase.
- In the query phase, the query is executed on each shard locally and top hits are returned to the coordinating node. The coordinating node merges the results and creates a global sorted list.
- In the fetch phase, the coordinating node brings the actual documents for those hit IDs and returns them to the requesting client.
- A coordinating node needs enough memory and CPU in order to handle the fetch phase.
Aggregations in Elasticsearch
What is an Elasticsearch aggregation?
In Elasticsearch, an aggregation is a collection or the gathering of related things together. The aggregation framework collects data based on the documents that match a search request which helps in building summaries of the data. Below are the different types of aggregations:
Types of aggregations
- Bucket aggregations: Bucket aggregations create buckets or sets of documents based on values of fields in the documents. When the aggregation is performed, the documents are placed in the respective bucket(s). This way you can divide a set of invoices into several buckets, one for each customer, system logs can be divided into “error”,”warning” and “info”, or CPU performance data divided into hourly buckets. The output consists of a list of buckets, each with a key and a count of documents. Here are some examples of bucket aggregations: Histogram Aggregation, Range Aggregation, Terms Aggregation, Filter(s) Aggregations, Geo Distance Aggregation and IP Range Aggregation.
- Metric aggregations: Metric aggregation mainly refers to the mathematical calculations performed across a set of documents, usually based on the values of a numerical field present in the document, such as COUNT, SUM, MIN, MAX, AVERAGE etc. Metrics may be carried out at top level, but are often more useful as a sub aggregation to calculate values for a bucket aggregation.
- Pipeline aggregations: These aggregations allow you to aggregate based on the result of another aggregation rather than from document sets. Typically this aggregation is used to find the average number of documents in a bucket, or to sort buckets based upon a metric produced by a metric aggregation.
Aggregation syntax
GET /logs-000001/_search { "aggs": { "errors": { "terms": { "field": "log_type" } } } }
The above query does two things: The “query” part selects a number of documents from the index (the number of documents in the “hits” output, not the actual documents returned), while the aggregation (which is at the same level in the json) will split the document into buckets determined by the contents of the log_type field.
Nesting aggregations
It is possible to nest aggregations inside one another (nothing to do with nested fields), so as to divide the buckets into sub buckets, or to calculate metrics from the sub buckets. The below aggregation will separate out all exam results by gender of the pupil and then calculate the average results for each gender. In this case, the important thing to understand is that the second aggregation will be calculated on the individual set of the bucket rather than the document set as a whole.
POST exam_results*/_search { "size": 0, "aggs": { "genders": { "terms": { "field": "gender" }, "aggs": { "avg_grade": { "avg": { "field": "grades" } } } } } }
Aggregation performance
Aggregations are typically carried out in RAM memory, and require a different document access structure than a search query that is obtained from the inverted index, so it is important to consider the implication of performance when constructing your aggregations. The most important considerations are:
Number of buckets
This would be controlled by the “size” parameter in a terms aggregation, or the “calendar interval” in a date histogram. Bear in mind that where you have bucket aggregations nested at more than one level, then the total number of buckets will be multiplied for each level of aggregation.
Number of documents
When running an aggregation,it is preferable (if possible) to adjust the query so that your aggregation is only performed on a restricted set of those documents that you are interested in, instead of using a match_all query. This will reduce the memory required to run the aggregation.
Fielddata
Aggregations as a rule should always be run on keyword type fields, not analysed text. It is possible to run on analyzed text by using the mapping setting “fielddata”:”true” but this is highly memory intensive and should be avoided if possible.
Log Context
Log”Missing definition for aggregation [” + aggregationName + “]”classname is AggregatorFactories.java We extracted the following from Elasticsearch source code for those seeking an in-depth context :
#ERROR!
Run the Check-Up to get customized recommendations like this:

Heavy merges detected in specific nodes

Description
A large number of small shards can slow down searches and cause cluster instability. Some indices have shards that are too small…

Recommendations Based on your specific ES deployment you should…
Based on your specific ES deployment you should…
X-PUT curl -H [a customized recommendation]