Elasticsearch How to Model Relationships Between Documents in Elasticsearch Using Nesting

By Opster Expert Team

Updated: Mar 2, 2023

| 7 min read

Aside from reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.

Quick links

This article is part 2 of a 3 part series on modeling relationships between documents in Elasticsearch.

Background and overview

Elasticsearch uses a variety of methods for defining relationships between documents, including object types, nested documents, parent-child relationships, and denormalizing.

An object type that maintains the relationship between arrays of objects in a document is called a nested data type.

Article 1 of this series discussed both the possibilities and limitations of using Object Type fields inside Elasticsearch mappings. This article will go through how to use nested type objects to retain relationships between inner objects contained as document subfields.

Article 3 will discuss an alternative method using parent-child relationships.

How to use nested field type

Nested documents ensure the independence of each object in an array when arrays of objects are indexed, while maintaining a document-level appearance identical to that of inner objects. 

Utilizing the same example from Part 1 of this series, the following represents a nested document:

PUT book/_doc/1
{
  "title": "Machine Learning",
  "author": [
    {
      "first_name": "John",
      "last_name": "Stefan"
    },
    {
      "first_name": "Sandy",
      "last_name": "Naily"
    }
  ]
}

As seen in Part 1, if the document above is indexed using default dynamic mapping, the author field will be mapped as an “object” type field, and the relationship between first and last name will be lost.  

However, when “nested” is specified at the mapping level, those relationships are preserved.

Explicit mapping for the document above looks like this:

PUT books
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "author": {
        "type": "nested",
        "properties": {
          "first_name": {
            "type": "text"
          },
          "last_name": {
            "type": "text"
          }
        }
      }
    }
  }
}

Nested queries in Elasticsearch

Since nested objects internally treat each object in the array as a separate hidden document, each can be queried separately using a nested query:

Usersmust specify the path argument when running a nested query or filter to inform Elasticsearch of the nested objects location in the Lucene block. The nested query or filter should also contain a standard query or filter, as appropriate.

Consider the following two queries:

GET book/_search
{
  "query": {
    "nested": {
      "path": "author",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "author.first_name": "Sandy"
              }
            },
            {
              "match": {
                "author.last_name": "Stefan"
              }
            }
          ]
        }
      }
    }
  }
}
GET book/_search
{
  "query": {
    "nested": {
      "path": "author",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "author.first_name": "John"
              }
            },
            {
              "match": {
                "author.last_name": "Stefan"
              }
            }
          ]
        }
      }
    }
  }
}

This query will not match the aforementioned document because “Sandy” and “Stefan” are not inside the same nested object. However, the second query for “John Stefan” will match because John and Stefan are in the same nested object. Nested mapping and queries provide users with more intuitively correct results than object mapping.

Elasticsearch manages the nested relation internally, giving the impression of a nested hierarchy, despite the fact that it is still fundamentally flat. Elasticsearch actually indexes two distinct documents (the root object and the nested object) when indexing a nested document, then internally relates the two. The read performance is still very quick because both documents are kept in the same Lucene block on the same Shard.

Nested fields accept the following parameters:

  • Dynamic: this parameter determines whether to dynamically add additional properties to an existing nested object. This parameter can be set to strict, true (default value), and false.
  • Properties: this parameter represents the fields inside the nested object, which can be of any data type, even nested. New properties can be added to existing nested objects.

Include_in_parent: an optional parameter of type boolean. If set to true, all fields in the nested object will also be added to the parent document as standard (flat) fields. The default value is false.

PUT books
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "author": {
        "type": "nested",
        "properties": {
          "age": {
            "type": "float"
          },
          "name": {
            "type": "nested",
            "include_in_parent": true,
            "properties": {
              "first": {
                "type": "text"
              },
              "last": {
                "type": "text"
              }
            }
          }
        }
      }
    }
  }
}
  • Include_in_root: an optional parameter of type boolean. If it is set to true, all fields in the nested object will be added to the root document as standard (flat) fields. The default value is false.

The inner authors’ objects will be indexed twice when include_in_root is added to nested mapping: once as a nested document and once as an object within the root document. 

With the following mapping, users can utilize nested queries for nested documents and regular queries for cross-object matches:

PUT books
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "author": {
        "type": "nested",
        "include_in_root": true,
        "properties": {
          "first_name": {
            "type": "text"
          },
          "last_name": {
            "type": "text"
          }
        }
      }
    }
  }
}

Nested aggregation in Elasticsearch

A unique type of single-bucket aggregation called nested aggregation makes it possible to aggregate nested documents. Based on the following mapping, users have an index of books, where each book is written by a list of authors – each displaying authors’ names and ages:

PUT books
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "author": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "text"
          },
          "age": {
            "type": "double"
          }
        }
      }
    }
  }
}

The following request returns the youngest author’s age from those who wrote “Machine Learning “:

GET /books/_search
{
  "query": {
    "match": {
      "title": "Machine Learning"
    }
  },
  "aggs": {
    "authors": {
      "nested": {
        "path": "author"
      },
      "aggs": {
        "min_age": {
          "min": {
            "field": "author.age"
          }
        }
      }
    }
  }
}

The path of nested documents inside top-level documents is necessary for nested aggregation. Then, across these nested documents, any kind of aggregation can be defined.

Reverse nested aggregation in Elasticsearch

Reverse nested aggregation is a unique single-bucket aggregation that allows aggregating on parent documents from nested documents. This type of aggregation can effectively exit the nested block structure and link to other nested structures or the root document itself, allowing nesting of further aggregations that aren’t part of the nested object. 

Nested aggregations must contain the reverse_nested aggregation’s definition.

The path option for reverse nested aggregation specifies which nested object field should be joined back. When the default value is empty, this means that it joins back to the main document level (the root). The path cannot refer to a nested object field that is not contained in the nested aggregation’s nested structure a reverse_nested is in.

For example, the following mapping is used for a book’s index:

PUT /books
{
  "mappings": {
    "properties": {
      "title": {
        "type": "keyword"
      },
      "author": {
        "type": "nested",
        "properties": {
          "first_name": {
            "type": "keyword"
          },
          "last_name": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

The following aggregations will return the top authors’ first_name and per top author, the top title of the books the authors have written:

GET /books/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "author": {
      "nested": {
        "path": "author"
      },
      "aggs": {
        "top_first_name": {
          "terms": {
            "field": "author.first_name"
          },
          "aggs": {
            "author_to_book": {
              "reverse_nested": {},
              "aggs": {
                "top_book_per_author": {
                  "terms": {
                    "field": "title"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Since this is the only place in the DSL where the reverse_nested aggregation can be used, it is placed in nested aggregation, as seen above. Its sole function is to connect to a parent document higher up in the nested structure. Since no path has been defined, a reverse_nested aggregation that links back to the root/main document level is used. If more than one layered nested object type has been defined in the mapping, the reverse_nested aggregation can join to a different level through the path option.

Nested sorting

Elasticsearch also allows sorting based on fields contained within one or more nested objects. The nested sort option used when sorting by nested field support includes the following properties:

  • Path: specifies the nested object to be sorted. Inside this nested object, the actual sort field must be a direct field. This field is required for nested field sorting.
  • Filter: a filter that needs to be matched by the inner objects in the nested path in order for the field values to be considered during sorting. The query or filter is frequently repeated inside nested filters or queries. By default, there is no active filter.
  • Max_children: the maximum number of children that should be considered for each root document when choosing a sort value. Unlimited by default.
  • Nested: same as top-level nested, except it’s applied to different nested paths inside current nested objects.

Example based on the books index with the following mapping:

PUT /books
{
  "mappings": {
    "properties": {
      "title": {
        "type": "keyword"
      },
      "author": {
        "type": "nested",
        "properties": {
          "age": {
            "type": "double"
          },
          "first_name": {
            "type": "keyword"
          },
          "last_name": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

In the following example, the author is a type of nested field. Elasticsearch needs the nested path to be specified or it won’t know which nested level sort values need to be captured.

POST book/_search
{
  "query": {
    "term": {
      "title": "Machine Learning"
    }
  },
  "sort": [
    {
      "author.age": {
        "mode": "avg",
        "order": "asc",
        "nested": {
          "path": "author",
          "filter": {
            "term": {
              "author.last_name": "Naily"
            }
          }
        }
      }
    }
  ]
}

If a nested field is defined in a sort without a nested context, Elasticsearch will return an error.

Nested inner hits

Users can highlight the matching nested documents using inner_hits, shown in the following example:

GET book/_search
{
  "query": {
    "nested": {
      "path": "author",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "author.first_name": "John"
              }
            },
            {
              "match": {
                "author.last_name": "Stefan"
              }
            }
          ]
        }
      },
      "inner_hits": {
        "highlight": {
          "fields": {
            "author.first_name": {}
          }
        }
      }
    }
  }
}

In addition, nested inner objects can be included to search hits as inner hits, using nested inner_hits,  shown below:

POST books/_search
{
  "query": {
    "nested": {
      "path": "author",
      "query": {
        "match": {
          "author.first_name": "Elie"
        }
      },
      "inner_hits": {}
    }
  }
}

The inner hit definition is required in the nested query, no other options need to be specified.

Advantages and disadvantages of nested field type

Advantages:

  • Nested types are aware of object boundaries.
  • Users can perform any query across the union by using nested queries and aggregations, which join parent and child parts.
  • Because all the Lucene documents that make up an Elasticsearch document are present in the same block and segment, query-time joins are quick.
  • If all of the object’s functions are required, child documents in parent ones can be included. This functionality is transparent for your application.

Disadvantages:

  • A special nested query is required to access nested documents for search, aggregation, and highlighting, yielding complex queries.
  • Updates to a single field in a nested document will result in a forced reindex of the entire document. This is because all documents are stored in the same Lucene block, which doesn’t permit random write access to its segments. This covers the root and all other nested objects, regardless of whether they were modified. The old document is then marked as deleted internally by ES, the field is updated, and everything is then reindexed into a new Lucene block. Nested documents can result in non-negligible reindexing costs if data changes frequently.
  • Finally, “cross-referencing” between nested documents is not possible. The properties of one nested document cannot be seen by another. It’s possible to work around this by using “include_in_root,” which copies the nested documents into the root, however, this brings up issues with inner objects.

Objects and nested mapping limitations

Each nested object is indexed as a single Lucene document, as was previously mentioned. In keeping with the previous examples, 101 Lucene documents—one for the parent document and one for each nested object—would be created if a single document that includes 100 author objects is indexed. 

Given the expense involved with nested mappings, Elasticsearch provides the following parameter settings to prevent performance problems:

  • Index.mapping.nested_fields.limit: the max number of distinct nested mappings that can be found in an index. Only in exceptional circumstances, like when arrays of objects need to be queried independently of one another, can the nested type be used. This setting restricts the amount of distinct nested types per index to avoid improperly designed mappings. The default value is 50.

The author mapping in the previous example would only count as one to this cap.

  • Index.mapping.nested_objects.limit: The max number of nested JSON objects across all nested types that a single document may have. When a document has too many nested objects, this restriction helps prevent out-of-memory errors. The default value is 10,000.

Summary

  • If users need to create an array of objects where each object must be handled as a unique entity, the nested datatype is a great solution to consider. Nested documents are kept together in the same Lucene block, improving read/query performance.
  • A nested document can be read quicker than a parent/child equivalent.
  • The entire nested document must be reindexed when a single field in a nested document (parent or nested children) is changed. For large nested documents, this can be very expensive.
  • Specific queries and aggregations are required for nested mappings.
  • “Cross-referencing” nested documents isn’t feasible.
  • Data that does not change frequently is best suited for nested field types.
  • When users want to have both: objects for cross-object matching when you want and nested documents for preventing it, Elasticsearch makes this possible with mapping options include_in_root and include_in_parent.

How helpful was this guide?

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?