Elasticsearch How to Use Runtime Fields in Elasticsearch

Average Read Time

5 Mins

Elasticsearch How to Use Runtime Fields in Elasticsearch

Opster Expert Team - Gustavo

Sep-2021

Average Read Time

5 Mins

Opster Team

October 2021

Average Read Time

5 Mins


In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.

The Elasticsearch Check-Up is free and requires no installation.

To evaluate your use of runtime fields in Elasticsearch, we recommend you run the Elasticsearch Configuration Check-Up. The Check-Up will also help you optimize other important settings and processes in Elasticsearch to improve performance and ensure high availability for your crucial data.

Overview

Elasticsearch 7.12 released a new feature called runtime fields. A runtime field is a field evaluated at query time instead of indexing time, which allows us to modify our schema at the query stage. Below we’ll review query and index phases, and how, when and why you should (or shouldn’t) use runtime fields.

Index time vs query time

When we talk about index time we refer to the actions before running our queries, for example ingesting documents or setting up mappings. Query time is, understandably, when we run the queries themselves. 

Index time is affected by the size of data to be indexed (the more fields we index, the bigger the document size), as well as CPU usage (the higher the CPU usage at that time, the slower the indexing will be). We can cut down indexing time by using runtime fields to conserve CPU usage, however this would result in slower query times as the retrieval of the data would require additional processing.   

When should we calculate fields?

The rule of thumb is to “do as much processing as possible on index time” or index the calculations as computed fields. This way we reduce the load at query time, resulting in potentially faster queries.

Let’s illustrate this with an example.

A runner joined a competition consisting of 3 races and was stored as follows (note we used objects to store the participations):

{
 "participant": "Fast Runner",
 "participations": {
   "race1": {
     "place": 2,
     "time_secs": 55.2
   },
   "race2": {
     "place": 4,
     "time_secs": 49.22
   },
   "race3": {
     "place": 1,
     "time_secs": 54.25
   }
 }
}

We want to be able to search through runners and show individual race times but also an average of race times. The question is: Do we generate this piece of data at query time, or index time? 

The rule of thumb says at index time. Just calculate the average before indexing the document, and store this number under a new field. There are many ways to achieve this, such as by using an ingest pipeline. We can also obtain an average at query time. This means calculating the average of the races on the fly on each query.

There are certain situations in which you would want to calculate fields at query time, such as if: 

  • You don’t have access to the ingestion process.
  • You want to test the field before ingesting it.
  • You are not sure the mapping is finalized yet.
  • You want to see the data without having to re-ingest it.
  • You need to see quickly whether transformation is worth it or not.
  • The disk space is a priority.

Script fields

Before runtime fields, the way to go was to use script fields. Script fields are generated using painless scripts to process data between fields of a document. 

Let’s put our data in Kibana Dev Tools.

First create the index:

PUT runtime_test

Now index some documents:

PUT runtime_test/_doc/1
{
  "participant": "Fast Runner",
  "participations": {
    "race1": {
      "place": 2,
      "time_secs": 55.2
    },
    "race2": {
      "place": 4,
      "time_secs": 49.22
    },
    "race3": {
      "place": 1,
      "time_secs": 54.25
    }
  }
}
PUT runtime_test/_doc/2
{
  "participant": "Slow Runner",
  "participations": {
    "race1": {
      "place": 10,
      "time_secs": 115.50
    },
    "race2": {
      "place": 8,
      "time_secs": 99.54
    },
    "race3": {
      "place": 10,
      "time_secs": 100.11
    }
  }
}

This is how we generate our average field with script_fields at the same time we query for all documents:

GET runtime_test/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "avg": {
      "script": {
        "source": "(doc['participations.race1.time_secs'].value + doc['participations.race2.time_secs'].value + doc['participations.race3.time_secs'].value)/3;"
      }
    }
  }
}

There are many operations we can do using painless, but remember that it can affect our query performance.

The big limitation of script fields is that they are applied after the query is made, in the fetch phase. This phase occurs after N results are selected from the query. In this case the script field acts more as a “decorator”, adding extra information to the documents, and not affecting how this document had been queried. 

Script fields, moreover, have nothing to do with the mappings. Script fields only exist in the query context in a very rigid way.

Runtime fields

Runtime fields “enhance” the old script fields, introducing an interesting concept called “Schema on read”.  This way, you can not only define your mappings at index time, but also at query time, on the fly, and have (almost) all the advantages of the regular fields. This is possible because, contrary to the script fields, runtime fields are applied from the beginning of the query, meaning you can fetch documents based on these fields.

As we just learned, runtime fields can be defined at index time or query time. Going back to our example, we will first declare our race’s average field at index time and test it out.

Runtime fields are very flexible. We can add runtime fields to our mappings and then remove them easily. We can just use the same index and just add the field:

PUT runtime_test/_mapping
{
  "runtime": {
    "times_average": {
      "type": "double",
      "script": {
        "source": "emit((doc['participations.race1.time_secs'].value + doc['participations.race2.time_secs'].value + doc['participations.race3.time_secs'].value)/3);"
      }
    }
  }
}

* note the emit at the end of the script. This is needed to create runtime fields

Runtime fields are not indexed or stored, so they will not appear in the _source block if you run a query, but can easily be added to the response by adding the “fields” clause to the body of the query. 

​​GET runtime_test/_search
{
  "query": {
    "match_all": {}
  },
  "fields" : ["times_average"]
}

For querying, you treat the fields just as regular fields. Let’s filter out our record by our new times_average runtime field.

GET runtime_test/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "times_average": {
              "gte": 100,
              "lte": 200
            }
          }
        }
      ]
    }
  }
}

Our Fast Runner average is 52.89, so it will not show. Slow Runner averages 105.05, so it will. If you can change the gte value to be between 0 and 52,  Fast Runner will show up and the Slow Runner will not.

Alternatively, you can skip adding the runtime field to the mappings and pass the schema as a clause of the query, giving even more flexibility.

We can remove our runtime field from the index mappings by setting it to null:

PUT runtime_test/_mapping
{
  "runtime": {
    "times_average": null
  }
}

Make sure the field is not there by retrieving the index mappings:

GET runtime_test/_mapping

You can also use this approach to update mappings. Just PUT a runtime field with the same name and the script will update.

To use a runtime field for a particular query you just have to move the runtime block from the mappings to the query.

Repeat the query, but now put the runtime fields on the query:

GET runtime_test/_search
{
  "runtime_mappings": {
    "times_average": {
      "type": "double",
      "script": {
        "source": "emit((doc['participations.race1.time_secs'].value + doc['participations.race2.time_secs'].value + doc['participations.race3.time_secs'].value)/3);"
      }
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "times_average": {
              "gte": 50,
              "lte": 200
            }
          }
        }
      ]
    }
  },
  "fields": [
    "times_average"
  ]
}

Note we use runtime_mappings in the search query instead of the runtime section declaration in the mappings, but the rest of the block remains the same. 

This way we can query our runtime fields before even touching the mappings. 

Override fields

Another interesting feature of runtime fields is that you can override existing fields just by naming a runtime field with the same name as that indexed field you want to override.

Let’s assume that in our example, our Fast Runner received a penalty of 250 seconds because of cheating in the first race and we want to start querying against that change right away. Just define a runtime field with the same name and it will shadow it.

We have to filter the documents first to avoid penalizing other participants: 

GET runtime_test/_search
{
  "runtime_mappings": {
    "times_average": {
      "type": "double",
      "script": {
        "source": "emit((doc['participations.race1.time_secs'].value + doc['participations.race2.time_secs'].value + doc['participations.race3.time_secs'].value)/3);"
      }
    },
    "participations.race1.time_secs": {
      "type": "double",
      "script": "emit(doc['participations.race3.time_secs'].value + 250.00)"
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "times_average": {
              "gte": 50,
              "lte": 100
            }
          }
        },
        {
          "ids": {
            "values": [
              "1"
            ]
          }
        }
      ]
    }
  },
  "fields": [
    "times_average", "participations.race1.time_secs"
  ]
}

If we run this query it will return no results. This is because adding 250 seconds to the Fast Runner times, the average time changed from 52.89 to 135.90. Remember that the Slow Runner average is above 100 so it won’t show either.

As we can see, the query is leveraging the runtime field, not the indexed one, which is perfect. You can simulate many scenarios without reindexing data or losing the original indexed values.

Dynamic runtime fields

Elasticsearch dynamic mappings will index new fields from documents. This is useful because sometimes we need to query against fields we don’t know the names of beforehand. The problem is that this uncertainty could cause problems. Let’s say a document with 3000 fields arrives and we allow Elasticsearch to dynamically index these fields. This is likely to cause a mapping explosion.

We can set Elasticsearch to create runtime fields dynamically instead. As runtime fields are not indexed or stored, this will prevent a mapping explosion because runtime fields are not counted in the index.mapping.total_fields.limit. Remember these fields will be searchable, sortable, aggregable and filterable. Later on, you can decide which fields should be indexed to improve query performance.

To set runtime dynamic fields you have to specify the option when you create the index: 

PUT runtime_index
{
 "mappings": {
   "dynamic": "runtime",
   "properties": {
     "not_runtime_field": {
       "type": "text"
     }
   }
 }
}

Let’s repeat the index creation, now including a runtime field:

PUT runtime_index
{
 "mappings": {
   "dynamic": "runtime",
   "runtime": {
     "runtime_field": {
       "type": "text"
     }
   },
   "properties": {
     "not_runtime_field": {
       "type": "text"
     }
   }
 }
}

Now all documents you ingest with new fields, will be set up as runtime fields. These fields will use no disk space but be fully functional for queries!

Notes

Runtime fields are an excellent option to make ephemeral operations, experiment, and mock up schemas on top of your data easily in a very flexible way, without reindexing data.

It’s on you to balance between resource usage and query performance. After moving around your fields, make sure to make the right choice about which fields to keep runtime and which ones to index.



Run the Check-Up to get a customized report like this:

Analyze your cluster