Elasticsearch Elasticsearch Strings – Keyword VS Text VS Wildcard

Average Read Time

2 Mins

Elasticsearch Elasticsearch Strings – Keyword VS Text VS Wildcard

Opster Expert Team - Saskia

May-2022

Average Read Time

2 Mins

Opster Team

October 2021

Average Read Time

2 Mins


In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.

The Elasticsearch Check-Up is free and requires no installation.

In addition to reading this guide, we recommend you run the Elasticsearch Configuration Check-Up. The Check-Up will help you check and optimize important settings in Elasticsearch to improve performance.

When to use keyword, wildcard or text field types in Elasticsearch

Overview

String literals in Elasticsearch can come in different flavors. Keyword, wildcard and text field types all have different features and are ideal for different use cases. Below is an explanation of the differences between each one and the context in which to use the different types for your string fields. 

Text vs. Keyword

By default, in recent versions of Elasticsearch all string fields get indexed as both text and keyword. 

The difference between text and keyword

In early Elasticsearch versions there was a field type called “string”. This was used to enable full text search. These fields would go through an analysis pipeline that performs operations such as lowercasing, removing punctuation, splitting the document into single tokens and filtering them further by stopwords etc. 

This process works perfectly for searching larger documents, but sometimes this isn’t the ideal behavior. When you want to filter by certain values or list them all using aggregations, you need a different type because you don’t want the input document to go through an analysis pipeline. You want it to stay not analyzed. 

So if you wanted to use a field for exact filtering or term aggregations you had to configure the field of type “string” with: “index” : “not_analyzed”. 

"old_string_field" : {
	"type" : "string",
	"fields" : {
  	  "keyword" : {
    	    "type" : "string",
    	    "index" : "not_analyzed"
  	  }
	}
  }

This was exactly how you could differentiate between text and keyword. Since this was not very intuitive for users not familiar with information retrieval, 2 new types were created: text and keyword. 

As of Elasticsearch version 5 the default mapping for String literals is:

"new_string_field" : {
	"type" : "text",
	 "fields" : {
  	    "keyword" : {
    	      "type" : "keyword"
  	    }
	}
  }

So the differences are: 

  • Text is fully analyzed and can be used for partial full text matching.
  • Keyword will be indexed as is without any modification. It’s ideal for term aggregations and for filtering exact values.

Keyword vs. Wildcard

When you’re planning to run many wildcard queries you should use the wildcard type. It works well for machine-generated content like log messages that you would typically grep through in the terminal. 

Performance is usually poor if you’re running wildcard queries on regular text or keyword fields. If you already know your users will run wildcard queries, you should use the wildcard field to maintain cluster stability. Read more about wildcard fields and how they process queries internally. 

The wildcard type was introduced in Elasticsearch version 7.9. 

Text vs. Match Only Text

The type “match_only_text” is very similar to “text” but it saves disk space by sacrificing granular scoring. Read more about it here.   

Code samples

Create a multi-field mapping to enable all string types on the field message:

PUT string-types
{
  "mappings": {
	"properties": {
  	"message": {
    	"type": "text",
    	"analyzer": "standard",
    	"fields": {
      	"keyword": {
        	"type" : "keyword"
      	},
      	"wildcard_field" : {
        	"type" : "wildcard"
      	}
    	}
  	}
	}
  }
}

Which Elasticsearch string type should I use?

Use the text field if:

  • You’re planning to perform regular fulltext search / search for a specific word or phrase
  • The content is in in regular, written text, such that a person could easily read

Use the keyword type if: 

  • You’re planning to filter exact values 
  • You’re planning to filter on prefix character sequences
  • You’re planning to perform term aggregations like for a faceted navigation on a website

Use the wildcard type if:

  • You’re trying to find the needle in poorly tokenized or machine generated text
  • You do not intend to use queries that rely on word positions

Use match_only_text if:

  • You intend to run fulltext search but granular scores are not very important to you


Run the Check-Up to get a customized report like this:

Analyze your cluster