Elasticsearch Elasticsearch Shingles Example

By Opster Team - May 2023

Updated: Jun 19, 2023

| 2 min read

Quick links

Elasticsearch Shingles Example: Boosting Relevance with N-Grams

Introduction

What is shingles in Elasticsearch?

Shingles, also known as word N-grams, are a useful technique for improving the relevance of search results in Elasticsearch. By breaking text into overlapping groups of words, shingles allow for more accurate matching of phrases and can help identify related documents.

In this article, we will explore how to use shingles in Elasticsearch with a practical example.

Creating a Custom Analyzer with Shingles

To use shingles, we need to create a custom analyzer that includes a shingle token filter. The following example demonstrates how to create an index with a custom analyzer that generates 2-word shingles:

PUT /shingle_example
{
"settings": {
"analysis": {
"analyzer": {
"shingle_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle_filter"
]
}
},
"filter": {
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 2,
"output_unigrams": false
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "shingle_analyzer"
}
}
}
}

In this example, we define a custom analyzer called “shingle_analyzer” that uses the standard tokenizer and includes a lowercase filter and a custom shingle filter. The shingle filter is configured to generate 2-word shingles and not output unigrams.

Indexing Documents with Shingles

Now that we have created an index with a custom shingle analyzer, let’s index some sample documents:

POST /shingle_example/_doc
{
"text": "The quick brown fox jumps over the lazy dog"
}

POST /shingle_example/_doc
{
"text": "The quick brown dog jumps over the lazy fox"
}

It’s easy to visualize the shingles that have been generated by running the following command:

POST /shingle_example/_analyze
{
  "analyzer”: "shingle_analyzer",
  "text": "The quick brown fox jumps over the lazy dog"
}

Searching with Shingles

To search for documents using shingles, we can use the match query with the custom analyzer:

GET /shingle_example/_search
{
"query": {
"match": {
"text": {
"query": "quick brown fox",
"analyzer": "shingle_analyzer"
}
}
}
}

The search results will show that the first document is more relevant than the second one, as it contains the exact phrase “quick brown fox”. Without shingles, both documents would have the same relevance score, as they contain the same individual words.

Conclusion

Using shingles in Elasticsearch can help improve the relevance of search results by considering the order of words and matching phrases more accurately. By creating a custom analyzer with a shingle token filter, you can easily implement this technique in your Elasticsearch setup. Experiment with different shingle sizes and configurations to find the best approach for your specific use case.

How helpful was this guide?

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?