Elasticsearch How to Handle Circuit Breakers in Elasticsearch

Elasticsearch How to Handle Circuit Breakers in Elasticsearch

Opster Team

Nov 2020


In addition to reading this guide, run the Elasticsearch Health Check-Up. Detect problems and improve performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and many more.
Free tool that requires no installation with +1000 users.

Run the Elasticsearch check-up to receive recommendations like this:

checklist Run Check-Up
error

The circuit breaker tripped count is high on your cluster

error-img

Description

When circuit breakers are tripped, search or indexing requests will be aborted, causing applications to throw exceptions...

error-img

Reccomendations

Based on your specific ES deployment you should...

1

X-PUT curl -H "Content-Type: application/json" [customized recommendation]

What are circuit breakers?

As explained in Opster’s Elasticsearch Memory Usage Guide, 50% of memory on an Elasticsearch node is generally used for the JVM (Java Virtual Machine) heap, while the other half of the memory is used for other requirements such as cache.

In order to prevent “Out of Memory” (OOM) errors, Elasticsearch implements circuit breakers. If a certain request could cause errors in the node because of memory issues, Elasticsearch will throw a “CircuitBreakerException” and reject the request rather than risk crashing the entire node.

A circuit breaker exception is usually an exception that is thrown to alert us of something else that needs to be fixed to reduce memory usage. Circuit breakers generally come with sensible defaults. Simply increasing the circuit breaking limit is likely to increase the risk that your node crashes due to an OutOfMemoryError.

If you get a circuit breaking exception, you should check what type of circuit breaker it is, and then look at your monitoring data and Elasticsearch logs to diagnose what caused it. Remember that the event or query that appears in the log may just be the “straw that broke the camel’s back”. There may be other causes of high memory usage, and the event in the log is just the very last one which pushed Elasticsearch over the limit. Possible causes are discussed in each section below.

Finding out your current circuit breaker status

Get your current settings

GET /_cluster/settings?include_defaults=true

Find out your current memory usage and breakers

GET _nodes/stats/breaker

This will return useful information like this

"breakers" : {
    	"request" : {
      	"limit_size_in_bytes" : 20574004838,
      	"limit_size" : "19.1gb",
      	"estimated_size_in_bytes" : 0,
      	"estimated_size" : "0b",
      	"overhead" : 1.0,
      	"tripped" : 0
    	},
    	"fielddata" : {
      	"limit_size_in_bytes" : 13716003225,
      	"limit_size" : "12.7gb",
      	"estimated_size_in_bytes" : 0,
      	"estimated_size" : "0b",
      	"overhead" : 1.03,
      	"tripped" : 0
    	},
    	"in_flight_requests" : {
      	"limit_size_in_bytes" : 34290008064,
      	"limit_size" : "31.9gb",
      	"estimated_size_in_bytes" : 6254164,
      	"estimated_size" : "5.9mb",
      	"overhead" : 2.0,
      	"tripped" : 0
    	},
    	"accounting" : {
      	"limit_size_in_bytes" : 34290008064,
      	"limit_size" : "31.9gb",
      	"estimated_size_in_bytes" : 282771278,
      	"estimated_size" : "269.6mb",
      	"overhead" : 1.0,
      	"tripped" : 0
    	},
    	"parent" : {
      	"limit_size_in_bytes" : 32575507660,
      	"limit_size" : "30.3gb",
      	"estimated_size_in_bytes" : 13431618584,
      	"estimated_size" : "12.5gb",
      	"overhead" : 1.0,
      	"tripped" : 0
    	}
  	}

Fielddata circuit breaker

indices.breaker.fielddata.limit  (default=40% JVM heap)

indices.breaker.fielddata.overhead (default=1.03)

The limit is set as a proportion of the JVM heap set in jvm.options, while the “overhead” setting is a fixed ratio which Elasticsearch uses to multiply the theoretical calculations to estimate the circuit breaker memory requirement.

Fielddata circuit breaker is a limit on the total amount of memory used by fielddata in your indices. Fielddata is by default set to false on a text field, but may be used where you have defined it in one of your mappings:

"fielddata": true

In general it is recommended to avoid this setting because of the large amount of memory required in putting individual text values into memory. If possible you should change your mappings to set it to false, and use keyword type mappings rather than text type for aggregations and sorting.

However, if this is not possible and you need to aggregate based on individual terms in a text rather than keywords, then you could also consider setting a fielddata frequency filter on the mapping to limit the amount of fielddata put into memory.

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "need_to_aggregate_individual_terms_on_this_field": {
        "type": "text",
        "fielddata": true,
        "fielddata_frequency_filter": {
          "min": 0.001,
          "max": 0.1,
          "min_segment_size": 500
        }
      }
    }
  }
}

Request circuit breaker

indices.breaker.request.limit(default=60% JVM heap)

indices.breaker.request.overhead(default=1)

The limit is set as a proportion of the JVM heap set in jvm.options, while the “overhead” setting is a fixed ratio which Elasticsearch uses to multiply the theoretical calculations to estimate the circuit breaker memory requirement.

The request circuit breaker takes into account the memory required based on the request structures, in particular aggregations. The most common cause of exceeding this circuit breaker is through the use of aggregations with a large size value. Try reducing the value of “size” in your aggregations.

Inflight requests circuit breaker

network.breaker.inflight_requests.limit  (default=100% JVM heap)

network.breaker.inflight_requests.overhead (default=2)

The limit is set as a proportion of the JVM heap set in jvm.options, while the “overhead” setting is a fixed ratio which Elasticsearch uses to multiply the theoretical calculations to estimate the circuit breaker memory requirement.

The in-flight requests circuit breaker considers the size of active transport and http requests for the node based on the byte size of those requests. Generally this circuit breaker is activated when batch sizes for bulk requests are too large. Try reducing the size of bulk requests, particularly if those requests contain large documents.

Script compilation circuit breaker

script.context.$CONTEXT.max_compilations_rate (default=75/5m)

The script compilation circuit breaker is slightly different from the others. Rather than applying a memory limit, it limits the number of times a script can be compiled in a given period. If you get this warning, you should use stored scripts with parameters instead of inline ones, as the former are compiled only once, while the latter are compiled on each execution.

Parent circuit breakers

indices.breaker.total.use_real_memory default=true

indices.breaker.total.limit  default=95% JVM heap

Parent circuit breaker exceptions are  caused by the sum of all memory being used across the different types of circuit breakers.  If the use_real_memory is left as the default, then the parent circuit breaker will take into account real memory usage and will be based upon 95% of the JVM heap size.  In general it is better to base this circuit upon real memory usage since it gives you a more accurate picture of what is going on in the instance.  On the other hand if you choose to set “use_real_memory” to false, then the limit will be based on the sum of the estimates of other circuit breakers in which case the default limit will be reduced to 70% of the JVM heap size to take into account the margin or error with using a sum of estimates.

Accounting circuit breakers

indices.breaker.accounting.limit default= 100% of JVM heap

indices.breaker.accounting.overhead default=1

This circuit breaker is to protect the node from over usage of memory due to things that persist in memory after a request has completed, such as lucene segments before they are flushed to disk. The default limit is however set at 100% of JVM heap so the parent circuit breaker will trip before this limit becomes effective. The accounting overhead setting is a coefficient which is used to multiply all estimates before applying the limit.

Adjusting circuit breakers

In general, and as warned above, it is usually not advisable to modify circuit breakers from their defaults, since it is far worse to lose a node from an OutOfMemoryError than to drop a few requests. Instead you should try to understand why you are exceeding them and prevent this from happening. Also bear in mind that the default calculations are based on your JVM heap size which is generally assumed to be 50% of the total available size. If this is not the case, then you may want to re-consider setting the JVM settings in jvm.options before reconfiguring everything else. However if you still think you need to modify the circuit breakers (or restore the defaults), you can adjust circuit breaker settings just like any other cluster settings

PUT _cluster/settings
{
  "transient": {"indices.breaker.total.limit":"5GB" }
}

Or to restore the setting to it’s default

PUT _cluster/settings
{
  "transient": {"indices.breaker.total.limit":null }
}

How to avoid circuit breaker exceptions

To avoid your system throwing circuit breaker exceptions, you can run the Elasticsearch Health Check-Up. It’ll detect issues in your system and provide you with actionable instructions to resolve them, so you won’t reach the point of memory utilization that causes the exception to occur.

If you’re having trouble with circuit breaker exceptions, run the Check-Up now for an accurate analysis of your settings and follow the instructions to ensure your operations continue running smoothly.

Run the Elasticsearch check-up to receive recommendations like this:

checklist Run Check-Up
error

The circuit breaker tripped count is high on your cluster

error-img

Description

When circuit breakers are tripped, search or indexing requests will be aborted, causing applications to throw exceptions...

error-img

Reccomendations

Based on your specific ES deployment you should...

1

X-PUT curl -H "Content-Type: application/json" [customized recommendation]





Improve Elasticsearch Performance

Run The Analysis