Search Gateway Configuration

This is the configuration of Opster’s Search Gateway. Here you’ll be able to see an example of the configuration that will be customized to each individual user and explanations of the different terms and parameters. 

The default configuration

Default.conf:
{
  opster.mclb: {
    port: 9200,
    healthChecker:{
            threadPool: 1,
            periodInMilliSeconds: 1000
        }
    route: {
      userParameter:"X-User-Id"
    }

    backends: [
      {
        id: 1,
        url: "http://localhost:9200",
                //authInfo: {
                //  type: "BASIC"
                //  credentials: {
                //    user: "shak"
                //    password: "shaked"
                //  }
                //}
        default: true
      }
    ],

  "searchGateway": {
    "heavySearchCostThreshold": 1000,
    "slowSearchTimeInMilliThreshold": 1000,
    "features": {
      "regex": {
        "cost": 100,
        "factors": {
          "HEAVY": 3,
          "MEDIUM": 2,
          "LIGHT": 1
        },
        "classifier": {
          "type": "string",
          "config": {
            "contains": {
              "cost": 10,
              "values": [
                "*",
                ".*"
              ]
            },
            "startsWith": {
              "cost": 100,
              "values": [
                ".*",
                "*"
              ]
            },
            "pattern": {
              "cost": 100,
              "values": [
                ":\\s?\\*[^\\s\"]"
              ]
            }
          },
          "thresholds": {
            "HEAVY": {
              "gte": 100
            },
            "MEDIUM": {
              "gt": 10,
              "lt": 100
            },
            "LIGHT": {
              "lte": 10
            }
          }
        }
      },
      "data": {
        "cost": 1,
        "factors": {
          "HEAVY": 3,
          "MEDIUM": 2,
          "LIGHT": 1
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "{{size}}"
          },
          "thresholds": {
            "HEAVY": {
              "gt": 1000
            },
            "MEDIUM": {
              "gt": 100,
              "lte": 1000
            },
            "LIGHT": {
              "lte": 100
            }
          }
        }
      },
      "range": {
        "cost": 1,
        "factors": {
          "HEAVY": 1001,
          "MEDIUM": 500,
          "LIGHT": 1
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "{{duration}}"
          },
          "thresholds": {
            "HEAVY": {
              "gte": 86400000
            },
            "MEDIUM": {
              "gt": 21600000,
              "lt": 86400000
            },
            "LIGHT": {
              "lte": 21600000
            }
          }
        }
      },
      "aggregation": {
        "cost": 100,
        "factors": {
          "HEAVY": 3,
          "MEDIUM": 2,
          "LIGHT": 1
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "({{level}} * {{size}}) * (1 * (1 + {{hasScripts}}))"
          },
          "thresholds": {
            "HEAVY": {
              "gt": 100000
            },
            "MEDIUM": {
              "gt": 1000,
              "lte": 100000
            },
            "LIGHT": {
              "lte": 1000
            }
          }
        }
      },
      "script": {
        "cost": 100,
        "factors": {
          "HEAVY": 3
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "{{hasScript}}"
          },
          "thresholds": {
            "HEAVY": {
              "gte": 1
            }
          }
        }
      }
    }
  }
  }
}

Explaining the default configuration

Breaking down each line of the configuration – the parameters marked with an asterisk are customized per user. 

ParamMandatory/OptionalTypeExplanation
Opster.mclb.port *MandatoryintApp listening port
opster.mclb.healthChecker.threadPoolMandatoryintThe amount of threads, to monitor the backend clusters’ health. Usually, no more than 1 is needed
opster.mclb.healthChecker.periodInMilliSecondsMandatoryintHealth checks interval
opster.mclb.route.userParameter *MandatoryStringA header name, to be used to tag user prams
opster.mclb.cacheConfiguration.expensiveQueriesCacheSizeOptionalintExpensive queries amount to cache. Default value: 1
opster.mclb.cacheConfiguration.slowQueriesCacheSizeOptionalintSlow queries amount to cache. Default value:1
opster.mclb.cacheConfiguration.cacheFetchFromTimeInHoursOptionalintCache loads from time to time. Default value: Integer.MAX_VALUE
opster.mclb.cacheConfiguration.cacheBulkFetchSizeOptionalintCache query size param. Default value: 0
When set as default, no cache loading will occur
opster.mclb.cacheConfiguration.maxFetchingOptionalintScrolling amount (how many times scrolled to get more cache results). Default value: 0
When set as default, no cache loading will occur
opster.mclb.backends.idMandatoryintUnique number. Indicates the ID of the user. There must at least 1 backend with an ID 1 and set to default
Opster.mclb.backends.url *MandatoryStringFull Elasticsearch url. For example, http://localhost:9200
opster.mclb.backends.authInfo *OptionalObjectThis section represents Elasticsearch authentication
opster.mclb.backends.authInfo.type *MandatoryenumAvailable params: BASIC.
BASIC - represents basic http authentication
opster.mclb.backends.authInfo.credentials.user *MandatoryStringUsername for authentication
opster.mclb.backends.authInfo.credentials.password *MandatoryStringPassword for authentication
opster.mclb.tenantsOptionalObjectThis section represents the available tenants. If not set, supply all will be routed to the default backend
opster.mclb.tenants. *MandatoryObjectDynamic names for the tenants
opster.mclb.tenants..patterns *MandatoryStringRegex pattern to identify index patterns to classify per tenant
opster.mclb.tenants..leaderMandatoryintBackend ID to route to by default
opster.mclb.tenants..followerOptionalintBackend ID to route to in case the default is not available
opster.mclb.errorHandlingOptionalObjectThis section represents error handling functionality.
Default is no error handling
opster.mclb.errorHandling.enabledMandatoryBooleanDefault is false. When set to default, there is no error handling
opster.mclb.errorHandling.kafkaMandatoryObjectThis section represents error handling by Kafka
opster.mclb.errorHandling.kafka.bootstrapServers *MandatoryStringKafka server url
opster.mclb.errorHandling.kafka.groupIdMandatoryStringKafka group ID to identify with
opster.mclb.errorHandling.kafka.topicMandatoryStringKafka topic prefix. Topics will be created if necessary with the given prefix and backend IDs

Explaining the log configuration 

Logback.xml

    <appender name=”elasticsearch” class=”com.opster.mclb.infrastructure.logs.ElasticSearchAppender”>

        <protocol>http</protocol> 

        <host>localhost</host> – The host for destination logs Elasticsearch. 

        <port>9200</port> – The port of destination logs Elasticsearch.

        <index>opster-sg</index> – The index of destination logs Elasticsearch.

        <batchSize>10</batchSize> – The log bulk size.

        <batchTimeoutInMilliseconds>5000</batchTimeoutInMilliseconds> – The timeout for bulk insert.

        <retry>3</retry> – The amount of bulk insert retries until log is thrown away.

        <username>admin</username> – Optional – if basic authentication is needed, you can set the username here.

        <password>admin</password> – Optional – if basic authentication is needed, you can set the password here.

    </appender>

Explaining the search configuration

After installation, the Search_config is configured individually by each user with the help of the team according to use case and requirements. 

Defining the relevant terms

Query – a single search execution.

Pattern – the query structure built from its terms and aggregations. 

Expensive query – any query above ‘heavySearchCostThreshold’.

Slow query –  any query above ‘slowSearchTimeInMilliThreshold’.

Heavy query – any pattern that was always expensive and slow until that point in time. This means that the first time this pattern runs below the slow query threshold it will not be considered heavy anymore.

  • Query – a single search execution.
  • Pattern – the query structure built from its terms and aggregations. 
  • Expensive query – any query above ‘heavySearchCostThreshold’.
  • Slow query –  any query above ‘slowSearchTimeInMilliThreshold’.
  • Heavy query – any pattern that was always expensive and slow until that point in time. This means that the first time this pattern runs below the slow query threshold it will not be considered heavy anymore.

Calculations of each parameter

This is how an expensive query is calculated.

There are 5 features taken into consideration: 

  1. Regex – represent query regex terms cost. Calculated by: 
    1. contains
    2. startWith
    3. pattern (regex) 

The Search Gateway will only match representative regex terms. For example, a term with analyze_wildcard:false will be ignored but a prefix term with no wildcard will be calculated as leading wildcard(start with).

  1. Range – represents the query range terms cost. Calculated by the range duration.
  2. Data – represents the estimated amount of data that the search needs to process. The available params are:
    1. size – the index size
    2. shardsCount – the amount of shards needs to be queried.
    3. docCount – how many documents are in the index searched
  3. Script- represents whether the query contains script fields that are not under the aggregation.
  4. Aggregation – represents the aggregation part in the search. The available params are: 
    1. level – max number of nested aggregation 
    2. size – estimated bucket size that will return
    3. hasScripts – if the aggregation terms contains script field

Each feature has its own classifiers.

All classifiers under a feature are calculated by the formula in the expression field, summed and aggregated by threshold (threshold section under classifier section) into 3 buckets: HEAVY, MEDIUM and LIGHT.

Then each bucket has its own factor which will be multiplied by the cost field directly under the feature name.

Then all features costs are summed and compared to the ‘heavySearchCostThreshold’.

The default configuration for customization

{
    "heavySearchCostThreshold": 1000, 
    "slowSearchTimeInMilliThreshold": 1000, 
    "features": {
      "regex": {
        "cost": 100,
        "factors": {
          "HEAVY": 3,
          "MEDIUM": 2,
          "LIGHT": 1
        },
        "classifier": {
          "type": "string",
          "config": {
            "contains": {
              "cost": 10,
              "values": [
                "*",
                ".*"
              ]
            },
            "startsWith": {
              "cost": 100,
              "values": [
                ".*",
                "*"
              ]
            },
            "pattern": {
              "cost": 100,
              "values": [
                ":\\s?\\*[^\\s\"]"
              ]
            }
          },
          "thresholds": {
            "HEAVY": {
              "gte": 100
            },
            "MEDIUM": {
              "gt": 10,
              "lt": 100
            },
            "LIGHT": {
              "lte": 10
            }
          }
        }
      },
      "data": {
        "cost": 1,
        "factors": {
          "HEAVY": 3,
          "MEDIUM": 2,
          "LIGHT": 1
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "{{size}}"
          },
          "thresholds": {
            "HEAVY": {
              "gt": 1000
            },
            "MEDIUM": {
              "gt": 100,
              "lte": 1000
            },
            "LIGHT": {
              "lte": 100
            }
          }
        }
      },
      "range": {
        "cost": 1,
        "factors": {
          "HEAVY": 1001,
          "MEDIUM": 500,
          "LIGHT": 1
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "{{duration}}"
          },
          "thresholds": {
            "HEAVY": {
              "gte": 86400000
            },
            "MEDIUM": {
              "gt": 21600000,
              "lt": 86400000
            },
            "LIGHT": {
              "lte": 21600000
            }
          }
        }
      },
      "aggregation": {
        "cost": 100,
        "factors": {
          "HEAVY": 3,
          "MEDIUM": 2,
          "LIGHT": 1
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "({{level}} * {{size}}) * (1 * (1 + {{hasScripts}}))"
          },
          "thresholds": {
            "HEAVY": {
              "gt": 100000
            },
            "MEDIUM": {
              "gt": 1000,
              "lte": 100000
            },
            "LIGHT": {
              "lte": 1000
            }
          }
        }
      },
      "script": {
        "cost": 100,
        "factors": {
          "HEAVY": 3
        },
        "classifier": {
          "type": "math",
          "config": {
            "expression": "{{hasScript}}"
          },
          "thresholds": {
            "HEAVY": {
              "gte": 1
            }
          }
        }
      }
    }
}

To book a demo of the Search Gateway, click here.