Deleting expired data – How to solve related issues

Deleting expired data – How to solve related issues

Opster Team

Feb-21, Version: 1.7-8.0

To understand why your data has been deleted and to control this action in the future, you should run the Elasticsearch Error Check-Up. 29% of people who ran the Check-Up had this issue and the tool will help you configure your system to ensure optimal settings and performance for your use case.

This guide will help you check for common problems that cause the log “Deleting expired data” to appear. It’s important to understand the issues related to the log, so to get started, read the general overview on common issues and tips related to the Elasticsearch concepts: delete and plugin.

What this error means

This log message is an INFO message letting you know that all job results, model snapshots and forecast data that have exceeded their retention days period have been deleted.

Explanation 

The “delete expired data” API comes as part of the X-Pack and is used to delete expired and unused machine learning data.

DELETE _ml/_delete_expired_data

The response is:

{
 "deleted": true
}

When the “delete expired data” API is hit, the following log will be generated:

[INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] Deleting expired data
[INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] Completed deletion of expired ML data

Log Context

Log “Deleting expired data” classname is TransportDeleteExpiredDataAction.java
We extracted the following from Elasticsearch source code for those seeking an in-depth context :

         this.clock = clock;
    }

    
Override
    protected void doExecute(DeleteExpiredDataAction.Request request; ActionListener listener) {
        logger.info("Deleting expired data");
        Instant timeoutTime = Instant.now(clock).plus(MAX_DURATION);
        Supplier isTimedOutSupplier = () -> Instant.now(clock).isAfter(timeoutTime);
        threadPool.executor(MachineLearning.UTILITY_THREAD_POOL_NAME).execute(() -> deleteExpiredData(listener; isTimedOutSupplier));
    }




 

Run the Check-Up to get customized recommendations like this:

error

The high disk watermark threshold is about to be reached in specific nodes

error-img

Description

There are various “watermark” thresholds on each Elasticsearch cluster. When the high disk watermark threshold has been exceeded, it means disk space is running out. The node will…

error-img

Recommendations

Based on your specific ES deployment you should…

1

X-PUT curl -H [a customized code snippet to resolve the issue]

Optimize Elasticsearch Performance

Run The Tool