Log Cleanup & Maintenance
The log cleanup system provides automated maintenance for historical data, helping you manage database growth and maintain optimal performance by removing old log records while preserving recent data for analysis and monitoring.
How It Works
The log cleanup system identifies and removes old records from log tables based on a configurable retention period:
Retention Calculation: Calculates a cutoff date based on current date minus retention days
Batch Processing: Deletes records in configurable batches to avoid database locks
Cascading Cleanup: Removes related records from dependent tables in the correct order
Progress Tracking: Reports the number of records deleted from each table
Configuration
Input Parameters
retention_days: Number of days to retain data (minimum: 90 days)
batch_size: Number of records to delete per batch (default: 10,000)
API Usage
import httpx
cleanup_data = {
"retention_days": 90,
"batch_size": 5000
}
response = httpx.post(
"http://localhost:8000/log_cleanup",
json=cleanup_data
)
result = response.json()
print(result)
curl -X POST "http://localhost:8000/log_cleanup" \
-H "Content-Type: application/json" \
-d '{
"retention_days": 90,
"batch_size": 5000
}'
package main
import (
"bytes"
"encoding/json"
"fmt"
"net/http"
)
type CleanupRequest struct {
RetentionDays int `json:"retention_days"`
BatchSize int `json:"batch_size"`
}
func main() {
data := CleanupRequest{
RetentionDays: 90,
BatchSize: 5000,
}
jsonData, _ := json.Marshal(data)
resp, _ := http.Post("http://localhost:8000/log_cleanup",
"application/json", bytes.NewBuffer(jsonData))
defer resp.Body.Close()
var result map[string]interface{}
json.NewDecoder(resp.Body).Decode(&result)
fmt.Println(result)
}
import java.net.http.{HttpClient, HttpRequest, HttpResponse}
import java.net.URI
import play.api.libs.json.Json
object LogCleanupExample {
def main(args: Array[String]): Unit = {
val client = HttpClient.newHttpClient()
val json = Json.obj(
"retention_days" -> 90,
"batch_size" -> 5000
).toString()
val request = HttpRequest.newBuilder()
.uri(URI.create("http://localhost:8000/log_cleanup"))
.header("Content-Type", "application/json")
.POST(HttpRequest.BodyPublishers.ofString(json))
.build()
val response = client.send(request,
HttpResponse.BodyHandlers.ofString())
println(response.body())
}
}
Response Example
{
"total_pipeline_executions_deleted": 150,
"total_timeliness_pipeline_execution_logs_deleted": 23,
"total_anomaly_detection_results_deleted": 8,
"total_pipeline_execution_closure_parent_deleted": 45,
"total_pipeline_execution_closure_child_deleted": 45,
"total_freshness_pipeline_logs_deleted": 12
}
Cleanup Process
The system cleans up data from six main log tables in a specific order to maintain referential integrity:
1. Freshness Pipeline Logs
Table: freshness_pipeline_log
Filter: Records with last_dml_timestamp <= retention_date
Purpose: Removes old DML freshness check logs
2. Timeliness Pipeline Execution Logs
Table: timeliness_pipeline_execution_log
Filter: Records with pipeline_execution_id <= max_pipeline_execution_id
Purpose: Removes old timeliness check logs
3. Anomaly Detection Results
Table: anomaly_detection_result
Filter: Records with pipeline_execution_id <= max_pipeline_execution_id
Purpose: Removes old anomaly detection results
4. Pipeline Execution Closure Table (Parent Side)
Table: pipeline_execution_closure
Filter: Records with parent_execution_id <= max_pipeline_execution_id
Purpose: Removes closure table relationships where the execution is a parent
5. Pipeline Execution Closure Table (Child Side)
Table: pipeline_execution_closure
Filter: Records with child_execution_id <= max_pipeline_execution_id
Purpose: Removes closure table relationships where the execution is a child
6. Pipeline Executions (Last)
Table: pipeline_execution
Filter: Records with id <= max_pipeline_execution_id
Purpose: Removes old pipeline execution records (must be last due to foreign key constraints)
Scheduled Cleanup
Automated Cleanup with Cron
Set up automated log cleanup using cron:
# Clean up logs daily (365 days retention)
0 2 * * * curl -X POST http://localhost:8000/log_cleanup -H "Content-Type: application/json" -d '{"retention_days": 365}'
# Clean up logs weekly (90 days retention)
0 2 * * 0 curl -X POST http://localhost:8000/log_cleanup -H "Content-Type: application/json" -d '{"retention_days": 90}'
The log cleanup system helps maintain optimal database performance while preserving the data you need for monitoring and analysis.