Managing your Elasticsearch Indexes

4 min readDec 1, 2022

Elasticsearch is great for handling large amounts of data. Using data ingestion components like Beats and Logstash, it is easy to get your hands on all sorts of data. You can use Kibana to explore data and analyze data through dashboards. Many Elastic components manage index lifecycle management and dashboards out of the box. But what if you do not have fancy data streams but want to manage your indexes? That is when you can initialize Index Lifecycle Management yourself using the provided mechanisms.

In this blog, you learn to create an index managed by Elastic ILM (Index Lifecycle Management).

Create the Index Lifecycle Management Policy

The policy configures what to do with your indexes. Each index can go through several phases. It starts hot; the index is active, current, and valid for writing and reading. After some time, you can roll over the index to be warm. You roll over when the index is too large, there are too many documents, or the index is too old. After warm, we move to cold; after cold, we can archive or delete the data. The reasoning between hot, warm, and cold is to use the best hardware and with that speed for the right price. You move the warm indexes to cheaper hardware than the hot indexes.

The following code block shows a command to create a policy with the name order_data_policy. The example supports two phases, hot and delete. Notice that for testing, the rollover has a max_docs of 10. That way, you can see the rollover at work.

PUT _ilm/policy/order_data_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "set_priority": {
            "priority": 100
          },
          "rollover": {
            "max_age": "1d",
            "max_docs": 10,
            "max_size": "5gb"
          }
        }
      },
      "delete": {
        "min_age": "7d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

The hot phase defines a max_age of 1 day. This means that if the index is older than one day, a rollover occurs, resulting in a new index. The delete phase has a min_age of seven days. If an index is older than seven days, ILM moves the index from the hot phase into the delete phase.

Create the Index Template

An index template is required for ILM. When rolling over an index, a new index is created using the template. A template needs settings for the index and mappings for the fields. The settings contain some essential properties:

index.lifecycle.name — The connection with the policy you have created
index.lifecycle.rollover_alias — The name of the alias to use while interacting with the indexes.
default_pipeline — A pipeline you can use to set a timestamp in your document automatically.

The first code block shows you how to create the pipeline for the timestamp. In the second code block, you create the index template.

PUT _ingest/pipeline/add-current-time
{
  "description" : "automatically add the current time to the documents",
  "processors" : [
    {
      "set" : {
        "field": "@timestamp",
        "value": "{{{_ingest.timestamp}}}"
      }
    }
  ]
}

PUT _index_template/order_data_template
{
  "index_patterns": ["order_data-*"],
  "template": {
    "settings": {
      "number_of_replicas": 0,
      "number_of_shards": 1,
      "index.lifecycle.name": "order_data_policy",
      "index.lifecycle.rollover_alias": "order_data",
      "default_pipeline": "add-current-time"
    },
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "name": {
          "type": "text"
        },
        "gender": {
          "type": "keyword"
        },
        "amount": {
          "type": "integer"
        }
      }
    }
  }
}

Create the first index

You must create the first index to kick off the Index Lifecycle Management. You can use a number after an index name, but often you want a date in the name of the index. The following command looks a bit weird. But, creating your index this way tells ILM to create rollover indexes with a name with a date in there as well. Notice the name of the alias order_data. This is the exact same alias as used in the index template for the property rollover_alias.

PUT /%3Corder_data-%7Bnow%2Fd%7D-000001%3E
{
  "aliases": {
    "order_data": {
      "is_write_index": true
    }
  }
}

Insert data

Using the Bulk API, you can insert multiple documents simultaneously. The following code block shows you how to create five documents. You can safely run it a few times. You need to create more than ten documents to kick off the rollover. You can check if a rollover has taken place using the cat API. Force to add writing the documents to the index using a _refresh. There is also a command to ask for more information about the current state of ILM for a specific index. All commands are in the following code block.

POST order_data/_bulk
{"index":{}}
{"name": "Jettro","gender": "Male","amount": "3"}
{"index":{}}
{"name": "Kees","gender": "Male","amount": "2"}
{"index":{}}
{"name": "Mary","gender": "Female","amount": "1"}
{"index":{}}
{"name": "Raven","gender": "None","amount": "3"}
{"index":{}}
{"name": "John","gender": "Male","amount": "3"}
{"index":{}}
{"name": "Truus","gender": "Female","amount": "5"}

GET _cat/indices

POST order_data/_refresh

GET order_data-2022.12.01-000001/_ilm/explain

By default, Elasticsearch polls every 10 minutes for the necessity of performing actions according to ILM. For development, you can use the following command to shorten this time.

PUT _cluster/settings
{
  "persistent": {
    "indices.lifecycle.poll_interval": "1m"
  }
}

Concluding

Managing your indexes has become a lot easier. No need to create a lot of code in your application to do this anymore. Just a bit of upfront planning and configuration is enough.

Contact me if you need help with your Elastic cluster, application monitoring, improving search on your website, or Data Management. Or visit the website of my employer: https://luminis.eu.

References

Stackoverflow post for automatically ingesting documents with a timestamp.
Elastic documentation for writing an index with the current date in the name.
Read the original Elastic Index Lifecycle Management documentation.
Bit more background from Opster.