Overview

Sometimes it is necessary to send alerts based on appearance of certain events in the Kublr log collection and audit indices (or potentially any other indices stored in the Kublr log collection and management Elastic stack).

One way to achieve this is using Elastic Watchers as described in this support article: elasticsearch watchers.

The downside of this approach is that it requires a commercial Elastic license and is only supported on Kublr 1.21 and later.

This article describes another, lighter-weight method, that works with free Elastic stack version included in Kublr by default and is not limited by a version of Kublr.

This method is based on using open source Prometheus ES Exporter package, that can be installed into the Kublr Control Plane cluster and run arbitrary ES queries and expose their results as Prometheus metrics.

These metrics can then be used to build Grafana dashboards and Alert Manager alerts as usual.

Deploy Prometheus ES Exporter

Use the following commands to deploy Prometheus ES Exporter.

Note that the queries in the following command are only provided as an example, customize them to fit your specific use-case.

# get password required to access Elastic

KIBANA_PASSWORD="$(kubectl -n kublr get secret \
  kublr-logging-searchguard -o jsonpath="{.data.kibana-password}" |
  base64 -d)"

# Deploy Prometheus ES Exporter helm package

helm upgrade \
    elastic-exporter https://braedon.github.io/helm/prometheus-es-exporter-0.1.1.tgz \
    --create-namespace --namespace kublr \
    --install \
    --values - \
<<EOF
image:
  tag: '0.14.0'
container:
  extraArgs:
  - '--basic-user=system.kibanaserver'
  - '--basic-password=${KIBANA_PASSWORD}'
  - '--header=x-forwarded-for: 127.0.0.1'
  - '--header=x-proxy-user: admin'
  - '--header=x-proxy-roles: admin'
  - '--cluster-health-disable'
  - '--nodes-stats-disable'
  - '--indices-aliases-disable'
  - '--indices-mappings-disable'
  - '--indices-stats-disable'
service:
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/port: '9206'
    prometheus.io/scrape: 'true'
elasticsearch:
  cluster: https://kublr-logging-elasticsearch-client.kublr:9200
  queries: |-
    [DEFAULT]
    QueryIntervalSecs = 15
    QueryTimeoutSecs = 30
    QueryIndices = _all
    QueryOnError = drop
    QueryOnMissing = drop

    [query_all]
    QueryJson = {
        "size": 0,
        "track_total_hits": true,
        "query": {
          "match_all": {}
        }
      }

    [query_aggregated]
    QueryJson = {
        "size": 0,
        "query": {
          "bool": {
            "filter": [
              { "range": {"@timestamp": {"gte": "now-1m", "lt": "now"}}},
              { "prefix": {"_index": "kublr"}},
              { "match_phrase": { "log": "hardware error" } }
            ]
          }
        },
        "aggs": {
          "hits_by_cluster": {
            "composite": {
              "sources": [
                { "space": { "terms": { "field": "cluster_space.keyword" } } },
                { "cluster": { "terms": { "field": "cluster_name.keyword" } } }
              ]
            }
          }
        }
      }
EOF

Customizing Prometheus ES Exporter configuration

The Helm values file in the example above includes container.extraArgs parameter with additional arguments that disable a number of global statistics that the exporter can expose as metrics.

Remove corresponding arguments (e.g. '--cluster-health-disable', '--nodes-stats-disable', '--indices-aliases-disable', '--indices-mappings-disable', and/or '--indices-stats-disable') from the values if you want to re-enable corresponding statistics in the metrics.

Refer to the Prometheus ES Exporter documentation and the source code for more details on the arguments customization.

Customizing Prometheus ES Exporter queries

The example configuration above contains two ES queries that demonstrate basic capabilities of Prometheus ES Exporter.

Each query should be specified in a section with a name starting with "query_". The rest of the section name will be used as a basis for the Prometheus metrics exposed by the exporter, so for this example the exporter will expose the following metrics:

all_hits
all_took_milliseconds
aggregated_hits
aggregated_hits_by_cluster_doc_count
aggregated_took_milliseconds

The "all" query is very simple, does not contain aggregations, just returns total count of all documents in all ES indices, and corresponding metrics will only contain one series each.

The "aggregated" query is a realistic query example useful in real-life scenarios.

The query counts log records in Kublr log indices over the last minute that contain a specific term ("hardware error" in this example).

It also contains an aggregation named "hits_by_cluster", which aggregates counts by Kublr cluster space and name.

Each aggregation included in an ES query results in an additional Prometheus metrics that contains multiple series based on the aggregation result buckets.

In this example it will result in the metric "aggregated_hits_by_cluster_doc_count", which will contain multiple series - one for each separate Kublr cluster - labeled with additional labels "hits_by_cluster_cluster" and "hits_by_cluster_space".

Using Prometheus ES Exporter metrics to create alerts

The metrics exported by Prometheus ES Exporter can be used to create alert as usual.

For example "aggregated_hits_by_cluster_doc_count" metric can be used to generate alerts using the following custom alert definition:

alert: HardwareErrorInLogs
expr: aggregated_hits_by_cluster_doc_count > 0
annotations:
  summary: 'Log record including hardware error phrase'
  description: Log record including hardware error phrase
    in {{ $labels.hits_by_cluster_cluster}} cluster
    (kublr space: {{ $labels.hits_by_cluster_space}}).
labels:
  severity: warning

As usual, custom alerts may be added to the Kublr Control Plane via its cluster spec as follows:

spec:
  features:
    monitoring:
      values:
        alertmanager:
          alerts:
          - alert: HardwareErrorInLogs
            expr: aggregated_hits_by_cluster_doc_count > 0
            annotations:
              summary: 'Log record including hardware error phrase'
              description: Log record including hardware error phrase
                in {{ $labels.hits_by_cluster_cluster}} cluster
                (kublr space: {{ $labels.hits_by_cluster_space}}).'
            labels:
              severity: warning

Kublr

How can we help you today?

Monitoring Alerts Based on Collected Logs Print

Overview

Deploy Prometheus ES Exporter

Customizing Prometheus ES Exporter configuration

Customizing Prometheus ES Exporter queries

Using Prometheus ES Exporter metrics to create alerts

Reference

How can we help you today?

Monitoring Alerts Based on Collected Logs Print

Overview

Deploy Prometheus ES Exporter

Customizing Prometheus ES Exporter configuration

Customizing Prometheus ES Exporter queries

Using Prometheus ES Exporter metrics to create alerts

Reference

Related Articles