tags: logging, monitoring
TABLE OF CONTENTS
- Overview
- Deploy Prometheus ES Exporter
- Customizing Prometheus ES Exporter configuration
- Customizing Prometheus ES Exporter queries
- Using Prometheus ES Exporter metrics to create alerts
- Reference
Overview
Sometimes it is necessary to send alerts based on appearance of certain events in the Kublr log collection and audit indices (or potentially any other indices stored in the Kublr log collection and management Elastic stack).
One way to achieve this is using Elastic Watchers as described in this support article: elasticsearch watchers.
The downside of this approach is that it requires a commercial Elastic license and is only supported on Kublr 1.21 and later.
This article describes another, lighter-weight method, that works with free Elastic stack version included in Kublr by default and is not limited by a version of Kublr.
This method is based on using open source Prometheus ES Exporter package, that can be installed into the Kublr Control Plane cluster and run arbitrary ES queries and expose their results as Prometheus metrics.
These metrics can then be used to build Grafana dashboards and Alert Manager alerts as usual.
Deploy Prometheus ES Exporter
Use the following commands to deploy Prometheus ES Exporter.
Note that the queries in the following command are only provided as an example, customize them to fit your specific use-case.
# get password required to access Elastic KIBANA_PASSWORD="$(kubectl -n kublr get secret \ kublr-logging-searchguard -o jsonpath="{.data.kibana-password}" | base64 -d)" # Deploy Prometheus ES Exporter helm package helm upgrade \ elastic-exporter https://braedon.github.io/helm/prometheus-es-exporter-0.1.1.tgz \ --create-namespace --namespace kublr \ --install \ --values - \ <<EOF image: tag: '0.14.0' container: extraArgs: - '--basic-user=system.kibanaserver' - '--basic-password=${KIBANA_PASSWORD}' - '--header=x-forwarded-for: 127.0.0.1' - '--header=x-proxy-user: admin' - '--header=x-proxy-roles: admin' - '--cluster-health-disable' - '--nodes-stats-disable' - '--indices-aliases-disable' - '--indices-mappings-disable' - '--indices-stats-disable' service: annotations: prometheus.io/path: /metrics prometheus.io/port: '9206' prometheus.io/scrape: 'true' elasticsearch: cluster: https://kublr-logging-elasticsearch-client.kublr:9200 queries: |- [DEFAULT] QueryIntervalSecs = 15 QueryTimeoutSecs = 30 QueryIndices = _all QueryOnError = drop QueryOnMissing = drop [query_all] QueryJson = { "size": 0, "track_total_hits": true, "query": { "match_all": {} } } [query_aggregated] QueryJson = { "size": 0, "query": { "bool": { "filter": [ { "range": {"@timestamp": {"gte": "now-1m", "lt": "now"}}}, { "prefix": {"_index": "kublr"}}, { "match_phrase": { "log": "hardware error" } } ] } }, "aggs": { "hits_by_cluster": { "composite": { "sources": [ { "space": { "terms": { "field": "cluster_space.keyword" } } }, { "cluster": { "terms": { "field": "cluster_name.keyword" } } } ] } } } } EOF
Customizing Prometheus ES Exporter configuration
The Helm values file in the example above includes container.extraArgs parameter with additional arguments that disable a number of global statistics that the exporter can expose as metrics.
Remove corresponding arguments (e.g. '--cluster-health-disable', '--nodes-stats-disable', '--indices-aliases-disable', '--indices-mappings-disable', and/or '--indices-stats-disable') from the values if you want to re-enable corresponding statistics in the metrics.
Refer to the Prometheus ES Exporter documentation and the source code for more details on the arguments customization.
Customizing Prometheus ES Exporter queries
The example configuration above contains two ES queries that demonstrate basic capabilities of Prometheus ES Exporter.
Each query should be specified in a section with a name starting with "query_". The rest of the section name will be used as a basis for the Prometheus metrics exposed by the exporter, so for this example the exporter will expose the following metrics:
- all_hits
- all_took_milliseconds
- aggregated_hits
- aggregated_hits_by_cluster_doc_count
- aggregated_took_milliseconds
The "all" query is very simple, does not contain aggregations, just returns total count of all documents in all ES indices, and corresponding metrics will only contain one series each.
The "aggregated" query is a realistic query example useful in real-life scenarios.
The query counts log records in Kublr log indices over the last minute that contain a specific term ("hardware error" in this example).
It also contains an aggregation named "hits_by_cluster", which aggregates counts by Kublr cluster space and name.
Each aggregation included in an ES query results in an additional Prometheus metrics that contains multiple series based on the aggregation result buckets.
In this example it will result in the metric "aggregated_hits_by_cluster_doc_count", which will contain multiple series - one for each separate Kublr cluster - labeled with additional labels "hits_by_cluster_cluster" and "hits_by_cluster_space".
Using Prometheus ES Exporter metrics to create alerts
The metrics exported by Prometheus ES Exporter can be used to create alert as usual.
For example "aggregated_hits_by_cluster_doc_count" metric can be used to generate alerts using the following custom alert definition:
alert: HardwareErrorInLogs expr: aggregated_hits_by_cluster_doc_count > 0 annotations: summary: 'Log record including hardware error phrase' description: Log record including hardware error phrase in {{ $labels.hits_by_cluster_cluster}} cluster (kublr space: {{ $labels.hits_by_cluster_space}}). labels: severity: warning
As usual, custom alerts may be added to the Kublr Control Plane via its cluster spec as follows:
spec: features: monitoring: values: alertmanager: alerts: - alert: HardwareErrorInLogs expr: aggregated_hits_by_cluster_doc_count > 0 annotations: summary: 'Log record including hardware error phrase' description: Log record including hardware error phrase in {{ $labels.hits_by_cluster_cluster}} cluster (kublr space: {{ $labels.hits_by_cluster_space}}).' labels: severity: warning
Reference
- Prometheus ES Exporter source code and documentation
- Prometheus ES Exporter docker image
- Prometheus ES Exporter helm chart
- Elasticsearch Query DSL
- Prometheus Querying
- Prometheus Alerting Rules