Tags: kcp, monitoring, rules
TABLE OF CONTENTS
- Change/increase Prometheus scrape timeout
- Customize Alert rules
- Add custom Prometheus recording rules via config map
- Customize OIDC Roles name for access
- Add custom scrape jobs
- Add inhibition rules
Change/increase Prometheus scrape timeout
spec:
features:
monitoring:
values:
prometheus:
config:
scrapeTimeout: 60sCustomize Alert rules
spec:
features:
monitoring:
values:
alertmanager:
alerts:
- alert: PrometheusTargetDown
expr: "100 * (up{job=~"cluster-.*"} == 0) / (up{job=~"cluster-.*"}) > 0"
for: 10m
labels:
kublr_space: demo
kublr_cluster: kcp-demo
severity: warning
annotations:
summary: 'Prometheus targets down!'
description: '{{ "{{" }} printf "%0.00f" $value {{ "}}" }}% of the Prometheus job are down!'Add custom Prometheus recording rules via config map
You can use a custom config map for storing additional rules files
spec:
features:
monitoring:
values:
prometheus:
config:
extraRulesConfigmaps:
- name: prometheus-rules-configmap
fileMask: '*.yaml'Customize OIDC Roles name for access
spec:
features:
monitoring:
values:
grafana:
authentication:
oidc:
role: "admin"
prometheus:
authentication:
oidc:
role: "admin"
alertmanager:
authentication:
oidc:
role: "admin"Add custom scrape jobs
Note that the scrape rules need to be added in the specification of the cluster in which the metrics need to be scraped, not in the Kublr Control Plane cluster specification.
If the scrape rules are added in the KCP cluster spec, it will only affect the KCP cluster, not the managed clusters.
spec:
features:
monitoring:
values:
prometheus:
config:
scrapeJobs: |
- job_name: istiod
kubernetes_sd_configs:
- namespaces:
names:
- istio-system
role: endpoints
relabel_configs:
- action: keep
regex: istiod;http-monitoring
source_labels:
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_nameAdd inhibition rules
You can customize inhibition rules as well. Depends on your needs you can add or override inhibition rules in the Kublr Control Plane cluster specification:
values:
alertmanager:
config:
default_receiver: info
inhibit_rules:
- equal:
- silence_group
- kublr_space
- kublr_cluster
source_matchers:
- severity=warning|info
- kublr_space=~".+"
- kublr_cluster=~".+"
target_matches:
- severity!=critical
receivers: |
- name: info
slack_configs:
- api_url: 'https://hooks.slack.com/services/T28RX3E2K/B05C2FHL0PL/AuxIpnnlIwDYklzLIxAQM8U'
channel: '#alerts-inhibit-rules-test'
- name: critical
slack_configs:
- api_url: 'https://hooks.slack.com/services/T28RX3E2K/B05BN0KQJ0P/If3voMqToCasjW4PExOuiT7'
channel: '#alerts-inhibit-rules-critical'
routes: |
- matchers:
- severity=~critical
receiver: critical
continue: true
- matchers:
- severity=warning|info
receiver: info
continue: true