Tags: kcp, monitoring, rules
TABLE OF CONTENTS
- Change/increase Prometheus scrape timeout
- Customize Alert rules
- Add custom Prometheus recording rules via config map
- Customize OIDC Roles name for access
- Add custom scrape jobs
- Add inhibition rules
Change/increase Prometheus scrape timeout
spec: features: monitoring: values: prometheus: config: scrapeTimeout: 60s
Customize Alert rules
spec: features: monitoring: values: alertmanager: alerts: - alert: PrometheusTargetDown expr: "100 * (up{job=~"cluster-.*"} == 0) / (up{job=~"cluster-.*"}) > 0" for: 10m labels: kublr_space: demo kublr_cluster: kcp-demo severity: warning annotations: summary: 'Prometheus targets down!' description: '{{ "{{" }} printf "%0.00f" $value {{ "}}" }}% of the Prometheus job are down!'
Add custom Prometheus recording rules via config map
You can use a custom config map for storing additional rules files
spec: features: monitoring: values: prometheus: config: extraRulesConfigmaps: - name: prometheus-rules-configmap fileMask: '*.yaml'
Customize OIDC Roles name for access
spec: features: monitoring: values: grafana: authentication: oidc: role: "admin" prometheus: authentication: oidc: role: "admin" alertmanager: authentication: oidc: role: "admin"
Add custom scrape jobs
Note that the scrape rules need to be added in the specification of the cluster in which the metrics need to be scraped, not in the Kublr Control Plane cluster specification.
If the scrape rules are added in the KCP cluster spec, it will only affect the KCP cluster, not the managed clusters.
spec: features: monitoring: values: prometheus: config: scrapeJobs: | - job_name: istiod kubernetes_sd_configs: - namespaces: names: - istio-system role: endpoints relabel_configs: - action: keep regex: istiod;http-monitoring source_labels: - __meta_kubernetes_service_name - __meta_kubernetes_endpoint_port_name
Add inhibition rules
You can customize inhibition rules as well. Depends on your needs you can add or override inhibition rules in the Kublr Control Plane cluster specification:
values: alertmanager: config: default_receiver: info inhibit_rules: - equal: - silence_group - kublr_space - kublr_cluster source_matchers: - severity=warning|info - kublr_space=~".+" - kublr_cluster=~".+" target_matches: - severity!=critical receivers: | - name: info slack_configs: - api_url: 'https://hooks.slack.com/services/T28RX3E2K/B05C2FHL0PL/AuxIpnnlIwDYklzLIxAQM8U' channel: '#alerts-inhibit-rules-test' - name: critical slack_configs: - api_url: 'https://hooks.slack.com/services/T28RX3E2K/B05BN0KQJ0P/If3voMqToCasjW4PExOuiT7' channel: '#alerts-inhibit-rules-critical' routes: | - matchers: - severity=~critical receiver: critical continue: true - matchers: - severity=warning|info receiver: info continue: true