Tags: kcp, monitoring, rules


TABLE OF CONTENTS

Change/increase Prometheus scrape timeout


spec:
  features:
    monitoring:
      values:
        prometheus:
          config:
            scrapeTimeout: 60s


Customize Alert rules


spec:
  features:
    monitoring:
      values:
        alertmanager:
          alerts:
          - alert: PrometheusTargetDown
             expr: "100 * (up{job=~"cluster-.*"} == 0) / (up{job=~"cluster-.*"}) > 0"
             for: 10m
             labels:
               kublr_space: demo
               kublr_cluster: kcp-demo
               severity: warning
             annotations:
               summary: 'Prometheus targets down!'
               description: '{{ "{{" }} printf "%0.00f" $value {{ "}}" }}% of the Prometheus job are down!'


Add custom Prometheus recording rules via config map


You can use a custom config map for storing additional rules files


spec:
  features:
    monitoring:
      values:
        prometheus:
          config:
            extraRulesConfigmaps:
              - name: prometheus-rules-configmap
                fileMask: '*.yaml'


Customize OIDC Roles name for access


spec:
  features:
    monitoring:
      values:
        grafana:
          authentication:
            oidc:
              role: "admin"
        prometheus:
          authentication:
            oidc:
              role: "admin"
        alertmanager:
          authentication:
            oidc:
              role: "admin"


Add custom scrape jobs

Note that the scrape rules need to be added in the specification of the cluster in which the metrics need to be scraped, not in the Kublr Control Plane cluster specification.

If the scrape rules are added in the KCP cluster spec, it will only affect the KCP cluster, not the managed clusters.


spec:
  features:
    monitoring:
      values:
        prometheus:
          config:
            scrapeJobs: |
              - job_name: istiod
                kubernetes_sd_configs:
                - namespaces:
                    names:
                      - istio-system
                  role: endpoints
                relabel_configs:
                - action: keep
                  regex: istiod;http-monitoring
                  source_labels:
                    - __meta_kubernetes_service_name
                    - __meta_kubernetes_endpoint_port_name

Add inhibition rules 

You can customize inhibition rules as well. Depends on your needs you can add or override inhibition rules in the Kublr Control Plane cluster specification:

      values:
        alertmanager:
          config:
            default_receiver: info
            inhibit_rules:
              - equal:
                  - silence_group
                  - kublr_space
                  - kublr_cluster
                source_matchers:
                  - severity=warning|info
                  - kublr_space=~".+"
                  - kublr_cluster=~".+"
                target_matches:
                  - severity!=critical
            receivers: |
              - name: info
                slack_configs:
                  - api_url: 'https://hooks.slack.com/services/T28RX3E2K/B05C2FHL0PL/AuxIpnnlIwDYklzLIxAQM8U'
                    channel: '#alerts-inhibit-rules-test'
              - name: critical
                slack_configs:
                  - api_url: 'https://hooks.slack.com/services/T28RX3E2K/B05BN0KQJ0P/If3voMqToCasjW4PExOuiT7'
                    channel: '#alerts-inhibit-rules-critical'
            routes: |
              - matchers:
                - severity=~critical
                receiver: critical
                continue: true
              - matchers:
                - severity=warning|info
                receiver: info
                continue: true