In some high-availability (HA) systems, it is essential to deploy certain components, such as Prometheus, RabbitMQ, or other critical services, on dedicated nodes to ensure optimal performance and isolation. Kublr allows you to manage these deployments effectively by using taints and tolerations to restrict where specific pods can be scheduled. This enables you to separate workloads and ensure that key services are running on the appropriate nodes, without interference from other non-critical components.


In this guide, we will demonstrate how to apply taints and tolerations specifically for Kublr components like Prometheus, Alertmanager, and Grafana, ensuring these components run only on dedicated monitoring nodes.


Step 1: Defining Taints for the Nodes


To isolate your monitoring components (like Prometheus, Alertmanager, and Grafana) on specific nodes, you first need to define taints for these nodes. This ensures that no other non-monitoring pods are scheduled on them unless they have the appropriate tolerations.


For example, let’s define a taint on the worker nodes where the monitoring components will run:

nodes:
  - autoscaling: true
    initialNodes: 1
    kublrVariant: aws-ubuntu-22.04
    locations:
      - aws:
          availabilityZones:
            - us-west-2c
          groupType: asg-lt
          instanceType: t3.xlarge
          sshKey: kublr
    name: monitoring
    kublrAgentConfig:
      taints:
        node-role.kublr/monitoring: node-role.kublr/monitoring=:NoSchedule
    locationRef: aws1

In this configuration, we add a taint on the monitoring node to prevent non-monitoring pods from running on them unless they have a matching toleration.


Step 2: Adding Tolerations for Monitoring Components


Next, to ensure that Prometheus, Alertmanager, and Grafana are deployed on the dedicated nodes, we need to add tolerations for these components. This allows them to be scheduled on nodes with the taint node-role.kublr/monitoring.


Below is the configuration for adding tolerations to the monitoring components:

monitoring:
  selfHosted:
    alertmanager:
      enabled: true
      persistent: true
      size: 2G
    grafana:
      enabled: true
      persistent: true
      size: 2G
    prometheus:
      enabled: true
      persistent: true
      size: 32G
  values:
    alertmanager:
      nodeSelector:
        kublr.io/node-group: monitoring
      tolerations:
        - effect: NoSchedule
          key: node-role.kublr/monitoring
    grafana:
      nodeSelector:
        kublr.io/node-group: monitoring
      tolerations:
        - effect: NoSchedule
          key: node-role.kublr/monitoring
    prometheus:
      nodeSelector:
        kublr.io/node-group: monitoring
      tolerations:
        - effect: NoSchedule
          key: node-role.kublr/monitoring

Here, the tolerations field ensures that these monitoring components will be allowed to run on nodes with the taint node-role.kublr/monitoring. The NoSchedule effect restricts other pods from being scheduled on those nodes unless they also have a matching toleration.


Step 3: Applying the Configuration


Once the taints and tolerations are defined, apply the configuration to your Kublr cluster. This will ensure that Prometheus, Alertmanager, and Grafana are scheduled only on the dedicated monitoring nodes, maintaining their isolation and ensuring they are not mixed with non-monitoring workloads.


Conclusion


Using taints and tolerations in Kublr allows you to isolate critical components like Prometheus and RabbitMQ on dedicated nodes. This ensures that high-availability systems have the necessary resources and isolation for important services, preventing interference from other workloads and optimizing performance.