In some high-availability (HA) systems, it is essential to deploy certain components, such as Prometheus, RabbitMQ, or other critical services, on dedicated nodes to ensure optimal performance and isolation. Kublr allows you to manage these deployments effectively by using taints and tolerations to restrict where specific pods can be scheduled. This enables you to separate workloads and ensure that key services are running on the appropriate nodes, without interference from other non-critical components.
In this guide, we will demonstrate how to apply taints and tolerations specifically for Kublr components like Prometheus, Alertmanager, and Grafana, ensuring these components run only on dedicated monitoring nodes.
Step 1: Defining Taints for the Nodes
To isolate your monitoring components (like Prometheus, Alertmanager, and Grafana) on specific nodes, you first need to define taints for these nodes. This ensures that no other non-monitoring pods are scheduled on them unless they have the appropriate tolerations.
For example, let’s define a taint on the worker nodes where the monitoring components will run:
nodes: - autoscaling: true initialNodes: 1 kublrVariant: aws-ubuntu-22.04 locations: - aws: availabilityZones: - us-west-2c groupType: asg-lt instanceType: t3.xlarge sshKey: kublr name: monitoring kublrAgentConfig: taints: node-role.kublr/monitoring: node-role.kublr/monitoring=:NoSchedule locationRef: aws1
In this configuration, we add a taint on the monitoring node to prevent non-monitoring pods from running on them unless they have a matching toleration.
Step 2: Adding Tolerations for Monitoring Components
Next, to ensure that Prometheus, Alertmanager, and Grafana are deployed on the dedicated nodes, we need to add tolerations for these components. This allows them to be scheduled on nodes with the taint node-role.kublr/monitoring.
Below is the configuration for adding tolerations to the monitoring components:
monitoring: selfHosted: alertmanager: enabled: true persistent: true size: 2G grafana: enabled: true persistent: true size: 2G prometheus: enabled: true persistent: true size: 32G values: alertmanager: nodeSelector: kublr.io/node-group: monitoring tolerations: - effect: NoSchedule key: node-role.kublr/monitoring grafana: nodeSelector: kublr.io/node-group: monitoring tolerations: - effect: NoSchedule key: node-role.kublr/monitoring prometheus: nodeSelector: kublr.io/node-group: monitoring tolerations: - effect: NoSchedule key: node-role.kublr/monitoring
Here, the tolerations field ensures that these monitoring components will be allowed to run on nodes with the taint node-role.kublr/monitoring. The NoSchedule effect restricts other pods from being scheduled on those nodes unless they also have a matching toleration.
Step 3: Applying the Configuration
Once the taints and tolerations are defined, apply the configuration to your Kublr cluster. This will ensure that Prometheus, Alertmanager, and Grafana are scheduled only on the dedicated monitoring nodes, maintaining their isolation and ensuring they are not mixed with non-monitoring workloads.
Conclusion
Using taints and tolerations in Kublr allows you to isolate critical components like Prometheus and RabbitMQ on dedicated nodes. This ensures that high-availability systems have the necessary resources and isolation for important services, preventing interference from other workloads and optimizing performance.