In most cases,  Kublr Control Plane installation use non HA mode DataBases with one replica deployment. DataBase need PV for persistent data storage, and each PV should be create in some AZ in Cloud.

By default Kublr use RollingUpdates strategy for each nodes group, and use DRAIN command for gracefully workload shutdown, before node upgraded.

Known issue and resolution

If MongoDB or PostgreSQL data base stop, Kublr cluster controller can't continue upgrade, because need to set state into DataBase. Upgrade process will stalled, while DataBases down.

In case with one worker node in one Cloud AZ, database pod can't start and waits, while node should be uncordone.

How to identify:

Upgrade process stop, cluster-controller pod in CrashLoopBack state.

How to fix:

Use skip drain strategy for worker nodes in Control Plane Cluster

  - name: {{ GROUP_NAME }}
         skip: true