[Since Kublr 1.20]


TABLE OF CONTENTS


Overview and preparation


Changing cluster physical network CIDR may be necessary for example if the cluster vnet needs to be peered with other network, and the address spaces of the networks intersect.


The following considerations must be taken into account:

  • This operation requires full cluster downtime and takes from 20 minutes for a small cluster to ~30-50 minutes for the Kublr Control Plane
  • Public IP addresses will not change - e.g. the master API public address, ingress address, or any public LoadBalancer Services addresses will stay the same.
  • IP addresses of Services of type LoadBalancer allocated on a private load balancer will change (as the private address spaces is modified). In most cases it does not pose a problem as the services are addressed by their DNS names.
  • Applications running in the cluster will not be lost, it will not be necessary to redeploy them
  • Application data will not be lost as long as it is stored on external storage (e.g. Azure managed disks, or external NFS etc)
  • Application data using local storage (stored on the cluster nodes) will be lost, so special planning is required to backup and restore such applications (note, that this is not a best practice, consider changing your deployment architecture to store data on external storage).


To prepare for the migration and to be able to troubleshoot in any unexpected situations, make sure that you have

  1. access to the Azure account and the resource group with the cluster
  2. private SSH key to access the cluster instances
  3. admin kubeconfig file for the cluster
  4. copy of the cluster specification

It is recommended that the cluster is backed up, for example using Velero.


Change address space for a managed cluster (not KCP)


1. Prepare applications running in the cluster for downtime as necessary. You may want to gracefully shut down applications  that require it to avoid data loss or corruption.


2. Open Azure portal, find the cluster resource group and manually remove the following resources (note that not all disks should be removed):

  • delete all Virtual Machines
    VM names follow one of the patterns:
    • ${cluster-name}-master${i} - for example my-cluster-master2
    • ${cluster-name}-agent-${group-name}-${i} - for example my-cluster-agent-group1-2
    • ${cluster-name}-${group-name}${i} - for example my-cluster-master2 or my-cluster-group12
  • delete all Virtual Machine Scale Sets
    VMSS names follow one of the patterns:
    • ${cluster-name}-group-vmss-${group-name}-${i} - for example my-cluster-group-vmss-group1-1
    • ${cluster-name}-group-vmss-${group-name} - for example my-cluster-group-vmss-group1
  • delete all Network Interfaces
    NIC names follow one of the patterns:
    • ${cluster-name}-masterNic${i} - for example my-cluster-masterNic2 
    • ${cluster-name}-agentNic-${group-name}-${i} - for example my-cluster-agentNic-group1-2
    • ${cluster-name}-${group-name}Nic${i} - for example my-cluster-group1Nic2
  • delete all managed OS disk
    OS disks names follow one of the patterns:
    • ${cluster-name}-master${i}-osDisk - for example my-cluster-master2-osDisk
    • ${cluster-name}-agent-${group-name}-${i}-osDisk - for example my-cluster-group1-2-osDisk
    • ${cluster-name}-${group-name}${i} - for example my-cluster-group12
  • NB! Very Important! DO NOT DELETE managed disks with data with names following the pattern:
    • ${cluster-name}-master${i}-dataDisk - for example my-cluster-master2-dataDisk
    • ${cluster-name}-dynamic-pvc-${pvc-id} - for example my-cluster-dynamic-pvc-c8a2673c-dc13-4a28-99ad-6522b510578d
  • delete all internal load balancers
    Internal load balancers names follow one of the patters:
    • ${cluster-name}-internal - for example my-cluster-internal
    • ${cluster-name}-LoadBalancer-private - for example my-cluster-LoadBalancer-private
  • delete all NAT Gateways
    NAT Gateways names follow the pattern ${cluster-name}-NatGateway - for example my-cluster-NatGateway.
    Before deleting, NAT Gateway must be disassociated from the subnets.


3. Check the Virtual Network resource and verify that there are no attached devices left.

If there are still attached devices, make sure that corresponding resources are removed or detached.

Virtual Network resource must have no attached devices for the subnet CIDR to be changed successfully.


4. Go to Kublr UI, change spec.locations[0].azure.virtualNetworkSubnetCidrBlock to the desired CIDR value and update the cluster.


In addition, for multi-master clusters make sure that simultaneous update policy is configured for masters as follows.

The update policy can also be changed for other worker groups and the cluster as a whole for faster update:


spec:
  updateStrategy:
    type: ResetUpdate # default value RollingUpdate

  master:
    updateStrategy:
      rollingUpdate:
        maxUnavailable: 100% # default value 1
      drainStrategy:
        skip: true # default value false

  nodes:
    - ...
      updateStrategy:
        rollingUpdate:
          maxUnavailable: 100% # default value 1
        drainStrategy:
          skip: true # default value false


After the cluster update finishes, the update policy may be changed back to the default 


The Azure resources will be automatically updated and/or recreated, and the cluster should start back up.


Change address space for KCP


1. Run Kublr Box container of the same version as the KCP that needs to be updated


2. Register the KCP in the Kublr Box


3. Follow the procedure for the managed cluster (above)