[Affects Kublr 1.22.3 and earlier]

[Tags: azure, staticip]

TABLE OF CONTENTS

Overview
Prerequisites
Issue
Root cause
Solutions

Overview

Kublr 1.23 and later uses newer default versions of Azure API verison '2022-01-01' for azure resources.

Due to changes in availability zone default settings between different versions of Azure resources API, some resources cannot be updated when an Azure cluster created in Kublr 1.22 and earlier is updated in Kublr 1.23 and later. Azure complains about an attempt to change availability zone settings of an existing resource and interrupts the update.

The issue can be resolved by explicitly specifying the current availability zone settings for the problematic resources in the Kublr cluster specification.

Prerequisites

1. Kublr Control Plane upgraded/running with v1.23.0+ version

2. Azure cluster created in Kublr Control Plane v1.22.3 or earlier

Issue

When Azure cluster update process is started, user can get Azure Deployment error in the Events tab in UI with the following or a similar error:

Azure Location deployment failed
Failed.
{
  "code": "DeploymentFailed",
  "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.",
  "details": [
    {
      "code": "ResourceAvailabilityZonesCannotBeModified",
      "message": "Resource /subscriptions/****/resourceGroups/****/providers/Microsoft.Network/publicIPAddresses/****-MasterIp has an existing availability zone constraint 1, 2, 3 and the request has availability zone constraint NoZone, which do not match. Zones cannot be added/updated/removed once the resource is created. The resource cannot be updated from regional to zonal or vice-versa."
    }
  ]
}

In some cases, due to a known issue in UI, the error message may miss the details:

Azure Location deployment failed
failed to update location: 'Azure Location deployment failed: Failed.
{}'

In this case, the error details can be confirmed directly via the Azure portal in the corresponding cluster's Deployment Azure resource view.

Root cause

Starting with v1.23.0, Kublr uses Azure apiVersions '2022-01-01' for Azure resources in Azure Deployments.

Azure clusters, created by previous Kublr Control Plane, are created with the publicIP addresses with apiVersions '2018-08-01', which used and this IP addresses are zonal type by default on apiVersion '2022-01-01' migration.

In Deployment process, publicIP addresse try to reconfigure with noZone availability, and this causes an error.

Solutions

Solution 1: stay on old apiVersion

You update the cluster spec so that the old apiVersion is used for the impacted components: loadBalancerPublicIP, loadBalancerPrivateFrontendIPConfig and natGatewayPublicIP

Use the following cluster specification changes:

spec:
  locations:
    - azure:
        armTemplateExtras:
          loadBalancerPrivate:
            apiVersion: '2018-08-01'
          loadBalancerPublicIP:
            apiVersion: '2018-08-01'
          natGatewayPublicIP:
            apiVersion: '2018-08-01'

Solution 2: use availability zone constraint

You can explicitely specify zone constraints for the affected components in the cluster specification:

spec:
  locations:
    - azure:
        armTemplateExtras:
          loadBalancerPrivateFrontendIPConfig:
            zones: ['1', '2', '3']
          loadBalancerPublicIP:
            zones: ['1', '2', '3']
          natGatewayPublicIP:
            zones: ['1', '2', '3']

Make sure to check the specific list of the zones configured on the resources in the Azure portal: different Azure regions may have different sets of zones or not have zones at all.

Solution 3: recreate the affected resources

Important Note! This is not a recommended solution as this means that public IP used by the cluster will change, which may result in the cluster and/or workload downtime!

Open Azure portal https://portal.azure.com/ in cluster resource group:

Delete LoadBalancers named cluster-name and cluster-name-internal
Detach NATGateway cluster-name-NatGateway from all Networks
Delete PublicIP addresses cluster-name-MasterIp and cluster-name-NatIP
Delete NatGetaway cluster-name-NatGateway
Change the master group update policy in the cluster specification as follows and run the cluster update:

spec:
  master:
    updateStrategy:
      drainStrategy:
        skip: true
      rollingUpdate:
        maxUnavailable: 100%

Wait for the cluster to recover and become healthy
Change the master group update policy back to normal

Kublr

How can we help you today?

Azure: Fix zoned resources error on migration to Kublr 1.23.0+ Print

Overview

Prerequisites

Issue

Root cause

Solutions

Solution 1: stay on old apiVersion

Solution 2: use availability zone constraint

Solution 3: recreate the affected resources

How can we help you today?

Azure: Fix zoned resources error on migration to Kublr 1.23.0+ Print

Overview

Prerequisites

Issue

Root cause

Solutions

Solution 1: stay on old apiVersion

Solution 2: use availability zone constraint

Solution 3: recreate the affected resources

Related Articles