Tags: ingress, cert-manager


Overview


Older versions of cert-manager component included in Kublr may contain a known issue where cert-manager cannot be updated due to a circular dependency involving cert-manage Kubernetes web hooks.


The issue is tracked in cert-manager github project at https://github.com/cert-manager/cert-manager/issues/4771


The issue manifests itself when on a cluster update attempt Kublr tries to update the ingress feature and shows the following (or similar) error for the feature in the cluster Events and Status views:


Cluster update is in process: Unable to deploy helm package: kublr-ingress (kublr-feature-ingress:1.17.1-12): could not execute command '"helm --debug upgrade --install --namespace kube-system kublr-ingress /tmp/downloads/repo.kublr.com/repository/helm/kublr-feature-ingress-1.17.1-12.tgz --reset-values --values /tmp/helm/kublr-feature-ingress110152841 --values /tmp/helm/kublr-feature-ingress017703508"': Error: UPGRADE FAILED: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://kubernetes.default.svc:443/apis/webhook.cert-manager.io/v1beta1/mutations?timeout=30s: x509: certificate has expired or is not yet valid: exit status 1


Mitigation


The problem is fixed in newer versions of cert-manager and Kublr.


For older versions use the following procedure as a workaround:


1. Run the following commands in the affected cluster:


# delete pre-upgrade hook resources potentially left over after
# unsuccessful feature update

kubectl delete clusterroles cert-manager-crd-init-kube-system
kubectl delete clusterrolebinding cert-manager-crd-init-kube-system
kubectl delete configmap -n kube-system kublr-ingress-crd
kubectl delete sa -n kube-system cert-manager-crd-init-kube-system
kubectl delete job -n kube-system kublr-ingress-certmanager-crd-job

# delete cert-manager Kubernetes API hooks causing the issue (they will
# be restored on successful feature update)

kubectl delete MutatingWebhookConfiguration kublr-ingress-certmanager-webhook
kubectl delete ValidatingWebhookConfiguration kublr-ingress-certmanager-webhook


2. Run the cluster update again