Tags: security, cert-manager, CRD


TABLE OF CONTENTS


Overview


In some situations Kubernetes CRD may be stuck in a state when CR objects are inaccessible and canot be removed at the same time.


One example of conditions that may lead to a situation like that is when a CRD is upgraded to a newer version, corresponding CR objects storage format is migrated to the newer version, and then the CRD definitions is downgraded back for one reason or another combined with conversion hook issues. This leads to complete inaccessibility of the objects (error message similar to "conversion webhook ... failed ...") and inability to remove and/or edit corresponding CR and CRD objects.


Often this situation may arise with the cert-manager when a newer cert manager version installation went awry and the cert manager needs to be rolled back to the version built into Kublr.


This article describes tactics that may help to recover in this and similar situations.


1. Recover the CRD new version webhooks


The simplest method for recovery the CRD functionality is to recover the webhooks that work with the stored objects.


The downside of this method is that it does not solve the problem of rolling back, and it might not be possible in some cases.


2. Clean up CRDs


CRDs may not be accessible and removable in this situation due the finalizers that require removal of children objects, that in turn cannot be removed due to the hook failure.


If removing a CRD is not successful and hangs, the issue is most probably due to Kubernetes API server trying to remove corresponding CR objects according to the CRD finalizer(s):


CRD=clusterissuers.cert-manager.io

kubectl delete crd "${CRD}"


The finalizers can still be removed from the CRD by using the raw kubectl queries, which makes it possible to remove the CRD without cleaning up its CR objects from etcd. The following command can be used to clear finalizers from a CRD (or any other object really):


CRD=clusterissuers.cert-manager.io

kubectl get \
    --raw "/apis/apiextensions.k8s.io/v1/customresourcedefinitions/${CRD}" |
    jq 'del(.metadata.finalizers)' | 
    kubectl replace \
    --raw "/apis/apiextensions.k8s.io/v1/customresourcedefinitions/${CRD}" -f -


Removing finalizers makes it possible for Kubernetes API server to complete the CRD removal.


3. Clean up leftover CRs


Even after the CRDs are force-removed, corresponding CR objects stay in the etcd database, which may lead to the same issue after the CRDs are installed again.


These CRs can be removed from etcd database directly using etcdctl as follows:


CRD_GROUP=cert-manager.io

kubectl exec -it -n kube-system \
    "$(kubectl get pods -n kube-system -o json |
        jq -r '.items[].metadata.name|select(startswith("k8s-etcd"))|[.][0]')" -- \
    etcdctl del --prefix "/registry/${CRD_GROUP}/"