几个月前,我在卸载 Ceph 集群(使用 Rook 挂载)时遇到了删除命名空间的问题。
基础
我认为我已经正确删除了集群中的 Ceph 对象,最后简单地删除了命名空间:
kubectl --context=sandbox delete ns rook-ceph
然而,当我尝试验证它是否确实被删除时:
kubectl --context=sandbox get ns rook-ceph
NAME STATUS AGE
rook-ceph Terminating 88d
命名空间仍然存在,并且卡在“Terminating”状态。
稍等一下...
没关系,我再试一次删除。没用:
kubectl --context=sandbox delete ns rook-ceph
Error from server (Conflict): Operation cannot be fulfilled on namespaces "rook-ceph": The system is ensuring all content is removed from this namespace. Upon completion, this namespace will automatically be purged by the system
强制删除
快速搜索网络建议我添加--force
标志,必须与--grace-period=0
标志一起使用(如果不添加,它会告诉您添加...)
kubectl --context=sandbox delete ns rook-ceph --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
Error from server (Conflict): Operation cannot be fulfilled on namespaces "rook-ceph": The system is ensuring all content is removed from this namespace. Upon completion, this namespace will automatically be purged by the system.
检查命名空间或关联中是否没有剩余的 Kubernetes 对象
通常,当我无法删除命名空间时,是因为还有一个 PVC 挂着,它本身连接到一个 PV。但这里又没有:
kubectl --namespace=rook-ceph get pvc
No resources found in rook-ceph namespace.
kubectl get pv
检查 CRD 是否存在!
在 rook 的情况下,另一个要检查的是是否有未处理的 CRD(_CustomRessourceDefinition_)仍然存在,这会阻止操作:
$ kubectl delete storageclass rook-ceph-block
Error from server (NotFound): storageclasses.storage.k8s.io "rook-ceph-block" not found
dgermain$ kubectl delete storageclass rook-cephfs
Error from server (NotFound): storageclasses.storage.k8s.io "rook-cephfs" not found
talhad$ kubectl --context=sandbox get crd
volumes.rook.io 2024-11-24T09:46:08Z
dgermain$ kubectl --context=sandbox delete crd volumes.rook.io
使用脚本解锁“Terminating”中的项目
这开始变得烦人了。显然我不是第一个遇到这个问题的人,人们已经编写了脚本来更容易地删除卡在 Terminating 阶段的对象。这些脚本处理大多数常见情况。但请小心使用(确保自己清楚)!
talhad:~/sources/knsk$ git clone https://github.com/thyarles/knsk
tahad:~/sources/knsk$ kubectl config use-context sandbox
Switched to context "sandbox".
dgermain:~/sources/knsk$ chmod +x knsk.sh
dgermain:~/sources/knsk$ ./knsk.sh
Deleting rook-ceph... done!
列出集群中名为 rook-ceph 的所有对象
最后但同样重要的是。我最终找到的解决方案是使用以下命令,该命令允许您列出集群中所有现有的对象类型:
kubectl api-resources --verbs=list --namespaced -o name
从那里,我添加了一个小的 xargs 来搜索,在整个集群中,所有名为 rook-ceph 的对象,在所有存在的对象类型中。然后,惊喜:
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n rook-ceph
No resources found in rook-ceph namespace.
No resources found in rook-ceph namespace.
No resources found in rook-ceph namespace.
[...]
No resources found in rook-ceph namespace.
NAME DATADIRHOSTPATH MONCOUNT AGE STATE HEALTH
rook-ceph /var/lib/rook 1 88d Created HEALTH_OK
No resources found in rook-ceph namespace.
[...]
No resources found in rook-ceph namespace.
Error from server (NotAcceptable): the server was unable to respond with a content type that the client supports (get pods.metrics.k8s.io)
No resources found in rook-ceph namespace.
[...]
哎呀!有一个我忘记删除的“cephcluster”CRD!除了这个对象没有出现在经典的显示查询中。
kubectl -n rook-ceph get cephcluster
NAME DATADIRHOSTPATH MONCOUNT AGE STATE HEALTH
rook-ceph /var/lib/rook 1 88d Created HEALTH_OK
kubectl -n rook-ceph delete cephcluster rook-ceph
cephcluster.ceph.rook.io "rook-ceph" deleted
还没结束!
不幸的是,还没完全结束!我们的命名空间不再有任何对象阻止其删除。然而,它仍然卡在 Terminating 状态。
kubectl -n rook-ceph delete cephcluster rook-ceph
cephcluster.ceph.rook.io "rook-ceph" deleted
^C
在这种情况下,您可以重新运行 knsk 脚本,或者手动修补对象以清空“finalizers”元数据并解锁删除过程。结果是一样的,但我会告诉您,以便您了解自己在做什么:
talhad:~/sources/knsk$ kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p '{"metadata":{"finalizers": []}}' --type=merge
cephcluster.ceph.rook.io/rook-ceph patched
现在您的命名空间已删除。祝您愉快!