This Knowledge Base will work any Kubernetes distribution, probably using any storage provider. In this example we will be using Openshift 4 that has been integrated into vSphere
Confirm you have run out of storage
Confirm you have run out of storage. Here is an example of the Openshift container registry filling up
$ oc get po -n openshift-image-registry NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-684cdfddf7-828kw 2/2 Running 1 23d image-pruner-1626134400-gkc8f 0/1 Completed 0 2d9h image-pruner-1626220800-7j2s8 0/1 Completed 0 33h image-pruner-1626307200-hwn6w 0/1 Completed 0 9h image-registry-58d95dc7b5-65wmx 1/1 Running 0 13d manualrun2-x97z7 0/1 Completed 0 21d node-ca-8lztl 1/1 Running 0 23d node-ca-ch2d4 1/1 Running 0 23d node-ca-dnc8s 1/1 Running 0 23d node-ca-gn22v 1/1 Running 0 23d node-ca-k4ss9 1/1 Running 0 23d node-ca-mkhxq 1/1 Running 0 23d node-ca-mmcpj 1/1 Running 0 23d node-ca-ptpmp 1/1 Running 0 23d node-ca-qd4m9 1/1 Running 0 23d node-ca-rjsbz 1/1 Running 0 23d node-ca-s72ts 1/1 Running 0 23d node-ca-v65p2 1/1 Running 0 23d quickrun3-9qwdc 0/1 Completed 0 9d
Shell into the pod
$ oc -n openshift-image-registry rsh image-registry-58d95dc7b5-65wmx sh-4.2$ df -h Filesystem Size Used Avail Use% Mounted on overlay 120G 79G 42G 66% / tmpfs 64M 0 64M 0% /dev tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup shm 64M 0 64M 0% /dev/shm tmpfs 7.9G 6.4M 7.9G 1% /etc/passwd /dev/sdd 492G 491G 364M 100% /registry tmpfs 7.9G 8.0K 7.9G 1% /etc/secrets /dev/mapper/coreos-luks-root-nocrypt 120G 79G 42G 66% /etc/hosts tmpfs 7.9G 4.0K 7.9G 1% /var/lib/kubelet tmpfs 7.9G 28K 7.9G 1% /run/secrets/kubernetes.io/serviceaccount tmpfs 7.9G 0 7.9G 0% /proc/acpi tmpfs 7.9G 0 7.9G 0% /proc/scsi tmpfs 7.9G 0 7.9G 0% /sys/firmware
Here we can see the /registry is full
/dev/sdd 492G 491G 364M 100% /registry
Determine the location of the VMDK
Next find out details of the persistent volume (PV)
$ oc get pvc -n openshift-image-registry NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE image-registry-storage Bound pvc-73d4242f-cc86-480f-b5a5-a4ff7c5b19b3 250Gi RWO thin 160d
That tells us the PV name is pvc-73d4242f-cc86-480f-b5a5-a4ff7c5b19b3
Let get the details of that PV
$ oc get pv pvc-73d4242f-cc86-480f-b5a5-a4ff7c5b19b3 -oyaml apiVersion: v1 kind: PersistentVolume metadata: annotations: kubernetes.io/createdby: vsphere-volume-dynamic-provisioner pv.kubernetes.io/bound-by-controller: "yes" pv.kubernetes.io/provisioned-by: kubernetes.io/vsphere-volume creationTimestamp: "2021-02-05T09:50:37Z" finalizers: - kubernetes.io/pv-protection name: pvc-73d4242f-cc86-480f-b5a5-a4ff7c5b19b3 resourceVersion: "246560361" selfLink: /api/v1/persistentvolumes/pvc-73d4242f-cc86-480f-b5a5-a4ff7c5b19b3 uid: 9a60e7cf-ae3e-446e-95a3-64b864300aed spec: accessModes: - ReadWriteOnce capacity: storage: 250Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: image-registry-storage namespace: openshift-image-registry resourceVersion: "246556792" uid: 73d4242f-cc86-480f-b5a5-a4ff7c5b19b3 persistentVolumeReclaimPolicy: Delete storageClassName: thin volumeMode: Filesystem vsphereVolume: fsType: ext4 volumePath: '[G15_T1_R1-R4_OpenShift_LNX4D_2419] kubevols/dev-01-rb-8gps5-dynamic-pvc-73d4242f-cc86-480f-b5a5-a4ff7c5b19b3.vmdk' status: phase: Bound
This is the important part
vsphereVolume: fsType: ext4 volumePath: '[G15_T1_R1-R4_OpenShift_LNX4D_2419] kubevols/dev-01-rb-8gps5-dynamic-pvc-73d4242f-cc86-480f-b5a5-a4ff7c5b19b3.vmdk'
That tells us the filesystem (ext4) and the location in vSphere.
Contact you friendly vSphere admin and please ask them to increase the size of the VMDK located in this DataStore "G15_T1_R1-R4_OpenShift_LNX4D_2419" and this exact location "kubevols/dev-01-rb-8gps5-dynamic-pvc-73d4242f-cc86-480f-b5a5-a4ff7c5b19b3.vmdk" to your new size.
Resize / Grow the filesystem
One your friendly vSphere administrator has increased the disk of the disk (we went from 500G to 600G) we can continue.
Force the pod to start on a new node
I have seen that even if the friendly vSphere administrator grows the VMDK, sometimes the underlying operating system does not detect the change. To force the change, we have to get the the pod to start up on a new node. If we just delete the pod, it might start up on the exact same node, and the disk will remain the same size.
Determine what node the pod is running on
$ oc -n openshift-image-registry get po -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cluster-image-registry-operator-684cdfddf7-828kw 2/2 Running 1 23d 192.168.2.13 cts-rbocpmstd03 <none> <none> image-pruner-1626134400-gkc8f 0/1 Completed 0 2d9h 192.168.8.127 cts-rbocpinfd03 <none> <none> image-pruner-1626220800-7j2s8 0/1 Completed 0 33h 192.168.8.134 cts-rbocpinfd03 <none> <none> image-pruner-1626307200-hwn6w 0/1 Completed 0 9h 192.168.6.113 cts-rbocpinfd02 <none> <none> image-registry-58d95dc7b5-65wmx 1/1 Running 0 13d 192.168.8.77 cts-rbocpinfd03 <none> <none> manualrun2-x97z7 0/1 Completed 0 21d 192.168.8.28 cts-rbocpinfd03 <none> <none> node-ca-8lztl 1/1 Running 0 23d 192.168.20.92 cts-rbocpwrkd05 <none> <none> node-ca-ch2d4 1/1 Running 0 23d 192.168.0.30 cts-rbocpmstd01 <none> <none> node-ca-dnc8s 1/1 Running 0 23d 192.168.8.4 cts-rbocpinfd03 <none> <none> node-ca-gn22v 1/1 Running 0 23d 192.168.14.19 cts-rbocpwrkd02 <none> <none> node-ca-k4ss9 1/1 Running 0 23d 192.168.10.16 cts-rbocpinfd01 <none> <none> node-ca-mkhxq 1/1 Running 0 23d 192.168.16.97 cts-rbocpwrkd03 <none> <none> node-ca-mmcpj 1/1 Running 0 23d 192.168.22.31 cts-rbocpwrkd06 <none> <none> node-ca-ptpmp 1/1 Running 0 23d 192.168.6.25 cts-rbocpinfd02 <none> <none> node-ca-qd4m9 1/1 Running 0 23d 192.168.18.251 cts-rbocpwrkd04 <none> <none> node-ca-rjsbz 1/1 Running 0 23d 192.168.13.61 cts-rbocpwrkd01 <none> <none> node-ca-s72ts 1/1 Running 0 23d 192.168.4.30 cts-rbocpmstd02 <none> <none> node-ca-v65p2 1/1 Running 0 23d 192.168.2.14 cts-rbocpmstd03 <none> <none> quickrun3-9qwdc 0/1 Completed 0 9d 192.168.8.95 cts-rbocpinfd03 <none> <none>
Here we can see the pod is running on node: cts-rbocpinfd03
Cordon that node, delete the pod, and uncordon the node
$ oc adm cordon cts-rbocpinfd03 $ oc -n openshift-image-registry delete po image-registry-58d95dc7b5-65wmx $ oc adm uncordon cts-rbocpinfd03
This will force the pod to start up on another node, which will cause the disk to detach and re-attach
Grow the filesystem
Determine the new location of the pod
$ oc -n openshift-image-registry get po -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cluster-image-registry-operator-684cdfddf7-828kw 2/2 Running 1 23d 192.168.2.13 cts-rbocpmstd03 <none> <none> image-pruner-1626134400-gkc8f 0/1 Completed 0 2d9h 192.168.8.127 cts-rbocpinfd03 <none> <none> image-pruner-1626220800-7j2s8 0/1 Completed 0 33h 192.168.8.134 cts-rbocpinfd03 <none> <none> image-pruner-1626307200-hwn6w 0/1 Completed 0 9h 192.168.6.113 cts-rbocpinfd02 <none> <none> image-registry-58d95dc7b5-cjcp7 1/1 Running 0 2m13s 192.168.10.36 cts-rbocpinfd01 <none> <none> manualrun2-x97z7 0/1 Completed 0 21d 192.168.8.28 cts-rbocpinfd03 <none> <none> node-ca-8lztl 1/1 Running 0 23d 192.168.20.92 cts-rbocpwrkd05 <none> <none> node-ca-ch2d4 1/1 Running 0 23d 192.168.0.30 cts-rbocpmstd01 <none> <none> node-ca-dnc8s 1/1 Running 0 23d 192.168.8.4 cts-rbocpinfd03 <none> <none> node-ca-gn22v 1/1 Running 0 23d 192.168.14.19 cts-rbocpwrkd02 <none> <none> node-ca-k4ss9 1/1 Running 0 23d 192.168.10.16 cts-rbocpinfd01 <none> <none> node-ca-mkhxq 1/1 Running 0 23d 192.168.16.97 cts-rbocpwrkd03 <none> <none> node-ca-mmcpj 1/1 Running 0 23d 192.168.22.31 cts-rbocpwrkd06 <none> <none> node-ca-ptpmp 1/1 Running 0 23d 192.168.6.25 cts-rbocpinfd02 <none> <none> node-ca-qd4m9 1/1 Running 0 23d 192.168.18.251 cts-rbocpwrkd04 <none> <none> node-ca-rjsbz 1/1 Running 0 23d 192.168.13.61 cts-rbocpwrkd01 <none> <none> node-ca-s72ts 1/1 Running 0 23d 192.168.4.30 cts-rbocpmstd02 <none> <none> node-ca-v65p2 1/1 Running 0 23d 192.168.2.14 cts-rbocpmstd03 <none> <none> quickrun3-9qwdc 0/1 Completed 0 9d 192.168.8.95 cts-rbocpinfd03 <none> <none>
We can see it is now running on node: cts-rbocpinfd01
Next determine the disk name by shelling into the pod
$ oc -n openshift-image-registry rsh image-registry-58d95dc7b5-cjcp7 sh-4.2$ df -h Filesystem Size Used Avail Use% Mounted on overlay 120G 96G 24G 81% / tmpfs 64M 0 64M 0% /dev tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup shm 64M 0 64M 0% /dev/shm tmpfs 7.9G 5.6M 7.9G 1% /etc/passwd /dev/sdc 492G 491G 386M 100% /registry tmpfs 7.9G 8.0K 7.9G 1% /etc/secrets /dev/mapper/coreos-luks-root-nocrypt 120G 96G 24G 81% /etc/hosts tmpfs 7.9G 4.0K 7.9G 1% /var/lib/kubelet tmpfs 7.9G 28K 7.9G 1% /run/secrets/kubernetes.io/serviceaccount tmpfs 7.9G 0 7.9G 0% /proc/acpi tmpfs 7.9G 0 7.9G 0% /proc/scsi tmpfs 7.9G 0 7.9G 0% /sys/firmware
The important line is this one
/dev/sdc 492G 491G 386M 100% /registry
That tells us the disk: /dev/sdc
And that it still things it is 500Gb
Now that we know where the pods is running (cts-rbocpinfd01) and the id of the disk (/dev/sdc) can we resize
SSH into that node, or debug into it and become root
$ oc debug node/cts-rbocpinfd01 Starting pod/cts-rbocpinfd01devocp-debug ... To use host binaries, run `chroot /host` Pod IP: 172.28.37.20 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host
Confirm the disk is correct
$ sh-4.4# df -h /dev/sdc Filesystem Size Used Avail Use% Mounted on /dev/sdc 492G 491G 386M 100% /var/lib/kubelet/pods/7416b263-4827-4405-80d0-cfb6d38d0b60/volumes/kubernetes.io~vsphere-volume/pvc-73d4242f-cc86-480f-b5a5-a4ff7c5b19b3
Now, since we know it is ext4, we can issue the resize command
$ sh-4.4# resize2fs /dev/sdc resize2fs 1.45.4 (23-Sep-2019) Filesystem at /dev/sdc is mounted on /var/lib/kubelet/plugins/kubernetes.io/vsphere-volume/mounts/[G15_T1_R1-R4_OpenShift_LNX4D_2419] kubevols/dev-01-rb-8gps5-dynamic-pvc-73d4242f-cc86-480f-b5a5-a4ff7c5b19b3.vmdk; on-line resizing required old_desc_blocks = 63, new_desc_blocks = 75 The filesystem on /dev/sdc is now 157286400 (4k) blocks long.
Confirm new size
$ sh-4.4# df -h /dev/sdc Filesystem Size Used Avail Use% Mounted on /dev/sdc 590G 491G 99G 84% /var/lib/kubelet/pods/7416b263-4827-4405-80d0-cfb6d38d0b60/volumes/kubernetes.io~vsphere-volume/pvc-73d4242f-cc86-480f-b5a5-a4ff7c5b19b3
And that is it. The filesystem has been resized, and the pod will automatically pick up the new storage.