Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix --wait's failure to work on coredns pods #19748

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ComradeProgrammer
Copy link
Member

@ComradeProgrammer ComradeProgrammer commented Oct 3, 2024

FIX #19288
Before: minikube start --wait=all may end when coredns is not ready
After: minikube start --wait=all will be able to wait until coredns is completly ready

The situation mentioned in #19288 was actually introduced by the HA-cluster PR. By default coredns has a deployment consists 2 coredns pods. However in pkg/minikube/node/start.go:158 it was manyally scaled down to 1. This happens before we start to wait for those essential nodes(minikube waits for nodes at line 236).

When minikube waits for system pods, there were 2 checks which will check the system pods's status:

  • WaitExtra (pkg/minikube/bootstrapper/bsutil/kverify/pod_ready.go) will list all pods with the given labels, and check whether they are ready
  • ExpectAppsRunning will list all the pods in kube-system namespace (pkg/minikube/bootstrapper/bsutil/kverify/system_pods.go:91 , in function ExpectAppsRunning), and check whether there are at least 1 running pod for some essential labels. But the bug is that it only check the running state, and do not check the ready state

After the HA-cluster PR was introduced, when minikube run WaitExtra funtion(the 1st check), one of the coredns pod's status can be Succeed. WaitExtra don't recognize this state and will print an error and break the checking loop. This logic is written at pkg/minikube/bootstrapper/bsutil/kverify/pod_ready.go line 99 and line 69.

The error I see is

E1003 22:37:04.296076   17140 pod_ready.go:66] WaitExtra: waitPodCondition: pod "coredns-7db6d8ff4d-nljst" in "kube-system" namespace has status phase "Succeeded" (skipping!): {Phase:Succeeded Conditions:[{Type:PodReadyToStartContainers Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2024-10-03 22:37:04 +0200 CEST Reason: Message:} {Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2024-10-03 22:36:53 +0200 CEST Reason:PodCompleted Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2024-10-03 22:36:53 +0200 CEST Reason:PodCompleted Message:} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2024-10-03 22:36:53 +0200 CEST Reason:PodCompleted Message:} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2024-10-03 22:36:53 +0200 CEST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.49.2 HostIPs:[{IP:192.168.49.2}] PodIP: PodIPs:[] StartTime:2024-10-03 22:36:53 +0200 CEST InitContainerStatuses:[] ContainerStatuses:[{Name:coredns State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:0,Signal:0,Reason:Completed,Message:,StartedAt:2024-10-03 22:36:54 +0200 CEST,FinishedAt:2024-10-03 22:37:04 +0200 CEST,ContainerID:docker://281a8c3106510cdc16a6d3f91ad2e9f5d7aa1609fac4f0e8f7494af67cb8b5d6,}} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:registry.k8s.io/coredns/coredns:v1.11.1 ImageID:docker-pullable://registry.k8s.io/coredns/coredns@sha256:1eeb4c7316bacb1d4c8ead65571cd92dd21e27359f0d4917f1a5822a73b75db1 ContainerID:docker://281a8c3106510cdc16a6d3f91ad2e9f5d7aa1609fac4f0e8f7494af67cb8b5d6 Started:0x14001e58e00 AllocatedResources:map[] Resources:nil VolumeMounts:[]}] QOSClass:Burstable EphemeralContainerStatuses:[] Resize: ResourceClaimStatuses:[]}

The when minikube arrives at ExpectAppsRunning(the 2nd check), it doesn't check the ready state, so it believes that all pods are ok. This causes the #19288

So the fix is to make ExpectAppsRunning(the 2nd check) check the ready state as well.

(The reason why I didn't make the 1st check function to recognized the Succeed state is that: if for some reason there is a job (e.g. init job for some containers) in kube-system namespace, and we change the WaitExtra's logic to reject Succeed state, there will be problems.)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ComradeProgrammer
Once this PR has been reviewed and has the lgtm label, please assign spowelljr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 3, 2024
@ComradeProgrammer
Copy link
Member Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Oct 3, 2024
@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@prezha
Copy link
Contributor

prezha commented Oct 6, 2024

hey @ComradeProgrammer thanks for looking into this

here's a bit more context that might help you:

wait for system-critical pods' Ready condition implementation was intentionally made alongside the Running checks, ie, the former would be called only if wait was explicitly requested and would add some startup delay, whereas the latter would almost always be called for quickest startup time

so, as the ExpectAppsRunning is called by WaitForAppsRunning, which, in turn, is called by WaitForNode that is always called (with the exception for the first ha control plane), by adding the Ready check to the ExpectAppsRunning, we'd effectively always wait, which is not the intention

on the other hand, only if wait is required, WaitExtra is called by WaitForNode (but before the WaitForAppsRunning) or by restartPrimaryControlPlane

also, for ha + wait, iirc the idea was that we'd be ok with waiting for at least one coredns pod to be ready, as the kube-dns service would take care of routing requests to the pod(s) that can process them

example from #19288 shows only one coredns, so it's not a ha cluster, but makes a very good point:

sometimes the list of pods is pulled before some of the pods have even been created, resulting in them not being in the waiting check

i think that the fix should be made in WaitExtra, and there we could eg, invert the logic so not to list all pods once and then loop through it waiting for each pod that's also on a system-critical list to become Ready (as we do now), but instead to wait until all system-critical pods (which is a fixed list: kverify.CorePodsLabels) became Ready, re-fetching the pod's status as needed

as for Succeed status phase, that means that "All containers in the Pod have terminated in success, and will not be restarted", so it is handled - by skipping it (ie, it will never become Ready, so no point waiting for it)

Copy link
Member

@spowelljr spowelljr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried this but coredns still wasn't ready.

$ minikube start --wait=all
😄  minikube v1.34.0 on Debian rodete (kvm/amd64)
✨  Automatically selected the docker driver
📌  Using Docker driver with root privileges
👍  Starting "minikube" primary control-plane node in "minikube" cluster
🚜  Pulling base image v0.0.45-1727731891-master ...
🔥  Creating docker container (CPUs=2, Memory=26100MB) ...
🐳  Preparing Kubernetes v1.31.1 on Docker 27.3.1 ...
    ▪ Generating certificates and keys ...
    ▪ Booting up control plane ...
    ▪ Configuring RBAC rules ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

$ kubectl get pods -A
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
kube-system   coredns-7c65d6cfc9-qjrwh           0/1     Running   0          9s
kube-system   etcd-minikube                      1/1     Running   0          14s
kube-system   kube-apiserver-minikube            1/1     Running   0          14s
kube-system   kube-controller-manager-minikube   1/1     Running   0          15s
kube-system   kube-proxy-c8mdp                   1/1     Running   0          9s
kube-system   kube-scheduler-minikube            1/1     Running   0          16s
kube-system   storage-provisioner                1/1     Running   0          9s

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 11, 2024
@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 19748) |
+----------------+----------+---------------------+
| minikube start | 49.0s    | 49.7s               |
| enable ingress | 16.8s    | 16.5s               |
+----------------+----------+---------------------+

Times for minikube ingress: 16.5s 18.5s 15.5s 18.5s 15.0s
Times for minikube (PR 19748) ingress: 18.5s 15.5s 18.5s 15.5s 14.5s

Times for minikube start: 46.7s 51.8s 49.4s 47.9s 49.3s
Times for minikube (PR 19748) start: 50.6s 49.7s 48.2s 47.3s 52.8s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 19748) |
+----------------+----------+---------------------+
| minikube start | 21.8s    | 23.0s               |
| enable ingress | 13.2s    | 13.0s               |
+----------------+----------+---------------------+

Times for minikube start: 24.4s 21.6s 20.5s 21.4s 21.0s
Times for minikube (PR 19748) start: 24.1s 24.2s 24.2s 20.9s 21.3s

Times for minikube ingress: 12.8s 13.8s 13.8s 12.3s 13.3s
Times for minikube (PR 19748) ingress: 13.3s 12.8s 13.3s 12.3s 13.3s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 19748) |
+----------------+----------+---------------------+
| minikube start | 22.2s    | 20.6s               |
| enable ingress | 38.9s    | 38.2s               |
+----------------+----------+---------------------+

Times for minikube ingress: 39.3s 38.8s 38.8s 38.8s 38.8s
Times for minikube (PR 19748) ingress: 40.3s 39.8s 33.3s 38.8s 38.8s

Times for minikube start: 20.6s 23.4s 20.3s 23.2s 23.7s
Times for minikube (PR 19748) start: 19.3s 20.5s 23.1s 20.3s 19.7s

@minikube-pr-bot
Copy link

Here are the number of top 10 failed tests in each environments with lowest flake rate.

Environment Test Name Flake Rate
Docker_macOS (1 failed) TestMultiControlPlane/serial/StartCluster(gopogh) 0.00% (chart)
Docker_Linux (1 failed) TestMultiControlPlane/serial/StartCluster(gopogh) 0.00% (chart)
Docker_Linux_containerd (1 failed) TestMultiControlPlane/serial/StartCluster(gopogh) 0.00% (chart)
Docker_Linux_crio (3 failed) TestMultiControlPlane/serial/StartCluster(gopogh) 0.00% (chart)
Docker_Linux_crio_arm64 (5 failed) TestMultiControlPlane/serial/StartCluster(gopogh) 0.00% (chart)
Docker_Linux_crio_arm64 (5 failed) TestFunctional/parallel/PersistentVolumeClaim(gopogh) 1.10% (chart)
Docker_Linux_crio_arm64 (5 failed) TestScheduledStopUnix(gopogh) 2.20% (chart)
KVM_Linux_crio (10 failed) TestMultiControlPlane/serial/StartCluster(gopogh) 0.00% (chart)
KVM_Linux_crio (10 failed) TestStartStop/group/newest-cni/serial/SecondStart(gopogh) 0.00% (chart)
Docker_Linux_docker_arm64 (1 failed) TestMultiControlPlane/serial/StartCluster(gopogh) 0.00% (chart)
KVM_Linux_containerd (3 failed) TestMultiControlPlane/serial/StartCluster(gopogh) 0.00% (chart)
KVM_Linux_containerd (3 failed) TestStartStop/group/no-preload/serial/SecondStart(gopogh) 0.00% (chart)
KVM_Linux_containerd (3 failed) TestStartStop/group/default-k8s-diff-port/serial/SecondStart(gopogh) 0.00% (chart)
Hyper-V_Windows (9 failed) TestMultiControlPlane/serial/StartCluster(gopogh) 0.00% (chart)
Hyper-V_Windows (9 failed) TestPause/serial/VerifyDeletedResources(gopogh) 0.00% (chart)
Docker_Linux_containerd_arm64 (1 failed) TestMultiControlPlane/serial/StartCluster(gopogh) 0.00% (chart)

Besides the following environments also have failed tests:

To see the flake rates of all tests by environment, click here.

if err != nil {
return err
}
if len(corednsPods.Items) == 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a flag that is called disable-optimizations ..or something like that, that would make the coredns 2 replicas
in that case it would be not working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--wait flag sometimes misses pods
6 participants