Runner to workflow pods take 3 minutes to start on RWX & containerMode: Kubernetes #3834
Open
4 tasks done
Labels
bug
Something isn't working
gha-runner-scale-set
Related to the gha-runner-scale-set mode
needs triage
Requires review from the maintainers
Checks
Controller Version
0.9.3
Deployment Method
Helm
Checks
To Reproduce
Describe the bug
After initializing the runner pod (which is fairly immediate) - the github actions jobs (6 of them) seems to get stuck polling for 2-3 minutes waiting to spin up the workflow pod to continue the github action job.
The runner pod logs show every 5-10 seconds there is a job that polls for 2-3 minutes before the container hook is called and the workflow pod is spun up.
See Line 6-52 in the scaleset logs gist below, you'll see this line get called every few seconds.
[WORKER 2024-12-03 19:21:58Z INFO HostContext] Well known directory 'Root': '/home/runner'
This bug started occuring when we switched to RWX, new storage class using NFS based azure files. I suppose it might be the slowness to provision a PVC using azure files versus traditional disk based setup on RWO
Describe the expected behavior
After initializing the runner pod on new github actions job- the workflow pods should spin up near immediately to process the docker builds from each GHA job.
Additional Context
Controller Logs
ARC Controller & Scaleset Logs: https://gist.github.com/jonathan-fileread/fd0978bef66784e20d6b50bce50cd3b9
Runner Pod Logs
ARC Controller & Scaleset Logs: https://gist.github.com/jonathan-fileread/fd0978bef66784e20d6b50bce50cd3b9
The text was updated successfully, but these errors were encountered: