-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional slow startup times during Booting builder
step
#283
Comments
bump, also experiencing this, and this fails my workflows.. |
Same here |
I'm also seeing this but it just fails the workflows instead of being slow. It doesn't happen with every build but it does force us to rebuild when it fails. My logs are exactly the same as above, and show up under the "Booting builder" dropdown of the docker/setup-buildx-action@v3 action. |
@crazy-max I wonder if you might have some expertise in how docker and/or the buildx plugin interact, and what we might look for in terms of debugging the slowness. |
bump, we also fase issues with docker/setup-buildx-action@v3 affair on v2 it worked great so we had to rollback :( |
Also seeing this failing on our self-hosted runners, this is after running for ~6-7 minutes. Also using action runner controller on AWS EKS. It is periodic, some builds eventually work, some do not.
|
We had this problem, and we found the solution: you must set limits and requests for resources for the arc runners. There's no default requests/limits, which means that kube is free to schedule pods anywhere. This is why the error is happening at random. If a build comes in that uses lots of containers at once, kube is probably killing containers to maintain health. You have to have requests and limits configured to get kube to do intelligent scheduling. The bug here is that arc should set a sensible default, or it should error. In their helm chart they comment that they don't set a default because the user should be setting it. That's a reasonable impulse, but instead of having no requests/limits, the helm chart should fail if the user isn't setting them. |
@devonjones, those errors do pop up when a node's drive is slowed down by runner overload, but I don't think that's the root cause in this issue since this does seem to be similar to the issue that just mentioned this ticket ^. I've also seen this error on our github hosted runners. |
I just got the exact same error on a GitHub hosted runner. It worked without any issues the last days. Docker info
Buildx version
Booting builder
@devonjones: Is there a way to configure/provide the limits within the GitHub Actions file? |
Contributing guidelines
I've found a bug, and:
Description
I see that, about 25% of the time, that the
Booting builder
step can take quite some time to execute. I am running on Amazon EKS and we are using self hosted runners deployed using https://github.com/actions/actions-runner-controller. Before my runner becomes available to pick up a workflow, I see that the docker daemon has started successfully:Expected behaviour
Consistent startup time
Actual behaviour
Startup time is very slow at time.
Repository URL
No response
Workflow run URL
No response
YAML workflow
Workflow logs
The delay seems to be in the step that produces this log line:
BuildKit logs
No response
Additional info
No response
The text was updated successfully, but these errors were encountered: