ARC in Kubernetes mode issue with workflow nodes scheduling #3376
Replies: 3 comments 4 replies
-
One of the ideas to address this issue and get rid of PVs is to push required data from |
Beta Was this translation helpful? Give feedback.
-
I am also running into this same issue with using Kubernetes mode. There are some alternative options I thought of (some are not GREAT though...):
If I'd had to chose an option, 3 would be ideal. |
Beta Was this translation helpful? Give feedback.
-
@DenisPalnitsky thanks for the nice description of the issue. I think most people solved this issue with Yet it;s not optimal: ReadWriteMany is either slow or expensive. So actions/runner-container-hooks#160 makes a lot of sense. |
Beta Was this translation helpful? Give feedback.
-
I'm testing ARC in Kubernetes mode, and I can't figure out how it could work reliably without failing jobs. Here's the problem I'm facing.
Input:
Imagine that we have small nodes that can only accommodate three containers. In that scenario, if ARC needs to schedule two jobs, it will schedule "Runner Pod 1" and "Workflow Pod 1" on the first node. Then, to run the second job, it will schedule "Runner Pod 2" on the first node, but the node will run out of capacity. Therefore, "Workflow Pod 2" cannot be scheduled on Node1 (due to no resources) and cannot be scheduled on Node2 because the PV is attached to Node1. This causes the job to fail.
With larger nodes, the situation may get even worse. Kubernetes can schedule multiple Runner Pods on one node, and there will be no capacity to schedule multiple workflow jobs there.
The fundamental problem is that when a job is scheduled, K8s needs to know in advance the resources that will be used by two pods (runner and worker), and there is no way to let the k8s scheduler know that in advance because the second pod is created by the first one.
Is there anything I'm missing that could address this issue?
Beta Was this translation helpful? Give feedback.
All reactions