Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-hosted runner does not pick up next job in concurrency group #3611

Open
juhhov opened this issue Dec 5, 2024 · 2 comments
Open

Self-hosted runner does not pick up next job in concurrency group #3611

juhhov opened this issue Dec 5, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@juhhov
Copy link

juhhov commented Dec 5, 2024

Describe the bug
Self-hosted runner does not pick up the next job in concurrency group.

To Reproduce
Steps to reproduce the behavior:

  1. Add one self-hosted runner to repo
  2. Add workflow to repo:
name: Concurrency test

concurrency:
  group: 'test'

on:
  workflow_dispatch:

jobs:
  status_check:
    runs-on: ['self-hosted']
    steps:
      - name: Sleep some
        run: sleep 60
  1. Trigger two jobs as fast as possible

Expected behavior
Both jobs are ran one after another.

Runner Version and Platform

2.320.0.

Ubuntu 22.04 Host.

What's not working?

First job runs ok, but second is not ran.

Job Log Output

Requested labels: self-hosted
Job defined at: xxx
Waiting for a runner to pick up this job...

Runner and Worker's Diagnostic Logs

Runner log:

024-12-05 11:34:58Z INFO HostContext] Well known directory 'Bin': '/actions-runner/bin'
[2024-12-05 11:34:58Z INFO ProcessInvokerWrapper] Starting process:
[2024-12-05 11:34:58Z INFO ProcessInvokerWrapper]   File name: '/actions-runner/bin/Runner.Worker'
[2024-12-05 11:34:58Z INFO ProcessInvokerWrapper]   Arguments: 'spawnclient 107 112'
[2024-12-05 11:34:58Z INFO ProcessInvokerWrapper]   Working directory: '/actions-runner/bin'
[2024-12-05 11:34:58Z INFO ProcessInvokerWrapper]   Require exit code zero: 'False'
[2024-12-05 11:34:58Z INFO ProcessInvokerWrapper]   Encoding web name:  ; code page: ''
[2024-12-05 11:34:58Z INFO ProcessInvokerWrapper]   Force kill process on cancellation: 'True'
[2024-12-05 11:34:58Z INFO ProcessInvokerWrapper]   Redirected STDIN: 'False'
[2024-12-05 11:34:58Z INFO ProcessInvokerWrapper]   Persist current code page: 'False'
[2024-12-05 11:34:58Z INFO ProcessInvokerWrapper]   Keep redirected STDIN open: 'False'
[2024-12-05 11:34:58Z INFO ProcessInvokerWrapper]   High priority process: 'True'
[2024-12-05 11:34:58Z INFO ProcessInvokerWrapper] Process started with process id 320, waiting for process exit.
[2024-12-05 11:34:58Z INFO JobDispatcher] Send job request message to worker for job 2f62c294-a6f4-5256-0c19-e95207e7538d.
[2024-12-05 11:34:58Z INFO ProcessChannel] Sending message of length 23043, with hash '146d8a10f069098b8edb8d868c8cdde13c56d34358238390887c61c3c75f5f1c'
[2024-12-05 11:34:58Z INFO JobNotification] Entering JobStarted Notification
[2024-12-05 11:34:58Z INFO JobNotification] Entering StartMonitor
[2024-12-05 11:34:58Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-12-05 11:35:49Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-12-05 11:35:58Z INFO JobDispatcher] Successfully renew job request 662, job is valid till 12/5/2024 11:46:05 AM
[2024-12-05 11:36:04Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2024-12-05 11:36:04Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2024-12-05 11:36:04Z INFO ProcessInvokerWrapper] Finished process 320 with exit code 100, and elapsed time 00:01:05.8564459.
[2024-12-05 11:36:04Z INFO JobDispatcher] Worker finished for job 2f62c294-a6f4-5256-0c19-e95207e7538d. Code: 100
[2024-12-05 11:36:04Z INFO JobDispatcher] finish job request for job 2f62c294-a6f4-5256-0c19-e95207e7538d with result: Succeeded
[2024-12-05 11:36:04Z INFO Terminal] WRITE LINE: 2024-12-05 11:36:04Z: Job status_check completed with result: Succeeded
[2024-12-05 11:36:04Z INFO JobDispatcher] Stop renew job request for job 2f62c294-a6f4-5256-0c19-e95207e7538d.
[2024-12-05 11:36:04Z INFO JobDispatcher] job renew has been cancelled, stop renew job request 662.
[2024-12-05 11:36:04Z INFO JobNotification] Entering JobCompleted Notification
[2024-12-05 11:36:04Z INFO JobNotification] Entering EndMonitor
[2024-12-05 11:36:04Z INFO MessageListener] Received job status event. JobState: Online
[2024-12-05 11:36:39Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-12-05 11:37:29Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...

First job worker log, there is no log for second:

[2024-12-05 11:36:03Z INFO JobServerQueue] Actions upload time: 1244 ms, Result upload time: 2455 ms
[2024-12-05 11:36:03Z INFO TempDirectoryManager] Cleaning runner temp folder: /var/cache/runner/_temp
[2024-12-05 11:36:03Z INFO HostContext] Well known directory 'Bin': '/actions-runner/bin'
[2024-12-05 11:36:03Z INFO HostContext] Well known directory 'Root': '/actions-runner'
[2024-12-05 11:36:03Z INFO HostContext] Well known directory 'Diag': '/actions-runner/_diag'
[2024-12-05 11:36:03Z INFO HostContext] Well known config file 'Telemetry': '/actions-runner/_diag/.telemetry'
[2024-12-05 11:36:03Z INFO JobRunner] Raising job completed event
[2024-12-05 11:36:03Z INFO Worker] Job completed.
@juhhov juhhov added the bug Something isn't working label Dec 5, 2024
@coleplx
Copy link

coleplx commented Dec 5, 2024

Probably related to #3609 🤔

@tkellen
Copy link

tkellen commented Dec 5, 2024

Same behavior on both supported versions on OSX as well. Worked around it for the time being with this script:

#!/usr/bin/expect -f
set timeout -1
proc run {} {
    spawn ./run.sh
    set pid [exp_pid]  ;
    puts "Running run.sh with PID $pid"
    expect {
        -re ".*with result.*" {
            puts "Restart phrase detected. Restarting run.sh..."
            exec kill -15 $pid
            wait
            return 1
        }
        eof {
            puts "run.sh finished. Restarting..."
            return 1
        }
    }
}
while {1} {
    set result [run]
    if {$result != 1} {
        break
    }
}

...pretty slow though because sometimes a pending job makes it in before the shutdown and it takes a minute to reconnect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants