Skip to content
This repository has been archived by the owner on Nov 6, 2019. It is now read-only.

WPT runner timeout #564

Open
jugglinmike opened this issue May 18, 2018 · 2 comments
Open

WPT runner timeout #564

jugglinmike opened this issue May 18, 2018 · 2 comments

Comments

@jugglinmike
Copy link
Collaborator

A recent collection failed due to an as-yet unknown error originating in the WPT CLI.

The collection was for Safari as provided by Sauce Labs, specifically "chunk" number 84 of 100.

We have previously experienced unexpected closure of the Sauce Labs Connect tunnel (see gh-544). At first glance, the logs from that process seemed to indicate that the latest error was due to the same problem:

20180517 220229.293 [16324] Secure remote tunnel VM provisioned.
20180517 220229.293 [16324] Tunnel ID: 49a19b33c8a349d59dc353c8411ccfee
20180517 220229.293 [16324] Using no proxy for connecting to tunnel VM.
20180517 220229.294 [16324] Starting Selenium listener...
20180517 220229.294 [16324] Establishing secure TLS connection to tunnel...
20180517 220229.294 [16324] [EVDNS] Added nameserver 10.101.0.2:53 as 0x7fa79800dc50
20180517 220229.295 [16324] MAIN connecting to maki82227.miso.saucelabs.com:443
20180517 220229.392 [16324] MAIN connecting to KGP server maki82227.miso.saucelabs.com took 96 ms
20180517 220229.544 [16324] Selenium listener started on port 4445.
20180517 220230.548 [16324] Sauce Connect is up, you may start your tests.

(about 18,000 lines elidated)

20180517 222103.273 [16324] PROXY 127.0.0.1:36670 (172.20.47.221) <- 200 web-platform.test:8000 (537 bytes)
20180517 222104.613 [16324] PROXY resolved 'web-platform.test' to '127.0.0.1'
20180517 222104.613 [16324] PROXY 127.0.0.1:36684 (172.20.47.221) -> GET http://web-platform.test:8000/css/vendor-imports/mozilla/mozilla-central-reftests/ib-split/emptyspan-4.html (618 bytes)
20180517 222104.615 [16324] PROXY 127.0.0.1:36684 (172.20.47.221) <- 200 web-platform.test:8000 (711 bytes)
20180517 222106.466 [16324] PROXY resolved 'web-platform.test' to '127.0.0.1'
20180517 222106.466 [16324] PROXY 127.0.0.1:36696 (172.20.47.221) -> GET http://web-platform.test:8000/css/vendor-imports/mozilla/mozilla-central-reftests/ib-split/emptyspan-4-ref.html (622 bytes)
20180517 222106.469 [16324] PROXY 127.0.0.1:36696 (172.20.47.221) <- 200 web-platform.test:8000 (537 bytes)
20180517 225236.457 [16324] Got signal terminated
20180517 225236.460 [16324] Cleaning up.
20180517 225236.460 [16324] Removing tunnel 49a19b33c8a349d59dc353c8411ccfee.
20180517 225236.460 [16324] Waiting for any active jobs using this tunnel to finish.
20180517 225236.460 [16324] Press CTRL-C again to shut down immediately.
20180517 225236.460 [16324] Note: if you do this, tests that are still running will fail.

However, in this case, there were 30 minutes of inactivity prior to the closure. The Sauce Labs documentation is pretty clear about this:

As a safety measure to prevent tests from running indefinitely, Sauce limits
the duration of tests to 30 minutes by default. You can adjust this limit on
per-job basis and the maximum value is 10800 seconds.

https://wiki.saucelabs.com/display/DOCS/Test+Configuration+Options

This shifts the focus to the WPT CLI: why was the test hanging? Here are the WPT CLI logs at the time of the Sauce Labs Connect closure:

2018-05-17 22:20:41,996 INFO validate-wpt-results wpt-run:stdout Starting runner
2018-05-17 22:20:56,539 INFO validate-wpt-results wpt-run:stdout TEST-START | /css/vendor-imports/mozilla/mozilla-central-reftests/ib-split/emptyspan-1.html
2018-05-17 22:20:56,540 INFO validate-wpt-results wpt-run:stdout Test requires OS-level window focus
2018-05-17 22:20:59,191 INFO validate-wpt-results wpt-run:stdout TEST-PASS | /css/vendor-imports/mozilla/mozilla-central-reftests/ib-split/emptyspan-1.html | took 2651ms
2018-05-17 22:20:59,191 INFO validate-wpt-results wpt-run:stdout TEST-START | /css/vendor-imports/mozilla/mozilla-central-reftests/ib-split/emptyspan-2.html
2018-05-17 22:20:59,192 INFO validate-wpt-results wpt-run:stdout Test requires OS-level window focus
2018-05-17 22:21:01,710 INFO validate-wpt-results wpt-run:stdout TEST-PASS | /css/vendor-imports/mozilla/mozilla-central-reftests/ib-split/emptyspan-2.html | took 2519ms
2018-05-17 22:21:01,711 INFO validate-wpt-results wpt-run:stdout TEST-START | /css/vendor-imports/mozilla/mozilla-central-reftests/ib-split/emptyspan-3.html
2018-05-17 22:21:01,712 INFO validate-wpt-results wpt-run:stdout Test requires OS-level window focus
2018-05-17 22:21:04,114 INFO validate-wpt-results wpt-run:stdout TEST-PASS | /css/vendor-imports/mozilla/mozilla-central-reftests/ib-split/emptyspan-3.html | took 2404ms
2018-05-17 22:21:04,115 INFO validate-wpt-results wpt-run:stdout TEST-START | /css/vendor-imports/mozilla/mozilla-central-reftests/ib-split/emptyspan-4.html
2018-05-17 22:21:04,116 INFO validate-wpt-results wpt-run:stdout Test requires OS-level window focus
2018-05-17 22:21:26,352 INFO validate-wpt-results wpt-run:stdout TEST-UNEXPECTED-TIMEOUT | /css/vendor-imports/mozilla/mozilla-central-reftests/ib-split/emptyspan-4.html | expected PASS
2018-05-17 22:21:26,352 INFO validate-wpt-results wpt-run:stdout TEST-INFO took 22238ms
2018-05-17 22:21:36,353 INFO validate-wpt-results wpt-run:stdout Forcibly terminating runner process
2018-05-17 22:21:36,362 INFO validate-wpt-results wpt-run:stdout Starting runner
2018-05-17 22:26:46,364 INFO validate-wpt-results wpt-run:stdout Forcibly terminating runner process
2018-05-17 22:26:46,371 INFO validate-wpt-results wpt-run:stdout u'stop': ()
2018-05-17 22:26:46,376 INFO validate-wpt-results wpt-run:stdout Starting runner
2018-05-17 22:31:56,378 INFO validate-wpt-results wpt-run:stdout Forcibly terminating runner process
2018-05-17 22:31:56,385 INFO validate-wpt-results wpt-run:stdout u'stop': ()
2018-05-17 22:31:56,388 INFO validate-wpt-results wpt-run:stdout Starting runner
2018-05-17 22:37:06,389 INFO validate-wpt-results wpt-run:stdout Forcibly terminating runner process
2018-05-17 22:37:06,396 INFO validate-wpt-results wpt-run:stdout u'stop': ()
2018-05-17 22:37:06,400 INFO validate-wpt-results wpt-run:stdout Starting runner
2018-05-17 22:42:16,401 INFO validate-wpt-results wpt-run:stdout Forcibly terminating runner process
2018-05-17 22:42:16,408 INFO validate-wpt-results wpt-run:stdout u'stop': ()
2018-05-17 22:42:16,411 INFO validate-wpt-results wpt-run:stdout Starting runner
2018-05-17 22:47:26,413 INFO validate-wpt-results wpt-run:stdout Forcibly terminating runner process
2018-05-17 22:47:26,422 INFO validate-wpt-results wpt-run:stdout u'stop': ()
2018-05-17 22:47:26,423 INFO validate-wpt-results wpt-run:stdout Starting runner
2018-05-17 22:52:36,427 INFO validate-wpt-results wpt-run:stdout Forcibly terminating runner process
2018-05-17 22:52:36,434 INFO validate-wpt-results wpt-run:stdout u'stop': ()
2018-05-17 22:52:36,434 INFO validate-wpt-results wpt-run:stdout Max restarts exceeded
2018-05-17 22:52:36,436 INFO validate-wpt-results wpt-run:stdout Got 492 unexpected results
2018-05-17 22:52:36,440 INFO validate-wpt-results wpt-run:stdout SUITE-END | took 3005s
2018-05-17 22:52:36,457 INFO validate-wpt-results wpt-run:stdout 17 May 22:52:36 - Got signal terminated
2018-05-17 22:52:36,461 INFO validate-wpt-results wpt-run:stdout 17 May 22:52:36 - Cleaning up.
2018-05-17 22:52:36,461 INFO validate-wpt-results wpt-run:stdout 17 May 22:52:36 - Removing tunnel 49a19b33c8a349d59dc353c8411ccfee.

With 5 minutes of silence between each "starting" and "terminating" output, it appears as though the "TestRunnerManager" instance is timing out.

To learn more, we will have to increase the level of debugging. I am reluctant to do that given the significant increase in storage this will require and the very low frequency of this event. I'm filing this issue so we have a place to report future occurrences and reevaluate the necessity of further investigation.

Finally, while the system is capable of re-trying collection in response to failure, recovery failed in this case due to a previously-reported bug in the WPT CLI: web-platform-tests/wpt#10724).

@jugglinmike
Copy link
Collaborator Author

We experienced this bug again today. It again occurred on a Safari trial, but this time, it was "chunk" number 77 of 100. This suggests that the problem is not triggered by any particular test. It's no less likely a Safari-specific problem, though.

We run 100 "chunks" for each of two browsers every 24 hours, and we haven't experienced this failure in 11 days. Even if we assume it is Safari-specific, that's still only a rate of 0.2%. I'm inclined to continue handling this manually.

@gsnedders
Copy link
Member

It's probably something related to "restarting" the browser and that not working?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants