Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With the RT Startup App disabled, Restart Robot Code on the DS permanently breaks comms between DS and Rio #70

Open
chauser opened this issue Dec 10, 2024 · 5 comments
Labels
bug Something isn't working NI - roboRIO Image

Comments

@chauser
Copy link

chauser commented Dec 10, 2024

Describe the bug
When the Rio setting "Disable RT Startup App" is true and a robot program is running, pressing "Restart Robot Code" on DS causes communications to the roboRIO to be lost until it is rebooted.

To Reproduce
Steps to reproduce the behavior:

  1. Enable "Disable RT Startup App" either using the roboRIO imaging tool (Edit Startup Settings) or by pressing and holding the User button on the roboRIO for 5 seconds.
  2. Reboot the roboRIO
  3. Once communication is established by the DS, press the "Restart Robot Code" button -- the code should start
  4. At any time later, press the "Restart Robot Code" button again. The Robot Code and Communications bars on the DS will turn red and the DS will no longer communicate with the roboRIO. Interestingly, other network communications such as SSH, code deploy, etc. still work.

Expected behavior
The 2024 behavior when these steps are performed is that communications are lost briefly (<15s). To restart the robot code requires pushing the "Restart Robot Code" button a second time. That seems reasonable. There might be an argument that needing to "Restart Robot Code" a second time is undesirable, but it does have its uses -- if you're trying to explore the system state without the code running, for example. (Note: when I tried the 2024 code again today it behaved the same as 2025 which is what I'd expect because the script to kill the running code is unchanged since last year.)

Desktop (please complete the following information if applicable):

  • OS: DS on Windows 10 and Windows 11
  • OS Language: English
  • Project Information: WPILib Information:
    Project Version: 2025.1.1-beta-2
    VS Code Version: 1.94.2
    WPILib Extension Version: 2025.1.1-beta-2
    C++ Extension Version: 1.22.9
    Java Extension Version: 1.36.2024092708
    Java Debug Extension Version: 0.58.2024090204
    Java Dependencies Extension Version 0.24.0
    Java Version: 17
    Java Location: /home/hauser/wpilib/2025/jdk
    Vendor Libraries:
    PathplannerLib (2025.0.0-beta-5)
    CTRE-Phoenix (v6) (25.0.0-beta-3)
    REVLib (2025.0.0-beta-3)
    Studica (2025.1.1-beta-3)
    WPILib-New-Commands (1.0.0)
    photonlib (v2025.0.0-beta-5)
    I tested also with just the Arcade Drive example project and saw the same.

roboRIO (please complete the following information if applicable):

  • roboRIO 1 or 2?: roboRIO 1; I have not tested on a 2.
  • Image version: FRC_roboRIO_2025_v1.0
@chauser chauser added the bug Something isn't working label Dec 10, 2024
@chauser
Copy link
Author

chauser commented Dec 11, 2024

Additional info: Deploy Robot Code from the WPILib menu in VSCode also kills communication if a robot program is already running (and RT Startup App is disabled).

@ThadHouse
Copy link
Member

ThadHouse commented Dec 11, 2024

This actually doesn’t surprise me at all. When you try to deploy with rt startup app disabled, the deploy hangs as well sometimes. This is why there’s a warning in gradlerio where if it detects this setting it prints a message. My guess is the DS is hanging the same way as well, as it tries to run the same command that robot deploys run.

The answer here is pretty much going to be don’t do this.

You can actually stop the robot program by sshing in as admin and running

frcKillRobot -t

@chauser
Copy link
Author

chauser commented Dec 12, 2024

More info and a couple of proposals for changes.
The reason the DS loses communication in this scenario is that if automatic app startup is disabled and the robot program is started by "Restart Robot Code" on the DS, then the robot code is in the same process group as the NetCommDaemon. The kill script kills the robot code by killing its process group. So, poof, no NetCommDaemon.

Possible fixes (so NetCommDaemon doesn't get killed): the proposed changes are only in code that is used when automatic app startup is disabled, thus hopefully relatively low risk.

  1. prefix line 52 of frcKillRobot.sh with setsid . This makes the robot code run in a new process group so NetCommDaemon is not killed by a subsequent "Restart Robot Code" or deploy from VS code. This is probably the most user-friendly choice as it allows easy control of when robot code is started using the "Restart Robot Code" button. Examples: deploy from VS Code and later start the code from the DS; observe the robot code crash, examine log windows at your leisure, and restart the code from the DS when ready. I have tested this.
  2. remove lines 51 and 52 of frcKillRobot.sh entirely. This means that when automatic app startup is disabled robot code cannot be started except by ssh-ing in and using frcRunRobot.sh. It seems to me this could be satisfactory for the kind of users who are likely to turn off automatic app startup. I have not tested this, but could if it seemed desirable.
  3. No fix; the theory behind this is that anyone who disables automatic app startup knows what they are doing and will avoid using the "Restart Robot Code" button. The problem arises if automatic startup disabled state is entered inadvertently and users end up killing DS communication which could be quite surprising. The message about enabling automatic startup that is put out by the deploy process is helpful but it didn't trigger, for me a least, a realization that what I was doing with the "Restart Robot Code" button was really outside the realm of what was expected to work, especially since it seemed to work the first time I tried it.

@chauser
Copy link
Author

chauser commented Dec 19, 2024

I don't think this is NI's issue. frcKillRobot.sh is in the GradleRIO project.

@ThadHouse
Copy link
Member

That’s still an NI script. We just do one modification to work around a Java issue from a few years ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working NI - roboRIO Image
Projects
None yet
Development

No branches or pull requests

3 participants