Agents become offline if build is stopped during an ssh command exection

ktopaz · November 9, 2023, 2:15pm

Hi Guys,

We sometimes have issues where agents become offline and have to be killed and restarted to come online again.
I haven’t run the agents in debug mode yet because the problem was sporadic
But I think I’ve identified the problem:
Our iOS developers sometimes stop their running builds while it is still running
these specific builds are triggering workflows on MAC machines via SSH, and these take long time to complete.
I noticed agents become offline after having one of these builds stopped by a user while running on them during the “SSH Run Script” Action execution.
I’ll try to enable debug mode soon and gather more information for you on this.

Vincent · November 9, 2023, 9:11pm

Hi

Agents periodically contact the server to let it know they are still available - so for an agent to go offline it will have stopped doing that. We would definitely need to see an agent debug log to figure out if the agent is crashing.

When you say they take a long time to complete, how long? We don’t do ios builds here but we could certainly create the same scenario with ssh calling out to a linux machine to so something long running.

Sparky · November 17, 2023, 6:26am

Hi Arik,

We’ve identified an issue where the SSH Run Script action could enter into a loop using high CPU. This is likely to be the cause of your issue. The fix for this is in version 1.9.2.1234

ktopaz · March 28, 2024, 1:45pm

Hi Guys,

Somehow this issue re-surfaced for us after a while (not sure exactly when)
Its the same deal -
If someone STOPS a build during an “SSH Run Script” action execution -
SOME of times this will cause the agent to become unresponsive and shown as offline under /administration/ci/agents/viewall

When this happens - although the agent service appears to be running on the agent VM - we’re unable to restart it and have to kill the process from task manager before starting it correctly.

We’re running the latest release - 1.9.2.1293
Could you kindly help us check it again?

Thanks!

Sparky · April 2, 2024, 4:12am

Hi Arik,

We are unable to reproduce this or find any possible cause in the code. We would need an agent debug log and details of the SSH Run Script action settings to investigate this further.

Agents become offline if build is stopped during an ssh command exection

Products

Support

Resources

Company