Error Synching Workspace

This was reported last year, but we’re still seeing the same issue. Almost 100% of the time.

The first stage of a job builds the Delphi DUnit test suite, runs the tests and then imports them via the Import/Evaluate DUnitX action. The stage then does a agent to server workspace synch and registers some artifacts.
This part works fine.

The next stage starts with a server to agent workspace synch and fails. The file it cannot copy is the Delphi Unit Test exe from the first stage, and an error is returned.

An error occurred while syncing files from the server to the agent. Details: Exception: UnauthorizedAccessException
Message: (5) Access is denied: [\PEGBUILDP2\Continua\Ws\HealthOne_Integration\5690\Output\FHIRServerTests.exe]

Do you have any suggestions on how we can fix this as it is breaking our build.

Hi @dev.licensing,

Usually an access denied error implies that the executable is still running. Are you able to check whether it is listed as a running process on the agent when the error occurs?

Another possible reason is that it is being locked by real-time anti-virus protection. We generally recommend that the server and agent workspace folders are excluded from real-time anti-virus processing as this can significantly impact performance as well as lock files. Run scheduled anti-virus scans instead at times when the build server is not busy.

Otherwise, can you send us details your build configuration so that we can see if there is any other reason why the file might be locked? Preferably, export the configuration from the Import/Export page in the Administration section, or alternatively generate a diagnostics report on the Event Log page. Send the file(s) to me via direct message on these forums or via email to support at finalbuilder.com.

Meanwhile if the file is not required for the second stage, you should be able to workaround the issue by excluding it from the Workspace Rules for that stage.
e.g. by adding the line
- Output\FHIRServerTests.exe >

Thanks for the hints Dave.

I checked in the Task Manager and could see Delphi firing up, building the Test application. Then Delphi closed down, the Test app started, did its thing and then disappeared from the task manager.

I then did an export of the Continua config (in json format) and opened it up for a quick review. I found these details in the action that runs the Test app

"Comments": [
                  "Timeout set to 120 seconds because sometimes the unit tests run but do not terminate"
                ],
   "PlugIn": {
                  "Name": "Execute Program",
                  "Properties": {
                    "ExecutablePath": "$Workspace$\\Output\\FHIRServerTests.exe",
                    "WorkingDirectory": "$Workspace$\\Output",
                    "CheckExitCode": "false",
                    "TimeoutInSeconds": "120",
                    "SendSystemEnvironmentVariables": "false"
                  }
                }

I think I set that timeout back in the day, because we were having issues with the Test app not completing. However I think there might be a side effect, in that Continua is not releasing a lock on the exe file until the time out expires, so even though the next action was to delete the *.exe file (which reported success) the actual file was still on the disk.
So I then added a 125 second delay after the NUnit Import / Evaluate step.

This seems to have made the problem go away (have run it twice) - I won’t use the word fixed as injecting a 2 min delay into the build will drive us nuts.

My next step is to understand the reason the timeout was set in the first place.
Do you have any thoughts on this ?

Cheers

David

And reducing the timeout to 10 seconds, and the corresponding delay to 15 seconds, resulted in a failure at the end of the next stage during the Agent to Workspace sync.

An error occurred while syncing files from the agent 'pegbuildp2' to the server 'localhost'. Details: Exception: UnauthorizedAccessException
Message: (5) Access is denied: [E:\CI_AWS\Ws\5699\Output\FHIRServerTests.exe]
Stack Trace:    at Alphaleonis.Win32.NativeError.ThrowException(UInt32 errorCode, String readPath, String writePath)    at Alphaleonis.Win32.NativeError.ThrowException(Int32 errorCode, Nullable`1 isFolder, String readPath)    at Alphaleonis.Win32.Filesystem.NativeMethods.CloseHandleAndPossiblyThrowException(SafeHandle handle, Int32 lastError, Nulla

I’ll have a go at implementing some of your other suggestions.

Hi David,

Which version of Continua are you using? There’s no place I can see where a lock is held on a processes that Continua spawns, and the process runner code has not changed for some time. The TimeoutInSeconds parameter is used to kill a process that has not completed within the specified time interval, not to delay the process in any way.

We will do some tests, however, to see if we can reproduce the issue.

It may be worth using something like LockHunter or File Locksmith in Windows PowerToys to see what is holding on to the lock. Also, try running the FHIRServerTests.exe in a command prompt to see if it completes. Perhaps there is a loop in one of your unit tests?

Hi David

Do any of your DUnitX test methods have the

[MaxTime]

attribute applied? Looking through DUnitX at the moment and found there may be an issue with the way the timeout is implemented - it does appear to be leaving a thread handle open - whether that is enough to cause this issue (or any issue) is unknown - will test this tomorrow.

Hi Vincent

Do any of your DUnitX test methods have the MaxTime attribute applied ?

We’re using the older DUnit framework. But I did check the code base and could not see any attributes applied to the tests.

Server Version: 1.9.2.983
Agent Version: 1.9.2.983

We interactively run the tests as part of the development process that the developers run when making code changes, although this is the debug build that uses the interactive GUI runner.
When we then get Continua to build and run the tests, it uses the release config which builds for the console mode that outputs the XML file. I have run this release version in a command prompt on the agent machine (although logged in using my domain credentials, not the service account that Continua uses) and it ran fine, taking a couple of seconds and closing cleanly.

I could also delete the exe file via the Windows explorer immediately after running it interactively.
When run via Continua, if I insert a delay action after running the tests, I can interactively delete the exe file. However, if I remove the delay action and instead go straight to the NUnit import action followed by a Delete action (targeting the exe) followed by a delay action (to give me time to manually delete the file) then I cannot delete the exe. I get prompted to allow admin confirmation and then cannot delete interactively (try again prompt etc).

Interestingly while typing the previous para and then looking at the agent machine again the delete action has succeeded (approx 2 mins later ) and the exe file has gone. This is consistent with my post yesterday with the 125 second timeout. So between executing and deleting the exe something (Continua, AV, file system ?) is locking that file.

I’ll keep digging.

Hi David,

I think the most likely culprit is AV, although 2 mins is quite a long time for AV to be locking a file. Which AV are you using, can you add exclusions?

We have done some testing and have not been able to reproduce any locking issue. Can you provide more details about your workflow? So far, we know that you have an Execute Program action running the DUnit test executable, followed by an NUnit Import action, then a Delete action, followed by a temporary Delay action. Is there anything else? It’s odd that you say that the Delete action succeeded 2 mins later, is it running in a loop?

I’ve also noticed from your export that your Execute Program action is set to ignore the exit code. It might help to know what the exit code was?

We’ve got CloudStrike AV installed on the build machine. Our infrastructure team have put an exclusion on the E:\CI_AWS and E:\Continua folders (server and agent on the same machine for the initial build).

Re the workflow, if we did not do the delete then it would fail on the beginning of the next stage when doing the server to agent synch. Probably because the agent for the second stage is the same as the first stage and the temporary FHIRServerTests.exe file is still there (and somehow locked) so the server cannot copy it.
So I put the Delete exe action into the first stage so that the file would not be copied back to the server but for some reason the exe file is locked so the delete would “succeed” but the file was still in the folder.

My next alteration will be to alter the workspace rules on the first stage to exclude that exe. Fingers crossed.

P.S. thanks for the top notch support too :slight_smile:

We understand why you’ve put the Delete action there, but we’ve not able to replicate any of the issues that you are seeing. This is why we have asked for some more details of your workflow, so we can attempt to reproduce.

Perhaps, you can also send us a build log, so we can have a detailed view of what’s happening?

Regarding the delete action, is the “Log each deleted item” option enabled, this may add some light to the situation.