Triggering two builds of the same project at the same time caused Repository clone error

fixed

(Aszuul) #1

I haven’t seen this error before now, but today I had two changesets come in within seconds of each other that triggered builds of the same configuration. The log for the failure is below, and you can see the first timestamp is 3:26:34 and is for changeset 51313. the other build was successful for changeset 51312, which should have come in earlier but has a timestamp of 3:26:47. I assume that they were both caught by the same polling interval, but I’m not sure how the error occurred.

I’ve got this particular configuration set up with no quiet period and to use the latest single changeset.
we are on version 1.9.0.374.

Because the error occurred with the mercurial clone command I’m wondering if upgrading to the latest release will fix it, but I wanted to see if anyone else had seen this occur previously.

Thank you for any information you can provide.


3:26:34 PMStage: Build
3:26:34 PM	Server To Agent Workspace Sync
3:26:34 PM		Started syncing files from the server 'localhost' to the agent 'phus-tfsagt01'
3:26:34 PM		Repository Syncing
3:26:34 PM			Using UNC transport.
3:26:34 PM			Repository 'Implementation-Databases'. Branch 'Archive'. Revision '51313'. Comment 'merge from DEV Branch'
3:26:40 PM				[Warning] No repositories synced.
3:26:40 PM			[Fatal] An error occurred while syncing files from the server to the agent. Details: Exception: ProcessException

Message: Running C:\Program Files\VSoft Technologies\ContinuaCI Agent\hg\hg.exe 
with arguments "clone --noupdate \\PHUS-TFSAGT01\ContinuaCISHare\Rc\b36e5ccf 
D:\CI_AWS\Repos\b36e5ccf --config ui.username=Continua --config 
"web.cacerts=C:\Program Files\VSoft Technologies\ContinuaCI Agent\hg\hgrc.d\cacert.pem" 
--noninteractive --encoding cp1252" on agent failed with return code 255 and error output: 
"abort: repository \\PHUS-TFSAGT01\ContinuaCISHare\Rc\b36e5ccf not found!"

Stack Trace:    at Continua.Shared.Utils.Mercurial.Run(ProcessArguments args, String workingFolder, Func`2 checkResult, Boolean runRecoverIfRequired, Boolean allowTermination, Nullable`1 timeoutInSecs)    
at Continua.Shared.Utils.Mercurial.Clone(String remoteUrl, String localUrl)    
at Continua.Modules.Builds.Agent.FileSync.AgentRepositoryCache.<>c__DisplayClass18_0.<SyncFrom>b__1()    at Continua.Shared.Utils.ReadWriteLockList`1.WithWriteLock(TId id, CancellationTokenSource cancelTokenSource, Action action)    
at Continua.Modules.Builds.Agent.FileSync.AgentRepositoryCache.SyncFrom(String cacheRevision)    at Continua.Modules.Builds.Agent.AgentRepositoryHelper.SyncCache(String cacheRevision)    
at Continua.Modules.Builds.Agent.AgentBuildHelper.SyncRepoCache(BuildRepositoryDTO buildRepositoryDTO, AgentRepositoryHelper repositoryHelper)    
at Continua.Modules.Builds.Agent.AgentBuildHelper.SyncSourceFromServer(IEnumerable`1 rules, AgentWorkspaceSyncContext workspaceCtx)    
at Continua.Modules.Builds.Agent.AgentBuildHelper.InitialiseWorkspaceOnAgent(IAgentCallbackProxy proxy, TransportContextDTO source, Guid callId)

.


(Dave Sparks) #2

Hi,

We haven’t seen this issue before and have not been able to reproduce it in initial tests.

The hg clone error message implies that the path \\PHUS-TFSAGT01\ContinuaCISHare\Rc\b36e5ccf has been deleted, the .hg folder within deleted, or the agent does not have access to it. As this is a clone rather than a pull then this is the first time the repository has been used on this agent since it was created or last reset. Is this a new agent or a new repository?

One reason that the folder may have been deleted on the server is due to a repository reset. Is it possible that changes were made to the repository settings while the build was running? Editing some properties such as the repository URL can cause the repository to be reset.

Which repository type are you using? Can you send us full details of the repository settings (either as with private urls etc. redacted) so that we can set up a similar scenario to test with?


(Aszuul) #3

Certainly.

no changes were made to the repository yesterday. This server is currently acting as the continua client and an agent. so the repository and both temp repositories would have been on the same machine. The repository has existed for a month or so, and has been used without issue until this error.

here are the repository settings:

image

TFS Version: ‘TFS 2017 and later’

image

Thanks!


(Vincent Parrett) #4

Did you check that the PHUS-TFSAGT01\ContinuaCISHare share still exists on the server, and that the user the agent service runs under has permission to access it. Is it possible you network admins changed a group permission that could affect this?


(Dave Sparks) #5

Hi,

We have done some testing with the TFS repository and cannot find any way that a changeset would cause the Mercurial repository folder to be clear or deleted.

Is this error reproducible? If so, can you enable debug logging on both the server and agent and restart both services. Once the error occurs send a copy on the log files to me as a direct message or via email to support at finalbuilder.com.


(Aszuul) #6

Sorry for the delay, I am able to repeat the error. It occurs every time I trigger two separate builds within the same repository polling window. I’ve sent the logs, and look forward to your response.


(Dave Sparks) #7

Hi,

Thank you for sending the log files. This has helped us to find the cause of your issue. We have been working on a fix today and plan to release a new version on Wednesday.


(Aszuul) #8

Fantastic! thank you!