Understanding repository sync mechanism

Hi,

I’m trying to understand the repository sync mechanism from Server to Agent.

How I thought it would work
The Server monitors external repositories and updates its local repositories in [Server storage]\Rc[Repository ID]. When a build starts and the Agent is determined, the relevant Continua (Mercurial) repositories are synced from the Server to the Agent (I guess using some hg pull mechanism). Each agent has a similar storage location on [Agent storage (CI_WS)]\Repos[Repository ID]. The Agent then can retrieve source files from its local cache during any build action/step. This is efficient because DVCS are built for this purpose and subsequent builds should start immediately without repository sync, or at least, only new changesets are sent from Server to Agent.

How it appears to work
Subsequent builds take some time to start, even when nothing or little has changes in a repository. We do have a large (Subversion) repository and if used in a build, each build’s “Server To Agent Workspace Sync” would take at least one minute. When enabling “Log repository files copied”, it appears that all source files for this repository’s branch are copied from Server to Agent. When restarting the build, the same files are copied, even though no files have changed (e.g. no new changesets have been added to the branch since the last build).

A few questions come to mind:

  • What’s the purpose of the local Agent repository cache [Agent storage (CI_WS)]\Repos[Repository ID] ? When looking at the Repository Sync log, I’d conclude it’s not being used, or otherwise it’s not being updated in an efficient way.
  • Wouldn’t it be more efficient to have the Agent sync it’s Mercurial repository with the server (e.g. pull all new changes from all (or relevant/required) branches) instead of copying all source files each time?

I’m probably overlooking something here (wouldn’t be surprised) but I’m interested in how it works and should work. We’re sometimes stuggling with the repository sync mechanism and during analysis of some issues I encountered this one.

Kind regards,
Remko Seelig

I’m using TFS, and there could be differences in our setups, but this is what I see:

- Server monitors and stores as you described.
- Agents monitor in the background, and update their cache from the server.
- When you do a build, the Agent will use whatever it has in its cache unless you force a repository check.
- When you do a build, the Agent copies files from its cache to its workspace repository. If you don’t restrict what gets copied, you may be copying more files than you need between these directories.

Hi,

Sorry for the delay in replying Remko and thank you for your input EB_Build. The repository sync does actually work as you first thought it would Remko and as described by EB_Build.  

Although Continua CI logs that it is copying files from the server to the agent, it is not actually copying all the files for each build. 

It maintains a repository cache in the form of a Mercurial repository on the server and each agent. When an agent is selected for a build stage, Continua CI compares the latest revision in the agent repository cache to the latest revision on the server. If they are different, it pulls changes from the server to the agent using the “hg pull” command line. It then exports the files for the required revision to the agent workspace using the “hg archive” command. The list of files exported from the Mercurial repository are written to the build log. 

The repository rules define the include and exclude patterns which are passed to the “hg archive” command. To increase performance, we recommend that you use the repository rules to limit the files which are exported to those required for each build stage. 

Thanks for the clarification. This makes sense. The next time I’m analyzing build times, I’ll take this in consideration.