Continuous Integration Servers are often underspecified when it comes to hardware. In the early days of Automated Builds, the build server was quite often that old pc in the corner of the office, or an old server in the data center that no one else wanted.
Developers weren't doing many builds per day, so it worked, it was probably slow but that didn't seem to matter much.
Fast forward 20 years, and the Continuous Integration Server is now a critical service. The volume and frequency of builds has increased dramatically and a slow CI server can be a real problem in an environment where we want fast feedback on that code
we just committed (even though it "worked on my machine"). Continuous Deployment only adds to the workload of the CI server.
In this post, I'm going to cover off some ideas to hopefully improve the performance of your CI server. I'm not going to cover compilation, unit tests etc. (which can be where a lot of the time is spent). Instead, I'll focus on the environment, machine
configuration and some settings on your Continua CI configurations.
It's impossible to provide hard and fast specs for hardware or virtual machines, as it varies greatly depending on the expected load.
There are a bunch of things you can tweak that may improve performance. I will touch on some key points for virtual hosts, but I'm not going to go too deep into tuning virtual hosts, that's not my area of expertise. Of course, dedicated physical machines
would be the ideal, but these days, even if you do get dedicated hardware for CI/CD, it's most likely going to be as a virtual host (hyper-v or vmware) rather than an OS installed on bare metal (do companies still provision a single os on bare metal
servers these days?). Virtualisation brings in a whole bunch of benefits, but it also brings with it some limitations that cannot be ignored.
Continuous Integration environments are mostly I/O bound and Continua CI is no different in that regard. So let's look at the various resources used by CI/CD.
It's unlikely that CPU will be a limiting factor in the performance of your CI server, unless you are running other CPU intensive tasks on your server. If that's the case, then move your CI server to dedicated hardware, or at least a dedicated virtual
At a minimum you should have at least 2 cores on the server. On our production server, which is a virtual machine (on Hyper-V 2012R2) with 4 virtual cores and dynamic RAM, the Windows resource monitor shows that average CPU usage usually sits around 2%
when idle (no running builds, measured on the guest OS using resource monitor on the Continua Server Service). With 10 concurrent builds running, the Continua CI server service was using around 6% cpu.
Adding another 4 cores made very little difference. The Hyper-V host machine, which is also running a bunch of agent VM's, has plenty of CPU capacity, with the average CPU usage round 5-7%. Cutting down the number of cores to 2 did make a slight difference,
with the VM showing slightly higher CPU usage, however no discernible difference in build times.
This is obviously not very scientific, but it did demonstrate (well to me at least) that CPU is not the limiting factor. I set the server VM back to 4 cores and left it at that. Our Hyper-V host machines are a few years old now, and have 7200 rpm SAS
hard drives (in Raid 10) rather than SSD's (they were still too expensive when we bought the machines).
On a Continua CI Agent, we recommend at least 2 cpu cores, and limit the concurrent builds running on the agent to 1 per core. This isn't a hard and fast rule, just a convention we adhere to here (based on some performance testing). You may want to add
extra cores depending on what compilers or tools you are running during your build process. The only way to know if this is needed is to monitor cpu on the agent machine while a build is running.
The most used resources are disk read/write and network read/write. Poor I/O performance will really slow down your builds.
It goes without saying, but use the fastest disks you have available to you. If you can afford it, new generation nvme/pcie SSD's are the way to go. They are still quite expensive for larger capacities though. At the very least, use a separate disk for
the operating system and software installation, and another disk for your Continua CI Server's share folder (or the agents workspace folder on agent machines). This is where most of the I/O happens during builds. This recommendation applies whether
running on dedicated hardware or in a virtual machine.
If you are running the server and agent machines on the same virtual host (as we do for our production environment) then this is very important to get right. Poor I/O performance in virtualised environments is not uncommon - having agents and the server
fighting for a slice of the same I/O pie is not a good idea.
On the agent machines, good disk performance is critical. When a build is started on the agent, the first thing it does is create a workspace folder. It then exports the source code from the repository cache(s) (Mercurial repo which was cloned
from the server) to that folder, using the repository rules (more on this later). This workspace initialisation phase can be very slow if you have poor I/O performance.
Continua CI uses networking to transfer files, repository changes etc between the server and the agents. Poor network performance will impact on build initialisation times (updating the agents repo cache, build workspace) and on build completion times
(transferring workspace changes back to the server). Logging between the agent and the server will also be impacted by poor network performance.
By default, Continua CI uses SMB to transfer files, source code (repository caches) between the server and the agents. When the server's share folder is not accessible by SMB, Continua CI will try to use SSH/SFTP (Continua CI installs it own specialised
SSH service). In high latency networks (for example if the agent is remote from the server), SSH/SFTP may perform better than SMB.
You can force an agent to use SSH/SFTP by setting the agents ServerFileTransport.ForceSSH property to true.
Continua CI supports PostgreSQL (the default) or Microsoft SQL Server. If you chose to use MSSQL, we recommend running it on a separate well specified machine. MSSQL is quite heavy in it's use of RAM and disk I/O - it's best run on a machine that has
been tuned to run it properly. I'm not going to go into that here, that's a whole other topic on an area that I'm definitely not an expert.
The PostgreSQL database server that is installed by default (unless you select otherwise) with Continua CI is much more more frugal when it comes to resources. On our main Continua CI server, PostgreSQL typically using around 60MB of ram. Contrast that
with SQL Server running on my dev machine, not used or touched for weeks and it's using 800MB! PostgreSQL can also be tuned, we have tried to provision it with sensible defaults that strike a balance between performance and resource usage. If you
need to tune PostgreSQL, then we recommend installing your own PostgreSQL instance and pointing Continua CI at it.
Currently the Continua CI installer doesn't provide any options for the database install location (C:\ProgramData\VSoft\ContinuaCI\PostgreSQLDB ), this is something we are looking at for a future release, that will make it possible to put the database
on it's own drive. For now, it's possible to move the database to another location by using a symlink, we have a few customers who have done this successfully. Contact support if you need help with this.
Virtual CPU Cores
In a virtual environment, it's very important not to overload your virtual host. Note that there is a difference between overloading and over allocating virtual cores. It's a common practice to allocate more virtual cores across the virtual machines than
there are physical/logical cores (logical when HyperThreading is enabled), but this has to be done with the knowledge and understanding of the load on the host machine. Overloading happens when so many cores are allocated and in use that the hypervisor
is unable to schedule a core to a virtual machine when needed. This results in pauses and poor performance.
In a clustered environment this is even more important, because when a cluster node dies, or is removed for upgrades etc, virtual machines will move to another node in the cluster - if that node is already overloaded then you will soon start hearing the
complaints from users!
The best explanation I have found on how hypervisors allocate cores is this article - https://www.altaro.com/hyper-v/hyper-v-virtual-cpus-explained/ - it's Hyper-V specific (we
use Hyper-V here) but much of the information also applies to VMWare.
When creating separate virtual disk volumes for your virtual machines, try to put those virtual drives on different physical drives, so they are not competing for the same I/O. Use fixed size virtual disks.
Continua CI Configuration Tuning
Continua CI is not immune to performance problems, we're always working to make it faster and consume less resources. There are however a few things that can be tuned in Continua CI to improve performance.
Repository Branch Settings
Use specific branch patterns to narrow down the number of repository files and folders which are monitored and downloaded. With repositories which use folder-based branches, such as Subversion and TFS, consider moving old branches to a separate archive
folder in your repository which will not match the branch patterns. Note that you can use more than one Continua CI repository per actual repository. Some users will have multiple projects in one repository, but only need to build a single one for
each configuration. Make use of relative paths, where supported by your repository type, to limit your repository to a single project folder. This can significantly speed up repository initialisation and changeset updating.
Continua CI polls repositories periodically to detect new commits. Each time this occurs, Continua CI invokes the command line client for that repo, and parses the output of that process. Some clients use a surprising amount of CPU. The git client, for
example, uses around 8% CPU per instance on our production server while checking for commits. Most of the time, these processes only run for a very short amount of time (when no changes are detected), however if you have a lot of repositories, these
small cpu spikes can add up.
There are a couple of options to keep this under control.
1) Set the appropriate polling interval for your repositories. If changes to a repository occur infrequently, then there's no point polling frequently.
2) Set the Server property Server.RepoMonitor.MaxCheckers property. This controls how many version control client processes are spawned concurrently, the default (5) is quite conservative so you should only need to lower this on a very low spec system.
If you have plenty of spare CPU capacity, then you can increase this value, however if you do then monitor CPU usage to make sure you don't overload the server.
3) Manual polling, using post commit hooks. This reduces CPU usage on the server, by only polling for repository changes when requested and has the added benefit of reducing the load on your version control server. This does take some setting up, and
depends very much on the capabilities of your version control system. I'll take a look at post commit hooks in a future blog post.
Repository Path Filtering
Repository Path Filtering is an option on all repository types, with the exception of Mercurial (*I'll explain why shortly). What this filtering does is allow you to limit which files get added to the server's repository cache. This filtering has a few
benefits, less disk space used on the server and the agents, less network I/O when transferring the changes from the server to the agent, and less I/O when checking out the source into the build workspace.
A typical use case for these rules is when you have files in your repository that rarely change and are not needed for the build process (design docs, deployment notes etc). No point adding them to the repo cache if you don't use them.
Changes to these rules won't affect files that are already in the repository cache, but it will avoid committing changes to those filtered out files to the repo cache. The best bang for buck with these filters will come if the repository is reset (the
cache is rebuilt, so filtered out files are never committed to the cache), however that can be an expensive operation, so unlike other repository settings, changing these rules will not force a reset.
* These filters don't apply to Mercurial repositories, as we use Mercurial for our repository cache. When you point Continua CI at a Mercurial repository, it just clones it to the server (repo cache), and then clones it to the agents (repo cache) without
Each Stage has a settings tab called Repository Rules. These rules apply when checking out the source from the agent's repository cache(s) to the build workspace. Only check out the source you need, this will improve performance. If a stage doesn't need
the source at all (for example, it's only working with artifacts from previous stages), then just blank out the Repository Rules field.
Don't leave logging of the repository rules turned on unless you are debugging the rules. Logging the files exported to the workspace can be a real performance killer.
Similar to Repository Rules, these rules control which files are transferred between the server and agent's build workspace folders, and back again. Only transfer files back to the server's workspace that you actually need, like build artifacts, reports
Don't leave logging of the workspace rules turned on unless you are debugging the rules. Logging the files transferred can be a real performance killer.
Avoid logging too much information. For example, verbose logging on MSBuild should be avoided unless debugging build issues. Output logged from actions is queued and sent back to the server to be written to the build log, this causes high network and
Disk space is quite often at a premium (especially with SSD's), and it's important to keep on top of it. This is where the Clean up Policies come into play. Continua CI allows you to specify a global clean up policy for both the server and the agents,
however it can be overridden at the Project or Configuration level. The clean up policy controls how long to keep old builds and their associated workspaces around. The clean up policy is highly configurable - use it to keep control over disk space.
Bear in mind that the work of cleaning up old builds is quite I/O and database intensive, so be sure to schedule it to run during a quite period
Anti-virus software can be a major performance killer, and in instances, an application killer. If I had a dollar for every time anti-virus software turned out to be the cause of a problem with Continua CI or FinalBuilder, well that would be some serious
beer money at least!
If you have anti-virus software installed on your server or agents, be sure to add exclusions from real-time scanning for the server's share folder, and the agent's workspace folder. Add scheduled scans on those folders instead. Also, when using the bundled
PostgreSQL database, add an exclusion for C:\ProgramData\VSoft\ContinuaCI\PostgreSQLDB - otherwise you may experience database corruption.
You should also consider adding an exclusions for the hg.exe in the "C:\Program Files\VSoft Technologies\ContinuaCI Agent\hg" folder. We found in testing that this will speed up the processing of the repostiory rules substantially (testing with windows
Version Control Clients
Avoid installing tools like TortoiseSVN or ToirtoiseHG on your server or agent machines as these programs do background indexing (for icon overlays) and can also cause file/folder access issues.
I intend to revise this post as I learn more about performance tuning, especially in a virtual environment. If you have any techniques or tweaks that helped speed up your CI Server please feel free to share them with us (and fellow users).