Cleanup continua CI server workspace?

We’ve got over 600gigs of space used in Continua\WS folders, which I believe is the server workspace?

Our cleanup policy is pretty tight with a 2 week build age, 1 min build and 2 max builds. All options are checked. There are folders in continua\ws that are months old and as far as I can tell should have been cleaned up.

It’s entirely possible that we’re doing something wrong… I’ve asked developers to stop builds that are pending promotion when no longer needed… as I think that would keep stuff around.

I’ve reviewed projects/configs and the ‘recently completed builds’ list seems to reflect cleanup policy.

What situation(s) can keep server workspace from being cleaned up? Any handy queries I could do on the DB to help identify the problem?

Thanks…

Hi Brendan,

What version of ContinuaCI are you using? Earlier versions did have some timeout issues with build cleanup which have been fixed in later versions. Check if there are any errors on the Event Log page in the Administration section.

Note that pinned builds and builds which have not finished, including those pending promotion, are not cleaned up. Also note that the global cleanup policy may be overridden at the project and configuration level. Check that the relevant workspace items are ticked under What to cleanup.

If you are using PostgreSQL for your database you can run the SQL query in the attached file to get a report of the cleanup policies for each configuration. CleanedUpBuildsQuery.sql.txt (7.5 KB)

This will also show various build counts for each configuration. The last column Number Of Non Pinned Terminated Builds Over Maximum shows how many builds are left which should be cleaned up but haven’t. You would expect there to be a few builds depending on the time since the last scheduled cleanup.

Tools such as WizTree, TreeSize and WinDirStat that show the size of each folder are useful to identify which workspace folders are taking up the most space. The workspace folders are named by project_slug and build_id, so once you have identified a bloated folder you can go to http://(server_host_name)/(project_slug)/ci/builds/view/(build_id) to view the build details.

1.9.0.374. I’ve already hunted down the projects/configs that overrode cleanup and turned that off. Cleanup policy has everything checked.

I’ve already narrowed it down to a specific project. The query you provided sounds like just what I need, AFAIK we are not pinning anything however getting devs to stop un-needed builds that are pending promotion has been a challenge.

I’ll reply to this thread when I know more, thanks!

I think I found the problem. We have one project/config reporting 253 'Number of Builds Pending Promotion".

Just took a look and the history has 11 pages of builds in pending promotion. Used the ‘Stop All’ except for most recent 50 builds (I’ll clean those up manually). Kicked of a cleanup and pretty quickly recovered a bunch of free space.

Thanks again!

I noticed that the stage we stop at has promote options set to ‘Enable stage promotion timeout’ with 72 hours in the duration. Auto promote is not checked, and there are no conditions specified. I would think that should be terminating the build and not leave it in a waiting for promotion state?

Hi Brenden,

The stage promotion timeout is working in our tests. Note that the relevant stage promotion options are those on the stage before the one marked pending promotion, rather than those on the stage marked pending promotion which has not yet run.

Are there any errors related to promotion or cleanup on the Event Log page in the Administration section? Otherwise, can you generate and download a copy of the diagnostics report using the links at the top of the Event Log page. Send it to me via direct message or email to support at finalbuilder.com.

Sorry it took so long to get back to this. What we really need it a way to stop the build if it’s not promoted in x weeks/days/hours. We’re using up gigs of workspace and keep running out.

Our second to the last stage is set to NOT auto promote, and timeout 336 hours.(14 days).
Last state (release) is set to NOT auto promote, timeout set to 1 hour.

All builds sit waiting for promotion perpetually as far as I can tell. I’m not sure that the auto-promotion logic is what we need anyway.

In short we want builds to be available for release (release stage tags, handles distribution to local locations, et) for a period of time. Manually stopping builds (or using Stop All) works but requires someone remember to do it…

Also, when using ‘Stop All’ if I select all but the last 10 builds, is that 10 builds per build configuration, or build configuration and branch? We use a single build configuration and have dozens of branches building from that. If it keeps the last X distinct configuration + branch I think I that’ll work fine.

Thanks.

All build except the last 10 builds for the current configuration are stopped. The branch is not taken into account. This would require us to add controls to select a branch for each configuration repository. This is something that we will consider in the future and I have added this task to our to-do list.

If this is happening then we would like to find out why. We use the promotion timeout feature on our own build server and it is working fine. This is why we asked if there were any errors listed on the Event Log page, and requested a diagnostics report, so we can work out the cause of the failure in your environment.

emailed requested info.

Hi Brenden,

We haven’t received any email from you. Did you send it to support at finalbuilder.com?

Hi Brenden,

Thank you for sending the diagnostics report. As you noted, there are no errors listed which are related to a failure with the stage promotion timeout.

Is it possible that the builds which are stuck on pending promotion were run before the stage promotion timeout was set? The configuration (including the stage details) is versioned, so each build takes copy of the settings when it starts. When Continua CI checks whether a stage pending promotion should timeout, it looks at the copy of the timeout settings from the time the build was run, not the current settings.

We are now considering whether it makes sense to change this so that the timeout can be applied retrospectively.

I made the changes several few months back and waited before posting this.

Something else that’s ‘bit’ us in the past, we started using CI very early on and our system has been in continuous use since then. In the past we’ve had some DB cruft that had caused weird behaviour, maybe this is another case.

I’ll try to schedule updating to LAG, we’re a bit behind at this point.