Crash starting build after service restart

fixed

(Simon Kennedy) #1

We had to take down our internal servers last Friday due to blackouts (systems were on UPS) but unfortunately since bringing them back up starting builds on Continua crashes the Continua CI service (v 1.8.1.899).

I see the following messages in even log:

Application: Continua.Server.Service.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: exception code c0000005, exception address 000007FF9F425522
Stack:
at System.Data.SqlServerCe.UnmanagedLibraryHelper+SafeLibrary_NativeMethods.LoadLibrary(System.String)
at System.Data.SqlServerCe.UnmanagedLibraryHelper+SafeLibrary_NativeMethods.LoadLibrary(System.String)
at System.Data.SqlServerCe.NativeMethodsHelper…ctor(System.String)
at System.Data.SqlServerCe.NativeMethods.LoadValidLibrary(System.String)
at System.Data.SqlServerCe.NativeMethods.LoadNativeBinariesFromPrivateFolder(System.String)
at System.Data.SqlServerCe.NativeMethods.LoadNativeBinaries()
at System.Data.SqlServerCe.SqlCeEngine…ctor(System.String)
at Continua.Modules.Builds.Logging.SqlCeLogSession.EnsureDatabase(System.String, System.String, Boolean)
at Continua.Modules.Builds.Logging.SqlCeLogSession.Open(Continua.Modules.Builds.Logging.LogAccessType)
at System.RuntimeMethodHandle.InvokeMethod(System.Object, System.Object[], System.Signature, Boolean)
at System.Reflection.RuntimeConstructorInfo.Invoke(System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)
at System.RuntimeType.CreateInstanceImpl(System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo, System.Object[], System.Threading.StackCrawlMark ByRef)
at System.Activator.CreateInstance(System.Type, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo, System.Object[])
at System.Activator.CreateInstance(System.Type, System.Object[])
at Continua.Modules.Builds.Logging.LogSessionManager.GetAvailableSession[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.String, Boolean)
at Continua.Modules.Builds.Logging.LogSessionManager.OpenSessionForWriting(System.String)
at Continua.Modules.Builds.Logging.BuildLogWriter.EndUnfinishedStages()
at Continua.Modules.Builds.Logging.BuildLogWriter.OnStageBegin(Continua.Modules.Builds.Stage)
at Continua.Modules.Builds.BuildRunner.OnExecutingStage(Continua.StateMachine.Transition1<Continua.Modules.Builds.BuildRunnerState>) at Continua.StateMachine.StateMachine1[[Continua.Modules.Builds.BuildRunnerState, Continua.Modules.Builds, Version=1.8.1.899, Culture=neutral, PublicKeyToken=null]].Execute(Continua.StateMachine.Transition`1<Continua.Modules.Builds.BuildRunnerState>)
at Continua.Modules.Builds.BuildRunner.StartStage(System.Guid)
at Continua.Modules.Builds.BuildController.OnTaskExecute(System.Object)
at System.Threading.Tasks.Task.Execute()
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)
at System.Threading.Tasks.Task.ExecuteEntry(Boolean)
at Continua.Shared.Utils.Threading.LimitedConcurrencyLevelTaskScheduler.b__6_0(System.Object)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()

I also see a bunch of PostgresSQL errors:

The description for Event ID 0 from source PostgreSQL cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

_The following information was included with the event: _

LOG: unexpected EOF on client connection with an open transaction

And:

The description for Event ID 0 from source PostgreSQL cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

_The following information was included with the event: _

LOG: could not receive data from client: An existing connection was forcibly closed by the remote host.

The configuration state seems OK (we can view configurations no problem) its just that when we kick off a build it almost immediately dies.

What steps should I take in this scenario?


(Vincent Parrett) #2

Hi Simon

I would start of by checking the server integrity, ie using sfc /scannow etc. Also make sure there is sufficient free disk space on the server.

If that all checks out, then run the Continua CI installer again (use the same version as is installed) to make sure the files are all there. I say this because the error is when trying to load the sqlce dll’s we use for logging. Check that any anti-virus software isn’t removing any files from the install.

Also, 1.8.899 is 7 months old, there have been a number of updates since then, 1.9 was a significant improvement (new notifications, improved performance) - bear in mind it’s only x64 and requires .net 4.7.2


(Simon Kennedy) #3

Thanks Vincent,

The scan reported nothing so I’m re-installing 1.8.1.899 :crossed_fingers:

Do I need to pre-install .net 4.7.2 manually before upgrading?


(Vincent Parrett) #4

Yes, you will need to install 4.7.2 manually, the installer will stop (disable the next button) if it’s not installed.


(Simon Kennedy) #5

I’ve re-installed 1.8.1.899 64bit which was OK. I managed to start a build which completed but then crashed the service with a different error:

System.OutOfMemoryException: Exception of type ‘System.OutOfMemoryException’ was thrown.
at System.Text.StringBuilder.ToString()
at Continua.Shared.Utils.ProcessRunner.HandleProcessOutput(Object data, IEnumerable`1 ioHandlers, Process process, StreamWriter standardInput, Object stdinLock, ILogger log, CancellationTokenSource cancellationTokenSource, Timer cancellationTimer, Boolean isErrorStream, Boolean requireOutput)
at Continua.Shared.Utils.ProcessRunner.<>c__DisplayClass10_3.b__1(Object data)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart(Object obj)

I re-started the service and started another build for a different configuration but the ‘Continua CI’ service is still crashing and seeing similar PostgresSQL errors (but other .NET errors seem to be resolved).

As it only happens on builds should I try resetting the repositories? Would this help?

Otherwise should I try updating to latest version given its current state?


(Vincent Parrett) #6

Looking at the stack trace, it would seem that something is generating a huge amount of output. How much memory does your server have? Is it shared with other applications?

I can’t say for sure that updating to the latest will fix that particular issue, I’m not see any commits around that area that deal with memory issues. That said, I it’s still worth doing.


(Simon Kennedy) #7

Its odd considering that its the same version on the same hardware and builds for these configurations have built before.

The ‘Continua.Server.Service.exe’ is consuming 1.5GB on startup but I’m not sure if that’s typical or not.


(Simon Kennedy) #8

I’m going to wait until its finished checking for repository updates; it seems to be taking a while. Maybe that’s related to the memory usage?

Update: The ‘checking for changes’ lasted around ~30mins then crashed with the same ‘OutOfMemoryException’. This has previously worked fine before friday.


(Vincent Parrett) #9

No, that’s not at all typical. Our production server and and our main test server consume around 300MB (running the latest version), they both have a lot of different repos and projects.

If this still occurs after upgrading then I guess we need to see a debug log and a diagnostics report (which you can get from the admin event log in Continua CI).

Enable Debug logging and then restart the service, wait till the cpu usage settles down… if memory usage is stable then stop the service and send us the log (zipped). If the log is too large to email (I thin 20MB is the max) then put it somewhere and send us the link.

If that doesn’t tell us anything, then well need to talk you through creating a dump that we analyse here with windbg.


(Vincent Parrett) #10

Yes it might be related, what repository types are they?


(Vincent Parrett) #11

Was any other software updated (version control clients?) on the server?


(Simon Kennedy) #12

These are all subversion repositories. We have two:

  • trunk
  • branches - we filter out specific sub-directories as this has been resource intensive in past to include everything

NOTE I have not upgraded to latest yet. Perhaps I should just go ahead?


(Simon Kennedy) #13

We have not updated any software on the machine recently.

The machine does have Slik Subversion 1.8.10 (x64) installed but has since 2014.


(Vincent Parrett) #14

Ok, just been talking to Dave (has the day off) and suspect it may be a corrupt database.

I’m just looking for the instructions for reindexing etc, in the mean time I would recommend stopping the continua server service.


(Vincent Parrett) #15

Ok, instructions for working on postgresql.

First, stop the postgresql service and take a file level backup of the database, which typically lives in

C:\ProgramData\VSoft\ContinuaCI\PostgreSQLDB

Then restart the postgresql service.

You can access the database using

%ProgramFiles%\VSoft Technologies\ContinuaCI\Server\PostgreSQL\bin\pgAdmin3.exe. 

You can find the connection details in

%ProgramFiles%\VSoft Technologies\ContinuaCI\Server\Continua.Server.Service.exe.config under configuration -&gt; hibernate-configuration -&gt; session-factory -&gt; property name="connection.connection_string".

In pgAdmin you will need to connect to localhost port 9001

Once connected, navigate to

Servers -> localhost -> Databases -> Continua CI

Right click on Continua CI, then click on Maintenance.

Select Vacuum, and check Analyze (and verbose messages) and click ok. This will take a while (took 5 minutes on our main server), depending on how big your db is. Once that’s done, scan through the output looking for errors. If no errors are reported, you are probably ok. If there are any errors about corrupted indexes, then the next step is to perform a REINDEX - this will rebuild all indexes.

If no db issues are found then I would suggest running the upgrade.


(Vincent Parrett) #16

I’ve been looking through the version history for subversion related changes, and there have been a few so definitely worth upgrading.


(Simon Kennedy) #17

Unfortunately even after upgrading to 1.9.0.300 the same behavior occurs and still crashes with the ‘OutOfMemoryException’; this is after around ~30 mins.

I tried resetting the repository but same behavior.

Should I try cloning the repository? It sounds like it should never use this much memory, is that right?

One additional thing I noticed in the logs is an error that occurred half way through:

An error occurred while listing working folder ‘C:\Users\Public\ContinuaShare\Rc\91ad9960’ and relative path ‘/newlook/branches/’. Message: Error running SVN : Running C:\Program Files\SlikSvn\bin\svn.exe from svn process with arguments “list http://looksubversion/svn/looksoftware/newlook/branches/ --non-interactive --no-auth-cache --username LOOKNET\Finalbuilder ********************* -R” failed with return code -1 and error output: “”. Standard output="1.0-server/
1.0-server/Documentation/
1.0-server/Documentation/BK2005002 - Server configuration.doc
1.0-server/Documentation/MK08072004 - Server Dependency Hierarchy.doc
1.0-server/Documentation/SK09072004 - High Level Architecture.sxd
1.0-server/Documentation/Specification Template.doc
1.0-server/Documentation/Specifications.doc
1.0-server/Documentation/looksoftware development manual.doc
etc

I was surprised this was using ‘Silk svn’ as I assumed that this would be an internal thing. I guess I should update that install?


(Vincent Parrett) #18

Continua CI doesn’t ship with version control clients (other than mercurial, which is used internally) so it uses the property collectors to find the clients installed on the machine.

It’s odd that it’s failing after 30 minutes, the default timeout is set to 60 minutes. It’s also odd that it is taking that long. Is this a particularly large repo? When the repo is reset, the only way in subversion to detect branches is to literally list the contents of the repo (there’s no “list branches” client command).

It would certainly be worth running the command from the command line and seeing what happened.

Also, did you check disk space on the drive where the continua share lives?


(Simon Kennedy) #19

Its failing after 30mins because its running out of memory. The server itself only has 2GB allocated but were not really expecting it too ever require that much.

When I execute the same command on the command line it lists the branches as expected; it does NOT list all the sub-directories and files as it does so above.

Oops, I left out ‘-R’. When you add that it does recurse all sub-directories.

The server has ~30GB free.


(Simon Kennedy) #20

Its seems odd that it would have to recurse the entire tree as the repository has a filter on (try) avoiding this. Is this a subversion limitation?