MS SQL Server, RCSI and Continua database

bruce_at_ds · January 9, 2022, 10:04pm

Hi,

We are using Continua with the database running on MS SQL Server 2019 (hosted in an availability group running on AWS).

Last week we upgraded to Continua 1.9.2.771 (from 729).

Over the weekend I received the following communication from our database administrator.

The new Continua version seems to have what is called a “rotating workload” which is a transaction that never closes by design and is constantly updating rows. This caused tempdb to fill up and throw alerts because Continua uses RCSI. I have turned off for Continua to mitigate future problems with tempdb so see how it goes with blocking/timeout errors on builds. The session had to be killed to recover the tempdb space prior to changing the RCSI state so you may have an error or process to restart.

Can you please confirm if our database admin’s analysis is correct, and also if there is any issues with turning off RCSI for the Continua database?

Thanks,
Bruce Palmer

Vincent · January 9, 2022, 10:25pm

Hi Bruce

Continua CI uses transactions, even for selects. This is a best practice when using NHibernate (the ORM that we use). Every request, or background task creates a session, which opens a transaction, does it’s selects etc and then the transactions are either committed or rolled back before the session is disposed. I will have to check with the team lead (who is away today) on this but there should not be any long running transactions - if there are then that is something we need to look at (likely unintentional).

Continua CI does use READ_COMMITTED_SNAPSHOT isolation, this was introduced in 1.9.2.554 (so well before .729) - the release note for this says

We have not had any other reports of tempdb issues.

The application is coded with the RSCI in mind - so turning it off may or may not cause problems.

bruce_at_ds · January 9, 2022, 10:36pm

Hi Vincent,

Thanks for your reply.

The following screen shot from our database admin may help you with your investigation.

The query text is:

(@p0 uniqueidentifier)select sharedreso0_.Id as id1_19_, sharedreso0_.Label as label2_19_, sharedreso0_.Quota as quota3_19_, sharedreso0_.ForAgent as foragent4_19_, sharedreso0_.sharedresourceid as sharedresourceid5_19_ from core_sharedresourceitem sharedreso0_ where sharedreso0_.Id in (@p0)

Vincent · January 9, 2022, 11:05pm

Hi Bruce

Thanks, that is useful information, we’ll look into it.

Vincent · January 9, 2022, 11:22pm

Hi Bruce

I believe we have identified the cause of the issue, hopefully we should be able to come up with a fix for this in the next few days.

Sparky · January 11, 2022, 6:15am

Hi Bruce,

Version 1.9.2.781 ensures that the transaction does not remain open for the query you identified. Please let us know if you find any other database issues.

bruce_at_ds · January 11, 2022, 9:35pm

Hi Dave,

Thanks for the fast action on this.

I’ve installed the update this morning, and asked our database admin to re-enable RCSI on the Continua database.

Will let you know how it goes in a day or two.

bruce_at_ds · January 19, 2022, 12:10am

Hi Dave,

Our database admin is satisfied the issue is now resolved.

Thanks again for the fast turn around on the fix.

bruce_at_ds · June 7, 2022, 5:14am

Hi Dave,

Our database admin may have discovered another RCS issue:

I think their RCS bug is back. I’m going to need to kill the session stuck and remove RCS on that DB

You can send them this code:
(@p0 nvarchar(128),@p1 nvarchar(128))select serverprop0_.Id as id1_18_, serverprop0_.Namespace as namespace2_18_, serverprop0_.Name as name3_18_, serverprop0_.Value as value4_18_, serverprop0_.CollectedValue as collectedvalue5_18_, serverprop0_.ElementType as elementtype6_18_, serverprop0_.IsUserSupplied as isusersupplied7_18_, serverprop0_.ReadOnly as readonly8_18_, serverprop0_.Description as description9_18_, serverprop0_.CollectorId as collectorid10_18_ from core_serverproperty serverprop0_ where serverprop0_.Namespace=@p0 and serverprop0_.Name=@p1 ORDER BY CURRENT_TIMESTAMP OFFSET 0 ROWS FETCH FIRST 1 ROWS ONLY

this code is not completing or not committed

We are using Continua 1.9.2.902 (updated this morning) on MS SQL Server 2019.

Can you look into this please?

Thanks.

Sparky · June 7, 2022, 6:41am

Hi Bruce,

Thank you for sending the SQL query, we’ve found a possible cause but need to investigate further. Hopefully we’ll be able to provide a fix in the next day or two.

bruce_at_ds · June 7, 2022, 11:25pm

Hi Dave,

Thanks for looking at this so quickly.

Our database admin is concern that this problem has reoccurred. In his words:

Yeah, maybe tell them its twice now (I hope not the same code) and it can literally take down an entire server as tempdb saturates all space. Luckily we monitor a lot of resources and can see it ahead of time but many companies would be blind to it, and few companies probably have massive tempdbs like we do.

I’m glad we’ve been able to help you zero in on the cause. Looking forward to getting the fix.

Cheers.

Sparky · June 8, 2022, 4:38am

Hi Bruce,

We have uploaded a hotfix version 1.9.2.904. Despite the two build number increments, this version includes only one change over v1.9.2.902 to hopefully fix this issue.

We were not able to detect any log running transactions in v1.9.2.902 either using the Activity Monitor or with SQL queries against sys.dm_exec_query_stats, but while reviewing the code found that we had inadvertently referenced a database session in a separate thread. Perhaps your database admin could share the SQL query that he uses to detect this issue?

Note that you do not need to upgrade the agents

bruce_at_ds · June 8, 2022, 6:01am

Hi Dave,

Thanks for the hotfix. I will install it tomorrow morning.

I’ve also passed your question on to our database admin. I will let you know of any reply.

Cheers.

bruce_at_ds · June 8, 2022, 9:54pm

Hi Dave,

Our database admin indicated he used this SQL:

SELECT * FROM sys.dm_tran_active_snapshot_database_transactions

Cheers.

bruce_at_ds · June 8, 2022, 11:20pm

Hi Dave,

We’ve installed the hotfix and reinstated RCSI on the database. Monitoring to see if the issue reoccurs.

Cheers.

bruce_at_ds · June 9, 2022, 5:12am

Hi Dave,

Our database admin has given the all clear. The hotfix has fixed the the issue.

Thanks again for responding so quickly.

Cheers.

Sparky · June 9, 2022, 6:32am

It’s good to hear that it work. Thanks for sending the query to the sys.dm_tran_active_snapshot_database_transactions table - it does show up the stuck transaction id for us in v1.9.2.902.

MS SQL Server, RCSI and Continua database

Products

Support

Resources

Company