Friday, December 20, 2019

Recovering from Severe Database corruption error 9100

In my previous post I outlined very dangerous SQL Server problem, caused by usually not very harmful commands "PARSE" and "TRY_PARSE": Having "NaN" value for REAL and FLOAT producing severe error.

This post will be about a major issue it causes and on how to fight it.

Will start from generating the problem.

Attention: Do not run that in production!!!
Here is a script to generate the issue:
DROP TABLE IF EXISTS tbl_Real_Test;
GO
CREATE TABLE tbl_Real_Test (ID UNIQUEIDENTIFIER, R REAL, F FLOAT);
GO
INSERT INTO tbl_Real_Test (ID, R, F)
SELECT NewID(), TRY_PARSE(r as REAL), TRY_PARSE(f as FLOAT) 
FROM (VALUES
       ('0','434534776544')
       ,('0.0000323', 'NaN')
       ,('3.3881317890172E+17','')
       ,('','000000000000x000000000000')
       ,('NULL','ABCDEFG')
       ,('Null','--------------------')
       ,('null','!!!!!!!!!!!!!!!!!!!!')
       ,('ABC','345435467655665676545')
       ,('NaN','5.3881317890172E+47')
       ,('Nan','0.0000000000345')
       ,('nan','434.34543543')
) as R(r,f);
GO
SELECT * FROM tbl_Real_Test
WHERE R Is not Null ;

GO

That script will generate following error:
Msg 9100, Level 23, State 2, Line 57
Possible index corruption detected. Run DBCC CHECKDB.

Troubleshooting:

You might try to run "DBCC CHECKDB" against that database like this:
DBCC CHECKDB('Slava_Temp_Test');

DBCC CHECKDB resulted couple of errors and besides of that it made an error Dump into an ERRORLOG file and created four other DUMP files:
Msg 2570, Level 16, State 3, Line 62
Page (1:16200), slot 1 in object ID 2002106173, index ID 0, partition ID 72057594042843136, alloc unit ID 72057594048479232 (type "In-row data"). Column "F" value is out of range for data type "float".  Update column to a legal value.
Msg 2570, Level 16, State 3, Line 62
Page (1:16200), slot 8 in object ID 2002106173, index ID 0, partition ID 72057594042843136, alloc unit ID 72057594048479232 (type "In-row data"). Column "R" value is out of range for data type "real".  Update column to a legal value.
DBCC results for 'tbl_Real_Test'.
There are 9 rows in 1 pages for object "tbl_Real_Test".

CHECKDB found 0 allocation errors and 2 consistency errors in table 'tbl_Real_Test' (object ID 2002106173).

Fixing attempt:

To fix the issue you might try following commands
DBCC CHECKDB ('Slava_Temp_Test' , REPAIR_REBUILD) WITH ALL_ERRORMSGS;
GO
DBCC CHECKDB ('Slava_Temp_Test' , REPAIR_ALLOW_DATA_LOSS) WITH ALL_ERRORMSGS;

GO

However, in order to perform the fix DBCC CHECKDB requires the database to be in a single user mode:
Msg 7919, Level 16, State 3, Line 141
Repair statement not processed. Database needs to be in single user mode.

The problem is that it might be not possible to switch database into a single user mode, because it is part of an Availability Group and experiencing very heavy user activity. The use of conventional SQL Server tool would cause disruption in Production for who knows for how much time.

Solution:

To solve that problem we do not have to switch database in single mode or restore it from a previous backup to eliminate the issue. You just run an update script over your problematic table and columns to replace bogus values, which SQL Server can't interpret correctly, by NULLs:
UPDATE tbl_Real_Test
SET R = CASE CAST(R as BINARY(4))
       WHEN CAST( PARSE('NaN' as REAL) as BINARY(4))
       THEN Null Else R END
       , F = CASE CAST(F as BINARY(8))
       WHEN CAST( PARSE('NaN' as FLOAT) as BINARY(8))
       THEN Null Else F END;
GO
SELECT * FROM tbl_Real_Test
WHERE R Is not Null OR F Is not Null;

GO

The result of this query should be like this:

Disclaimer and Advertisement:

There are probably a lot of other cases, which might cause "Database corruption error 9100" and if that particular solution does not apply to your environment you can contact me for the resolution.


No comments:

Post a Comment