Wednesday, July 17, 2019

SSIS Data Profiling Bug uncovered by SQL Server glitch

I was doing Data Profiling in my environment using SSIS and hit very unusual error:
SSIS package "Package 2.dtsx" starting.
Error: 0x2 at Data Profiling Task, Data Profiling Task: Error occurred when profiling the data. The error message is: System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values.
Parameter name: value
   at Microsoft.DataDebugger.DataProfiling.CandidateKeyProfile.set_KeyStrength(Single value)
   at Microsoft.DataDebugger.DataProfiling.KeyProfileGroupingSetTask.GenerateKeyProfile(TableQName parentTable, String countColumnName, Double keyStrength, Boolean isExactKey)
   at Microsoft.DataDebugger.DataProfiling.KeyProfileGroupingSetTask.PostProcessGroupBy(TableQName parentTable, String countColumnName, Double keyStrength, Boolean isExactKey)
   at Microsoft.DataDebugger.DataProfiling.KeyProfileGroupingSetTask.ComputeProfile(Boolean isGroupByQueryRun, TableQName parentTable, String countColumnName, Boolean isParentTempTable, Int64 parentRowCount)
   at Microsoft.DataDebugger.DataProfiling.GroupingSetWorkItem.Visit(GroupingPlanTreeNode node, TableQName parentTable, String countColumnName, Boolean isParentTempTable, Int64 parentRowCount)
   at Microsoft.DataDebugger.DataProfiling.GroupingSetWorkItem.ExecutePlan(List`1 plan)
   at Microsoft.DataDebugger.DataProfiling.GroupingSetWorkItem.DoWork()
   at Microsoft.DataDebugger.DataProfiling.TablePartitionedBatch.ComputeProfiles()
   at Microsoft.DataDebugger.DataProfiling.BuiltInProfiler.Profile().
Task failed: Data Profiling Task
SSIS package "Package 2.dtsx" finished: Success.

As you can see, there is no actual problem description or any failed values.
Here are the circumstances under which I could reproduce that error:
1. I have only "Data profiling task" in my SSIS package.
2. Chosen only "Column Value Distribution Profile" and "Candidate Key Profile" for problematic table.
3. Left all other parameters as Defaults and execute the package.

That produced the error.

I've tried following to reproduce it but it hasn't worked:
1. Single research on "Column Value Distribution Profile" or "Candidate Key Profile" - single items have not produced errors, only that pair together.
2. Copying that table another database has not produced the error.
3. When I've tried to specify single columns, not all columns returned the error, but only the one, which is unique.

I've found similar question on Microsoft forum and even posted my question and research there, but obviously, nobody could reproduce that error too. (https://social.msdn.microsoft.com/Forums/sqlserver/en-US/d0a8716b-54ae-4298-a03c-b9374ee1e0dc/systemargumentoutofrangeexception-error?forum=sqlintegrationservices)

So, then I've tried to look deeper at the data. I've noticed that when I run following queries I get different result:
EXEC sp_spaceused '[dbo].[ProgramLocking_JN]';
GO
SELECT COUNT(*)
FROM [dbo].[ProgramLocking_JN]
GO

As you can see, "sp_spaceused" procedure returns 336 rows, while in reality table has 337 rows.

DBCC CHECKDB returned no errors and following info on the table.
DBCC results for 'ProgramLocking_JN'.
There are 337 rows in 9 pages for object "ProgramLocking_JN".

Then I've tried to use DBCC UPDATEUSAGE as following:
DBCC UPDATEUSAGE(0,'ProgramLocking_JN') WITH COUNT_ROWS;
And got following result:
DBCC UPDATEUSAGE: Usage counts updated for table 'ProgramLocking_JN' (index 'ProgramLocking_JN', partition 1):
        ROWS count: changed from (336) to (337) rows.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

After that little fix the SSIS error has gone.

Lesson learned:
Before running SSIS Data Profiling it is nice to run "DBCC UPDATEUSAGE(0);" command against all databases you are doing your research.

No comments:

Post a Comment