Tuesday, July 23, 2019

Capturing FileGrowth Event using SQL Agent Job

Sometimes unpredictable file growth become real pain of a DBA and you want to answers for questions "What happened?" and "When it happened?"
You might capture it using Extended Events, but I want to talk about old-fashion way of capturing these kind of events: via SQL Agent Alerts and Jobs
As always, we do it in steps. Just three easy steps.

Step 1.

For capturing file growth events you need to create a log table "tbl_FileGrowthEvents" in your "DBA_Util" database:
USE [DBA_Util];
GO
DROP TABLE IF EXISTS tbl_FileGrowthEvents;
GO
CREATE TABLE tbl_FileGrowthEvents
(
FileGrowthEvent_ID INT IDENTITY(1,1)
, FileGrowthEvent_DT DATETIME NOT NULL
CONSTRAINT Def_FileGrowthEvents_FileGrowthEvent_DT DEFAULT GetDate()
, FileType CHAR(3) NOT NULL 
, DatabaseName SYSNAME NOT NULL
, [FileName] SYSNAME NOT NULL
, FileGrowth_XML XML NOT NULL
);
GO

Step 2.

You need to create a job with a name "Alert_File_Growth_Logging". Make sure to specify the owner of that job "SA" or another system account, which have appropriate access.


Create a single step in that job called "Logging Event" and copy-paste there following script:
INSERT INTO DBA_Util.dbo.tbl_FileGrowthEvents(
FileType, DatabaseName, [FileName], FileGrowth_XML)
SELECT LEFT([type_desc],3), DB_NAME(database_id), [name]
, '<FileGrowth_Event>
<FileGrowth_Event_Type>' + [type_desc] + '</FileGrowth_Event_Type>
<Post_Time>' + IsNull(CAST($(ESCAPE_NONE(WMI(PostTime))) as VARCHAR),'') + '</Post_Time>
<Duration>' + IsNull(CAST($(ESCAPE_NONE(WMI(Duration))) as VARCHAR),'') + '</Duration>
<Database_Name>' + DB_NAME(database_id) + '</Database_Name>
<File_Name>' + [name] + '</File_Name>
<Physical_Name>' + IsNull(physical_name,'') + '</Physical_Name>
<File_Size_Mb>' + IsNull(CAST([size] / 128 as VARCHAR),'') + '</File_Size_Mb>
<NT_Domain_Name>' + IsNull('$(ESCAPE_SQUOTE(WMI(NTDomainName)))','') + '</NT_Domain_Name>
<Login_Name>' + IsNull('$(ESCAPE_SQUOTE(WMI(LoginName)))','') + '</Login_Name>
<Session_Login_Name>' + IsNull('$(ESCAPE_SQUOTE(WMI(SessionLoginName)))','') + '</Session_Login_Name>
<Host_Name>' + IsNull('$(ESCAPE_SQUOTE(WMI(HostName)))','') + '</Host_Name>
<Application_Name>' + IsNull('$(ESCAPE_SQUOTE(WMI(ApplicationName)))','') + '</Application_Name>
</FileGrowth_Event>'
FROM master.sys.master_files
WHERE [name] = N'$(ESCAPE_SQUOTE(WMI(FileName)))' 
and database_id = $(ESCAPE_SQUOTE(WMI(DatabaseID)));
You have to get something like this:

Step 3.

Now you need to create two alerts: "Alert Data file auto growth" and "Alert Log file auto growth".
In both you Specify following:
Type: "WMI event Alert"
Namespace: "\\.\root\Microsoft\SqlServer\ServerEvents\MSSQLSERVER"
In this case "MSSQLSERVER" is an instance name of your SQL Server.
Query for Data File Alert: SELECT * FROM DATA_FILE_AUTO_GROW
Query for Log File Alert: SELECT * FROM LOG_FILE_AUTO_GROW
You have to get this:
 On "Response" Page you have to check "Execute Job" and select previously created job "Alert_File_Growth_Logging" from the list:
  On "Options" Page you can increase "Delay between responses" to one minute.
After saving both Alerts you are ready to test.

Step 4.

If you do not have it set, you have to check  the box "Replace tokens for all job responses to alerts" inside of SQL Server Agent -> Properties -> Alert System -> Token replacement.


Test:

Create a test database and run following script for it. (just make sure database name and file names are matching yours)
use SLAVA_TEST;
GO
SET NOCOUNT ON
GO
DROP TABLE IF EXISTS tbl_Copy_Messages;
GO
DBCC SHRINKFILE ('Slava_Test_Data') WITH NO_INFOMSGS;
GO
DBCC SHRINKFILE ('Slava_Test_Log') WITH NO_INFOMSGS;
GO
SELECT * INTO tbl_Copy_Messages FROM sys.messages;
GO

If you've done all three building steps without errors, you should have a content of your "tbl_FileGrowthEvents" table like this:

From that point you can answer which file has grown and when. So, you might correlate it to your internal processes.
If you need deeper information you can click on XML content and get more information associated with a transaction, which caused file growth event:


Friday, July 19, 2019

SSIS Data Profiling of Multiple data sets in multiple databases

      I assume you hit my post because you are already know how to do SSIS Data Profiling for a single table, but now you want to know how to do it for all tables in your database, or even more, Profile ALL data on ALL Databases on your Server and maybe profile whole data universe of your SQL environment.

      If you just need to Profile several of your tables, than there is pretty good old solution in the Internet: http://agilebi.com/jwelch/2008/03/11/using-the-data-profiling-task-to-profile-all-the-tables-in-a-database/
In this post I'll go further using "step-by-step" methodology.

Step 1. Preparing variables.

After you created new SSIS package set necesary variables first.
Most of the variables are regular strings, some are calculations and one object:

Strings: 

"SQLServer" - Name of your target SQL Server - has to be set;
"SourceDatabase" - Name of a currently targeted Database;
"ResultPath" - Folder, where all XML results will be placed. In my Example that is:
                "C:\Projects\DataProfiling\";
"ProfileXMLHeader" - XML Header of Data Profiling Request - must be set;
"ProfileXMLFooter" - XML Footer of Data Profiling Request - must be set;
"ProfileXMLBody" - XML Body of Current Data Profiling Request;

Calculatables:

"ResultFolder" - Folder, which will be created for your SQL Server instance. The formula is:
                "@[User::ResultPath] +  @[User::SQLServer]"
"ResultFile" - File name, which will be created for an individual database. The formula is:
                "@[User::ResultFolder] + "\\" +  @[User::SourceDatabase] + ".xml""
"ProfileXML" - Current XML Data Profiling Request. The formula is:
                "@[User::ProfileXMLHeader] +  @[User::ProfileXMLBody] +  @[User::ProfileXMLFooter]"

Object:

"DatabaseList" - That will be collection of database names on your server to loop through;

Variables settings:

Besides of "SQLServer" and "ResultPath" you have to setup Header and Footer of your XML request. For Visual Studio Version 16.0.1 there should be following values, but you can extract and use them from your own requests:
"ProfileXMLHeader"
<?xml version="1.0" encoding="utf-16"?>
<DataProfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://schemas.microsoft.com/sqlserver/2008/DataDebugger/">
  <ProfileVersion>1.0</ProfileVersion>
  <DataSources />
  <DataProfileInput>
    <ProfileMode>Exact</ProfileMode>
    <Timeout>0</Timeout>
    <Requests>
"ProfileXMLFooter"
    </Requests>
  </DataProfileInput>
  <DataProfileOutput>
    <Profiles />
  </DataProfileOutput>
</DataProfile>
As you can see, the Body of your XML Data Profiling Request will be placed between the tags "<Requests>" and "</Requests>"

As a result, your variables' settings should look like this:

Step 2. Preparing Connections

You will need only two connections in that project: to your SQL Server and to an output file.
First connection is "SearchedDataSource" - SQL Server connection:

After you created the connection you have to go to its properties and set Expressions as following:
"InitialCatalog" = @[User::SourceDatabase]
"ServerName" = @[User::SQLServer]
Setting "ServerName" as a variable will give you a freedom to point to different servers and possibly to go through their list you you wish.

Second connection is "ResearchResult" - File connection:
 
For that connection you also set an Expression in its properties:
"ConnectionString" = @[User::ResultFile]

Step 3. Building SSIS Package

Here is what you are going to build:

"Create Folder for SQL Server" - File System task to create a new folder for your data files:

"Get List of databases" - "Execute SQL Task" Fills out "DatabaseList" object with list of databases to process.
The most important Points are:
- Connection type: ADO.NET
- ResultSet: Full result set
- Result Set: 0 = User::DatabaseList
- SQL Statement: List of your databases. In this example I've excluded system databases, but you can do your own exclusions or build custom lists.

SELECT name FROM sys.databases WHERE database_id > 4

"Foreach Loop Container" - Loops through the list of your databases. There you use your object "DatabaseList" as Looping source.
Besides of looping, you need to extract current database name into "SourceDatabase" variable:

Inside of the loop you need to build SQL Task "Get List of tables" as following.
The most important Points are:
- Connection type: ADO.NET
- ResultSet: Single Row
- Result Set: 0 = User::ProfileXMLBody
- SQL Statement: The most crucial part of the project and I'll explain it below.


Here is not too complicated, but very important SQL Script.
At first, it does "DBCC UPDATEUSAGE(0);" for your database. That is important to prevent SSIS bug described in my previous post (https://slavasql.blogspot.com/2019/07/ssis-Data-Profiling-UPDATEUSAGE.html)
At second we build list of all tables in the database. As you can see in the "WHERE" clause I exclude system tables and empty ones. Also, it might be necessary to exclude tables, which are too big and won't fit into the memory.
The third part is just looping through the tables and building XML body for the request.
SET NOCOUNT ON
DBCC UPDATEUSAGE(0);

DECLARE @xml NVARCHAR(MAX)=N'';
DECLARE  @i INT, @m INT;
DECLARE @tbls TABLE (ID INT IDENTITY(1,1), [schema] SYSNAME, [name] SYSNAME)
INSERT INTO @tbls([schema], [name])
SELECT DISTINCT SCHEMA_NAME(t.schema_id), t.name 
FROM sys.dm_db_partition_stats as s
INNER JOIN sys.tables as t on t.object_id = s.object_id
WHERE t.type = 'U' and s.row_count > 0;

SELECT @m = MAX(ID), @i = MIN(ID) FROM @tbls;
WHILE @i <= @m
BEGIN
SELECT TOP 1 @xml += N'
<ColumnStatisticsProfileRequest ID="StatisticsReq' + CAST(@i as NVARCHAR) + N'"><DataSourceID>SearchedDataSource</DataSourceID><Table Schema="' + [schema] + N'" Table="' + [name] + N'" /><Column IsWildCard="true" /></ColumnStatisticsProfileRequest>
<ColumnValueDistributionProfileRequest ID="ValueDistReq' + CAST(ID as NVARCHAR) + N'"><DataSourceID>SearchedDataSource</DataSourceID><Table Schema="' + [schema] + N'" Table="' + [name] + N'" /><Column IsWildCard="true" /><Option>FrequentValues</Option><FrequentValueThreshold>0.001</FrequentValueThreshold></ColumnValueDistributionProfileRequest>
<ColumnPatternProfileRequest ID="PatternReq' + CAST(ID as NVARCHAR) + N'"><DataSourceID>SearchedDataSource</DataSourceID><Table Schema="' + [schema] + N'" Table="' + [name] + N'" /><Column IsWildCard="true" /><MaxNumberOfPatterns>20</MaxNumberOfPatterns><PercentageDataCoverageDesired>95</PercentageDataCoverageDesired><CaseSensitive>false</CaseSensitive><Delimiters> \t\r\n</Delimiters><Symbols>,.;:-"''`~=&amp;/\\@!?()&lt;&gt;[]{}|#*^%</Symbols><TagTableName /></ColumnPatternProfileRequest>
<ColumnNullRatioProfileRequest ID="NullRatioReq' + CAST(@i as NVARCHAR) + N'"><DataSourceID>SearchedDataSource</DataSourceID><Table Schema="' + [schema] + N'" Table="' + [name] + N'" /><Column IsWildCard="true" /></ColumnNullRatioProfileRequest>
<ColumnLengthDistributionProfileRequest ID="LengthDistReq' + CAST(@i as NVARCHAR) + N'"><DataSourceID>SearchedDataSource</DataSourceID><Table Schema="' + [schema] + N'" Table="' + [name] + N'" /><Column IsWildCard="true" /><IgnoreLeadingSpace>false</IgnoreLeadingSpace><IgnoreTrailingSpace>true</IgnoreTrailingSpace></ColumnLengthDistributionProfileRequest>'
+ '<CandidateKeyProfileRequest ID="KeyReq' + CAST(@i as NVARCHAR) + N'"><DataSourceID>SearchedDataSource</DataSourceID><Table Schema="' + [schema] + N'" Table="' + [name] + N'" /><KeyColumns><Column IsWildCard="true" /></KeyColumns><ThresholdSetting>Specified</ThresholdSetting><KeyStrengthThreshold>0.95</KeyStrengthThreshold><VerifyOutputInFastMode>false</VerifyOutputInFastMode><MaxNumberOfViolations>100</MaxNumberOfViolations></CandidateKeyProfileRequest>'
FROM @tbls
WHERE ID = @i;
SET @i += 1;
END
SELECT ProfileXMLBody = @xml;

The Last part is "Data Profiling Task". As Destination, you have to specify FileConnection "ResearchResult". The most important here is to set an expression "ProfileInputXML" as "@[User::ProfileXML]" variable. Also, very important to set Property "DelayValidation" to True.



Step 4. Run the Package!

That might take a while or even error out.

Step 5. Analyze the results

You can analyze individual Database Profiling results using Microsoft's "DataProfileViewer.exe" utility, which can be found in "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\SSIS\150\Binn\" folder. 
The result should look like this:

Caveats of that method:

1. If you have a lot of big databases on your server, it might take forever to scan all of them. The only way to monitor the progress to look in the newly created folder for that server for separate XML files for each already scanned database.
2. If data sets are too big for your computer to comprehend you might get 'System.OutOfMemoryException' error. Separating by single table and task does not help. Only solution I've found is to run that package on a computer with a lot of free memory.
3. SSIS does not scan BLOB columns like VARCHAR(MAX), NVARCHAR(MAX), XML, Etc. It simply ignores them. So, if you still need to scan those columns you need to use another Profiling tool.


Wednesday, July 17, 2019

SSIS Data Profiling Bug uncovered by SQL Server glitch

I was doing Data Profiling in my environment using SSIS and hit very unusual error:
SSIS package "Package 2.dtsx" starting.
Error: 0x2 at Data Profiling Task, Data Profiling Task: Error occurred when profiling the data. The error message is: System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values.
Parameter name: value
   at Microsoft.DataDebugger.DataProfiling.CandidateKeyProfile.set_KeyStrength(Single value)
   at Microsoft.DataDebugger.DataProfiling.KeyProfileGroupingSetTask.GenerateKeyProfile(TableQName parentTable, String countColumnName, Double keyStrength, Boolean isExactKey)
   at Microsoft.DataDebugger.DataProfiling.KeyProfileGroupingSetTask.PostProcessGroupBy(TableQName parentTable, String countColumnName, Double keyStrength, Boolean isExactKey)
   at Microsoft.DataDebugger.DataProfiling.KeyProfileGroupingSetTask.ComputeProfile(Boolean isGroupByQueryRun, TableQName parentTable, String countColumnName, Boolean isParentTempTable, Int64 parentRowCount)
   at Microsoft.DataDebugger.DataProfiling.GroupingSetWorkItem.Visit(GroupingPlanTreeNode node, TableQName parentTable, String countColumnName, Boolean isParentTempTable, Int64 parentRowCount)
   at Microsoft.DataDebugger.DataProfiling.GroupingSetWorkItem.ExecutePlan(List`1 plan)
   at Microsoft.DataDebugger.DataProfiling.GroupingSetWorkItem.DoWork()
   at Microsoft.DataDebugger.DataProfiling.TablePartitionedBatch.ComputeProfiles()
   at Microsoft.DataDebugger.DataProfiling.BuiltInProfiler.Profile().
Task failed: Data Profiling Task
SSIS package "Package 2.dtsx" finished: Success.

As you can see, there is no actual problem description or any failed values.
Here are the circumstances under which I could reproduce that error:
1. I have only "Data profiling task" in my SSIS package.
2. Chosen only "Column Value Distribution Profile" and "Candidate Key Profile" for problematic table.
3. Left all other parameters as Defaults and execute the package.

That produced the error.

I've tried following to reproduce it but it hasn't worked:
1. Single research on "Column Value Distribution Profile" or "Candidate Key Profile" - single items have not produced errors, only that pair together.
2. Copying that table another database has not produced the error.
3. When I've tried to specify single columns, not all columns returned the error, but only the one, which is unique.

I've found similar question on Microsoft forum and even posted my question and research there, but obviously, nobody could reproduce that error too. (https://social.msdn.microsoft.com/Forums/sqlserver/en-US/d0a8716b-54ae-4298-a03c-b9374ee1e0dc/systemargumentoutofrangeexception-error?forum=sqlintegrationservices)

So, then I've tried to look deeper at the data. I've noticed that when I run following queries I get different result:
EXEC sp_spaceused '[dbo].[ProgramLocking_JN]';
GO
SELECT COUNT(*)
FROM [dbo].[ProgramLocking_JN]
GO

As you can see, "sp_spaceused" procedure returns 336 rows, while in reality table has 337 rows.

DBCC CHECKDB returned no errors and following info on the table.
DBCC results for 'ProgramLocking_JN'.
There are 337 rows in 9 pages for object "ProgramLocking_JN".

Then I've tried to use DBCC UPDATEUSAGE as following:
DBCC UPDATEUSAGE(0,'ProgramLocking_JN') WITH COUNT_ROWS;
And got following result:
DBCC UPDATEUSAGE: Usage counts updated for table 'ProgramLocking_JN' (index 'ProgramLocking_JN', partition 1):
        ROWS count: changed from (336) to (337) rows.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

After that little fix the SSIS error has gone.

Lesson learned:
Before running SSIS Data Profiling it is nice to run "DBCC UPDATEUSAGE(0);" command against all databases you are doing your research.