Wednesday, August 6, 2014

SQL Server Data Compression. Hidden gem or the smart way to compress a table.

SQL Server Data Compression feature is available only in Enterprise edition since version 2008, but this post won't cover how it works, you can research it by yourself going to following links:

1. Data Compression
2. Page Compression
3. Row Compression

Here I will uncover only one hidden feature that can be easily missing from the view.

 Script #1. Initiate test table:
USE AdventureWorks2008R2;

GO
SELECT *, NEWID() as ID_Field
INTO Test_Compression
FROM sys.messages;
GO
ALTER TABLE Test_Compression ADD CONSTRAINT PK_Test_Compression PRIMARY KEY (Message_ID, Language_Id)
GO
CREATE INDEX NCIX_Test_Compression_1 ON Test_Compression (Message_ID, Language_Id, Severity)
GO
CREATE INDEX NCIX_Test_Compression_2 ON Test_Compression (Message_ID, Language_Id, Severity) INCLUDE (text)
GO
CREATE UNIQUE INDEX NCIX_Test_Compression_3 ON Test_Compression (ID_Field)
GO

The easiest way to implement table partitioning is to do one of the following scripts:
Script #2 Simple Table compression
--Test row compression.

ALTER TABLE Test_Compression REBUILD WITH (data_compression = row)

To check the result run following script
Script #3. Simple compression check for one table:
SELECT SCHEMA_NAME(t.schema_id) AS [SchemaName]

       ,OBJECT_NAME(t.object_id) AS [ObjectName]
       ,i.name as [Index Name]
       ,p.[data_compression_desc] as Compression_Type
FROM sys.partitions as p
INNER JOIN sys.tables as t ON p.object_id = t.object_id
INNER JOIN sys.indexes as i ON i.object_id = t.object_id and i.index_id = p.index_id
WHERE t.name = 'Test_Compression';


As you can see by doing simple table compression you can compress only Clustered index or a Heap.
Now I will show you the smarter way to do compression.
Please revert your changes back:
Script #4. Uncompress the test table

ALTER TABLE Test_Compression REBUILD WITH (data_compression = none);

At first we have to do some preparations:
 Script #5. Create temporary repository for compression estimation results
CREATE TABLE  #Compression_results(

       object_name   SYSNAME NOT NULL,
       schema_name   SYSNAME NOT NULL,
       index_id INT  NOT NULL,
       partition_number INT  NOT NULL,
       Compression_Type CHAR(4) NULL,
       Gain_Percentage as 100 - [size_with_requested_compression_setting(KB)]*100./[size_with_current_compression_setting(KB)],
       [size_with_current_compression_setting(KB)] INT  NOT NULL,
       [size_with_requested_compression_setting(KB)] INT  NOT NULL,
       [sample_size_with_current_compression_setting(KB)] INT  NOT NULL,
       [sample_size_with_requested_compression_setting(KB)] INT  NOT NULL
);

Script #6. Collect compression estimation data for the test table:
DECLARE @Schema SYSNAME = 'dbo', @Table SYSNAME = 'Test_Compression';


INSERT INTO #Compression_results(
       object_name, schema_name,
       [index_id], [partition_number],
       [size_with_current_compression_setting(KB)],
       [size_with_requested_compression_setting(KB)],
       [sample_size_with_current_compression_setting(KB)],
       [sample_size_with_requested_compression_setting(KB)])
exec sys.sp_estimate_data_compression_savings  @Schema,@Table,NULL,NULL,ROW;

UPDATE #Compression_results SET Compression_Type = 'ROW' WHERE Compression_Type is NULL;
 
INSERT INTO #Compression_results(
       object_name, schema_name,
       [index_id], [partition_number],
       [size_with_current_compression_setting(KB)],
       [size_with_requested_compression_setting(KB)],
       [sample_size_with_current_compression_setting(KB)],
       [sample_size_with_requested_compression_setting(KB)])
exec sys.sp_estimate_data_compression_savings  @Schema,@Table,NULL,NULL,PAGE;

UPDATE #Compression_results SET Compression_Type = 'PAGE' WHERE Compression_Type is NULL;

Script #7 Getting Compression estimation results
SELECT r.schema_name

       ,r.object_name as [Table Name]
       ,i.name as [Index Name]
       ,r.Compression_Type
       ,r.Gain_Percentage
       ,r.[size_with_current_compression_setting(KB)]
       ,CASE i.index_id WHEN 0 THEN 'ALTER TABLE [' + r.schema_name + '].[' + r.object_name + '] REBUILD WITH (DATA_COMPRESSION = ' + r.Compression_Type + ')'
              ELSE 'ALTER INDEX [' + i.name + '] ON [' + r.schema_name + '].[' + r.object_name + '] REBUILD PARTITION = '
                     + /*CAST(r.[partition_number] as VARCHAR)*/'ALL' + ' WITH (DATA_COMPRESSION = ' + r.Compression_Type + ')'
              END + ';
GO' as Compression_Script
FROM #Compression_results as r
INNER JOIN sys.schemas as s on s.name = r.schema_name
INNER JOIN sys.tables as t ON t.name = r.object_name and t.schema_id = s.schema_id
INNER JOIN sys.indexes as i ON i.object_id = t.object_id and i.index_id = r.index_id
WHERE [size_with_current_compression_setting(KB)] > 0 and [size_with_requested_compression_setting(KB)] > 0
--     and r.Gain_Percentage > 1
ORDER BY r.Gain_Percentage DESC;



As you can see from these estimation results we can easily make a plan what to compress and how:

For instance for indexes PK_Test_Compression and NCIX_Test_Compression_2, estimation results for ROW and PAGE compression are approximately the same. That means for them I'd choose ROW compression.


Estimation result for index NCIX_Test_Compression_1 show that PAGE compression will provide much better results than ROW compression. So, I'd compress that index on the PAGE level.
Index NCIX_Test_Compression_3 shows negative compression results, that means we won't even touch it.

Accordingly to estimation results I'm going to gain about 36.7% on clustered index and about 33.9% on indexes. (Just simple calculation: "SELECT 1-(21704 * (1-0.3833394765) + 1288 * (1-0.329192546) + 2784) /  (21704 + 1288 + 2784)")

Now, we just need to extract: created scripts from the very last column.
Please note that I put compression of Clustered index first. Also note I put stored sp_spaceused procedure to measure an impact of my compression.
Script #8.Compresses three indexes on test table.
EXEC sp_spaceused Test_Compression;
GO
ALTER INDEX [PK_Test_Compression] ON [dbo].[Test_Compression] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = ROW );
GO
ALTER INDEX [NCIX_Test_Compression_2] ON [dbo].[Test_Compression] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = ROW );
GO
ALTER INDEX [NCIX_Test_Compression_1] ON [dbo].[Test_Compression] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE);
GO
EXEC sp_spaceused Test_Compression;
GO


As a result of our compression the size of clustered index decreased by 36.9% and all other indexes by 35%. That means my compression for non-clustered indexes was little bit underestimated (was 33.9%), but clustered index compression estimation was very accurate.


At the end, want to point that Data Compression in SQL Server is simple, but you have to plan for it and know exactly what you are doing.
Also keep in mind that by using compression you put more pressure on your CPU. So, be reasonable fighting compression percentages.