Monday, June 10, 2019

Importance of Clustered Index

Everybody know that it is almost always better to have Clustered Index on a table for better performance, but I want to emphasize the Importance of Clustered Index Sizewise.

As most of the developers working in "Agile", which means without any proper planning, I've created several tables without proper indexing, just couple of foreign keys.
Tables were very narrow with just a few columns and my expectations for data growth were very modest. However, after just a little while I was very surprised when my database showed huge unexpected growth and size of the data became multiple times higher than I've expected.

After very little research I've found and fixed the problem. In this post I'll describe how I've done it.

1. Identifying the Problem.

At firs I run following script to see the biggest tables in my database:
SELECT OBJECT_SCHEMA_NAME(st.object_id) + '.' + OBJECT_NAME(st.object_id) as ObjectName,
SUM(CASE WHEN st.index_id < 2 THEN st.row_count ELSE 0 END) as row_count,
SUM(CASE WHEN st.index_id < 2 THEN st.reserved_page_count ELSE 0 END)*8 as Data_Size_KB,
SUM(CASE WHEN st.index_id < 2 THEN st.used_page_count ELSE 0 END)*8 as Used_Data_Space_KB
FROM sys.dm_db_partition_stats st
LEFT JOIN sys.tables as t ON t.object_id = st.object_id
LEFT JOIN sys.indexes i ON i.OBJECT_ID = st.OBJECT_ID AND i.index_id = 1
WHERE OBJECT_SCHEMA_NAME(st.object_id) != 'sys'
GROUP BY OBJECT_SCHEMA_NAME(st.object_id), OBJECT_NAME(st.object_id), t.name
ORDER BY Data_Size_KB DESC, row_count DESC

OPTION (RECOMPILE);

Here is the Top of what I've got:

2. Data Research

As you can see from previous results, the biggest table is "tbl_Instance_WAIT_Data", where I collect data for waits in my systems. Lets take a look at the table's structure using "sp_help":

EXEC sp_help '[Data_Collection].[tbl_Instance_WAIT_Data]';


As you can see, that table has only 7 columns and data size for one row is only 44 bytes. With simple calculation we can determine that on one SQL Server database page, which is 8 Kb in size, can fit about 150 records. For the current row count of 692096 rows, expected data size is supposed to be no more than 37 Mb, but data usage exceeds 370 Mb (10 times more!) and total space usage is almost 3 Gb (80 times more!!!)

3. Simple Fix

To fix that space over usage problem we just need to create a clustered index like this:

CREATE CLUSTERED INDEX CLIX_Instance_WAIT_Data

ON Data_Collection.tbl_Instance_WAIT_Data 
([Timestamp], Instance_ID, Wait_ID);

*Just note that I use default database Fill Factor. In my case data will be sorted by Timestamp and 100% page fill factor will be OK, but in other cases you might need to reduce it to 70%-90%.

And now lets check the space size usage:

EXEC sp_spaceused '[Data_Collection].[tbl_Instance_WAIT_Data]';


As you can see, the data size has reduced to the expected number of ~37 Mb.