In this post I want to go through the Basic operations with Clustered Columnstore Index (CCSI) and show their internal views:
CreationWill create a test table with CCSI:
After table creation we immediately can see it's internals:
Here we can see that ALL four columns of our table are "columnstored" and even though Columnstore Indexes (CSI) do not support "Included" columns, all of them marked as "included".
Along with CCSI itself, SQL Server creates "COLUMN_STORE_DELETE_BITMAP" for the new index:
"COLUMN_STORE_DELETE_BITMAP" - tracks deleted rows, obviously, number it is still zero.
Inserting rows into CSI Delta StoreAt first will try to insert only one row into the table
Then immediately run Script #2 again and get following result:
SQL Server created "Delta store" for our CCSI and inserted 10 new uncompressed rows.
Lets change variable @i to "100000" in Script #3 and run it again. Script #2 will show following:
If we try different method of inserting records we can get pretty interesting results:
(Run script #2 right after)
SQL Server calculated an amount of data to insert and decided to use Parallelism for CSI insertion.
Parallel CSI insert used all four CPUs and loaded data equally into 4 Delta Store groups.
Inserting rows into Column Store using Bulk Insert
At first, clean CSI:
Accordingly to MSDN "Columnstore Indexes Data Loading" document, in order to avoid placing your data into the Delta Store and move them directly to Compressed storage you have to keep your batch size equal or bigger than 102,400 rows.
Lets test that. I've modified script #4 to insert 102400 records into CSI:
Run Script #2 after that:
As you can see, SQL Server split all records across your CPUs and as a result you still have your records in the Delta Store.
There are 2 ways to overcome that situation:
1. Make batch size bigger than number of CPUs used by your SQL Server multiplied by 102,400. In my case that number is 4 x 102,400 = 409,600;
2. Lover degree of parallelism. Yes if you want to load data into smaller chunks the process will be slower.
So, lets test the second scenario. Delete all records again from "tbl_Test_Columnstore" table and run following modified script:
Now we expect to have some data in our Columnstore. So, we need a new script to report it:
Here is the result:
As you can see, this time all 102,400 rows went directly to the Columnstore. Also, you can notice that actual columnstore uses almost 10 times less space than the same data in Delta Store.
Deleting rows from CSIWill use very simple script to delete rows:
After running Script #6 we notice following:
Lets delete more and run script #6 again:
Try to re-insert records using script #5 and see what will happen:
SQL Server Created New Columnstore row group to keep newly inserted records.
Updating data in CSIWill do very simple update and see what happen:
Moving Data from Delta Store to Column StoreThat is pretty simple "ALTER INDEX" operation:
After data are moved from Delta Store it will look like this:
As you can see, data from Delta Store row group #3 were moved to Column Store row group #6 and row groups 3,4, & 5 were marked for system deletion by "TOMBSTONE".
Some time later, marked row groups disappear:
Merging two Column StoresIn order to generate new Column Store will run script #5 again:
In order to merge two small row groups into a larger one we can just use reorganize without any parameters:
Reporting CSI columns' space allocationScript #7:
If you sum the size of all columns you'll get the same number as the one, given by script #6.