Thursday, February 19, 2015

Collation on a single column

You specify collation when you install SQL Server.

What is "COLLATION"? - That is the way how SQL Server interprets ASCII symbols and character codes. It is applicably not only for "English", but for any supported Unicode languages.

Most of the time the default collation always works and nobody even look at it. All new databases will have the same collation and all CHAR/NCHAR/VARCHAR/NCHAR fields.

However, sometimes you have to pay attention to your collation.
There might be a case when you attach database from another server with different collation.
Sometimes your business rules dictate that values have to be case sensitive and should be sorted in certain order. Then you have to change your collation.

At first, how do you know what is default collation on your SQL Server?
Run following command:
SELECT SERVERPROPERTY(N'Collation');
My SQL Express returns "SQL_Latin1_General_CP1_CI_AS" - the regular default value.

If you want, you can change collation for any database.
Following script demonstrates different results of the same query with different database collation and two different ways to retrieve default collation information for a database:
USE [TestDB];
GO 
SELECT collation_name FROM sys.databases WHERE Name = 'TestDB'; GO
SELECT 'A','A' UNION 
SELECT 'a','a' UNION
SELECT 'B','B' UNION
SELECT 'b','b';
GO 

ALTER DATABASE [TestDB] COLLATE SQL_Latin1_General_CP1_CS_AS;
GO
SELECT DATABASEPROPERTYEX('TestDB', 'Collation') as collation_name;

GO
SELECT 'A','A' UNION 
SELECT 'a','a' UNION
SELECT 'B','B' UNION
SELECT 'b','b';
GO 
 When we have default case insensitive collation UNION statement collapses upper and lover case values, but when we switch database to case sensitive collation query returned all four rows.

Here is a demonstration how collation effects a table:
USE [TestDB];
GO 
CREATE TABLE tbl_test_Collation(
    A CHAR(1),
    B CHAR(1)
);
GO 
INSERT INTO tbl_test_Collation(A, B) 
SELECT 'A','A' UNION 
SELECT 'a','a' UNION
SELECT 'B','B' UNION
SELECT 'b','b';
GO
SELECT Name, collation_name FROM sys.columns
WHERE object_id = OBJECT_ID('tbl_test_Collation');

GO
As you can see, both fields in the table have default collation for the database:

The beauty of SQL Server collation is in ability to set individual collation on every column:

ALTER TABLE tbl_test_Collation  
ALTER COLUMN A CHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AS;
GO
ALTER TABLE
tbl_test_Collation
 
ALTER COLUMN B CHAR(1) COLLATE Latin1_General_BIN;
GO
SELECT
Name, collation_name FROM
sys.columns
WHERE object_id = OBJECT_ID('tbl_test_Collation'); 
GO
As a result we have two columns in a table with different collation:

Now will play around with that table. As you noticed, values in column "A" and "B" are identical:
SELECT * FROM tbl_test_Collation WHERE A = 'a';
GO
 
SELECT * FROM tbl_test_Collation WHERE B = 'a';
GO

We can see that first query, when we filtered by case insensitive column, returned two records and filter on case sensitive column returned only one record.
Ordering, using different collation, is even more interesting:
SELECT * FROM tbl_test_Collation ORDER BY A; 
GO
SELECT * FROM tbl_test_Collation ORDER BY B; 
GO
First and the second queries have completely different order:
      
Ordering by case insensitive column sorts in alphabetical order. Ordering by "Binary sort" collation sorts by characters' ASCII codes.

Sometimes we need to JOIN, Link or concatenate character fields with different collation.
Here are couple of examples which will generate errors:

SELECT A + B FROM tbl_test_Collation;
GO
SELECT * FROM tbl_test_Collation WHERE A = B;GO

Here is the way to fix these queries:
SELECT A + B COLLATE SQL_Latin1_General_CP1_CI_AS  
FROM tbl_test_Collation;GO
SELECT * FROM tbl_test_Collation 
WHERE A COLLATE Latin1_General_BIN = B;GO

However, be very careful joining columns, which do not have the same collation.
For example two queries below looks exactly the same, but return different sets of records:
SELECT * FROM tbl_test_Collation as a
INNER JOIN tbl_test_Collation
as b
    ON a.A
COLLATE Latin1_General_BIN = b.B;
GO
SELECT * FROM tbl_test_Collation as a INNER JOIN tbl_test_Collation as b
   
ON a.A = b.B COLLATE SQL_Latin1_General_CP1_CI_AS;GO

To prevent discrepancy between queries you can use "DATABASE_DEFAULT" option instead of specifying particular collation:
SELECT * FROM tbl_test_Collation as a INNER JOIN tbl_test_Collation as b
   
ON a.A COLLATE DATABASE_DEFAULT = b.B COLLATE DATABASE_DEFAULT;GO

I want to finish this post by the simple query to retrieve list of all available collations.
In SQL Server 2014 is almost 4K collations. To see only applicable to your language you can use following script
SELECT name, [description] FROM sys.fn_helpcollations()
WHERE name like 'Greek%';

No comments:

Post a Comment