Will start from UTF8 and how it is better than UNICODE.
Here is a simple script to demonstrate abilities of newest SQL Server UTF8 feature:
GO
USE tempdb
GO
DROP TABLE IF EXISTS dbo.tbl_Use_Of_UTF8
GO
CREATE TABLE dbo.tbl_Use_Of_UTF8 (
ID INT IDENTITY (10,10)
, UnicodeColumn NVARCHAR(4000)
, UTF8Column VARCHAR(4000) collate LATIN1_GENERAL_100_CI_AS_SC_UTF8
, UTF8_Ukr_Column VARCHAR(4000) collate Ukrainian_100_CI_AS_SC_UTF8
)
GO
INSERT dbo.tbl_Use_Of_UTF8 (UnicodeColumn) VALUES
(N'༈༉ШЧݪݫݬݭܕܖܗܘܙऔकखഈ㔃㔄刀刁⠴⠵ȀȁȂȃꔂꔃՈՉ')
,(N'abcde༈༉ШЧݪݫݬݭܕܖܗܘܙऔकखഈ㔃㔄刀刁⠴⠵ȀȁȂȃꔂꔃՈՉ')
,(N'абвгдеАБВГДЕ')
,(N'abcdefghABCDEFGHабвгдеАБВГДЕ')
,(N'abcdefghABCDEFGH');
GO
UPDATE tbl_Use_Of_UTF8 SET UTF8Column = UnicodeColumn, UTF8_Ukr_Column = UnicodeColumn;
GO
SELECT Chars = LEN(UnicodeColumn)
, UnicodeSize = DATALENGTH(UnicodeColumn)
, UTF_Size = DATALENGTH(UTF8Column)
, Ukr_Size = DATALENGTH(UTF8_Ukr_Column)
, UTF8Column,
UTF8_Ukr_Column
FROM dbo.tbl_Use_Of_UTF8;
GO
|
The result will be like this:
From that result we can make following observations:
1. Use of multiple languages in UTF8 column takes more space than Unicode column. (First and Second rows)
2. Use of single non-Latin language in UTF8 field does not provide any size reduction at all. (Third row)
3. Use of a combination of Latin and a single non-Latin language bring some size reduction.(Fourth row)
4. Use of only Latin characters in UTF8 column gives us 50% size reduction over Unicode.
Conclusion
UTF8 is not suitable to store purely non-Latin languages.However, it can be extremely useful when you store only Latin characters, but have in mind a possibility of storing small number of records, which contain non-Latin characters.
The most useful case for that can be database's "Address" field to store names of streets and cities in the US.
99.9% of which, will be in Latin, but names like "Doña Ana", "Lindström" or "Utqiaġvik" would require use of Unicode. With UTF8 you can save exactly 50% of your disk space and eventually have some performance improvements.
No comments:
Post a Comment