Thursday, March 24, 2016

DBCC CHECKDB vs Evil

DBCC CHECKDB is very resource intensive and long operation, but it is extremely important to perform against your databases.
Kendra Little has a wonderful blog about that topic: DBCC CheckDB: Frequently Asked Questions.

In my post I won't talk about "Importance", "Necessity" and "Resume changing events" associated with DBCC CHECKDB. I do not want to talk about job security at all, I want to talk about professional etiquette.

The DBA in an organization is the last line of defense for the company's data. Only you can protect it from loss and corruption. And I hope along with making regular verified backups you are doing regular DBCC CHECKDB against all of your databases.

So, will draw the line: that operation is important and DBA must do that.
Now, the question is, how to make it less painful.

Option 1.

Run DBCC CHECKDB against secondary server. That can be a mirror, secondary AlwaysOn server or any other working copy of your environment. That is pretty easy to set and produces good consistent results. You always can switch replicas and do the checkup against the primary one.
Cons: You must pay additional licensing. If your primary server has 64 Cores, you have to pay for additional 64 SQL Server core licenses on the secondary, where you perform DBCC CHECKDB. So, it is extremely expensive option.

Option 2.

Run DBCC CHECKDB against restored copy of the database. That is much cheaper option. You are using one, not very expensive server, not only to verify data integrity for multiple databases from several production servers, but in parallel with that you also verify quality of your backups.
That is wonderful option, but requires dedicated server and advanced procedure with a lot of moving parts.

Option 3.

Use DBCC CHECKFILEGROUP. That is almost the same as to use DBCC CHECKDB, but you can perform it against only one file group at the time. That means if you have huge partitioned database with "Sliding Window" partitioning structure, you need to run DBCC CHECKFILEGROUP only against your Primary group and the most recent/active group. There is no necessity to run it against old and well backed up data, which you can restore at any time in the case of corruption.
If you do not use "Sliding Window", you have an option to redistribute your checkup operation over time and do only one partition at a time to reduce overall pressure on the system.
In order to use that option, you have to have a partitioning in your huge database.

Option 4.

Split DBCC CHECKDB execution over its constituencies.
If you take a look at DBCC CHECKDB command description in MSDN you can find that this command contains three major parts: DBCC CHECKALLOC, DBCC CHECKCATALOG and DBCC CHECKTABLE. To reduce the pressure on your system you can run these portions separately. Even more, you can run DBCC CHECKTABLE against every table and even every index on your server.
Looks like the great idea: you simply load a service table with requests and service broker executes those DBCC checkups at any idle time of your server.
That approach looks so good that I decided to test it.

Option 4. Testing.

Test #1. I've created a script to collect all tables and indexes on a server. That script collected the data and in parallel run DBCC CHECKALLOC and DBCC CHECKCATALOG against each database on the server. As the second step it looped against the list of indexes and run DBCC CHECKTABLE against each of them.
Test #2. I've created another script just to loop over list of databases and run DBCC CHECKDB against each of them.

Option 4. Testing Results.

On my workstation Test #1 finished in 37 minutes and 23 seconds. Test #2 finished in just 7 minutes and 7 seconds.
The five times difference makes no sense of split DBCC CHECKDB on its constituencies. Even worse,
besides of doing DBCC CHECKALLOC, DBCC CHECKCATALOG and DBCC CHECKTABLE, it also validates the Service Broker data in the database and link-level consistency between table metadata and file system directories and files the filestream.

As the result of my research I can conclude that Option 4 is not really an option anymore.

Option 4.

*During the preparation of this post I also used materials from Paul Randal's blog post "Importance of how you run consistency checks"

No comments:

Post a Comment