Is a Successful Backup Really a Successful Backup?

There are two common types of backups performed on SQL Server databases. The first is the native backup type that is commonly taken using either the out-of-the-box database maintenance plans or a custom backup script. The second type is that which is taken by third-party backup tools – be it an enterprise-wide backup tool or a SQL Server-specific backup tool. What I typically see in majority of the customers that I work with is that they simply take the database backups and rely on the success or failure of the backup jobs. However, I also tell my customers that I don’t believe in backups, let alone successful backups. I think this is worth repeating, so I’ll say it again:

I don’t believe in backups, let alone successful backups.

The reason I don’t believe in successful backups is because it creates a false sense of security, especially when you are running daily checks where your point of reference for a PASS/FAIL check is the status of a job execution. This proved a point several years ago when one of my customers reported that they are seeing corruption errors in the error log. Upon further investigation, I confirmed that they have had a handful of page corruptions and that they are on SQL Server 2008 Enterprise Edition. They told me that they have the latest full database backups and the log backups, even going as far back as 4 months prior. Thinking to myself, I guess this is good enough since I can do a page-level restore for the corrupted pages. I was wrong – very wrong. When I started restoring the pages from backup, I encountered errors after errors pertaining to the corrupted pages that I have been trying to restore. It turns out that those same corrupted pages have been on all of the backups as well. I’ll save the story of how we recovered the corrupted pages in a future blog post.

This is where the CHECKSUM option in the BACKUP command play a very important role. Here’s a definition of the CHECKSUM option from Books Online.

“The CHECKSUM option specifies that the backup operation will verify each page for checksum and torn page, if enabled and available, and generate a checksum for the entire backup.”

Basically, when this option is used, the backup process will test the page checksums that exist on the data file pages being backed up. If a bad page checksum is found, the backup process will stop. Simply put, the backup process will stop and report errors when bad page checksums are found. This means that your SQL Server Agent job that runs the backup or that enterprise backup software that you use will report a failed backup, causing you to investigate further (I would assume that you do read the alerts that you get and not just hit the Shift+Delete button on your e-mail client.)

To illustrate, I’ll take a backup of two databases – one of which I “intentionally” corrupted. I’ll use the standard BACKUP DATABASE command that majority of the third-party backup tools use. It’s also what’s being generated by the out-of-the-box Back Up Database Task in the database maintenance plan, with some other weird options.

BACKUP DATABASE [TestDB_Corrupted]
TO DISK='C:\Backups\TestDB_Corrupted.bak'
WITH INIT, STATS
GO
BACKUP DATABASE [TestDB]
TO DISK='C:\Backups\TestDB.bak'
WITH INIT, STATS
GO

BackupNoChecksum

Notice that while the first database is corrupted, SQL Server still reported the backup process as successful. What about if we include the CHECKSUM option in the BACKUP DATABASE command? Let’s see.

BACKUP DATABASE [TestDB_Corrupted]
TO DISK='C:\Backups\TestDB_Corrupted.bak'
WITH INIT, STATS, CHECKSUM
GO
BACKUP DATABASE [TestDB]
TO DISK='C:\Backups\TestDB.bak'
WITH INIT, STATS, CHECKSUM
GO


BackupWithChecksum

Now, we’re seeing a better picture of what the state of our backups look like. Technically, we won’t be having a backup file because the backup process will terminate when bad page checksums are found. Just by simply adding the CHECKSUM option in our BACKUP DATABASE command, we can be notified about whether or not there is possible corruption on our databases and not have a false sense of security about successful backups. The problem, of course, is when we don’t have any control of the database backups, for instance, when organizations standardize on an enterprise backup solution. Good thing there’s Trace Flag 3023. This trace flag enables the CHECKSUM option on all backups taken on the SQL Server instance. This saves me from trying to recall whether or not a backup taken using either the SQL Server native backup or third-party backup tools did use the CHECKSUM option. Microsoft KB article 2656988 describes this trace flag in detail and how to turn it on a SQL Server instance. This is one of those “set-and-forget” settings that I recommend to my customers, much like the backup compression option on SQL Server 2008 and higher instances (only for Standard and higher editions.)

Now, let’s not rush and make assumptions about using the CHECKSUM option in our backups. Just because it is checking for page checksums and torn pages during the backup process doesn’t mean you don’t have corruptions on your database. There is still no substitute for running regular database consistency checks to be sure that you are aware of whether or not your databases do have corruption. It’s just my way of knowing whether or not I can really rely on my backups.

You might be asking, “so, what do you believe in as far as backups are concerned?I believe in successfully testing your backups by restoring them on a regular basis. This is the only guarantee that my backups do work.

Comments

comments

Please note: I reserve the right to delete comments that are offensive or off-topic.

Leave a Reply

Your email address will not be published. Required fields are marked *