Why Performance Optimization Should Be Included In Every High Availability Strategy

A couple of years ago, when my friend SQL Server MVP and Microsoft Certified Master Amit Bansal (Twitter | Facebook) delivered a high availability training in Singapore, I asked if I could take a look at the course outline so I can justify attending. Back then, I was responsible for managing several SQL Server 2000 clusters and have just rolled out SQL Server 2005. I wanted to learn how I can provide higher availability to our new SQL Server 2005 instances beyond the usual failover clustering implementation.

As I was looking at the course outline, I can’t help but notice the inclusion of two topics that I felt should be in the “performance tuning” category rather than high availability – table partitioning and read committed snapshot isolation (also known as row-versioning.) Now, understand that this was before SQL Server 2008 was released and these two features were relatively new during this time. How could they possibly fall under the “high availability” category together with log shipping, failover clustering and database mirroring?

What SQL Server Books Online Does Not Mention

If you read the SQL Server Books Online entry on table partitioning, you would notice that the theme of the document revolves around optimizing performance. Nowhere would you find anything regarding availability. Take a look at the benefits of table partitioning in the list below, emphasis mine:

  • You can transfer or access subsets of data quickly and efficiently
  • You can perform maintenance operations on one or more partitions more quickly
  • You may improve query performance

So, if I’m a developer or a SQL Server DBA, I would instinctively put table partitioning under the “performance tuning” category, not high availability. Common sense, right?

Unfortunately, if we look at every other SQL Server feature (or even anything unrelated to technology) the way the creator/developer wants us to look at it, we’re probably limiting ourselves to a lot of potential creative ideas that we can use it for. I’m sure you’ve seen one of those crazy, yet creative ideas on what you can use your clothes hanger for (9 Incredible Things You Can Do With Cheap Plastic Hangers.)

When Performance Translates to Availability

I always start every high availability and disaster recovery project with the definition of the recovery objective (RPO/RTO) and service level agreements (SLAs.) I also emphasize that SLA is not just how quickly you can restore your database back online but how the end users can interact with it. It doesn’t matter if you managed to restore database operations within 5 minutes and have met your SLAs but if the end users cannot access the application or the performance is affecting response time, it is technically offline. It’s the end users’ perception of availability that defines whether or not something is available.

Then, it hit me. The reason why Amit included those two topics in the high availability category is because he fully understands what availability means from the point of view of the end users, not the IT team. It means making sure that the application and the database is available for the end users – even if it is just a perception of it.

During this time, I haven’t really thought about index maintenance as an availability strategy. All I knew was that we schedule index maintenance every Sunday since our databases are accessed globally. The only time that we can schedule downtime was during times of minimal activity. When we upgraded to SQL Server 2005, we managed to take advantage of the online index operation feature to further minimize downtime on our larger databases. But still, nobody ever complained that they cannot access the database because of the index maintenance. That further reinforced the fact that index maintenance is more of a performance optimization strategy and not of high availability.

But then I started experiencing application timeouts while performing index maintenance during one of my testing. I thought it was just a bug in the application code. Until I realized that my application is getting blocked by the index maintenance. Back then, all I saw from my application was the nasty HTTP 404 code in the browser. Being a former developer myself, my first instinct was to take a look at the application code. Good thing my colleague (who wasn’t even a DBA) told me to look at the database. I think that’s where the term “DBA stands for default blame acceptor” got started.

Call To Action

Take a look at some of your performance optimization strategies and understand how they are affecting your database availability. It could mean redefining your table schema to reduce the amount of transaction log records generated. It could also mean re-writing a stored procedure code  to improve query response time. Or maybe partition your large tables to isolate highly critical data from archived data. Don’t treat your performance optimization strategies separately from your high availability ones. And when you do so, you can achieve both performance and availability requirements.

Additional Resources

 

Please note: I reserve the right to delete comments that are offensive or off-topic.

Leave a Reply

Your email address will not be published. Required fields are marked *