>Change what? – Change Data Capture in SQL Server 2008

>It must have been tough trying to capture INSERT, UPDATE and DELETE activities in a SQL Server database. Usually, developers may resort to creating DML triggers just to solve this problem in previous versions. The problem with this approach is that it introduces unnecessary overhead in the database engine. Another approach is to read the transaction log using either the undocumented DBCC LOG and fn_dblog or using third-party tools like Lugiment Log Explorer, ApexSQL Log, or RedGate SQL Log Rescue. In SQL Server 2005, the concept of DDL triggers were introduced, thus making it easier to audit DDL activities. This new feature of SQL Server 2008 called Change Data Capture(CDC), system stored procedures are used to mark which types of objects you want to audit, and the stored procedures take care of how the auditing occurs. How it works is that it captures DML statements (this is quite interesting) asynchronously, reading the transaction logs (so much for third-party tools) and populates the CDC table with the row’s data which change, and expose the changes through a relational interface which can be consumed easily. The downside of this is that the feature is only available on Enterprise and Developer Editions of SQL Server 2008. To use this feature, you have to enable it on the database level. Again, I’ll use the very familiar Northwind database for this sample.

USE Northwind
EXEC sys.sp_cdc_enable_db

Once CDC is enabled, it creates the corresponding objects are created: cdc schema, cdc user, metadata tables and other system objects. The thing about this is that you have to enable it on a per table basis (I don’t know why Microsoft chose to do it this way but if you decide to do this on all user-defined tables, you definitely need to write a script to accomplish this)

EXEC sp_cdc_enable_table ‘dbo’, ‘EmployeeTerritories’, @role_name = ‘cdc_role’;

Just to explain the parameters passed to the stored procedure, the dbo happens to be the schema in which the table belongs, EmployeeTerritories is the name of the table which will be enabled for CDC while the @role_name parameter is the name of the database role used to have access to change data. If the role already exists, permissions are given to that role. If it doesn’t, it will be created with the specified name. Now, let’s simulate changes in the dbo.EmployeeTerritories table (don’t worry about referential integrity as this has been taken cared of)

INSERT dbo.EmployeeTerritories VALUES (4,98004), (4,98052);
WAITFOR DELAY ’00:00:05′;
UPDATE dbo.EmployeeTerritories SET TerritoryID =80202
WHERE employeeid = 4 AND TerritoryID=98052;


These information is stored in a new table created named schema_table_CT in the cdc schema. In this case, it would be the cdc.dbo_EmployeeTerritories_CT table. Querying this table is similar to executing the DBCC LOG command in previous versions of SQL Server – it just doesn’t make sense. Good thing Microsoft came up with table-valued functions which makes querying this table a lot easier (but not straight-forward). One function which we can look at is the cdc.fn_cdc_get_all_changes__ function. This function returns one row for each change applied to the source table within the specified log sequence number (LSN) range. We don’t really care about LSN ranges but this is how SQL Server tracks all changes in the transaction log. This is also part of the information stored in our database backups. Since we have this data in our cdc.dbo_EmployeeTerritories_CT table, we can use this data and pass it as a parameter in this function. Below is a sample script to use this function to query the cdc.dbo_EmployeeTerritories_CT table but would return a more meaningful resultset


DECLARE
@begin_time datetime, @end_time datetime, @from_lsn binary(10), @to_lsn binary(10);

— Obtain the beginning of the time interval.
SET @begin_time = GETDATE()-1
;

— Obtain the end of the time interval.
SET @end_time = GETDATE();

— Map the time interval to a change data capture query range.
SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn(‘smallest greater than or equal’, @begin_time);
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn(‘largest less than or equal’, @end_time
);

— Return the changes occurring within the query window.
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_EmployeeTerritories(@from_lsn, @to_lsn, ‘all’);
GO

You might notice the new function introduced – fn_cdc_map_time_to_lsn. This just returns the log sequence number (which we don’t really care about) corresponding to the specified time (which we are more familiar with). The resultset would be something we could understand. What we need to really look at is the __$operation column which returns code values for the operations done for the specified table: 1 = delete, 2 = insert, 3 = update (before image), and 4 = update (after image). Imagine an UPDATE statement to be like a DELETE followed by an INSERT that’s why you have a before and after images. Another way to translate the results of the cdc.dbo_EmployeeTerritories_CT table is to use the query below

SELECT
operation = CASE __$operation
WHEN 1 THEN ‘DELETE’
WHEN 2 THEN ‘INSERT’
WHEN 3 THEN ‘UPDATE(BEFORE)’
WHEN 4 THEN ‘UPDATE(AFTER)’
ELSE ‘UNDEFINED’ END,
*
FROM cdc.dbo_EmployeeTerritories_CT


Bear in mind though that the data in the cdc tables are available within 3 days expressed in minutes (4320). You can clean this up when necessary using the sp_cdc_cleanup_change_table stored procedure.

This gives you an overview about change data capture and how you can use it in SQL Server 2008. Download the RC0 and start playing around with it. Check out my SQL Server 2008 videos at BlogCastRepository.com

Please note: I reserve the right to delete comments that are offensive or off-topic.

Leave a Reply

Your email address will not be published. Required fields are marked *