21 Sep 2011

Comparing Master Data Services instances

As I have mentioned in previous posts, I've been working on a data warehouse project. Within the project we decided to use SQL Server 2008 R2 Master Data Services (MDS) to store all the warehouse specific reference data. Here are some reasons why MDS was used:
  • Allowed us to manage the reference data.
  • Reference data could be loaded into the warehouse like a source system
  • Would allow the client’s data governance team to easily update the reference data and keep the warehouse up-to-date
For active development of entities and data loads we use a sandpit instance. When the ETL was ready to use the new entities or loaded data, a cut of the sandpit instance would be promoted to the development environment. We came across a problem when we needed to identify some changes which were accidentally made on the development instance.

I came up with a method which helped to identify the changes fairly easily, below is the method I used. I will say that it’s not a perfect solution and might not work for everyone or continue to work when MDS updates have been applied.

Run the following SQL script on both instance of the MDS database servers:

   1:  CREATE DATABASE MDSCompare
   2:   
   3:  DECLARE @vColList AS VARCHAR(MAX)
   4:  DECLARE @vViewName AS SYSNAME
   5:  DECLARE @vSQL AS VARCHAR(MAX)
   6:  DECLARE @vEntityID AS INT
   7:  DECLARE @vModelID AS INT = (SELECT id FROM mdm.tblModel where Name = '<ModelName,Char,Master Data>')
   8:   
   9:  DECLARE EntityID_Cursor CURSOR LOCAL FORWARD_ONLY FAST_FORWARD READ_ONLY
  10:  FOR SELECT ID FROM mdm.tblEntity e WHERE model_id= @vModelID order by id
  11:   
  12:  OPEN EntityID_Cursor
  13:   
  14:  FETCH NEXT FROM EntityID_Cursor
  15:  INTO @vEntityID
  16:   
  17:  WHILE @@FETCH_STATUS = 0
  18:  BEGIN
  19:   
  20:      SELECT @vViewName = REPLACE(e.Name,' ','')
  21:           , @vColList  = COALESCE(@vColList + ', [' + a.name + ISNULL('_' + la.name,'') +']', '[' + a.name + ISNULL('_' + la.name,'') +']')
  22:      FROM mdm.tblAttribute a
  23:          INNER JOIN mdm.tblEntity e
  24:              ON e.id = a.entity_id
  25:          LEFT OUTER JOIN mdm.tblAttribute la
  26:              ON a.domainEntity_Id = la.entity_id
  27:              AND la.attributeType_id = 1
  28:              AND la.IsSystem = 1
  29:      WHERE a.entity_id = @vEntityID   
  30:      AND a.attributeType_id <> 3
  31:   
  32:      SET @vSQL = 'SELECT ' + @vColList + ' INTO MDSCompare.dbo.' + @vViewName + ' FROM mdm.' + @vViewName
  33:   
  34:      EXEC (@vSQL)
  35:     
  36:      FETCH NEXT FROM EntityID_Cursor
  37:      INTO @vEntityID
  38:   
  39:      SELECT @vColList = null
  40:          , @vViewName = null
  41:          , @vSQL = null
  42:         
  43:  END
  44:     
  45:  CLOSE EntityID_Cursor
  46:  DEALLOCATE EntityID_Cursor

Then to test that all the entity which are required have been created use the following script:

   1:  DECLARE @vModelID AS INT = (SELECT id FROM mdm.tblModel where Name = '<ModelName,Char,Master Data>')
   2:   
   3:  ;WITH MDSEntity(Entityname)
   4:  AS
   5:  (
   6:      select REPLACE(e.Name,' ','')  Entityname
   7:      from mdm.tblEntity e WHERE model_id= @vModelID
   8:  )
   9:  , CompareTables (TableName)
  10:  AS
  11:  (
  12:      SELECT Table_Name
  13:      FROM MDSCompare.INFORMATION_SCHEMA.TABLES t
  14:      where t.table_schema = 'dbo'
  15:  )
  16:  SELECT *
  17:  FROM MDSEntity e
  18:  LEFT OUTER JOIN CompareTables c
  19:      ON e.Entityname = c.TableName
  20:  WHERE c.TableName IS NULL

Then use the visual studio 2010 schema compare tool against the two instances of the MDSCompare databases to highlight any structural changes which have been made to the entities.

To find data changes use the following script on the MDSCompare databases to create primary keys:

   1:  USE MDSCompare
   2:   
   3:  SELECT 'ALTER TABLE [' + table_schema + '].[' + table_name + '] WITH NOCHECK ADD CONSTRAINT PK_' + table_name+'_Code PRIMARY KEY CLUSTERED (Code)'
   4:  FROM INFORMATION_SCHEMA.COLUMNS
   5:  WHERE column_name = 'code'

Then use the visual studio 2010 data compare tool to highlight any data differences.

While using the above methods to do the comparisons I found some differences which I had to ignore. The reason we had to ignore them was because of the following scenario:

While loading data through the batch staging process we had to set the default name attribute of the entities to be an empty string. This is because it would not allow nulls and we did not want to use this attribute. However the development instance has the default name attribute set with nulls. I believe this was because the MDS deployment tool had converted the empty element of the deployment package, which was created because of the empty string, to null while uploading.

No comments: