Posts

Showing posts from 2011

A Business Intelligence project ALM's

Image
Table of Contents 1   Introduction. 2   Automation process components   2.1 The Build.   2.2 The Deployment script.   2.3 Automated deployment and execution. 3   Visual studio tips.   3.1 Reducing unresolve reference errors.   3.2 Composite Projects. 4   Resources. 1 Introduction As I have mentioned in previous posts I have been working on a data warehouse project. One of my main roles in this project was to work on the build, deployment and development processes. Within this post, I would like to show how I, with help from my colleagues, was able to automate a build, deployment and execute the data warehouse code. The reason for developing this automation was to try and find problems before we released to user acceptance testing (UAT). Once this automation started running it did capture errors. One error in particular, which would only appeared in release, was missing permissions. As the developers a...

Integrating a rules engine with integration services

Image
Introduction As I have mentioned in several previous posts, I have been working on a data warehouse project. One of my main roles within this project was to work with our clients JRules / Java consultants to help integrate IBM JRules, a rules engine, with Microsoft SQL Server 2008 R2: Integration Services (SSIS). The goal of the integration was to have the rules engine as part of the ETL process to help evaluate business rules. The reasons to use a rules engine within the ETL process were as follows: To allow business people to write the business rules for the data warehouse in a language they know: English To give the business people the responsibility for maintaining the rules to ensure that the data warehouse is up to date. To be able to update the rules without having to deploy the entire ETL solution. IBM JRules was used within the project because it was already in use within their organisation. The challenge was how to cross the divide between Java and Microsoft .Net...

Comparing Master Data Services instances

As I have mentioned in previous posts, I've been working on a data warehouse project. Within the project we decided to use SQL Server 2008 R2 Master Data Services (MDS) to store all the warehouse specific reference data. Here are some reasons why MDS was used: Allowed us to manage the reference data. Reference data could be loaded into the warehouse like a source system Would allow the client’s data governance team to easily update the reference data and keep the warehouse up-to-date For active development of entities and data loads we use a sandpit instance. When the ETL was ready to use the new entities or loaded data, a cut of the sandpit instance would be promoted to the development environment. We came across a problem when we needed to identify some changes which were accidentally made on the development instance. I came up with a method which helped to identify the changes fairly easily, below is the method I used. I will say that it’s not a perfect solution and migh...

Choosing the right CDC tool for the job

Introduction I have, as mentioned in a previous blog , been working on a data warehouse project using CDC for extracting the source system data. I would like to share some of the experiences and evaluation criteria used for selecting a CDC tool for our project. The reason for using a specialist tool was as follows: Couldn't move all the source databases to SQL Server 2008 Needed the capture change data to be sent to another server instance A consistent management experience for CDC between all our source systems. The two database vendors the tool needed to support was SQL Server and IBM AS400 Timescales First we underestimated how long it would take to select a CDC tool. We plan for 1 month, for both evaluation criteria and running the tests, and it took us about 2/3 months. The time was taken in evaluating the tools against the criteria and gaining access to database systems from other projects. However our development of the ETL solution was able to continue, while t...

Source system data to warehouse via CDC.

Image
Introduction I have been working on a data warehouse project with a difference. The difference is that the ETL is not doing the classic extract of source system data, instead the source system is going to send its data to the data warehouse by using change data capture (CDC). The decision for using CDC was as follows: To have the ETL only process the data that has changed within the source system between each run of the ETL. This would help the ETL perform as it would only have to process the change data and not work out what the changes were first. Not to have the ETL processes impact the central source system database. The CDC would be responsible for delivering the changes of the source system to another database server. Then the ETL could be run at any time within the day and wouldn't be responsible for blocking the transactional source system. As there was plans to have the ETL run at the end of the day for two time zones e.g. US and UK. I would like to share some o...