17 Oct 2011

A Business Intelligence project ALM's

Table of Contents

1   Introduction.
2   Automation process components
  2.1 The Build.
  2.2 The Deployment script.
  2.3 Automated deployment and execution.
3   Visual studio tips.
  3.1 Reducing unresolve reference errors.
  3.2 Composite Projects.
4   Resources.

1 Introduction

As I have mentioned in previous posts I have been working on a data warehouse project. One of my main roles in this project was to work on the build, deployment and development processes. Within this post, I would like to show how I, with help from my colleagues, was able to automate a build, deployment and execute the data warehouse code.

The reason for developing this automation was to try and find problems before we released to user acceptance testing (UAT). Once this automation started running it did capture errors. One error in particular, which would only appeared in release, was missing permissions. As the developers and the testing team all have different elevated rights, so that they can do their jobs, masked the problem ever existed.

2 Automation process components

A child with building blocks

2.1 The Build

To develop our data warehouse solution the project used the Microsoft development tools: Team Foundation Server (TFS) 2010 and Visual Studio (VS) 2010 / 2008. TFS 2010 was used as a source control repository and automated build platform (Team build). VS 2010 was used to develop our database code and VS 2008 - Business Intelligence Studio (BIDS) for creating the integration packages and analysis services cubes.

As we were using the above tools I was able to create, using my company's IP, a build that would drop deployable artefacts. This was then wrapped into a TFS build which would be triggered by a number of check-ins. This gave us our continuous build for the project which was helpful to find problems like: incompatible code between team members; incomplete check-ins missing files from the change set.

However, if a build was successful it didn't mean that the build was deployable. An example of this is an unresolved reference warning which shouldn’t have been a warning but an error. This happened to us, one of our builds which we tried, had several unresolved reference warnings, we tried to deploy but failed because a view referenced to a column within a table was invalid.

A parachutist

2.2 The Deployment script

We used a PowerShell script to manage the process of deploying, in the correct order, our databases, SSIS packages and analysis cubes. The script, which my colleagues had created, used a configuration file to list the artefacts for deploying with how and where. Also the script interaction used a text menu interface, which was created from the following location: http://mspowershell.blogspot.com/2009/02/cli-menu-in-powershell.html

This was helpful in reducing the work that we had to do for writing a release process. As we only had to write was: select menu option blah, select menu option blah etc. The script help reduce deployment errors from incorrect commands typed in or artefacts deployed in the wrong order.

A running nan

2.3 Automated deployment and execution

To have the automated build, deployment and execution process I reuse the build and deployment components. The build was refactored and reused within a scheduled trigger for out of office hours. The deployment script was extended to include support being called by automation tool. The automation tool was called TFSDeployer. TFSDeployer is a service which can be configured to listen for certain build events and then run a powershell script when captured. For more details about the application, please use the following link: http://tfsdeployer.codeplex.com/.

The script, which TFSDeployer would execute, when the out of office build completed, was configured to do the following steps:

  • Get the TFS build details.
  • Copy the contents of the build drop location to a deployment area.
  • Deploy the data warehouse solution using the deployment script
  • If the deployment failed then update the TFS build to failed and the status to broken
  • If the deployment was successful then execute a SQL Server job; which was configured to run the ETL.
  • If the SQL Server job reported failure then the TFS build was set as failed and the status to broken
  • If the SQL job reported success then the TFS build was set as successful and the status set for ready for testing

This then gave us the ability to test our builds and as mentioned before allowed us to find many problems before we went into UAT. This also gave us a platform to perform and automate regression testing.

3 Visual studio tips

3.1 Reducing unresolve reference warnings

Here are some tips on how we managed to reduce some of our unresolved reference warnings.

  • Converting some of derived tables to use common table expression with column definition syntax: WITH <CTEName>(Col1, Col2, Col3) AS
  • Adding a server and database variables for references which were going to go to be referred to via a link server. Within the code also wrapped the reference variables within square brackets.
  • Include objects which were created by code for example: select into tables and indexes. With indexes we also took another approach, where possible, which was to disable and enable them.

3.2 Composite Projects

During one of our early development cycles we came across a circular reference problem. This caused a problem at deployment time. I had to deploy the two databases about three times each to get a complete deployment.

The method I used to reduce a circular reference to happen was to use composite projects. The way I chose to split databases into composite projects was to put the storage objects like tables into one project. Then created another project for the code objects like views and referenced the storage project. The idea for this was based on some information I read from: http://vsdatabaseguide.codeplex.com/

4 Resources

Integrating a rules engine with integration services


As I have mentioned in several previous posts, I have been working on a data warehouse project. One of my main roles within this project was to work with our clients JRules / Java consultants to help integrate IBM JRules, a rules engine, with Microsoft SQL Server 2008 R2: Integration Services (SSIS).
The goal of the integration was to have the rules engine as part of the ETL process to help evaluate business rules. The reasons to use a rules engine within the ETL process were as follows:
  • To allow business people to write the business rules for the data warehouse in a language they know: English
  • To give the business people the responsibility for maintaining the rules to ensure that the data warehouse is up to date.
  • To be able to update the rules without having to deploy the entire ETL solution.
IBM JRules was used within the project because it was already in use within their organisation. The challenge was how to cross the divide between Java and Microsoft .Net.

Options to consider for integrating with the rules engine

Below is a list of options which I and some of my other colleagues thought about on how to integrate JRules and SSIS
  • How the rules engine will access the data.
    • Receiving the data: Should the data be supplied with the request to execute the rule engine.
    • Reading the data: Is the rules engine responsible for reading the data when requested to run.
  • Which processing method
    • Batch: Process a batch of data within the rules engine.
    • Row by Row: Make separate requests to the rules engine row by row.
  • The communication / execution method
    • Using Web Services method, Restful Services method.
    • Directly coding against the rules engine API libraries. For our integration we would have to have used some sort of interoperability classes like: JNBridge, JIntegra.
  • How to apply the results of the rule engine. Does the rule engine apply the results directly or are they returned to the ETL for processing.
  • Error handling
    • Rules engine errors: how to record the errors from the rule engine.
    • Integration errors: how to handle errors caused by a failure of the integration. 

Our approach

Our approach to the integration was to use a batch process model with a SOAP Web method for triggering the rule processing. The rule processing would be responsible for reading, processing and recording the results of the rules execution. The SSIS packages would prepare the batch of data; trigger the rule processing and then process the results. Below is a diagram to illustrate the integration approach:


Web service method
The web service method, which was written by our clients JRules/Java consultants, was defined with the following parameters: the batch identifier; the name of the collection of rules for evaluation.
This method would then start running the rules processing procedures which are as follows:
  • start reading the data
  • execute the rules over each complete source row once fetched
  • store the results back to the database
After the completion of the rule processing the web service method would give a response indicating the success of the rules execution. The web service method was optimise to use multiple threads and to stream read the data. Also the way that the rules are implemented within JRules also helps to optimise the rule processing.

Data Integration
The way that we passed data between SSIS and JRules was to use a couple of tables, which use a name value pair structure. One table, Inbound, was used to store the prepared data for rule processing. The other table, Outbound, to store the results from rule processing which the SSIS would process.
The reasons to use a generic structure are as follows:
  • If the rules parameters changed then only the ETL process would need to be updated
  • The rule integration was easier to code
  • To have the integration only use two tables to help debugging
SSIS Integration
SSIS packages would be responsible for creating the batch data. Afterwards trigger the rules engine processing method and wait for the response. Once the response was received the results would be applied or/and log any errors.
The method I use to call the Web method was to write a .Net control flow component. The main reason I chose to write a component was to have easier code reuse. Also maintenance was easier as I only had to update one component and all packages would pick up the new component.
The approach I took with my custom component was to separate the SSIS integration and the actual call to the web service. Creating a code library that handled the interaction with the web service. This was then wrapped with another code library that handled the SSIS integration.
Taking this approach also allowed me to write a console application to test the interaction with the rules engine using the same calling code as the packages would use. This console application allowed us to debug any integration problems that we came across.

Integration results

Currently we are able to process a batch of 100,000 source rows within a five minute period. This is adequate for a normal day runs where we would only have to process changes for a day. However when there is a requirement to re-run the rules against entire warehouse data the process can take a little while. This is because a number of batches are required to complete the rules processing. There is an improvement currently being investigated, by another team member, to reduce the amount of data sent to the rule engine for processing. The method which is currently being looked at is as follows: Find the unique rows to send to the rules engine for processing and then apply the results back to the required rows.