17 Oct 2011

Integrating a rules engine with integration services


As I have mentioned in several previous posts, I have been working on a data warehouse project. One of my main roles within this project was to work with our clients JRules / Java consultants to help integrate IBM JRules, a rules engine, with Microsoft SQL Server 2008 R2: Integration Services (SSIS).
The goal of the integration was to have the rules engine as part of the ETL process to help evaluate business rules. The reasons to use a rules engine within the ETL process were as follows:
  • To allow business people to write the business rules for the data warehouse in a language they know: English
  • To give the business people the responsibility for maintaining the rules to ensure that the data warehouse is up to date.
  • To be able to update the rules without having to deploy the entire ETL solution.
IBM JRules was used within the project because it was already in use within their organisation. The challenge was how to cross the divide between Java and Microsoft .Net.

Options to consider for integrating with the rules engine

Below is a list of options which I and some of my other colleagues thought about on how to integrate JRules and SSIS
  • How the rules engine will access the data.
    • Receiving the data: Should the data be supplied with the request to execute the rule engine.
    • Reading the data: Is the rules engine responsible for reading the data when requested to run.
  • Which processing method
    • Batch: Process a batch of data within the rules engine.
    • Row by Row: Make separate requests to the rules engine row by row.
  • The communication / execution method
    • Using Web Services method, Restful Services method.
    • Directly coding against the rules engine API libraries. For our integration we would have to have used some sort of interoperability classes like: JNBridge, JIntegra.
  • How to apply the results of the rule engine. Does the rule engine apply the results directly or are they returned to the ETL for processing.
  • Error handling
    • Rules engine errors: how to record the errors from the rule engine.
    • Integration errors: how to handle errors caused by a failure of the integration. 

Our approach

Our approach to the integration was to use a batch process model with a SOAP Web method for triggering the rule processing. The rule processing would be responsible for reading, processing and recording the results of the rules execution. The SSIS packages would prepare the batch of data; trigger the rule processing and then process the results. Below is a diagram to illustrate the integration approach:


Web service method
The web service method, which was written by our clients JRules/Java consultants, was defined with the following parameters: the batch identifier; the name of the collection of rules for evaluation.
This method would then start running the rules processing procedures which are as follows:
  • start reading the data
  • execute the rules over each complete source row once fetched
  • store the results back to the database
After the completion of the rule processing the web service method would give a response indicating the success of the rules execution. The web service method was optimise to use multiple threads and to stream read the data. Also the way that the rules are implemented within JRules also helps to optimise the rule processing.

Data Integration
The way that we passed data between SSIS and JRules was to use a couple of tables, which use a name value pair structure. One table, Inbound, was used to store the prepared data for rule processing. The other table, Outbound, to store the results from rule processing which the SSIS would process.
The reasons to use a generic structure are as follows:
  • If the rules parameters changed then only the ETL process would need to be updated
  • The rule integration was easier to code
  • To have the integration only use two tables to help debugging
SSIS Integration
SSIS packages would be responsible for creating the batch data. Afterwards trigger the rules engine processing method and wait for the response. Once the response was received the results would be applied or/and log any errors.
The method I use to call the Web method was to write a .Net control flow component. The main reason I chose to write a component was to have easier code reuse. Also maintenance was easier as I only had to update one component and all packages would pick up the new component.
The approach I took with my custom component was to separate the SSIS integration and the actual call to the web service. Creating a code library that handled the interaction with the web service. This was then wrapped with another code library that handled the SSIS integration.
Taking this approach also allowed me to write a console application to test the interaction with the rules engine using the same calling code as the packages would use. This console application allowed us to debug any integration problems that we came across.

Integration results

Currently we are able to process a batch of 100,000 source rows within a five minute period. This is adequate for a normal day runs where we would only have to process changes for a day. However when there is a requirement to re-run the rules against entire warehouse data the process can take a little while. This is because a number of batches are required to complete the rules processing. There is an improvement currently being investigated, by another team member, to reduce the amount of data sent to the rule engine for processing. The method which is currently being looked at is as follows: Find the unique rows to send to the rules engine for processing and then apply the results back to the required rows.

No comments: