Thursday, 6 October 2011

Project: Simple ETL for MS CRM


In many of my projects, we need to create recurring ETL jobs to keep data in MS CRM updated. For example:
  • In a large enterprise system, Accounts records are usually retrieved from a separate system and needs to be synchronised frequently
  • In multi-tenancy solutions, data like Country, Region, State should be easily synchronized across the different MS CRM organizations frpm a master source.


We can use 3rd party tools, such as Scribe and SSIS, or use Messaging Queues.  But since MS CRM comes with a workflow engine and has ways to schedule recurring jobs, I thought that it would be a natural fit to build an ETL platform using the workflow engine. The results probably may not be as good as commercial products, but it should be good enough for simpler tasks.


So here is what I have set out to do in this project:
  • Allow users to create an ETL job that can be run immediately, or at a future date/time, or every x minutes, or every day at HH:mm time, or every week or every month.
  • The ETL job will fetch the data from a Local CSV file or from an SQL query
  • It should allow developers to easily extend the solution to cater for more complex ETL requirements (e.g. some EAI servers expects acknowledgement files to be create at certain locations after the data is consumed, or if the external system is using other protocols like SOAP for retrieving data.)
  • It will allow user to define the field mapping.
  • Based on a certain field in the source data, it should check if the data exist. If it does, the record will be updated. If it doesn't existing, a record will be created.
  • Configuration and monitoring of the job status is self-contained within MS CRM.
  • You do not need a separate server or software to do the ETL

What it doesn't do:
  • This solution is heavily based on the CRM workflow engine and CRM Web Service. It is probably not be as fast or robust as commercial ETL or EAI. 
  • Also, when many records are imported, it may slow down the processing of other system jobs, compared to dedicated servers for ETL.

This project contains several solutions:

1. Recurring Workflow Scheduler Solution
  • This is a stand-alone solution that allows users to creates a recurring jobs that will trigger an On-Demand CRM workflow at regular intervals. 
  • Users can use this solution as a stand-alone solution to trigger their own custom workflows for other purposes, or they can use it to trigger the workflows in the ETL Solutions below.
  • Details on this solution is found in this post here: http://crmjanitor.blogspot.com/2013/08/recurring-workflow-scheduler-solution.html

2. ETL Solutions for Different Import Types
Each ETL Solution contains
  • an On-Demand workflow, which can be triggered by the Recurring Workflow Scheduler Solution above.
  • a Configuration table which contains the settings for each ETL job used by the On-Demand ETL workflow. E.g. the file location and separator type for a CSV file, the connection string for a SQL import, etc.

For this project, we will only create the solution for:
  • a simple CSV file import: (links coming soon)
  • a simple SQL Query import:(links coming soon)