Deploying SSIS projects with Octopus Deploy

In my previous blog post I outlined how to configure TeamCity to build SQL Server Integration Services projects and then package the ispac file into a NuGet package.

In this post I’m going to show how I set up Octopus Deploy to deploy the ispac to SQL Server 2012 and configure SSIS DB projects and environments.

Deployment Overview

Here is an overview of the process as is currently defined in Octopus Deploy.

1-process-overview

We are doing multi-tenanted releases for this project, which is why the Client DB (Step 4) is repeated multiple times. In a future post I’m going to cover off details about how we are doing multi-tenanted releases.

Setting up SSIS DB

The first step in the release process is to ensure that the SSIS DB is up to date for use with our SSIS projects, within this step we ensure that:

  • A project folder exists within SSIS DB for our ispac to be deployed into
  • An environment exists
  • Environment variables are up to date

All of this is achieved with a little bit of PowerShell

$folderName = $OctopusParameters['SSISDB FolderName']
$environmentName = $OctopusParameters['SSISDB EnvironmentName']

$loadStatus = [Reflection.Assembly]::Load("Microsoft.SqlServer.Management.IntegrationServices, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91")

# Store the IntegrationServices Assembly namespace to avoid typing it every time            
$ISNamespace = "Microsoft.SqlServer.Management.IntegrationServices"            
            
Write-Host "Connecting to server ..."            
            
# Create a connection to the server            
$constr = "Data Source=localhost;Initial Catalog=master;Integrated Security=SSPI;"            

$con = New-Object System.Data.SqlClient.SqlConnection $constr

# Create the Integration Services object            
$ssis = New-Object $ISNamespace".IntegrationServices" $con

if ($ssis.Catalogs.Count -eq 0)
{
    Write-Error "SSISDB doesn't exist"
    throw "SSISDB doesn't exist"
}

$cat = $ssis.Catalogs["SSISDB"]

if ($cat.Folders[$folderName] -eq $null)
{
    Write-Host "Creating new folder" $folderName
    $newfolder = New-Object $ISNamespace".CatalogFolder" ($cat, $folderName, "Description")     
    $newfolder.Create()
}

$folder = $cat.Folders[$folderName]

if ($folder.Environments[$environmentName] -eq $null)
{
    Write-Host "Creating environment" $environmentName
    $newEnv = New-Object $ISNamespace".EnvironmentInfo" ($folder, $environmentName, "Descriptoin")
    $newEnv.Create()
}

$env = $folder.Environments[$environmentName]

Write-Host "creating variables"

if ($env.Variables["Local"] -eq $null)
{
    $env.Variables.Add("Local", [System.TypeCode]::String, $OctopusParameters['Client - Local Conn String'], $false, "Our local client db")
}
else
{
    $env.Variables["Local"].Value = $OctopusParameters['Client - Local Conn String']
}

# create other environment variables here...

#save changes to environment
$env.Alter()

(NB: This script and the subsequent one for deploying the ispac are based off this example by Matt Masson)

I’ve tried to write this script to be idempotent so that it can be run multiple times without causing any problems.

Here is what you will see in SQL Management Studio after this step has run:

2-ssisdb-with-folder

Deploying SSIS Projects

The next step is to actually deploy the Integration Services project into the SSIS DB. In this step we use a PowerShell script to:

  • Deploy the ispac into the project folder that was created in the previous step
  • Ensure that the project has a reference to the environment that was created in the previous step
### Static Variables

$localToLocalETLFullPath = ".\DrDoctorClientETL.ispac"
$folderName = $OctopusParameters['SSISDB FolderName']
$environmentName = $OctopusParameters['SSISDB EnvironmentName']
#the projectName needs to match the output name that has been specified on the project, you can see this in the properties in SQL Server Data Tools
$projectName = "DrDoctor Client ETL"

###

$loadStatus = [Reflection.Assembly]::Load("Microsoft.SqlServer.Management.IntegrationServices, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91")

# Store the IntegrationServices Assembly namespace to avoid typing it every time            
$ISNamespace = "Microsoft.SqlServer.Management.IntegrationServices"            
            
Write-Host "Connecting to server ..."            
            
# Create a connection to the server            
$constr = "Data Source=localhost;Initial Catalog=master;Integrated Security=SSPI;"            
            
$con = New-Object System.Data.SqlClient.SqlConnection $constr      
# Create the Integration Services object            
$ssis = New-Object $ISNamespace".IntegrationServices" $con
$cat = $ssis.Catalogs["SSISDB"]
$folder = $cat.Folders[$folderName]

Write-Host "Deploying project ..."            
[byte[]] $projectFile = [System.IO.File]::ReadAllBytes($localToLocalETLFullPath)            
$folder.DeployProject($projectName, $projectFile)

#####

Write-Host "adding reference to environment"

$project = $folder.Projects[$projectName]
$env = $folder.Environments[$environmentName]

if ($project.References.Count -eq 0)
{
    $project.References.Add($env.Name, $folderName)
}
else
{
    Write-Host "assuming that the correct environment has been configured"
}

####

Write-Host "applying environment variables to project"

$targetVar = $env.Variables["Local"]
$sourceVar = $env.Variables["Source"]

Write-Host "Updating Extract package parameters"

$extractPackage = $project.Packages["Extract.dtsx"]

$extractPackage.Parameters["CM.Target.ConnectionString"].Set([Microsoft.SqlServer.Management.IntegrationServices.ParameterInfo+ParameterValueType]::Referenced, $targetVar.Name)
$extractPackage.Parameters["CM.Source.ConnectionString"].Set([Microsoft.SqlServer.Management.IntegrationServices.ParameterInfo+ParameterValueType]::Referenced, $sourceVar.Name)

$project.Alter()

After this step has run the Projects folder will contain the first of our SSIS projects:

3-ssisdb-with-project

Deploying a database project

In a previous blog post I showed how you can fiddle with the dtproj and add a reference to the Octopack NuGet package to have TeamCity turn your database project into a NuGet package for you.

However, since discovering how to use NuSpec files in TeamCity I’d now recommend against changing your dtproj file and instead moving to using a NuSpec to define the package.

Here is a sample NuSpec file that I started using this week

<?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://schemas.microsoft.com/packaging/2010/07/nuspec.xsd">
  <metadata>
    <id>ClientDB</id>
    <version>1.0.0</version>
    <authors>DrDoctor</authors>
    <requireLicenseAcceptance>false</requireLicenseAcceptance>
    <description>Client database</description>
  </metadata>
  <files>
    <file src="bin\Release\ClientDB.dacpac" target="ClientDB.dacpac" /> 
    <file src="bin\Release\Reference.dacpac" target="Reference.dacpac" /> 
  </files> 
</package> 

(You can adapt this as you need for your requirements)

As shown in my previous post on building SSIS projects in TeamCity you will need to add this NuSpec file the list of packages to build within the NuGet Packaging step to have TeamCity build the NuGet package.

Once TeamCity is building the NuGet package, we can add a new Deploy NuGet Package step in our Octopus Deploy project, with the follow PowerShell script as a custom deployment script.

Here is the PowerShell script which will generate an upgrade script using SqlPackage.exe

$dbName = $OctopusParameters['Client Staging DB']
$server = $OctopusParameters['National DB Server']

$releaseNumber = $OctopusParameters['Octopus.Release.Number']

& 'C:\Program Files (x86)\Microsoft SQL Server\110\DAC\bin\SqlPackage.exe' /Action:Script /OutputPath:Upgrade.$dbName.$releaseNumber.sql /SourceFile:DrDoctor_STG.dacpac /TargetServerName:$server /TargetDatabaseName:$dbName /p:IncludeCompositeObjects=true

New-OctopusArtifact -Path "Upgrade.$dbName.$releaseNumber.sql"

This will compare the dacpac to the database, if the database doesn’t exist then the upgrade script will create it, otherwise the upgrade script will contain a bunch of sql commands to upgrade the database to match the dacpac.

The New-Artifact command is a useful Octopus Deploy function that tells the tentecle to grab the file and make it part of the release, so it’s then available from within the Octopus Deploy web ui.

release-artifacts

Alternatively, if you’re game you could configure SqlPackage to automatically upgrade the database by using Action:Publish instead.

In the next blog post I’ll be covering multi-tenanted releases with Octopus Deploy and using the Offline Package Drop feature which was new in 3.0.

Disclaimer: the PowerShell could definitely do with some tidying up, don’t hold that against me 🙂

Advertisements

Building and packaging SQL Server Integration Services projects in TeamCity

At DrDoctor we’ve been using Octopus Deploy for about a year and half now, and in that time we have developed a very robust way of releasing the core of our system. That is, two IIS websites and a bunch of windows services. Release automation has vastly increased the frequency at which we release software into production and in turn the speed at which the business can operate.

The missing part in all of this are the integration projects, when DrDoctor begins working with a new hospital we do a bunch of work integrating their PAS system into our system. This takes the form of a couple of staging databases and a couple of SQL Server Integration Services (SSIS) projects.

Whenever we are discussing changes that need to be made we always screw up our faces at the thought of doing the release. We also have no idea which clients have which version, or when we last did a release.

In this and a few subsequent posts I’m going to outline the various steps that I took to automate our deployments, so we can get on doing the interesting work and let the computers handle the boring deployment work.

Building SSIS Projects in TeamCity

The first step required is to have TeamCity build our Integration Services (SSIS) projects in TeamCity. In reality all it’s really doing is taking all the .dtsx packages that are part of the project and turning them into an ispac file.

The ispac is basically a zip file that contains all the packages and some other metadata which is used when deploying into SQL Server SSISDB.

There are two methods that come up from a Google search on getting TeamCity building SSIS projects, the first involves using a community MSBuild extension and a bunch of other bits, you can read more about it here or here, you can find the MSBuild extension on CodePlex. I tried this method, but didn’t have immediate success.

The second option is to use Visual Studio (SQL Server Data Tools) devenv.exe to build your project, this was already installed on our build server as it is required when building database projects.

To do this in the TeamCity project’s build configuration I added a new build step, which was a Command Line runner:

1-building-ssis-devenv

In the custom script I put in the full path to devenv.exe, our SSIS projects are built in SSDT BI 2012 (Visual Studio 2012 shell) so I’m using that version of devenv.exe on the build server.

Packaging SSIS Projects for deployment

Now that TeamCity is building the project it time to setup TeamCity to package the ispac file into a NuGet package. As mentioned in the intro, we use Octopus Deploy as our deployment system, it in turn uses the NuGet package format as the container for application files (more details).

Setting up TeamCity to automate the creation of NuGet packages turns out to be really easy.

Creating a NuSpec file

The first step is to create a NuSpec file, this is the manifest file that TeamCity will use when building the NuGet package (it tells TeamCity which files to include in the package).

To set this up I added a new file called Local_ETL.nuspec into my project folder with the following content

<?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://schemas.microsoft.com/packaging/2010/07/nuspec.xsd">
  <metadata>
    <id>SSIS.Local_ETL</id>
    <version>1.0.0</version>
    <authors>DrDoctor</authors>
    <requireLicenseAcceptance>false</requireLicenseAcceptance>
    <description>Sync client to local</description>
  </metadata>
  <files>
    <file src="bin\Development\DrDoctor Client ETL.ispac" target="DrDoctorClientETL.ispac" />
  </files>
</package>

This as you can specifies a NuGet package ID, a version and a bunch of files that should be included.

Add build step in TeamCity

The next and final step is to configureTeamCity to create the NuGet package using the NuSpec file.

In TeamCity go into the Build Steps of your project, and click Add build step. From the runner type select NuGet Pack and configure the step with the following:

1-teamcity-nuget-packaging

Make sure that you have ticked “Publish created packages to build artifacts”. This will ensure that the packages are available through the built-in NuGet repository.

In the next post I’ll show how I setup my deployment project in Octopus Deploy to deploy the SSIS ispac to SQL Server SSISDB.

Stay tuned.

More Reasons to Adopt Biml

SSIS packages aren’t backwards compatible – that means if you start building packages targeting SQL Server 2012 and then find out you need to also support and earlier version (like what happened to me this week), you will need to start over. However, if you used Biml to generate your packages you could spin up an instance of SQL Server 2008 in Azure, load your Biml project in BIDS and regenerate your packages targeting a different version.

You can change quickly – I recently found myself in the situation where I had developed an Extract, Transform and Load package and when it came to test them realize that they were wrong. I need to load data after it had been transformed because down stream Transforms depended on data being loaded. Here’s a picture

1-Before

What I had to do was combine the Transform and Load packages into a single one. I could have gone through and copy, pasted, joined up and fixed any problems for each and every one (there were about 30+ transforms), but that would have taken a while and been a very boring task. Instead I made a very minor change to my Biml package and then regenerated the package.

2-After

You can refactor quickly – this is in a similar vein to the point above. In another project I did my development against SQL Server 2012 when it came time to deploy I found out the client site I had to deploy to was still running SQL Server 2008. Which meant I had to say good bye to the Project Deployment model, along with the nice and easy configuration experience you get with 2012 and go back to using XML configuration files – this was a bit of a pain to figure out. Given that I had about 40 packges, the effort of going through each one to enable Package Configurations would have been immense (and boring), instead after I figured out how to do it in Biml all I needed to do was regenerate the packages.

Drag and drop doesn’t scale – it works fine for small ad-hoc packages that you end up throwing away, however I strongly hold the view that if you’re doing any serious development, building packages by hand just isn’t going to scale. This is especially true when all you’re really doing is implementing design patterns. I don’t remember the last time I dragged and dropped something in SSDT BI, and I have been doing a lot of SSIS development recently!

In conclusion and the main reason you should adopt Biml, your time scales better. All the things I have outlined above I could have done by hand, but it would have taken me a while and would have been boring. I probably would have made mistakes along the way too. Instead, by using Biml to create the packages I can concentrate on more important things (like writing this blog post, automating our database deploytment or automating our release process).

BIML – Passing Variables to Child Packages

A project I’ve been working on recently has consisted of lots of little packages which are called from a single coordinating package. I started off developing against SQL Server 2012, and thankfully used BIML to generate all my packages. The main controller package passes a variable down to the sub-packages, which is then subsequently used when executing a stored proc. This all worked nicely until I was told that it needed to work against SQL Server 2008 😦 Thanks to the ease with which BIML allows changes to be made I had it fixed and working again within minutes.

This post will show how to pass variables from a parent package to a child package for both SQL Server 2012 and SQL Server 2008.

SQL Server 2012

This is actually really easy for SQL Server 2012, thanks to the introduction of package parameters. The below BIML will generate two SSIS packages.

Parent – This package has a variable defined as ParentVar, the Execute Package data flow task will pass this variable as the value for the ChildVar parameter defined in the Child package.

Child – This package has a single parameter defined, ChildVar which will have the value filled in by the Parent package.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Packages>
    <Package ConstraintMode="Linear" Name="Parent">
      <Variables>
        <Variable DataType="String" Name="ParentVar">FromParentPackage</Variable>
      </Variables>
      <Tasks>
        <ExecutePackage Name="Execute Child">
          <ParameterBindings>
            <ParameterBinding VariableName="User.ParentVar" Name="ChildParam"></ParameterBinding>
          </ParameterBindings>
          <ExternalProjectPackage Package="Child.dtsx"></ExternalProjectPackage>
        </ExecutePackage>
      </Tasks>
    </Package>
    <Package ConstraintMode="Linear" Name="Child">
      <Parameters>
        <Parameter DataType="String" Name="ChildParam"></Parameter>
      </Parameters>
      <Tasks>
        <Container ConstraintMode="Linear" Name="DoStuff"></Container>
      </Tasks>
    </Package>
  </Packages>
</Biml>

SQL Server 2008

In SQL Server 2008 we don’t have the luxury of package parameters so we have to fall back to Package Configurations. To pass a variable to a child package we need to define a “Parent package variable” configuration, which tells the Child package to expect a given variable from the Parent package and which local variable it should be mapped to.

Once again the following BIML will generate two packages.

Parent – This package has the variable defined and set, as well as an Execute Package data flow task which will execute the Child package

Child – This package has a Package configuration which will map the ParentVar variable to the local ChildVar variable. You can see the Package Configurations by right clicking anywhere in the package control flow and then clicking Package Configurations, this will bring up the following:

Package-configurations

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Packages>
    <Package ConstraintMode="Linear" Name="Parent">
      <Variables>
        <Variable DataType="String" Name="ParentVar">FromParentPackage</Variable>
      </Variables>
      <Tasks>
        <ExecutePackage Name="Execute Child">
          <Package PackageName="Child"></Package>
        </ExecutePackage>
      </Tasks>
    </Package>
    <Package ConstraintMode="Linear" Name="Child">
      <Variables>
        <Variable DataType="String" Name="ChildVar" InheritFromPackageParentConfigurationString="ParentVar"></Variable>
      </Variables>
      <Tasks>
        <Container ConstraintMode="Linear" Name="DoStuff"></Container>
      </Tasks>
    </Package>
  </Packages>
</Biml>

(To use this sample, make sure that the project you generate the packages in is set to the “Package Deployment Model”)

Biml From the Trenches – Convention over Configuration

Biml from what I have observed is still in it’s infancy. Don’t get me wrong, what can actually be done with Biml is impressive, however it seems to me how we are using it and how we are talking about it still needs to mature before it can be fully and effectively embraced by the community at large.

There is a growing number of people in the SQL Server community that are contributing excellent guidance and driving forward its maturity, as well as BimlScript.com which has an ever growing set of examples to learn from. I firmly believe that the future of Biml is bright, if you don’t believe me check out this interview with the creator Scott Currie.

What I would like to see more of, is people talking about implementing design patterns in Biml, let’s have less examples and more theory.

An area that there is still little guidance on is convention based Biml.

Talk to any experienced software developer and they will tell you all about “Convention over Configuration” and how it makes their life easier, but if you’re a BI developer this might be brand new concept to you.

Wikipedia has a great description:

Convention over configuration is a software design paradigm which seeks to decrease the number of decisions that developers need to make, gaining simplicity, but not necessarily losing flexibility.

And

The phrase essentially means a developer only needs to specify unconventional aspects of the application.

In my mind there are a few different types of conventions:

  1. Those that require strict adhered
  2. Those that can be overridden
  3. On/Off by default

In this post I’m going to give two examples of the first type, those I class as “strict adherence required” from a recent ETL project I was working on. I’ll cover the other types in subsequent posts.

NB: I’m deliberately not including any Biml or screenshots of SSIS packages in this post. The reason for this is that there are numerous ways to implement design patterns, so rather than showing you how I do it, I want to introduce the concepts and get you thinking about it.

Use a convention to Extract data

What I want to achieve: I want to move data from a source system into my target database, but only a subset of tables in the source system.

My convention: For each table in the Import schema of my Target database, there is a table with the same name and structure in the source system. Then, for each table in the Import schema generate Data Flow task which will read from the source table and copy the data into the target table.

The advantage: When I need to start importing more data from the source system all I need to do is create a new table in my import schema that matches the source and then regenerate the extract package. No configuration is needed, I don’t need to maintain a list of tables which needed to be imported, the presence of the table in the Import schema is enough. Simple.

Use convention to Transform data

The ETL project that I working on already had a number of queries written which would transform the source system data to match our table structure. We had a number of tables in the Stage schema which the transformed data needed to go before being loaded into the live tables.

What I want to achieve: Transform data from the Import schema tables and load into tables in the Stage schema

My convention: For each table in the Stage schema, there is a corresponding stored procedure with a name like: [Stage table name]_Transform

The advantage: Once again, when I need to start transforming more data, only two things need be done, create new table in Stage schema, create transform stored procedure. Then I just regenerate my SSIS package, no changes required.

What’s the point?

Hopefully you’ve figured it out, but the point of “convention over configuration”, or “coding to a convention” is to remove the number of decisions that you need to make during the development effort. This allows you to spend more time on the hard stuff and let the computer take care of the rest.

BIML Tip – Use LinqPad to write C#

To see any real benefit of using BIML you will need to combine it with C# to generate dynamic packages. The only problem is, unless you fork out $1000’s for MIST then your only alternative is BIDSHelper. This is a great tool, but one of the biggest problems is that you loose IntelliSense for C#. Lately I’ve been using *LinqPad to prototype and test C# code which I then paste into my BIML files.

In this tip I’m going to show you how you can use LinqPad to speed up your BIML Scripting experience.

Just a quick note before we get started: To get the added benefit of IntelliSense in LinqPad you have to fork out $39USD for the premium edition, however as you will see this is definitely a worthwhile expense. Even if you don’t purchase the premium version you can still follow along, you’ll just miss out on the auto-complete. (I have not being paid for this endorsement).

Setting Up LinqPad

You can download LinqPad from http://www.linqpad.net/

Once you have installed and launched LinqPad you will see an empty query window.

1-main-query-window

To be able to use the BIML engine and Varigence extensions we’ll need to add a reference to the required assemblies and their namespaces. To do that, press F4, this will bring up the Query Properties window.

The first thing that you will need to do is add the assembly references. Click the Browse button and locate BimlEngine.dll (for me this is in C:\Program Files (x86)\BIDS Helper 2012). I also added a reference to WindowsBase.dll which you can do by clicking the Add button then searching for that assembly.

2-additional-references

The next thing to do is setup the namespaces, click the “Additional Namespace Imports” tab, and paste the following:


System.Data
System.Data.SqlClient
Varigence.Hadron.CoreLowerer.SchemaManagement
Varigence.Hadron.Extensions.SchemaManagement
Varigence.Languages.Biml
Varigence.Languages.Biml.Connection
Varigence.Languages.Biml.Table
Varigence.Hadron.Extensions

3-additional-namespace-imports

Click OK to close the Query Properties window.

Language Modes in LinqPad

LinqPad gives you a few different options of writing C#.

4-language-modes

 

I’ve found the most useful modes to write C# to be used with BIML is either the “C# Program” or “C# Statements” mode.

The “C# Program” mode is useful when you want to write different methods which can be reused in your BIML. Whereas “C# Statements” mode is useful when you just want to write some C# to directly paste into your BIML, you can’t break this into different methods.

Writing C# – Example 1

In this first example I’ll show you how you could use LinqPad to write some very simple C# code that will use the Varigence extension methods to query the metadata of your database.

This is useful in BIML when you want to generate either a number of different packages to perform over a series of tables, or multiple sequences in a single package over a series of tables.

Set LinqPad to use “C# Statements” mode and paste in the following code:

string connectionstring = "Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;";

var targetConnection = SchemaManager.CreateConnectionNode("SchemaProvider", connectionstring); 

IEnumerable<AstTableNode> sourceTables = targetConnection.ImportDB("dbo", "Dim%", ImportOptions.ExcludeViews).TableNodes; 

sourceTables.ToList().ForEach(x=>x.Name.Dump());

This will simply use the SchemaManager and ImportDB extension methods provided by Varigence to dump the names of all the tables in the AdventureWorksDW2012 dbo schema that start with Dim

5-example1-1

 

 

The next thing to do is make this a bit more useful, paste the following code:


string connectionstring = "Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;";

var targetConnection = SchemaManager.CreateConnectionNode("SchemaProvider", connectionstring);

IEnumerable<AstTableNode> sourceTables = targetConnection.ImportDB("dbo", "Dim%", ImportOptions.ExcludeViews).TableNodes;

foreach (var table in sourceTables)
{
table.Name.Dump();
table.GetColumnList().Dump();
}

This will now loop over all the tables and this time dump the table name and then the column list.

6-example1-2

 

These two snippets demonstrate how to get started using LinqPad.

We can now take the second snippet and plug it into a BIML file:


<!--
<#@ import namespace="System.Data" #>
<#@ import namespace="Varigence.Hadron.CoreLowerer.SchemaManagement" #>
<#@ import namespace="Varigence.Hadron.Extensions.SchemaManagement" #>

<#
	string connectionstring = "Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;";

	var targetConnection = SchemaManager.CreateConnectionNode("SchemaProvider", connectionstring); 

	IEnumerable<AstTableNode> sourceTables = targetConnection.ImportDB("dbo", "Dim%", ImportOptions.ExcludeViews).TableNodes;
#>
-->

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<Connections>
		<OleDbConnection ConnectionString="Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" Name="Source"></OleDbConnection>
		<OleDbConnection ConnectionString="Data Source=.;Initial Catalog=MyDW;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" Name="Target"></OleDbConnection>
	</Connections>
	<Packages>
		<Package ConstraintMode="Linear" Name="Extract">
			<Connections>
				<Connection ConnectionName="Source"></Connection>
				<Connection ConnectionName="Target"></Connection>
			</Connections>
			<Tasks>
			<!--
			<# foreach(var table in sourceTables)
			{
			#>
			-->
			<ExecuteSQL ConnectionName="Target" Name="Truncate Import Table <#=table.Name#>">
				<DirectInput>
					TRUNCATE TABLE Import.<#=table.Name#>
				</DirectInput>
			</ExecuteSQL>
				<Dataflow Name="Copy data into Source <#=table.Name#>">
					<Transformations>
						<OleDbSource ConnectionName="Source" Name="Source - <#=table.Name#>">
							<DirectInput>
								SELECT
									<#=table.GetColumnList()#>
								FROM
									<#=table.Name#>
							</DirectInput>
						</OleDbSource>
						<OleDbDestination ConnectionName="Target" Name="Target - <#=table.Name#>">
							<ExternalTableOutput Table="Import.<#=table.Name#>"></ExternalTableOutput>
						</OleDbDestination>
					</Transformations>
				</Dataflow>
			<!--
			<#
			}
			#>
			-->
			</Tasks>
		</Package>
	</Packages>
</Biml>

This BIML script will then generate a package which will use the AdventureWorksDW2012 database as a source database and copy all the data from each of the Dim tables into an equivalent table in the Import schema of the MyDW database.

Writing C# – Example 2

In this example I will show you how you could use LinqPad to write a reusable methods which can then be in your BIML files.

The example that I will use is the GetSqlServerExtendedProperties method which was provided by Dave Stein (b | t), it simply queries the extended properties on a given table.

Set LinqPad to use “C# Program” mode and paste in the following code:


AstOleDbConnectionNode targetConnection;

void Main()
{
	targetConnection = SchemaManager.CreateConnectionNode("SchemaProvider", "Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"); 

	var sourceTables = targetConnection.ImportDB("dbo", "Dim%", ImportOptions.ExcludeViews).TableNodes; 

	foreach (var element in sourceTables)
	{
		GetSqlServerExtendedProperties(element);
	}

	foreach (var element in sourceTables)
	{
		var skip = element.Annotations.FirstOrDefault (a => a.Tag.Equals("ETL_Should_Skip"));

		if (annot.Text.Equals("1"))
		{
			//skip the table as it is marked to skip
			continue;
		}
		else
		{

		}
	}
}

AstNode GetSqlServerExtendedProperties(AstTableNode table)
{
	AstNode returnValue = (AstNode)table;

	var query = string.Format("SELECT name, value FROM fn_listextendedproperty(NULL, 'schema', '{0}', 'table', '{1}', NULL, NULL)", table.SchemaName, table.Name);

	var extendedProperties = ExternalDataAccess.GetDataTable(targetConnection.ConnectionString, query);

	foreach (DataRow ep in extendedProperties.Rows)
	{
		returnValue.AddAnnotation(AnnotationType.Tag, ep["value"].ToString(), ep["name"].ToString());
	}

	return returnValue;
}

This snippet will query the extended properties of all the tables in the dbo schema that starts with Dim and then loop through each of them. If it finds one called ETL_Should_Skip (lines 16 and 18) then it will skip it and go to the next one in the sequence.

This could be useful for example if you wanted to control which tables should go into your package without having this hard coded, or having to add a “control table” to your database.

The BIML would look like this:

<!--
<#@ import namespace="System.Data" #>
<#@ import namespace="Varigence.Hadron.CoreLowerer.SchemaManagement" #>
<#@ import namespace="Varigence.Hadron.Extensions.SchemaManagement" #>

<#+
AstOleDbConnectionNode conn;
#>

<#
	string connectionstring = "Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;";

	conn = SchemaManager.CreateConnectionNode("SchemaProvider", connectionstring); 

	IEnumerable<AstTableNode> sourceTables = conn.ImportDB("dbo", "Dim%", ImportOptions.ExcludeViews).TableNodes;
#>
-->

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<Connections>
		<OleDbConnection ConnectionString="Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" Name="Source"></OleDbConnection>
		<OleDbConnection ConnectionString="Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" Name="Target"></OleDbConnection>
	</Connections>
	<Packages>
		<Package ConstraintMode="Linear" Name="Extract">
			<Connections>
				<Connection ConnectionName="Source"></Connection>
				<Connection ConnectionName="Target"></Connection>
			</Connections>
			<Tasks>
			<!--
			<# 

			foreach (var element in sourceTables)
			{
				GetSqlServerExtendedProperties(element);
			}

			foreach(var table in sourceTables)
			{
				var skip = table.Annotations.FirstOrDefault (a => a.Tag.Equals("ETL_Should_Skip"));

				if (skip != null && skip.Text.Equals("1"))
				{
					//skip the table as it is marked to skip
					continue;
				}

			#>
			-->
			<ExecuteSQL ConnectionName="Target" Name="Truncate Import Table <#=table.Name#>">
				<DirectInput>
					TRUNCATE TABLE dbo.<#=table.Name#>
				</DirectInput>
			</ExecuteSQL>
			<Dataflow Name="Copy data into Source <#=table.Name#>">
				<Transformations>
					<OleDbSource ConnectionName="Source" Name="Source -"
						<#=table.Name#>">
						<DirectInput>
							SELECT
							<#=table.GetColumnList()#>
							FROM
							<#=table.Name#>
						</DirectInput>
					</OleDbSource>
					<OleDbDestination ConnectionName="Target" Name="Target -"
						<#=table.Name#>">
						<ExternalTableOutput Table="Import."
							<#=table.Name#>">
						</ExternalTableOutput>
					</OleDbDestination>
				</Transformations>
			</Dataflow>
			<!--
			<#
			}
			#>
			-->
			</Tasks>
		</Package>
	</Packages>
</Biml>

<!--
<#+
AstNode GetSqlServerExtendedProperties(AstTableNode table)
{
	AstNode returnValue = (AstNode)table;

	var query = string.Format("SELECT name, value FROM fn_listextendedproperty(NULL, 'schema', '{0}', 'table', '{1}', NULL, NULL)", table.SchemaName, table.Name);

	var extendedProperties = ExternalDataAccess.GetDataTable(conn.ConnectionString, query);

	foreach (DataRow ep in extendedProperties.Rows)
	{
		returnValue.AddAnnotation(AnnotationType.Tag, ep["value"].ToString(), ep["name"].ToString());
	}

	return returnValue;
}
#>
-->

Conclusion

If you find yourself writing lots of C# code which you’re using within BIML then it is definitely work trying out LinqPad to see if it makes your life easier.

 

BIML – Parameter Binding with SSIS 2012

For any SSIS project that contains more than a few packages chances are you have a master (or coordinator) package that controls execution of various packages within your project. In this post I’ll show you how to use Package Parameters with BIML.

When I first started searching around on how to do this, I can across this sample on the BIMLScript.com site: http://bimlscript.com/Snippet/Details/73

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
 <Packages>
   <Package Name="PackageParameters" ConstraintMode="Parallel">
     <Parameters>
       <Parameter Name="PackageParamStr" DataType="String">Test</Parameter>
     </Parameters>
     <Variables>
       <Variable Name="TestPackageParam" DataType="String" EvaluateAsExpression="true">@[$Package::PackageParamStr]</Variable>
     </Variables>
   </Package>
 </Packages>
</Biml>

In this sample the value of the parameter is being assigned to a package variable, which is then used within the package. Unfortunately I couldn’t get this working so had to keep looking for another solution. Trying to use the parameter directly proved to be an exercise in experimentation. After a few different attempts it turned out to be very simple, here is an example of using a Package Parameter with an OLE DB Source:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
 <Packages>
  <Package Name="PackageParameters" ConstraintMode="Parallel">
   <Parameters>
    <Parameter Name="PackageParamStr" DataType="String">Test</Parameter>
   </Parameters>
   <Tasks>
    <Dataflow Name="DFT - Move some data around">
     <Transformations>
      <OleDbSource ConnectionName="Source" Name="SRC - Client">
       <DirectInput>EXEC ETL.GetSomeData @sp_param1 = ?</DirectInput>
       <Parameters>
        <Parameter VariableName="PackageParamStr" Name="@sp_param1" />
       </Parameters>
      </OleDbSource>
     </Transformations>
    </Dataflow>
   </Tasks>
  </Package>
 </Packages>
</Biml>

Line 13 shows how to use the package parameter as a parameter for the stored procedure being called by the OLE DB Source.

Building an SSIS Testing Framework

If you compared the Google results for “testing c#” and “testing SSIS” you would quickly realize that testability isn’t one of the strengths of SSIS. I’ve seen a few different frameworks which were either too complicated or didn’t really work, in this post I’m going to give you my take on building an SSIS Testing Framework.

The Goal

The overall goal of what I wanted to achieve from my test framework:

  • Simplicity – writing tests needs to be as simple and easy as possible
  • The ability to load sample data into database tables
  • The ability to run a “setup” sql script
  • The ability to run an assert script to determine if the test has passed/failed
  • Auditing/history of previous test runs

My Solution

I didn’t initially set out to build a testing framework, it just evolved into one over the course of the couple of days I was working on it. I started off by manually testing my SSIS packages and realized I was following the same steps over and over again:

  1. Load sample data
  2. Run SSIS Package
  3. Run some scripts to check if it worked

I could see that this was going to get repetitive so started to think about how to automate it. After a few iterations this is what I’ve ended up with (at time of writing)

1-TestController-package

The TestController package follows the well trodden AAA (assemble, act, assert) approach that is found among the TDD (test driven development) practitioners in the software development industry.

This package does the following:

  1. Gets a list of tests to execute
  2. Then in the foreach loop does the following for each of the test cases:

Assemble:

    1. A script task is used to get a bunch of test parameters from a SQL table, then updates the associated variables in the TestController package, at the moment these are only used to control which tables have data loaded
    2. A pre-data load setup SQL script is then executed
    3. Load Sample Data will then load any sample data for this test run
    4. A post data load setup SQL script is then executed

Act:

  1. Execute the package under test (PUT)

Assert:

  1. Run an assert SQL script

 The Downsides

The solution isn’t perfect, here are a few of the problems that it has at the moment:

  • Unable to pass package parameters to the package under test in the Act phase
  • Makes assumptions about how you want to do your setup routine
  • Is purely focused on packages that touch the database
  • Writing assertions isn’t as easy as I would like, it involves writing lots of T-SQL and copy/pasting code throughout the different assert scripts
  • All your databases need to be on the same server
  • Maintaining the variables and and connection managers is painful

Some of these I’m not terribly bothered about, for example having to have all of your databases on a single server I think is acceptable as this is for testing purposes. Whereas the duplication of assertion code could be easily fixed by creating some reusable stored procedures (e.g. Assert.TableEqual, Assert.TableEmpty etc.) similar to those offered by tSQLt.

One of the more painful areas of the framework is setting up tables to have sample data loaded, the process involves adding variables, setting up connection managers adding components to the data flow task and updating the script component. One way of fixing this issue that I thought of was generating the TestController package with BIML and have all of this automatically generated.

Wrap Up

As a starting point I’m really happy with how the TestController is working at the moment. During the development of it and the use afterwards I picked up a few bugs with the packages that I wanted to test, so it is paying for itself already 🙂

If you’re interested in trying it out for yourself then leave a comment and I’ll put it up on GitHub.

Debugging Custom SSIS Components

In the past I have written extensively about how to build custom components for SQL Server Integration Services, these posts have always been focused on the ‘happy path’, if you’re not familiar with this phrase it refers to the path through your application that works exactly as expected. Often times in development we have to deal with the sad path, or when things aren’t working as we would like or expect.

In this post I will show two techniques that can be used to help with debugging custom SSIS components.

Attaching the Visual Studio Debugger

You will need to have Visual Studio open with the source code for the custom component and have SSDT open with the package that uses the custom component.

From Visual Studio select the Debug menu then Attach to Process

4-debug-menu

This will bring up the following window showing a list of available processes for debugging:

5-attach-debugger-to

In the list of available processes select the instance of devenv.exe that corresponds to your SSIS solution, then click Attach. Once attached any breakpoints that you define will be hit, from where you can debug your component.

This method works just fine for when you want to debug the setup and configuration of your package, that is, anything the users of your component will be doing prior to executing the package.

The disadvantage of this method is that once you actually start executing the package the debugger will detach itself. The reason for this is because execution of the package happens within a different process. Continue reading for a better method.

Launching the Debugger from in Code

Another way to automatically start debugging is to add the following line of code:

Debugger.Launch();

Into the particular method that you want to debug or into the constructor.

1-adding-debugger-launch

Here you can see I’ve added the Debugger.Launch() code into the constructor for my HDFS Task Component, Now when I open a package in SSDT that uses this component, or if I drag a new instance of the task into an existing package the following window will appear:

2-jit-debugger

Here I can select how I want to debug the application. During development of custom components I always have an instance of visual studio open with my component solution and you can see it listed as the third option in the image above. By selecting this option and clicking Yes I will be taken back to that instance of Visual Studio which will then attach itself to my instance of SSDT. It will then break execution.

3-back-in-visual-studio

At this point I can set any breakpoints where I want to break future execution. From here press F5 to continue. Visual Studio will remain attached to SSDT and any breakpoints you’ve defined will interrupt execution and bring you back here.

The disadvantage of this method is that the “Do you want to debug this application” window will keep appearing at various points during package design unless you attach the debugger.

The advantage of using this method is that once you actually start executing your package you will again be prompted to attach the debugger, which means you will be able to debug the component during package execution.

Make sure you either remove this line of code before shipping it for actual use or wrap it around the #if DEBUG / #endif compiler directives as shown above and remember to use the Release configuration for production use!

Automate your SalesForce SSIS Staging Packages with C#

This week I’ve been making some changes in one of my data warehouse projects, with the aim of making it a long lived and easy to maintain solution.

One of the big changes that I made was the way I was staging the data from my different sources, and this meant re-working a whole bunch of SSIS packages. Specifically there were a bunch that pull data from SalesForce which needed to be cleaned up.

So I got thinking, “all of these staging tables are loaded the same way, surely I should be able to automate this”.

In this post I’m going to show you a tool that I wrote in C# which connects to SalesForce, downloads metadata on all your objects, then generates staging tables and finally SSIS packages to load these tables.

The Manual Process

As I was re-working my packages, I was also recreating a lot of the staging tables. This was a very tedious process, as I had to go back and forth looking at the mapping of SalesForce types to their counterpart in SQL Server.

After creating the staging tables, the process of loading them from SalesForce is always the same:

  1. Truncate staging table
  2. Populate staging table with records from SalesForce (for a subset of column)

Down the line I have plans to extend this, so I’m only loading changed objects from SalesForce, which would result in the following process:

  1. Truncate staging table
  2. Lookup (from a meta data store) the last date loaded
  3. Populate staging table with records from SalesForce with CreatedDate or LastModified after the last date loaded
  4. Update the last data loaded value

The Solution

I designed a simple application in C#, this is what it looks like:

1-the-appThe example shown here is looking at the Opportunity object with a number of columns selected.

Clicking Generate Table DDL will pop up the following:

2-table-ddlAnd clicking Yes will then create the object in your database.

Clicking Generate Package will create the SSIS package which follows the above design pattern:

3-package-control-flow 4-package-data-flow

(I’m using the SalesForce Source from PragmaticWorks)

When it comes to extending the design pattern to only load new and/or changed objects all I will need to do is update my package generator and re-create the packages. Saving me the hassle of going through each package and making the change.

Future Plans

The application needs a bit of a tidy up before I can make it available for download, it also uses the PragmaticWorks assemblies for SSIS so I need to find out if I can bundle them for download.

It was an interesting exercise building this application so I’m planning on writing a few posts on how to programmatically build SSIS packages with C#.