BIML Tip – Use LinqPad to write C#

To see any real benefit of using BIML you will need to combine it with C# to generate dynamic packages. The only problem is, unless you fork out $1000’s for MIST then your only alternative is BIDSHelper. This is a great tool, but one of the biggest problems is that you loose IntelliSense for C#. Lately I’ve been using *LinqPad to prototype and test C# code which I then paste into my BIML files.

In this tip I’m going to show you how you can use LinqPad to speed up your BIML Scripting experience.

Just a quick note before we get started: To get the added benefit of IntelliSense in LinqPad you have to fork out $39USD for the premium edition, however as you will see this is definitely a worthwhile expense. Even if you don’t purchase the premium version you can still follow along, you’ll just miss out on the auto-complete. (I have not being paid for this endorsement).

Setting Up LinqPad

You can download LinqPad from http://www.linqpad.net/

Once you have installed and launched LinqPad you will see an empty query window.

1-main-query-window

To be able to use the BIML engine and Varigence extensions we’ll need to add a reference to the required assemblies and their namespaces. To do that, press F4, this will bring up the Query Properties window.

The first thing that you will need to do is add the assembly references. Click the Browse button and locate BimlEngine.dll (for me this is in C:\Program Files (x86)\BIDS Helper 2012). I also added a reference to WindowsBase.dll which you can do by clicking the Add button then searching for that assembly.

2-additional-references

The next thing to do is setup the namespaces, click the “Additional Namespace Imports” tab, and paste the following:


System.Data
System.Data.SqlClient
Varigence.Hadron.CoreLowerer.SchemaManagement
Varigence.Hadron.Extensions.SchemaManagement
Varigence.Languages.Biml
Varigence.Languages.Biml.Connection
Varigence.Languages.Biml.Table
Varigence.Hadron.Extensions

3-additional-namespace-imports

Click OK to close the Query Properties window.

Language Modes in LinqPad

LinqPad gives you a few different options of writing C#.

4-language-modes

 

I’ve found the most useful modes to write C# to be used with BIML is either the “C# Program” or “C# Statements” mode.

The “C# Program” mode is useful when you want to write different methods which can be reused in your BIML. Whereas “C# Statements” mode is useful when you just want to write some C# to directly paste into your BIML, you can’t break this into different methods.

Writing C# – Example 1

In this first example I’ll show you how you could use LinqPad to write some very simple C# code that will use the Varigence extension methods to query the metadata of your database.

This is useful in BIML when you want to generate either a number of different packages to perform over a series of tables, or multiple sequences in a single package over a series of tables.

Set LinqPad to use “C# Statements” mode and paste in the following code:

string connectionstring = "Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;";

var targetConnection = SchemaManager.CreateConnectionNode("SchemaProvider", connectionstring); 

IEnumerable<AstTableNode> sourceTables = targetConnection.ImportDB("dbo", "Dim%", ImportOptions.ExcludeViews).TableNodes; 

sourceTables.ToList().ForEach(x=>x.Name.Dump());

This will simply use the SchemaManager and ImportDB extension methods provided by Varigence to dump the names of all the tables in the AdventureWorksDW2012 dbo schema that start with Dim

5-example1-1

 

 

The next thing to do is make this a bit more useful, paste the following code:


string connectionstring = "Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;";

var targetConnection = SchemaManager.CreateConnectionNode("SchemaProvider", connectionstring);

IEnumerable<AstTableNode> sourceTables = targetConnection.ImportDB("dbo", "Dim%", ImportOptions.ExcludeViews).TableNodes;

foreach (var table in sourceTables)
{
table.Name.Dump();
table.GetColumnList().Dump();
}

This will now loop over all the tables and this time dump the table name and then the column list.

6-example1-2

 

These two snippets demonstrate how to get started using LinqPad.

We can now take the second snippet and plug it into a BIML file:


<!--
<#@ import namespace="System.Data" #>
<#@ import namespace="Varigence.Hadron.CoreLowerer.SchemaManagement" #>
<#@ import namespace="Varigence.Hadron.Extensions.SchemaManagement" #>

<#
	string connectionstring = "Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;";

	var targetConnection = SchemaManager.CreateConnectionNode("SchemaProvider", connectionstring); 

	IEnumerable<AstTableNode> sourceTables = targetConnection.ImportDB("dbo", "Dim%", ImportOptions.ExcludeViews).TableNodes;
#>
-->

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<Connections>
		<OleDbConnection ConnectionString="Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" Name="Source"></OleDbConnection>
		<OleDbConnection ConnectionString="Data Source=.;Initial Catalog=MyDW;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" Name="Target"></OleDbConnection>
	</Connections>
	<Packages>
		<Package ConstraintMode="Linear" Name="Extract">
			<Connections>
				<Connection ConnectionName="Source"></Connection>
				<Connection ConnectionName="Target"></Connection>
			</Connections>
			<Tasks>
			<!--
			<# foreach(var table in sourceTables)
			{
			#>
			-->
			<ExecuteSQL ConnectionName="Target" Name="Truncate Import Table <#=table.Name#>">
				<DirectInput>
					TRUNCATE TABLE Import.<#=table.Name#>
				</DirectInput>
			</ExecuteSQL>
				<Dataflow Name="Copy data into Source <#=table.Name#>">
					<Transformations>
						<OleDbSource ConnectionName="Source" Name="Source - <#=table.Name#>">
							<DirectInput>
								SELECT
									<#=table.GetColumnList()#>
								FROM
									<#=table.Name#>
							</DirectInput>
						</OleDbSource>
						<OleDbDestination ConnectionName="Target" Name="Target - <#=table.Name#>">
							<ExternalTableOutput Table="Import.<#=table.Name#>"></ExternalTableOutput>
						</OleDbDestination>
					</Transformations>
				</Dataflow>
			<!--
			<#
			}
			#>
			-->
			</Tasks>
		</Package>
	</Packages>
</Biml>

This BIML script will then generate a package which will use the AdventureWorksDW2012 database as a source database and copy all the data from each of the Dim tables into an equivalent table in the Import schema of the MyDW database.

Writing C# – Example 2

In this example I will show you how you could use LinqPad to write a reusable methods which can then be in your BIML files.

The example that I will use is the GetSqlServerExtendedProperties method which was provided by Dave Stein (b | t), it simply queries the extended properties on a given table.

Set LinqPad to use “C# Program” mode and paste in the following code:


AstOleDbConnectionNode targetConnection;

void Main()
{
	targetConnection = SchemaManager.CreateConnectionNode("SchemaProvider", "Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"); 

	var sourceTables = targetConnection.ImportDB("dbo", "Dim%", ImportOptions.ExcludeViews).TableNodes; 

	foreach (var element in sourceTables)
	{
		GetSqlServerExtendedProperties(element);
	}

	foreach (var element in sourceTables)
	{
		var skip = element.Annotations.FirstOrDefault (a => a.Tag.Equals("ETL_Should_Skip"));

		if (annot.Text.Equals("1"))
		{
			//skip the table as it is marked to skip
			continue;
		}
		else
		{

		}
	}
}

AstNode GetSqlServerExtendedProperties(AstTableNode table)
{
	AstNode returnValue = (AstNode)table;

	var query = string.Format("SELECT name, value FROM fn_listextendedproperty(NULL, 'schema', '{0}', 'table', '{1}', NULL, NULL)", table.SchemaName, table.Name);

	var extendedProperties = ExternalDataAccess.GetDataTable(targetConnection.ConnectionString, query);

	foreach (DataRow ep in extendedProperties.Rows)
	{
		returnValue.AddAnnotation(AnnotationType.Tag, ep["value"].ToString(), ep["name"].ToString());
	}

	return returnValue;
}

This snippet will query the extended properties of all the tables in the dbo schema that starts with Dim and then loop through each of them. If it finds one called ETL_Should_Skip (lines 16 and 18) then it will skip it and go to the next one in the sequence.

This could be useful for example if you wanted to control which tables should go into your package without having this hard coded, or having to add a “control table” to your database.

The BIML would look like this:

<!--
<#@ import namespace="System.Data" #>
<#@ import namespace="Varigence.Hadron.CoreLowerer.SchemaManagement" #>
<#@ import namespace="Varigence.Hadron.Extensions.SchemaManagement" #>

<#+
AstOleDbConnectionNode conn;
#>

<#
	string connectionstring = "Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;";

	conn = SchemaManager.CreateConnectionNode("SchemaProvider", connectionstring); 

	IEnumerable<AstTableNode> sourceTables = conn.ImportDB("dbo", "Dim%", ImportOptions.ExcludeViews).TableNodes;
#>
-->

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<Connections>
		<OleDbConnection ConnectionString="Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" Name="Source"></OleDbConnection>
		<OleDbConnection ConnectionString="Data Source=.;Initial Catalog=AdventureWorksDW2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;" Name="Target"></OleDbConnection>
	</Connections>
	<Packages>
		<Package ConstraintMode="Linear" Name="Extract">
			<Connections>
				<Connection ConnectionName="Source"></Connection>
				<Connection ConnectionName="Target"></Connection>
			</Connections>
			<Tasks>
			<!--
			<# 

			foreach (var element in sourceTables)
			{
				GetSqlServerExtendedProperties(element);
			}

			foreach(var table in sourceTables)
			{
				var skip = table.Annotations.FirstOrDefault (a => a.Tag.Equals("ETL_Should_Skip"));

				if (skip != null && skip.Text.Equals("1"))
				{
					//skip the table as it is marked to skip
					continue;
				}

			#>
			-->
			<ExecuteSQL ConnectionName="Target" Name="Truncate Import Table <#=table.Name#>">
				<DirectInput>
					TRUNCATE TABLE dbo.<#=table.Name#>
				</DirectInput>
			</ExecuteSQL>
			<Dataflow Name="Copy data into Source <#=table.Name#>">
				<Transformations>
					<OleDbSource ConnectionName="Source" Name="Source -"
						<#=table.Name#>">
						<DirectInput>
							SELECT
							<#=table.GetColumnList()#>
							FROM
							<#=table.Name#>
						</DirectInput>
					</OleDbSource>
					<OleDbDestination ConnectionName="Target" Name="Target -"
						<#=table.Name#>">
						<ExternalTableOutput Table="Import."
							<#=table.Name#>">
						</ExternalTableOutput>
					</OleDbDestination>
				</Transformations>
			</Dataflow>
			<!--
			<#
			}
			#>
			-->
			</Tasks>
		</Package>
	</Packages>
</Biml>

<!--
<#+
AstNode GetSqlServerExtendedProperties(AstTableNode table)
{
	AstNode returnValue = (AstNode)table;

	var query = string.Format("SELECT name, value FROM fn_listextendedproperty(NULL, 'schema', '{0}', 'table', '{1}', NULL, NULL)", table.SchemaName, table.Name);

	var extendedProperties = ExternalDataAccess.GetDataTable(conn.ConnectionString, query);

	foreach (DataRow ep in extendedProperties.Rows)
	{
		returnValue.AddAnnotation(AnnotationType.Tag, ep["value"].ToString(), ep["name"].ToString());
	}

	return returnValue;
}
#>
-->

Conclusion

If you find yourself writing lots of C# code which you’re using within BIML then it is definitely work trying out LinqPad to see if it makes your life easier.

 

Advertisements

Data Mining Resources for SQL Server Developers

Below are a few resources that I’ve discovered recently that show various ways of getting started with the Data Mining capabilities of SQL Server.

Enhancing Applications with SQL Server Data Mining by Peter Myers (TechEd Australia 2013)

In this excellent talk by Peter Myers, he demonstrates how easy it can be to build a data mining model then shows three different ways that they can be incorporated into an application:

  • Embedding data mining visualizations into a WinForms application
  • Using a data mining models as the source for a Reporting Services report, embedding a report into a WinForms application
  • Using the mining model to validate the “likeliness” of data input in a WinForms application

In addition to giving a very practical demonstration of creating and using a data mining model he also gives an excellent overview of some of the fundamentals associated with data mining. In particular he gives an excellent overview of the various models that are available with SQL Server.

Advanced-and Easy!-BI with Excel and SQL Server Data Mining by Rafal Lukawiecki (SharePoint Conference 2014)

In this talk by Rafal Lukawiecki (b | t), he once again gives a general overview of Data Mining, explaining some of the concepts then shows how to get started using the (free) Data Mining plugin for Excel 2013. He then moves onto building models using SQL Server Data Tools.

Most of the talk is spent within Excel, which really does show it is capable of almost anything 🙂

Forecasting with the Microsoft Time Series Data Mining Algorithm by Peter Myers (TechEd 2014)

In this talk from TechEd, Peter Myers focuses on the time series data mining algorithm in depth. He shows how it can be used to do forecasting using historical data.

The Level 300 (Experienced) classification is definitely appropriate, still if you want an in depth example of using the Time Series algorithm this is the video for you.

Data Mining in SQL Server Analysis Services by Brian Knight (SQLPass 2013)

In this excellent entry level presentation Brian Knight (of Pragmatic Works) demonstrates how simple and easy it is to get started working with Data Mining. If you’re looking for an introduction to the art of the possible this is a great place to start.

Kaggle – Data sets

In the two talks above (and plenty others I’ve seen in the past) the data set used for the examples is the well known AdventureWorksDW, if you’re a bit bored of this and fancy something different take a look at Kaggle.com.

Kaggle, if you haven’t come across it already is “…the leading platform for predictive modeling competitions.”, various companies publish interesting data sets as part of competitions, which you could enter to win prizes and money, or use the data sets for experimentation (check the TOC).

BIML – Parameter Binding with SSIS 2012

For any SSIS project that contains more than a few packages chances are you have a master (or coordinator) package that controls execution of various packages within your project. In this post I’ll show you how to use Package Parameters with BIML.

When I first started searching around on how to do this, I can across this sample on the BIMLScript.com site: http://bimlscript.com/Snippet/Details/73

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
 <Packages>
   <Package Name="PackageParameters" ConstraintMode="Parallel">
     <Parameters>
       <Parameter Name="PackageParamStr" DataType="String">Test</Parameter>
     </Parameters>
     <Variables>
       <Variable Name="TestPackageParam" DataType="String" EvaluateAsExpression="true">@[$Package::PackageParamStr]</Variable>
     </Variables>
   </Package>
 </Packages>
</Biml>

In this sample the value of the parameter is being assigned to a package variable, which is then used within the package. Unfortunately I couldn’t get this working so had to keep looking for another solution. Trying to use the parameter directly proved to be an exercise in experimentation. After a few different attempts it turned out to be very simple, here is an example of using a Package Parameter with an OLE DB Source:

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
 <Packages>
  <Package Name="PackageParameters" ConstraintMode="Parallel">
   <Parameters>
    <Parameter Name="PackageParamStr" DataType="String">Test</Parameter>
   </Parameters>
   <Tasks>
    <Dataflow Name="DFT - Move some data around">
     <Transformations>
      <OleDbSource ConnectionName="Source" Name="SRC - Client">
       <DirectInput>EXEC ETL.GetSomeData @sp_param1 = ?</DirectInput>
       <Parameters>
        <Parameter VariableName="PackageParamStr" Name="@sp_param1" />
       </Parameters>
      </OleDbSource>
     </Transformations>
    </Dataflow>
   </Tasks>
  </Package>
 </Packages>
</Biml>

Line 13 shows how to use the package parameter as a parameter for the stored procedure being called by the OLE DB Source.