Available for free consultation work in New Delhi

My wife and I have recently made a big change in life, we’ve moved from the familiar comforts of life in London to the big smoke of New Delhi to explore potential business opportunities!

At the moment we are both taking intensive Hindi lessons, I’m also playing around with a few side projects but have a bit of spare time on my hands.

If you would like some help with any of the following:

  • Software development process
  • Release automation
  • Octopus Deploy / TeamCity
  • C# / SQL Server development
  • SSIS / Biml
  • Splunk
  • Elastic Search

I’m available and would be more than happy to chat through any questions you have or even spend some time helping you out – free of charge. If you’re based in New Delhi or Gurgaon then I’d also be happy to come to your office, once again free of charge.

I’m trying to get involved wit the tech scene in Delhi and see this as an opportunity to learn about what’s going on, rather than trying to make a quick buck (or rupee).

If you want to get in touch then please email me:

delhidev [at] kenpire [dot] com

I’m willing to offer my time free of charge, the only thing I ask is that in your email you give a decent explanation of what you’d like help with.

Setting up SignalR and Nancy in Visual Studio 2015

A few of the side projects I’m working on at the moment are making use of SignalR for real-time communication between the browser and server, they are also using Nancy (a low ceremony web framework).

It took a few different blog posts and reading the Nancy documentation to figure out how to go from an empty ASP.Net project to having Nancy and SignalR playing nicely, so I’ve written this blog post outlining the process – as it turns out the process itself is pretty simple.

  1. Create a ASP.Net project
    1-new-project
  2. Select the Empty template
    2-empty-template
  3. Install the required Nancy NuGet packages:
    • Install-Package Nancy
    • Install-Package Nancy.Owin
  4. Install SignalR by adding a new SignalR Hub class3-add-signalr-hub
    public class HelloHub : Hub
    {
        public void Echo(string str)
        {
            Clients.Others.echo(str);
        }
    }
    
  5. Add a Startup class:
    namespace MyWebApp
    {
        using Owin;
    
        public class Startup
        {
            public void Configuration(IAppBuilder app)
            {
                app
                    .MapSignalR()
                    .UseNancy();
            }
        }
    }
    
  6. Add a new Module
    public class IndexModule : NancyModule
    {
        public IndexModule()
        {
            Get["/"] = _ =>
            {
                return View["Index"];
            };
        }
    }
    
  7. Add the Index.html view
    <!DOCTYPE html>
    <html>
    <head>
        <title>MyWebApp</title>
    </head>
    <body>
    
    <div>
        </div>
    
        <script src="~/Scripts/jquery-1.10.2.js"></script>
        <script src="~/Scripts/jquery.signalR-2.1.2.js"></script>
        <script src="/signalr/hubs"></script>
        <script type="text/javascript">
            var myHub = $.connection.helloHub;
    
            myHub.client.echo = function (text) {
                console.info(text);
            };
    
            $.connection.hub.start();
        </script>
    </body>
    </html>
    
  8. Start debugging with Chrome and open two windows
  9. Open Chrome Developer tools (F12) in both windows, then in one type
    myHub.server.echo("test123");
    
  10. You should see the message appear in developer tools in the other window
    4-cross-window-messaging

Performance tuning stored procedures – when indexes aren’t the problem

Recently I have been doing performance tuning on a few stored procedures that power some of the most important parts of DrDoctor. Whilst I’m confident to say I can read query execution plans much better now that I could 6 months ago that isn’t the biggest takeaway. What I’ve noticed with two stored procedures in particular is that the reason they are running slowly is because they were both reading lots and lots of data from one or two tables which was then slowly whittled away into practically nothing.

Take the most recent stored procedure I had to tune. This stored procedure finds slots for patients to book into for their up and coming hospital appointment.

1-query-plan-1

As you can see the final output of the query is 98 rows, yet if you look at the far right the highlighted table which contains the slots selects out over 1 million rows, which are eventually reduced down to pretty much nothing.

In this post I’m going to show how after making a few minor changes this stored procedure went from taking just over 30 seconds to finishing in under 1 second!

Tuning the performance

The first thing that I did was look at the execution plan, my initial observations were:

  • Final row count is 98
  • Selecting over 1 million rows from the slots table – this just feels wrong!
  • What is that sort operation doing in there
  • Looks like there are a few big and nasty key lookups

Fix attempt 1: I added a new non-clustered index to the slots table to try and reduce the amount of data that is being pulled back, this didn’t work however it did remove the sort operator.

2-sort-operator-gone

You can see from the above that we are still selecting out over a million rows from the slots table. It’s now dawning on me that the indexes isn’t really the problem, but rather it’s the query that needs to be addressed.

Looking more closely: In the query we have a bunch of INNER JOINs one of which joins onto the slots table, in the WHERE clause of the query we do some date filtering on the time range of the slots based on data from another table. What this means is that SQL Server selects out all the slots in this table and then carries all this data around until finally it filters the data down.

Fix attempt 2: Adding date filtering to the INNER JOIN predicate on the slots table to only select future slots

3-future-slots-only

This change has made a noticeable difference as now only ~600,000 rows are being returned. This is a move in the right direction, but I’m sure that more changes could be made.

Looking more closely: Having made some progress in the right direction I went back to the query. I know the solution is to filter that slots table down right at the source, so I started looking for more ways to do that. The query currently has a CTE that selects a row of data about the appointment that is being booked and then this CTE is joined to the main query.

Fix attempt 3: I changed the CTE to join to a configuration table and find for me the MIN/MAX range of slots that should be selected, then instead of joining the CTE to the main query I declared a bunch of variables and used the CTE to give them their values. Then I removed the CTE from the main query and used the variables, I also then changed the slots INNER JOIN to select slots based on the date range using the MIN/MAX calculated previously.

4-more-specific-date-range

As you can see this is still moving in the right direction, the number of rows being returned now is down to 324,000 – still a lot, but significantly less than the million I started with!

Looking more closely: I now started to look more broadly to see if it was possible to reduce the 324,000 rows earlier in the query plan. When deciding if a slot is suitable we have to look at the “session instance” (hospital lingo) and make sure that it is “allowed to be booked”, this involves a join to a reference table.

Fix attempt 4: This time I changed the session instance INNER JOIN predicate to also say where the TypeId IN (SELECT TypeId FROM TypeTable WHERE AFlag = 1)

5-more-innerjoin-predicates

This hasn’t reduced the amount of slots rows that were being selected but it did mean that SQL Server was carrying them around in memory for less time – a good thing! (This is obvious if you compare this query plan to the one above)

Looking more closely: I’m now increasing in confidence and sure that there is more room for improvement. Looking back at the query I see that we also do some filtering on the slots table based on a booking status type, which again involves a join to a reference table

Fix attempt 5: Similar to fix attempt 4 I updated the slot table INNER JOIN predicate

6-nailed-it

Nailed it! Take a look at the query plan above, the SQL Server optimizer is now clever enough to find that there are only 336 session instances, it then does a Nested Loops operator with these 336 rows against the slots table to return 1,145 slots rows! This down from 1,094,032 rows in the first execution plan!

Looking more closely: I haven’t finished with this query yet, looking back at the execution plan I can see that when SQL Server does the key lookup on the slots table it only finds 234 slots that match the final condition.

Fix attempt 6: I added a new non-clustered index on the columns SQL wants to filter on and includes the others it needs to return.

7-finished

With the non-clustered index added when SQL Server goes to the slots table it only returns 234 rows.

The query now finishes in under a second! What a huge improvement. There’s probably more room for improvement, but at this point I’m pretty happy with the improvement.

Reduce the data, then look at Indexes

As you can see from the above working out of my thoughts indexes weren’t the actual problem with the query. The problem was that too much data was being selected out of a very large table. With a little bit of work (this took me about 2.5 hours) I was able to reduce the number of rows from 1,094,032 down to 234!

In the end an index did help, but arguably the query probably performance just as well without an additional non-clustered index.

Getting out of “a variable nightmare” – or how I learnt to let go and embrace conventions

I’ve recently been doing some work automating the release of our integration projects to our various clients, I was reviewing the work that had been done last week and noticed that there were a lot of variables being defined. If we continued on this course as we added new clients we would end up adding between 3-5 new variables, multiple that by say 100 new clients and soon we would have an incredible amount of variables to deal with.

In our project we use Octopus environments to represent our different clients; you can see from the image below that we were defining a lot of different variables for each of our clients.

variables-before

As you can see above with only a handful of clients it’s already getting out of control – there has to be a better way!

Embrace convention, simplify your life

Consistency by convention

It struck me that we had inadvertently created a few conventions in the way we were defining different variables.

For example we had the following variables defined for each of our environments:

  • Client Staging DB, which represents the name of a database used by an ETL process these all took the form Client_[client name]
  • Client Staging User ID which took the form Staging_[client name].

These two variables (along with Client Staging Password) are used in the Client – National Conn String variable.

It dawned on me that with a very simple convention I could get rid of most of these repeated variables.

So instead of continuing on creating scoped variables for each of our clients (environments) I embraced the Octopus Environment Name variable. Now all of these existing variables that were scoped to each environment have been deleted and now there is a single variable shared by all environments.

Client – National Conn String has been replaced with

Data Source=xyz;User Id=Client_#{Octopus.Environment.Name};Password=#{Client Staging Password};Initial Catalog=Staging_#{Octopus.Environment.Name};Provider=SQLNCLI11.1;

This means longer having to define the Client Staging DB and Client Staging User ID variables.

There were a bunch of other variables that I went through and updated to use the same convention. After making this change I went from having 38 variables, down to 22.

variables-after

We now have an opinionated deployment process, which comes with some some big advantages:

  • The process is easier to reason about
  • Everything is consistent
  • Less setup work when on-boarding new clients

But it’s good to be flexible

There is still room for exceptions, take the Client – Local Conn String variable, I was able to standardize this for all of our clients except for the one where we run our database on their cluster. For them I have a variable scoped just to their environment.

Octopus Deploy makes it easy to be flexible with variable scoping.

Variable sets reduce noise

The other trick that I’ve used is putting groups of related variables into their own variable sets. They are great for sharing variables between projects, but I’ve used them to remove noise from the project variables listing.

No pain no gain

It’s worth mentioning there was a little bit of pain, not all of the environments were configured consistently, not all of the SQL users were named according to the convention. Which meant a bit of manual work fixing things up; but in the long run it’ll pay off.

Distributed multi-tenanted releases with Octopus Deploy

TL;DR; read the Octopus guidance, adopt the “environment per tenant” approach, try to make your process consistent across tenants and use scoped variables to make steps re-usable, keep complexity out.

In my last couple of posts I’ve written about the various aspects involved with automating the release of our client integration projects. I’ve shown how to get TeamCity to build Integration Services projects and then how to deploy them into SQL Server SSIS Catalog with Octopus Deploy.

In this post I’m going to explain how we setup Octopus Deploy for distributed multi-tenanted deployments.

octopus-dashboard

What is a distributed multi-tenanted deployment?

A tenant in the context of DrDoctor it is a hospital. As mentioned in a previous post, when DrDoctor starts working with a new hospital we do some bespoke integration work to integrate their PAS system with our system. This ETL process is made up of a couple of staging databases and some SSIS packages which orchestrate the ETL process.

For each hospital that we work with, we are given a virtual machine within their network to which we deploy a database and a set of SSIS packages. We then have a schedule job that:

  • Performs an ETL process on the client machine to transform the data from their PAS system into our schema
  • Transfers the transformed data from the client site to our central database running in our data center.

So in our context we have a mixed deployment topology, some components go to a virtual machine running in a remote data center and some components go to our servers running in another data center.

Setting up Octopus Deploy

Octopus Deploy already has some very helpful guidance around different ways that it can be set up to support multi-tenanted deployments. There are broadly speaking two approaches: environment per tenant or project per tenant.

The process which I opted for was environment per tenant, in this approach there is a single project which contains all the different steps and variables, and multiple environments which represent the various tenants.

The reason I opted for environment per tenant is that our release process is the same for each of our clients, this keep the overhead of maintenance low and the process of setting up a new client easy.

setting up the environments

As per the Octopus guidance (linked to above) I created an environment for each of the hospitals DrDoctor currently works with. At this time they are all configured with an offline package drop deployment target while we work with the various IT departments to get their firewalls to allow the Octopus tentacles to communicate with our Octopus server.

environments

I also added to each environment a listening tentacle deployment target for the tentacle inside our data center, this the same tentacle but with multiple environments applied to it.

shared-tentacle

I also created a new release life cycle for this project, which treats all the environments as equals.

etl-lifecycle

The release process

This is the release process at time of writing.

release-processSteps 2-4 and 6 I covered in my previous post about deploying SSIS projects with PowerShell and Octopus Deploy. All of these steps make use of environment specific variables so they will work uniformly across all of our clients.

The step that is interesting is actually step 5. As I noted above, we have to do bespoke integration work for each of our clients, this involves writing custom transforms (stored procedures) and having a different set of database tables to which we import data from and to.

In Visual Studio we have a SQL database project for each of our clients (Local.ClientX), with all the common objects in a referenced database project. The client specific database project contains the transform stored procedures, import tables and client specific data mappings.

database-projects

Each of these database projects are built by TeamCity and turned into NuGet packages for Octopus Deploy.

database-projects-in-teamcity

In step 5 there is a step for each of our different clients, with each step being scoped to a specific environment. The actual work being done in each of the sub-steps are identical with the exception of the NuGet package which is being deployed.

Configuring step 5 was the only point where I felt that the functionality of Octopus Deploy was lacking elegance. It would be much nicer to have a single step which dynamically picked the correct NuGet package based on the environment variable.

That wasn’t possible as far as I could see, hence why step 5 has lots of duplication.

Roles

There are two roles:

  1. client role – this is for all steps that should be run on the client integration machine
  2. mayden role – this is for the steps that should be run on our central server

Using roles allow us to control where each of the steps are run.

Introducing Offline package drop

At time of writing we haven’t yet negotiated with the various IT departments to allow the tentacles to get through their firewalls. However, with the release of Octopus Deploy 3.x this is no longer a deal breaker thanks to a new feature called offline package drop.

This is one of a number of new deployment targets that have been added in version 3.x. When you deploy a release to an offline package drop the Octopus server will create a local directory for that release/environment and bundle up all the release artifacts (NuGet packages, PowerShell scripts etc) and create a handy script which can be run to deploy the release. This is the perfect feature where you can’t have either a listening or polling tentacle.

So the release process now becomes:

  • Create a new release in Octopus
  • Deploy the release to one or more environments
  • Zip the deployment folder and copy to the client machine
  • Execute the deployment script

Assuming everything worked the client is now on the latest release.

Multiple deployment targets

One of the great features about Octopus Deploy is how the environments can be configured. As I showed above, in our environments we two deployment targets each scoped to specific roles. This becomes really handy when trying to coordinate different steps across different machines.

In our scenario, the way it has been configured is to have all the client specific steps go to the offline package target, and the step for our central server will go to the listening tentacle and therefore be executed without any manual work by the person doing the deployment.

The other nice thing is that once our clients start opening their firewall we can swap out the offline package drop with a listening or polling tentacle and the release process will just work as is.

Thoughts and advice

One of the biggest problems I was trying to address with all this automation was fear or more specifically releasing is really hard…so I’ll avoid it as much as I can. Before we automated the process we would often over point user stories that involved making changes to our ETL process because releasing was so hard.

One of the consequences I believe that has come out of this is that we will as time goes in build confidence in releasing changes, and therefore more easily make changes. There are loads of things we would like to do to our ETL process to make the actually process more efficient, or refactor some of the data structures. My hope is that all those little things we’ll start doing because it’ll no longer be a big scary release.

The main advice I would give to someone looking to use Octopus Deploy for multi-tenanted releases is this: adopt the environment per tenant and only one project approach. This might mean a little bit of pain making everything uniform, but it will reduce your overhead and give you a much simpler process to reason about.

The more I use Octopus the more it shows itself to be extremely flexible and suited to many different scenarios, but at the same time it doesn’t have to be complicated, you could very easily have a simple project up and running in an afternoon.

Go forth and automate.

Deploying SSIS projects with Octopus Deploy

In my previous blog post I outlined how to configure TeamCity to build SQL Server Integration Services projects and then package the ispac file into a NuGet package.

In this post I’m going to show how I set up Octopus Deploy to deploy the ispac to SQL Server 2012 and configure SSIS DB projects and environments.

Deployment Overview

Here is an overview of the process as is currently defined in Octopus Deploy.

1-process-overview

We are doing multi-tenanted releases for this project, which is why the Client DB (Step 4) is repeated multiple times. In a future post I’m going to cover off details about how we are doing multi-tenanted releases.

Setting up SSIS DB

The first step in the release process is to ensure that the SSIS DB is up to date for use with our SSIS projects, within this step we ensure that:

  • A project folder exists within SSIS DB for our ispac to be deployed into
  • An environment exists
  • Environment variables are up to date

All of this is achieved with a little bit of PowerShell

$folderName = $OctopusParameters['SSISDB FolderName']
$environmentName = $OctopusParameters['SSISDB EnvironmentName']

$loadStatus = [Reflection.Assembly]::Load("Microsoft.SqlServer.Management.IntegrationServices, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91")

# Store the IntegrationServices Assembly namespace to avoid typing it every time            
$ISNamespace = "Microsoft.SqlServer.Management.IntegrationServices"            
            
Write-Host "Connecting to server ..."            
            
# Create a connection to the server            
$constr = "Data Source=localhost;Initial Catalog=master;Integrated Security=SSPI;"            

$con = New-Object System.Data.SqlClient.SqlConnection $constr

# Create the Integration Services object            
$ssis = New-Object $ISNamespace".IntegrationServices" $con

if ($ssis.Catalogs.Count -eq 0)
{
    Write-Error "SSISDB doesn't exist"
    throw "SSISDB doesn't exist"
}

$cat = $ssis.Catalogs["SSISDB"]

if ($cat.Folders[$folderName] -eq $null)
{
    Write-Host "Creating new folder" $folderName
    $newfolder = New-Object $ISNamespace".CatalogFolder" ($cat, $folderName, "Description")     
    $newfolder.Create()
}

$folder = $cat.Folders[$folderName]

if ($folder.Environments[$environmentName] -eq $null)
{
    Write-Host "Creating environment" $environmentName
    $newEnv = New-Object $ISNamespace".EnvironmentInfo" ($folder, $environmentName, "Descriptoin")
    $newEnv.Create()
}

$env = $folder.Environments[$environmentName]

Write-Host "creating variables"

if ($env.Variables["Local"] -eq $null)
{
    $env.Variables.Add("Local", [System.TypeCode]::String, $OctopusParameters['Client - Local Conn String'], $false, "Our local client db")
}
else
{
    $env.Variables["Local"].Value = $OctopusParameters['Client - Local Conn String']
}

# create other environment variables here...

#save changes to environment
$env.Alter()

(NB: This script and the subsequent one for deploying the ispac are based off this example by Matt Masson)

I’ve tried to write this script to be idempotent so that it can be run multiple times without causing any problems.

Here is what you will see in SQL Management Studio after this step has run:

2-ssisdb-with-folder

Deploying SSIS Projects

The next step is to actually deploy the Integration Services project into the SSIS DB. In this step we use a PowerShell script to:

  • Deploy the ispac into the project folder that was created in the previous step
  • Ensure that the project has a reference to the environment that was created in the previous step
### Static Variables

$localToLocalETLFullPath = ".\DrDoctorClientETL.ispac"
$folderName = $OctopusParameters['SSISDB FolderName']
$environmentName = $OctopusParameters['SSISDB EnvironmentName']
#the projectName needs to match the output name that has been specified on the project, you can see this in the properties in SQL Server Data Tools
$projectName = "DrDoctor Client ETL"

###

$loadStatus = [Reflection.Assembly]::Load("Microsoft.SqlServer.Management.IntegrationServices, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91")

# Store the IntegrationServices Assembly namespace to avoid typing it every time            
$ISNamespace = "Microsoft.SqlServer.Management.IntegrationServices"            
            
Write-Host "Connecting to server ..."            
            
# Create a connection to the server            
$constr = "Data Source=localhost;Initial Catalog=master;Integrated Security=SSPI;"            
            
$con = New-Object System.Data.SqlClient.SqlConnection $constr      
# Create the Integration Services object            
$ssis = New-Object $ISNamespace".IntegrationServices" $con
$cat = $ssis.Catalogs["SSISDB"]
$folder = $cat.Folders[$folderName]

Write-Host "Deploying project ..."            
[byte[]] $projectFile = [System.IO.File]::ReadAllBytes($localToLocalETLFullPath)            
$folder.DeployProject($projectName, $projectFile)

#####

Write-Host "adding reference to environment"

$project = $folder.Projects[$projectName]
$env = $folder.Environments[$environmentName]

if ($project.References.Count -eq 0)
{
    $project.References.Add($env.Name, $folderName)
}
else
{
    Write-Host "assuming that the correct environment has been configured"
}

####

Write-Host "applying environment variables to project"

$targetVar = $env.Variables["Local"]
$sourceVar = $env.Variables["Source"]

Write-Host "Updating Extract package parameters"

$extractPackage = $project.Packages["Extract.dtsx"]

$extractPackage.Parameters["CM.Target.ConnectionString"].Set([Microsoft.SqlServer.Management.IntegrationServices.ParameterInfo+ParameterValueType]::Referenced, $targetVar.Name)
$extractPackage.Parameters["CM.Source.ConnectionString"].Set([Microsoft.SqlServer.Management.IntegrationServices.ParameterInfo+ParameterValueType]::Referenced, $sourceVar.Name)

$project.Alter()

After this step has run the Projects folder will contain the first of our SSIS projects:

3-ssisdb-with-project

Deploying a database project

In a previous blog post I showed how you can fiddle with the dtproj and add a reference to the Octopack NuGet package to have TeamCity turn your database project into a NuGet package for you.

However, since discovering how to use NuSpec files in TeamCity I’d now recommend against changing your dtproj file and instead moving to using a NuSpec to define the package.

Here is a sample NuSpec file that I started using this week

<?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://schemas.microsoft.com/packaging/2010/07/nuspec.xsd">
  <metadata>
    <id>ClientDB</id>
    <version>1.0.0</version>
    <authors>DrDoctor</authors>
    <requireLicenseAcceptance>false</requireLicenseAcceptance>
    <description>Client database</description>
  </metadata>
  <files>
    <file src="bin\Release\ClientDB.dacpac" target="ClientDB.dacpac" /> 
    <file src="bin\Release\Reference.dacpac" target="Reference.dacpac" /> 
  </files> 
</package> 

(You can adapt this as you need for your requirements)

As shown in my previous post on building SSIS projects in TeamCity you will need to add this NuSpec file the list of packages to build within the NuGet Packaging step to have TeamCity build the NuGet package.

Once TeamCity is building the NuGet package, we can add a new Deploy NuGet Package step in our Octopus Deploy project, with the follow PowerShell script as a custom deployment script.

Here is the PowerShell script which will generate an upgrade script using SqlPackage.exe

$dbName = $OctopusParameters['Client Staging DB']
$server = $OctopusParameters['National DB Server']

$releaseNumber = $OctopusParameters['Octopus.Release.Number']

& 'C:\Program Files (x86)\Microsoft SQL Server\110\DAC\bin\SqlPackage.exe' /Action:Script /OutputPath:Upgrade.$dbName.$releaseNumber.sql /SourceFile:DrDoctor_STG.dacpac /TargetServerName:$server /TargetDatabaseName:$dbName /p:IncludeCompositeObjects=true

New-OctopusArtifact -Path "Upgrade.$dbName.$releaseNumber.sql"

This will compare the dacpac to the database, if the database doesn’t exist then the upgrade script will create it, otherwise the upgrade script will contain a bunch of sql commands to upgrade the database to match the dacpac.

The New-Artifact command is a useful Octopus Deploy function that tells the tentecle to grab the file and make it part of the release, so it’s then available from within the Octopus Deploy web ui.

release-artifacts

Alternatively, if you’re game you could configure SqlPackage to automatically upgrade the database by using Action:Publish instead.

In the next blog post I’ll be covering multi-tenanted releases with Octopus Deploy and using the Offline Package Drop feature which was new in 3.0.

Disclaimer: the PowerShell could definitely do with some tidying up, don’t hold that against me🙂

Building and packaging SQL Server Integration Services projects in TeamCity

At DrDoctor we’ve been using Octopus Deploy for about a year and half now, and in that time we have developed a very robust way of releasing the core of our system. That is, two IIS websites and a bunch of windows services. Release automation has vastly increased the frequency at which we release software into production and in turn the speed at which the business can operate.

The missing part in all of this are the integration projects, when DrDoctor begins working with a new hospital we do a bunch of work integrating their PAS system into our system. This takes the form of a couple of staging databases and a couple of SQL Server Integration Services (SSIS) projects.

Whenever we are discussing changes that need to be made we always screw up our faces at the thought of doing the release. We also have no idea which clients have which version, or when we last did a release.

In this and a few subsequent posts I’m going to outline the various steps that I took to automate our deployments, so we can get on doing the interesting work and let the computers handle the boring deployment work.

Building SSIS Projects in TeamCity

The first step required is to have TeamCity build our Integration Services (SSIS) projects in TeamCity. In reality all it’s really doing is taking all the .dtsx packages that are part of the project and turning them into an ispac file.

The ispac is basically a zip file that contains all the packages and some other metadata which is used when deploying into SQL Server SSISDB.

There are two methods that come up from a Google search on getting TeamCity building SSIS projects, the first involves using a community MSBuild extension and a bunch of other bits, you can read more about it here or here, you can find the MSBuild extension on CodePlex. I tried this method, but didn’t have immediate success.

The second option is to use Visual Studio (SQL Server Data Tools) devenv.exe to build your project, this was already installed on our build server as it is required when building database projects.

To do this in the TeamCity project’s build configuration I added a new build step, which was a Command Line runner:

1-building-ssis-devenv

In the custom script I put in the full path to devenv.exe, our SSIS projects are built in SSDT BI 2012 (Visual Studio 2012 shell) so I’m using that version of devenv.exe on the build server.

Packaging SSIS Projects for deployment

Now that TeamCity is building the project it time to setup TeamCity to package the ispac file into a NuGet package. As mentioned in the intro, we use Octopus Deploy as our deployment system, it in turn uses the NuGet package format as the container for application files (more details).

Setting up TeamCity to automate the creation of NuGet packages turns out to be really easy.

Creating a NuSpec file

The first step is to create a NuSpec file, this is the manifest file that TeamCity will use when building the NuGet package (it tells TeamCity which files to include in the package).

To set this up I added a new file called Local_ETL.nuspec into my project folder with the following content

<?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://schemas.microsoft.com/packaging/2010/07/nuspec.xsd">
  <metadata>
    <id>SSIS.Local_ETL</id>
    <version>1.0.0</version>
    <authors>DrDoctor</authors>
    <requireLicenseAcceptance>false</requireLicenseAcceptance>
    <description>Sync client to local</description>
  </metadata>
  <files>
    <file src="bin\Development\DrDoctor Client ETL.ispac" target="DrDoctorClientETL.ispac" />
  </files>
</package>

This as you can specifies a NuGet package ID, a version and a bunch of files that should be included.

Add build step in TeamCity

The next and final step is to configureTeamCity to create the NuGet package using the NuSpec file.

In TeamCity go into the Build Steps of your project, and click Add build step. From the runner type select NuGet Pack and configure the step with the following:

1-teamcity-nuget-packaging

Make sure that you have ticked “Publish created packages to build artifacts”. This will ensure that the packages are available through the built-in NuGet repository.

In the next post I’ll show how I setup my deployment project in Octopus Deploy to deploy the SSIS ispac to SQL Server SSISDB.

Stay tuned.

Applying custom settings with Octopus Deploy

If you’re a .Net developer and not currently doing automated deployments then you really should be. If you’d like some help getting started send me a message in the comments and I’ll drop you an email.

The core of any deployment is the ability to easily manage and apply configuration values, across environments and across your different apps/web apps.

Octopus Deploy has first class support for managing variables, including setting different scopes or environments and steps in your deployment process. It also has first class support for substituting variables into your <appSettings> and <connectionStrings>. In this case, all you need to do is create a variable with the same name as the appSetting and Octopus Deploy will swap in this value during deployment. Easy.

So what’s the problem?

The trouble comes when you start using custom settings in your app.config and web.config files. For example, you might have a web.config file that looks like this:

<configuration>
  <configSections>
    <section name="SomeCrazySettings" type="SomeAssembly.SomeCrazySettings, SomeAssembly" />
  </configSections>
  <appSettings>
    <!-- omitted -->
  </appSettings>
  <SomeCrazySettings>
    <CustomThing key1="foo" key2="bar" />
  </SomeCrazySettings>
</configuration>

Octopus Deploy does have a built in way of dealing with this problem, but the downside is that you have to hard-code the values in the Octopus Deploy convention, e.g.

<SomeCrazySettings>
  <CustomThing key1="#{Crazy Key1}" key2="#{Crazy Key2}" /> 
</SomeCrazySettings>

The problem that I have with the above is when I want to hard-code the values for local development. One of the applications I work on has a config item called BaseUri and when I’m working locally I like to have it set to my local machine e.g. http://localhost:44301. So the syntax above doesn’t work unless I’m happy with constantly changing the values.

In this blog post I’m going to show how with a little bit of PowerShell we can update the custom settings shown above.

Deployment Script

PowerShell makes your deployments awesome, especially in the context of Octopus Deploy. Anything that isn’t baked in can easily be implement with a little bit of PowerShell.

$Path = $OctopusParameters["OctopusOriginalPackageDirectoryPath"]

$Config = Get-ChildItem "$Path\*" -Include *.exe.config | Select-Object -First 1
[xml]$ConfigXml = Get-Content $Config

$configNodes = $ConfigXml.SelectNodes("//SomeCrazySettings");

if ($configNodes.Count -eq 1)
{
    $ConfigXml.configuration.SomeCrazySettings.key1=$OctopusParameters['Key1']
    $ConfigXml.configuration.SomeCrazySettings.key2=$OctopusParameters['Key2']

    $ConfigXml.Save($Config)
}

This script does the following:

  • Line 1-4: Finds the config file and then loads it as an XML object
  • Line 6: Using XPath looks for the node with the custom configuration
  • Line 8-14: If the node was found, updates the values of the custom keys, and then saves the file.

Go forth and automate.

How I demo user stories

At DrDoctor we are broadly speaking following the Agile Scrum methodology; we do Sprint planning, work on user stories, hold retrospectives etc…

Where are break away from the traditional ‘sprint’ methodology is how we go about getting stories “accepted” and released. Traditionally in Scrum you demo all the stories at the end of the sprint in the Sprint Review meeting, whereas we tend to demo stories (and have them accepted by the PO) as they are completed and then we aim to have accepted stories released to production within 3 days.

Part of getting a story accepted is demonstrating the new functionality to the product owner.

Why you should strive to give a good demo:

  • Gives the product owner confidence that you have delivered what they requested
  • Gives you an opportunity to explain the nuances/subtleties of how you have implemented the story
  • Gives you an opportunity to impressing the product owner – who doesn’t like impressing people?

How I demo a story (after I consider it code complete):

  1. Prepare and practice – I work out how I’m actually going to demo the story, this might be by writing some SQL scripts to set up the scenario, or just knowing which website to start at and then I do a practice run through. I consider this step critical, a smooth demo means the PO can focus on the functionality and not my clumsiness
  2. Ask the product owner  “can I demo <story x> to you?”, this sets the context so they know exactly what I’m talking about
  3. Read out story requirements from Pivotal Tracker – Once I have the product owner at my desk, I always bring up the story and show them what was requested and talk through the requirements as they are written.
  4. Explain:
    1. The approach taken during development – trying to keep it as high level as possible, whilst also explaining what was done to implement the story
    2. How I’m going to demo the story
  5. Do the demo
  6. Refer back to the story in Pivotal Tracker – after I’ve finished the demo, I go back to Pivotal Tracker to ensure we have covered off all the acceptance criteria from the story, and I explain any issues or subtleties they should be aware of
  7. Call to action – I always ask them to accept the story straightaway. This forces them to either ask for minor changes or to accept the story.

The key thing for me as a developer is to ensure that I’m prepared. This means being prepared to do the demo, but also prepared to explain the details of the implementation and any subtleties the product owner needs to be aware of.

The other key is the call to action – “Can the story be accepted? Great, please mark it as accepted”.

Let me know in the comments: How do you demo stories? What do you think is missing from the list?

2-workflow-one

Visualizing Workflows with Splunk and D3

At DrDoctor we do a lot with SMS, we let patients reschedule appointments, book appointments when they move to the top of a waiting list and we provide patients with the ability to move their appointment forward by sending them lists of slots as capacity becomes available.

In all of this we use workflows to represent what is going on between the patient’s mobile phone and our system. We call them workflows but really they are just state machines built on top of the excellent .Net library Automatonymous.

Like all startups we want to ensure that we are always delivering an excellent experience to all our users, which means keeping an eye out for those who get stuck and changing the workflows accordingly.

Splunking State Changes

We’ve been using Splunk as part of our reporting system for about 4 months now. We put lots of different events into it, including a huge variety of events which are generated by our workflows. Whenever an instance of a workflow changes state we publish an event similar to this:

Listing all the different state transitions can be done with the following Splunk search:


Category=WorkflowStateChanged | stats count by Payload.PreviousState Payload.CurrentState

Which produces the following table of events

1-table-of-states

Splunk and D3

That table of data isn’t really very useful when looking at it by eye, it requires a fair bit of work to trace the path through a given workflow. That’s where D3 comes into the picture – D3 is a JavaScript library used to produce interactive charts. It’s incredibly powerful, but with that power come with a bit of learning curve.

D3 charts aren’t built directly into Splunk, however if you’re happy to learn a bit of JavaScript and a small bit of Python’s Django framework then you can do some pretty cool stuff with the Splunk Web Framework.

To produce the D3 charts shown below I followed the instructions from the Web Framework getting started tutorial, but then used a modified version of the Sankey chart based on this example by Colin Fergus. The reason for this is that I wanted to be able to show loops in the chart.

I then made a few modifications so that any link connecting to the “Final” state that doesn’t go through the second last one is colour red, and those that do coloured green. The red paths highlight to us the sad path through the workflow. So for example if we can’t match a phone number to a patient them we would transition to the final state and colour this red.

Real-world examples

The first example represents the appointment rescheduling workflow.

3-workflow-two

As you can see it can be fairly complicated. Don’t forget that this is showing the aggregation of all the different state changes, given that there are multiple paths through the workflow not all people will go through all the states.

We notices that a fair few people were getting to the point where they were set alternative slots but never selecting one, we theorized that perhaps they needed the option to request more, so we added that functionality in. As you can see from the blue backwards loop not many people have actually used this.

Here is a further example from our partial booking workflow, where we ask patients to book their appointment when the reach the top of a waiting list. What’s interesting about this is that a large number of users (patients) never actually even start the process of booking.
2-workflow-one

Wrapping Up

We have tried out various methods of visualising the paths through our workflows that our users take. We used to do this with an Excel spreadsheet which we filled up with data from a nasty SQL query and then used a column chart to show where people exited, this was before we were using Automatonymous. It worked for a while but wasn’t really sustainable as a solution due to the manual process.

As our workflows have developed and we as an organisation have evolved we have worked hard to improve how we do our reporting. With the main aims being:

  • Accessibility – decision makers and developers alike should have quick and easy access to information
  • Discoverability – new pieces of information should be exposed effortlessly
  • First class experience

As you can see from the charts above, Splunk has certainly delivered.

If you’re interested in seeing the code which makes these charts possible then leave a comment below.