Performance tuning stored procedures – when indexes aren’t the problem

Recently I have been doing performance tuning on a few stored procedures that power some of the most important parts of DrDoctor. Whilst I’m confident to say I can read query execution plans much better now that I could 6 months ago that isn’t the biggest takeaway. What I’ve noticed with two stored procedures in particular is that the reason they are running slowly is because they were both reading lots and lots of data from one or two tables which was then slowly whittled away into practically nothing.

Take the most recent stored procedure I had to tune. This stored procedure finds slots for patients to book into for their up and coming hospital appointment.

1-query-plan-1

As you can see the final output of the query is 98 rows, yet if you look at the far right the highlighted table which contains the slots selects out over 1 million rows, which are eventually reduced down to pretty much nothing.

In this post I’m going to show how after making a few minor changes this stored procedure went from taking just over 30 seconds to finishing in under 1 second!

Tuning the performance

The first thing that I did was look at the execution plan, my initial observations were:

  • Final row count is 98
  • Selecting over 1 million rows from the slots table – this just feels wrong!
  • What is that sort operation doing in there
  • Looks like there are a few big and nasty key lookups

Fix attempt 1: I added a new non-clustered index to the slots table to try and reduce the amount of data that is being pulled back, this didn’t work however it did remove the sort operator.

2-sort-operator-gone

You can see from the above that we are still selecting out over a million rows from the slots table. It’s now dawning on me that the indexes isn’t really the problem, but rather it’s the query that needs to be addressed.

Looking more closely: In the query we have a bunch of INNER JOINs one of which joins onto the slots table, in the WHERE clause of the query we do some date filtering on the time range of the slots based on data from another table. What this means is that SQL Server selects out all the slots in this table and then carries all this data around until finally it filters the data down.

Fix attempt 2: Adding date filtering to the INNER JOIN predicate on the slots table to only select future slots

3-future-slots-only

This change has made a noticeable difference as now only ~600,000 rows are being returned. This is a move in the right direction, but I’m sure that more changes could be made.

Looking more closely: Having made some progress in the right direction I went back to the query. I know the solution is to filter that slots table down right at the source, so I started looking for more ways to do that. The query currently has a CTE that selects a row of data about the appointment that is being booked and then this CTE is joined to the main query.

Fix attempt 3: I changed the CTE to join to a configuration table and find for me the MIN/MAX range of slots that should be selected, then instead of joining the CTE to the main query I declared a bunch of variables and used the CTE to give them their values. Then I removed the CTE from the main query and used the variables, I also then changed the slots INNER JOIN to select slots based on the date range using the MIN/MAX calculated previously.

4-more-specific-date-range

As you can see this is still moving in the right direction, the number of rows being returned now is down to 324,000 – still a lot, but significantly less than the million I started with!

Looking more closely: I now started to look more broadly to see if it was possible to reduce the 324,000 rows earlier in the query plan. When deciding if a slot is suitable we have to look at the “session instance” (hospital lingo) and make sure that it is “allowed to be booked”, this involves a join to a reference table.

Fix attempt 4: This time I changed the session instance INNER JOIN predicate to also say where the TypeId IN (SELECT TypeId FROM TypeTable WHERE AFlag = 1)

5-more-innerjoin-predicates

This hasn’t reduced the amount of slots rows that were being selected but it did mean that SQL Server was carrying them around in memory for less time – a good thing! (This is obvious if you compare this query plan to the one above)

Looking more closely: I’m now increasing in confidence and sure that there is more room for improvement. Looking back at the query I see that we also do some filtering on the slots table based on a booking status type, which again involves a join to a reference table

Fix attempt 5: Similar to fix attempt 4 I updated the slot table INNER JOIN predicate

6-nailed-it

Nailed it! Take a look at the query plan above, the SQL Server optimizer is now clever enough to find that there are only 336 session instances, it then does a Nested Loops operator with these 336 rows against the slots table to return 1,145 slots rows! This down from 1,094,032 rows in the first execution plan!

Looking more closely: I haven’t finished with this query yet, looking back at the execution plan I can see that when SQL Server does the key lookup on the slots table it only finds 234 slots that match the final condition.

Fix attempt 6: I added a new non-clustered index on the columns SQL wants to filter on and includes the others it needs to return.

7-finished

With the non-clustered index added when SQL Server goes to the slots table it only returns 234 rows.

The query now finishes in under a second! What a huge improvement. There’s probably more room for improvement, but at this point I’m pretty happy with the improvement.

Reduce the data, then look at Indexes

As you can see from the above working out of my thoughts indexes weren’t the actual problem with the query. The problem was that too much data was being selected out of a very large table. With a little bit of work (this took me about 2.5 hours) I was able to reduce the number of rows from 1,094,032 down to 234!

In the end an index did help, but arguably the query probably performance just as well without an additional non-clustered index.

Advertisement

How I demo user stories

At DrDoctor we are broadly speaking following the Agile Scrum methodology; we do Sprint planning, work on user stories, hold retrospectives etc…

Where are break away from the traditional ‘sprint’ methodology is how we go about getting stories “accepted” and released. Traditionally in Scrum you demo all the stories at the end of the sprint in the Sprint Review meeting, whereas we tend to demo stories (and have them accepted by the PO) as they are completed and then we aim to have accepted stories released to production within 3 days.

Part of getting a story accepted is demonstrating the new functionality to the product owner.

Why you should strive to give a good demo:

  • Gives the product owner confidence that you have delivered what they requested
  • Gives you an opportunity to explain the nuances/subtleties of how you have implemented the story
  • Gives you an opportunity to impressing the product owner – who doesn’t like impressing people?

How I demo a story (after I consider it code complete):

  1. Prepare and practice – I work out how I’m actually going to demo the story, this might be by writing some SQL scripts to set up the scenario, or just knowing which website to start at and then I do a practice run through. I consider this step critical, a smooth demo means the PO can focus on the functionality and not my clumsiness
  2. Ask the product owner  “can I demo <story x> to you?”, this sets the context so they know exactly what I’m talking about
  3. Read out story requirements from Pivotal Tracker – Once I have the product owner at my desk, I always bring up the story and show them what was requested and talk through the requirements as they are written.
  4. Explain:
    1. The approach taken during development – trying to keep it as high level as possible, whilst also explaining what was done to implement the story
    2. How I’m going to demo the story
  5. Do the demo
  6. Refer back to the story in Pivotal Tracker – after I’ve finished the demo, I go back to Pivotal Tracker to ensure we have covered off all the acceptance criteria from the story, and I explain any issues or subtleties they should be aware of
  7. Call to action – I always ask them to accept the story straightaway. This forces them to either ask for minor changes or to accept the story.

The key thing for me as a developer is to ensure that I’m prepared. This means being prepared to do the demo, but also prepared to explain the details of the implementation and any subtleties the product owner needs to be aware of.

The other key is the call to action – “Can the story be accepted? Great, please mark it as accepted”.

Let me know in the comments: How do you demo stories? What do you think is missing from the list?

JavaScript testing – first impressions of Wallabyjs and Visual Studio

I’ve been a big fan of NCrunch for a long time now, when it comes to .Net test runners I don’t think there are any that beat it. So when I heard about the JavaScript test runner Wallaby I was intrigued enough to give it a try, even if it did mean writing JavaScript tests.

This week I started to write some new AngularJS functionality at DrDoctor and thought that now was as good a time as any to start writing a few JavaScript tests.

This post isn’t about writing Jasmine tests for Angular, there are already plenty of tutorials out on the internet, but rather about highlighting some of the features of Wallabyjs.

1-inline-code-coverage
In line code coverage
3-red-when-tests-failing
When you have failing tests the inline coverage changes for that block to red
2-lines-covered-by-tests
Show covering tests
4-link-to-failing-test
If a test is failing it will show up in here as well and give you some details
5-failing-test
Wallabyjs will also display the reasons your tests are failing
6-highlight-uncovered-blocks
Highlight uncovered lines of code in a file

Improving .Net Performance

Over the past couple of weeks we’ve noticed that our Reporting service is terribly slow.

splunk-dblogger

The above chart is part of our APM (Application Performance Monitoring) solution built on top of MiniProfiler and Splunk (see this blog post for more details). Our reporting service works by processing messages which are queued up by various other services onto a queue in RabbitMQ. What this chart is showing is that the reporting service is working at a very constant and slow pace. It will eventually get through the backlog of messages, but it takes it a few hours each morning while the CPU sits at around 100%.

Instead of the chart above, what I’d rather see is a big spike in activity that is sustained for a (much) shorter period of time!

In this post I’m going to show how I used the built-in tooling in Visual Studio 2013 to find and diagnose the bottlenecks in our code and then visualize the improvement in performance using Splunk and MiniProfiler.

Finding and fixing performance bottlenecks

I’ve known for a long time that Visual studio has built-in performance sampling tools, but have rarely used them. I found Sasha Goldshtein’s (blog|Twitter) Pluralsight course on Measuring .Net Performance an excellent overview of how to get started with them.

To measure the performance I first queued up a significant number of messages for the reporting service to process then started the Visual Studio Profiler.

Profiling CPU Usage

I started by setting the project I wanted to profile as the “Startup project”. Then from the Analyze menu, selected Performance and Diagnostics, which brings up the following window

1-performance-and-diagnostics

I then selected Performance Wizard under Available Tools and then clicked the Start button. This brings up a dialog where you can select a few options, I selected CPU sampling as the profiling method, then the Reporting Service project as the profiling target, then clicked finished. The Reporting service then started and Visual Studio started profiling, after about 5 minutes I stopped the profiling.

Viewing Profiler Reports and fixing bottlenecks

This is what the first profiling report looked like

2-first-perf-report

As you can see from the Hot Path, the Infrastructure.PubSub.EzNQ.BusItemTypeNameSerializer.DeSerialize method gets called lot and is potentially very slow! The good news here is that this is our code, which means I can click on this row and visual studio will take me to this method, and even highlight the expensive code.

3-first-hot-path

The class in question is an implementation of an interface which is part of EasyNetQ (a very nice .Net API for RabbitMQ which all of our services use). The interface ITypeNameSerializer let’s us customize our queue names and tells EasyNetQ what type it should deserialize a message into.

As shown above there is a clear bottleneck in this code, the fix turned out to be pretty straightforward, I created the typesCache Dictionary at the class level, and then updated the DeSerialize method to:

public Type DeSerialize(string serializedName)
{
  var typeName = serializedName.Split('_')[1];

  if (typesCache.ContainsKey(typeName))
  {
    return typesCache[typeName];
  }
  else
  {
    var type = AppDomain.CurrentDomain.GetAssemblies()
                .SelectMany(a => a.GetTypes())
                .FirstOrDefault(t => t.Name == typeName);

    typesCache[typeName] = type;

    return type;
  }
}

After making these changes I ran the profiler again to see what difference that had made, here are the profile results for the second run through

4-second-hot-path

As you can see the Infrastructure.PubSub.EzNQ.BusItemTypeNameSerializer.DeSerialize method doesn’t get a mention, looks like the fix was successful. But now a call to DataAccess.SimpleDataHelper.GetDatabase in the DbReporter class is in the hot path.

Clicking onto the ReportingService.DbReporter.WriteReportToDb row brings up the code for that method

5-writereporttodbThis c# code probably has any DBA wriggling in their seat right about now (sorry!), this method is writing each message that it processes into a corresponding table in our database using the likes of c# dynamic and SimpleData. It’s pretty powerful and very neat as we don’t have to write lots of code to persist all the different reporting events to the database.

Sadly there isn’t much that can really be done about the .Insert call, however after reading through the SimpleData documentation I realized that I don’t need to continually make calls to .GetDatabase, but rather can do this once in the constructor and SimpleData will take care of opening and closing the connections as needed.

After making these two fixes the profiler results after a third run don’t indicate anything specific that can be “fixed”, so it was time to push it up to our testing environment and do some real testing.

Comparing Performance

As I mentioned above our custom APM solution is built on top of Splunk and MiniProfiler. There are a number of advantages that we get from this that we wouldn’t get from the likes of New Relic or Stackify. One advantage is that we can add our own meta data to all the performance data we generate.

We put a Version number into all the performance data that we generate, this comes from the NuGet version number generated by TeamCity as part of our build process. Because we put this Version number in it means comparing performance between versions becomes trivial.

In Splunk I created the following dashboard, which allows me or any of my fellow developers at DrDoctor to select an Application and Transaction and then a baseline and comparison version.

comparing-performance

Judging by the results shown above, it’s working at roughly twice the speed, I’d have hoped for it to be a bit faster than that, but still I’m not complaining!

Unexpected consequences

Our Reporting service does two things it writes all IReportItem messages which it processes to their respective table in the database and the second thing is write all events into a log file for Splunk to index.

While doing some local testing I noticed that even writing to a log file was slow! The maximum speed I saw was ~50messages/second just to write to a log file! After making the fixes above the processing speed went up to 5,000 messages/second! Not insignificant that’s for sure.

6-queue speed

Send logs to Splunk with log4net UdpAppender

Last week I deployed a new Api hosted as an Azure WebApp, we wanted some reporting events to come out of it into our Splunk instance so we could keep an eye on whether it is working as expected. I started off by using the Splunk C# SDK as it looked nice and easy to add into our app.

A very trivial example of using the Splunk SDK would look like this:

public class SplunkLogger
{
	private readonly Splunk.Service service;
	private readonly Splunk.Receiver receiver;

	public SplunkLogger()
	{
		this.service = new Splunk.Service("myawesomesplunkinstance", 8089, "https");
		this.service.Login("", "");
		this.receiver = new Splunk.Receiver(service);
	}

	public void Log(string msg)
	{
		var args = new Splunk.ReceiverSubmitArgs
		{
			Source = "mysource",
			SourceType = "mysource-event",
			Index = "myindex"
		};

		receiver.Submit(args, msg);
	}
}

The Splunk SDK should have worked, but since there was A LOT of traffic at different times throughout the day the vast majority of the HTTP requests to the Splunk API just timed out.


System.Net.WebException: The operation has timed out
  at System.Net.HttpWebRequest.GetRequestStream(TransportContext& context)
  at System.Net.HttpWebRequest.GetRequestStream()
  at Splunk.HttpService.Send(String path, RequestMessage request)
  at Splunk.Service.Send(String path, RequestMessage request)
  at Splunk.Receiver.Submit(String indexName, Args args, String data)
  at Splunk.Receiver.Submit(Args args, String data)
  at Splunk.Receiver.Submit(ReceiverSubmitArgs args, String data)

After I discovered that it constantly just timed out I started to look around for an alternative to using the Splunk REST Api endpoint.

I’ve always noticed that Splunk lets you send events data to it via TCP/UDP but have never had cause or reason to use it. However given that our API was hosted as an Azure Web App and logging to the file system wasn’t an option I thought this would be the perfect time to try it out. As it turns out I was not disappointed.

 Setting up Splunk

The first thing you will need to do is configure your Splunk instance to listen out for events from a Udp port.

  1. From the Splunk settings, select Data Inputs1-data-inputs
  2. Click “Add new” UDP input2-add-new-udp
  3. This will bring up the Add Data wizard, enter the port you want Splunk to listen on (this will also need to go into your log4net config)3-add-data-wizard
  4. The next page “Input Settings” let you (optionally) specify various things about the data you are going to be sending. There are two particularly useful settings here, the sourcetype and index4-input-settings
  5. Click Review and then Submit

Splunk is now listening on that UDP port.

Important Note: Make sure that all firewall ports between your producer and Splunk are open to allow UDP traffic on the specified port! Otherwise it will silently fail.

Introducing the log4net UdpAppender

The first step of getting this working from your application is adding the log4net NuGet package to your project, next up open your log4net configuration and add the following <appender>

<appender name="SplunkAppender" type="log4net.Appender.UdpAppender">
  <remoteAddress value="<your-splunk-server>" />
  <remotePort value="<your-udp-port>" />
  <layout type="log4net.Layout.PatternLayout">
    <ConversionPattern value="%m%n" />
  </layout>
</appender>
<root>
  <level value="DEBUG"></level>
  <appender-ref ref="SplunkAppender"></appender-ref>
</root>

Now from your c# code you use log4net as you normally would:

var logger = log4net.LogManager.GetLogger("SplunkLogger");
logger.Info("logged to splunk!");

And you should start seeing your events in Splunk.

If you need to verify that log4net is sending events over UDP you can use Wireshark. Here I have set it up to capture packets from my wifi adapter and monitoring port 9977

5-wireshark

Measuring Application Performance with Mini Profiler and Splunk


At DrDoctor we use New Relic for performance monitoring of our web applications. However web only makes up a small part of our overall system. We make extensive use of a messaging based system and have lots of micro-services which communicate over a messaging bus.

After adding New Relic to our Windows services we soon realized that they don’t really do a good job of supporting them. The main reason for this afar as I can see from reading their docs is that New Relic depends on an active HttpContext for profiling applications.

Having used New Relic to pinpoint the performance problems in our web app, I started to think about what it would take to build something similar but for a Windows Service. This blog post is showing off a proof-of-concept that I’m currently developing.

Introducing Mini Profiler

Mini Profiler is a lightweight c# library written by the folk at StackExchange. It was designed to show performance metrics for web applications, but is flexible enough that I was able to plumb it into our windows services. Doing this is pretty easy:

MiniProfiler.Settings.ProfilerProvider = new StackExchange.Profiling.SingletonProfilerProvider(); //note: not thread safe
MiniProfiler.Settings.Storage = new Log4NetStorage(container.Resolve<ILogger>()); //this is the castle-windsor wrapper interface
MiniProfiler.Settings.SqlFormatter = new SimpleSqlFormatter();

The next step was to create my own custom storage mechanism which would output the results of a profiling session to a log file. This was not entirely trivial, as the profiler session is a hierarchy of steps. Here is how I implemented the Save method of IStorage:

public void Save(MiniProfiler profiler)
{
    var timings = new Stack<Timing>();

    if (profiler.Root.HasChildren)
    {
        var children = profiler.Root.Children;
        for (var i = children.Count - 1; i >= 0; i--) timings.Push(children[i]);
    }
    else
    {
        timings.Push(profiler.Root);
    }

    var intro = string.Format(@"{{""Timestamp"":""{0}"",""Application"":""{1}"",""Status"":""Starting"",""TransactionId"":""{2}"",""Transaction"":""{3}""}}",
                    profiler.Started.ToString(DATE_FORMAT),
                    applicationName,
                    profiler.Root.Id,
                    profiler.Root.Name);

    logger.Info(intro);

    decimal runningDuration = 0;

    while (timings.Count > 0)
    {
        var timing = timings.Pop();
        var name = timing.Name;

        decimal customTimingDuration = 0;

        StringBuilder sb = new StringBuilder();

        if (timing.HasCustomTimings)
        {
            foreach (var item in timing.CustomTimings)
            {
                customTimingDuration = item.Value.Sum(v => v.DurationMilliseconds ?? 0);

                decimal customOffset = runningDuration;

                foreach (var item2 in item.Value)
                {
                    customOffset += item2.DurationMilliseconds ?? 0;
                    logger.Info(string.Format(@"{{""Timestamp"":""{0}"",""Application"":""{1}"",""Status"":""Processing"",""TransactionId"":""{2}"",""Transaction"":""{3}"",""Action"":""{4}"",""CommandString"":""{5}"",""Duration"":""{6}""}}",
                        profiler.Started.AddMilliseconds(Convert.ToDouble(customOffset)).ToString(DATE_FORMAT),
                        applicationName,
                        profiler.Root.Id,
                        profiler.Root.Name,
                        string.Format("{0}/{1}", name, item.Key),
                        item2.CommandString,
                        item2.DurationMilliseconds));
                }
            }
        }

        var offset = Convert.ToDouble(timing.DurationWithoutChildrenMilliseconds - customTimingDuration);
        var msg = string.Format(@"{{""Timestamp"":""{0}"",""Application"":""{1}"",""Status"":""Processing"",""TransactionId"":""{2}"",""Transaction"":""{3}"",""Action"":""{4}"",""Duration"":""{5}""}}",
            profiler.Started.AddMilliseconds(Convert.ToDouble(runningDuration)).ToString(DATE_FORMAT),
            applicationName,
            profiler.Root.Id,
            profiler.Root.Name,
            name,
            offset);

        logger.Info(msg);
        if (sb.Length > 0)
        {
            logger.Info(sb.ToString());
        }
        if (timing.HasChildren)
        {
            var children = timing.Children;
            for (var i = children.Count - 1; i >= 0; i--) timings.Push(children[i]);
        }

        runningDuration += timing.DurationWithoutChildrenMilliseconds + customTimingDuration;
    }

    var end = string.Format(@"{{""Timestamp"":""{0}"",""Application"":""{1}"",""Status"":""Finished"",""TransactionId"":""{2}"",""Transaction"":""{3}""}}",
                    profiler.Started.AddMilliseconds(Convert.ToDouble(runningDuration)).ToString(DATE_FORMAT),
                    applicationName,
                    profiler.Root.Id,
                    profiler.Root.Name);

    logger.Info(end);

}

This isn’t particularly pretty, and I’ll be the first to admit that using string.Format to create JSON is really bad practice, but I wanted something that would be lightweight and work with log4net, which is our current logging library. Really what I should do is adopt semantic logging and use something like Serilog. But I’ll save that for another day.

The next step was to create my own ISqlFormatter, as the ones that come with Mini Profiler included a lot more detail than what I wanted to record, this was very trivial:

public class SimpleSqlFormatter : ISqlFormatter
{
    public string FormatSql(string commandText, List<SqlTimingParameter> parameters)
    {
        return commandText;
    }
}

With these two classes written plumbing it into my application became very easy and I soon had performance results being outputted as JSON into a log file, which I could then import into Splunk.

With the plumbing done, its time to actually instrument our code. With Mini Profiler what you do is call MiniProfiler.Start() and then .Stop() at the beginning and end of a session. In addition to this, you can call .Step() anywhere in between and this create a kind of sub-profiler, if for example you wanted to explicitly measure the performance of a particular part of your code.

When using Mini Profiler you will want to break your code into individual sessions, eg on a website this would be a single web request. With the idea being that you want to be able to measure and compare performance over time.

As mentioned above, the services that I want to add this to all communicate over a messaging bus. So the logical place to call .Start() and .Stop() is in the class that is invoked when a new message arrives on the bus. I also wrapped our database connection code with the Mini Profiler instrumentation code so that it can measure time spent executing SQL and what that SQL was.

Logger Output

After wiring everything up and running through a few test cases, this is a sample of the events that were produced:

{"Timestamp":"05/12/15 13:41:50.22 Z","Application":"ApplicationA","Status":"Starting","TransactionId":"2c9d0f76-cbca-4d4f-b736-48d09a3b75c1","Transaction":"Processor/MessageA"}
{"Timestamp":"05/12/15 13:41:50.24 Z","Application":"ApplicationA","Status":"Processing","TransactionId":"2c9d0f76-cbca-4d4f-b736-48d09a3b75c1","Transaction":"Processor/MessageA","Action":"Dispatcher/sql","CommandString":"select ######### from ##### WHERE (###### = @p1 AND ##### = @p2)","Duration":"18.9"}
{"Timestamp":"05/12/15 13:41:50.22 Z","Application":"ApplicationA","Status":"Processing","TransactionId":"2c9d0f76-cbca-4d4f-b736-48d09a3b75c1","Transaction":"Processor/MessageA","Action":"Dispatcher","Duration":"669.8"}
{"Timestamp":"05/12/15 13:41:50.93 Z","Application":"ApplicationA","Status":"Processing","TransactionId":"2c9d0f76-cbca-4d4f-b736-48d09a3b75c1","Transaction":"Processor/MessageA","Action":"EasyNetQ/Publish/MesageB","Duration":"34"}
{"Timestamp":"05/12/15 13:41:50.96 Z","Application":"ApplicationA","Status":"Processing","TransactionId":"2c9d0f76-cbca-4d4f-b736-48d09a3b75c1","Transaction":"Processor/MessageA","Action":"EasyNetQ/Publish/MessageC","Duration":"4.3"}
{"Timestamp":"05/12/15 13:41:50.96 Z","Application":"ApplicationA","Status":"Processing","TransactionId":"2c9d0f76-cbca-4d4f-b736-48d09a3b75c1","Transaction":"Processor/MessageA","Action":"EasyNetQ/Publish/MessageD","Duration":"5.2"}
{"Timestamp":"05/12/15 13:41:50.97 Z","Application":"ApplicationA","Status":"Finished","TransactionId":"2c9d0f76-cbca-4d4f-b736-48d09a3b75c1","Transaction":"Processor/MessageA"}

Building Splunk Dashboards

Since the data is being outputted as JSON it is really easy to import into Splunk, I’m not going to cover this here. With data in Splunk it’s time to turn the raw data into some useful charts.

The first one is a stacked area chart, which shows the total time spent for a transaction over time. High values on this chart aren’t necessarily indicative of performance problems, as it could just be that we were experiencing high throughput of a certain transaction.

1-transaction-overview

A variation on the chart above would be to show the average duration instead of the total time, that would be more indicative of performance problems.

The next chart shows the total time spent on processing for each different transaction in the selected time period. Once again this chart isn’t necessarily indicative of a performance problem. But does show where our system spends most of its time.

2-total-time

The next chart shows the average duration of a transaction, this in the context of the previous chart could start to highlight potential problem areas. A variation on this chart would be to show it stacked with the number of times the transaction was executed.

3-average-duration

The next chart shows the maximum time it took to execute a SQL statement. This is very high level but still useful in seeing SQL queries that might need to be tuned. Averages would probably be a better values to show here.

4-slowest-queries

The last chart is a stacked area chart of time spent on the different actions that were measured, over time. This is more useful when looked at in the context of a single transaction type.

5-transaction-details

As an added bonus, Splunk lets you click on any of the charts and drill through to the raw events.

4-event-drilldownAnd here is what the dashboard looks like with everything in place, note also the filters at the top of the page, so I can drill into the different applications and individual transactions – they also cross filter 🙂

6-dashboard

This is still very much a proof-of-concept, I’m hoping to get this running in production in the next couple of weeks. When it’s running in production I’ll post a follow up.

Storing certificates more securely when using Google APIs, OAuth 2.0 and .Net

Recently, I’ve been replacing bit.ly with the Google URL shortener service for a .Net project. It’s a pretty simple service, and the documentation shows very clearly how to get going.

Google have 3 different ways of authenticating using OAuth 2.0 (details), the method I am using is as a “Service Account”. Their code sample is thus:

String serviceAccountEmail = "SERVICE_ACCOUNT_EMAIL_HERE";
var certificate = new X509Certificate2(@"key.p12", "notasecret", X509KeyStorageFlags.Exportable);

ServiceAccountCredential credential = new ServiceAccountCredential(
  new ServiceAccountCredential.Initializer(serviceAccountEmail)
  {
    Scopes = new[] { PlusService.Scope.PlusMe }
  }.FromCertificate(certificate));

// Create the service.
var service = new UrlshortenerService(new BaseClientService.Initializer()
{
  HttpClientInitializer = credential,
  ApplicationName = "API Sample",
});

The key thing that I want to draw your attention to is line 2, this implies (to me) that you are probably goign to keep the certificate in source control along with your code. This isn’t good, as it could lead quite easily to mistakenly making it public (e.g. pushing the project to GitHub).

A better solution would be to attempt to load the certificate from the local certificate store on the machine.

So that means that line 2 above, would be replaced with the following:

string certificateIssuerName = "blahblahblah123456.apps.googleusercontent.com";

X509Store store = new X509Store(StoreLocation.LocalMachine);
store.Open(OpenFlags.ReadOnly);
var cert = store.Certificates.Find(X509FindType.FindByIssuerName, certificateIssuerName, false)[0];
store.Close();

Things to know:

  1. NB: Very important, you need to make sure you call the Open method on the store, otherwise you will find that calling the Find method will never return you a certificate.
  2. When importing the certificate into your certificate store, make sure you include the private key, otherwise your code won’t work.

Accessing Google APIs through a proxy with .Net

Recently I’ve been replacing the use of Bit.ly as a link shortener with Google’s link shortener service (goo.gl) in one of our .Net projects.

It’s pretty easy to get going with their handy .Net NuGet package and sample code, however what wasn’t obvious was how to pass the HTTP requests through a proxy. After a bit of digging around, it turned out to be pretty easy:

The first thing that needs to be done is to create our own ProxySupportedHttpClientFactory class, which inherits from the Google APIs HttpClientFactory class and override the CreateHandler method to return a different handler:

public class ProxySupportedHttpClientFactory : HttpClientFactory
{
  protected override HttpMessageHandler CreateHandler(CreateHttpClientArgs args)
  {
    var proxy = new WebProxy("http://proxyserver:8080", true, null, null);

    var webRequestHandler = new WebRequestHandler()
    {
      UseProxy = true,
      Proxy = proxy,
      UseCookies = false
    };

    return webRequestHandler;
  }
}

The next step is to then use our new ProxySupportedHttpClientFactory when creating the UrlshortenerService:


var credentials = ....

var linkService = new Google.Apis.Urlshortener.v1.UrlshortenerService(new BaseClientService.Initializer()
{
  HttpClientInitializer = credentials,
  ApplicationName = "MyURLThingy",
  HttpClientFactory = new ProxySupportedHttpClientFactory()
});

And that’s it.

No More Interruptions – Integrating Codealike and HipChat

I’ve recently started using Codealike, a service that tracks various metrics while I’m coding. The data it collects is then presented in a bunch of really useful ways to help determine when I’m being most productive, as well as the places our code base I spend most of my time and various other things.

One of the metrics they calculate as part of this process is how “focused” you are and from this they determine whether or not someone should interrupt you – they have three different levels, No Activity, Can Interrupt and Cannot Interrupt.

One of the worst things that can happen is being interrupted while you’re “in the zone” or as Codealike put it “on fire”.

1-may-i-interrupt

Codealike does provide a webpage that you could put on a display in your office which displays your current status (you can view my current status), however whilst I have two monitors at work I tend you use both of them.

We use HipChat at work as our IM of choice, so, during the last week I’ve started to set my status as Do Not Disturb when Codealike thinks I’m in the zone.

Introducing: Codealike IM Updater

To remove the interruption of having to update my status in HipChat when I’m in the zone I set out to build the Codealike IM Updater. It’s a simple application that periodically checks my status according to Codealike and update my status on HipChat accordingly.

2-im-updaterAs you can see, it is very simple – all the user needs to do is supply their Codealike username, their HipChat API Token and Email address and how you want to map the different levels from Codealike to HipChat. Optionally you can also specify an optional message that is displayed next to your name in HipChat.

The code is available on GitHub and if anyone would like to extend it to work with other IM services then I will happily accept pull requests.

Nancy and NCrunch – Playing nicely together…at last

If you haven’t tried NCrunch before then stop reading this and go download the trial and try it out, it will forever change your testing habits* – for the better.

NCrunch is the breakfast of champions

-KenR

At DrDoctor we use Nancy (rather than the typically used ASP.Net MVC Framework) the main reason is summed up by the overall goal of the framework:

The goal of the framework is to stay out of the way as much as possible and provide a super-duper-happy-path to all interactions.

One of the greatest things about Nancy vs ASP.Net MVC is that everything can be tested, and we have tests for pretty much all of our Modules (read Controllers) as well as our Views. NCrunch does some serious magic when it comes to running your test intelligently in parallel. Up until a few days ago I couldn’t figure out how to get my tests to work.

Testing Nancy Views

Testing Nancy Views is very easy, for demonstration purposes I created a new Nancy project from the standard ASP.Net and Razor Nancy template.

6-sample-solution

The template creates a module called IndexModule which returns a view called Index.

namespace TestingDemo
{
  using Nancy;

  public class IndexModule : NancyModule
  {
    public IndexModule()
    {
      Get["/"] = parameters =>
      {
        return View["index"];
      };
    }
  }
}

I then created an MS Test project called specs, and added the following test to make sure that when someone browses to “/” they would be served up the Index view

public void Should_Return_IndexView()
{
 //arrange
  var bootstrapper = new ConfigurableBootstrapper(with =>
  {
    with.ViewFactory<TestingViewFactory>();
    with.RootPathProvider<TestRootPathProvider>();
    with.Module<IndexModule>();
  });

  var browser = new Browser(bootstrapper);

  //act
  var response = browser.Get("/");

  //assert
  Assert.AreEqual("index", response.GetViewName());
}

To make this work I needed to implement my own custom IRootPathProvider,

public class TestRootPathProvider : IRootPathProvider
{
  public string GetRootPath()
  {
    string assemblyFilePath = new Uri(typeof(TestingDemo.IndexModule).Assembly.CodeBase).LocalPath;
    var assemblyPath = System.IO.Path.GetDirectoryName(assemblyFilePath);
    var path = PathHelper.GetParent(assemblyPath, 3);
    path = System.IO.Path.Combine(path, "TestingDemo");

    return path;
  }
}

In the Specs project I reference the TestingDemo project and then added the Nancy.Testing and Nancy Razor View Engine (since I’m testing Razor views) NuGet packages.

At this point the test passes in the visual studio test runner, but fails in NCrunch.

Eating Nancy tests with NCrunch

To get the tests to pass in NCrunch you will need to make a few changes to the NCrunch configuration for the test project.

  1. Open the NCrunch configuration window from the NCrunch menu and select the test project (below mine is called Specs)
  2. Under General select “Additional files to include”
    1-ncrunch-config
  3. Click the button with three dots (see highlighted in red above), on the Additional Files to Include dialog click “Add Directory”
  4. Browse to and select the root folder of your Nancy project, then click OK
  5. You will see the folder appear in the Additional Files To Include window, click OK2-additional-files-to-include
  6. Back in the NCrunch configuration window, under Build Settings set “Copy referenced assemblies to workspace” to True
    4-copy-references-assemblies
  7. You can now close the NCrunch configuration window

You should see that the test has now passed in the NCrunch Tests window

5-tests-have-passed

* May not actually change your testing habits.