Log File Intelligence – log4net meets Splunk

At DrDoctor we are slowly adopting Splunk as our central reporting repository. We already have most of our application specific events going into it and we are already seeing some great benefits.

In this post I’m going to show the various steps I went through to get our log4net files being ingested in a useful format. Monitoring a file is easy, extracting useful fields is sometimes a challenge especially with log files.

Setting the format string

The first step was to change the format string in the log4net.config file. The main aim here was to make my life easier for when the log files are going into Splunk. By prefixing all the log4net tokens with a name means that I can write some simple, but very reliable regexes in Splunk to turn these into fields.


%utcdate{ISO8601} Release:'!release!' Version:'!version!' Thread:'[%thread]' Level:'%-5level' Logger:'%logger' Message:'%message'%newlineException:'%exception'

Those who are familiar with the log4net configuration options with notice that there are two tokens that don’t exist in the format string above. They are !release! and !version!, these are two very useful values to capture alongside our error messages as we can then start to track when new types of errors are discovered or introduced.

I’m using a custom PowerShell script in our deployment system, Octopus Deploy to set these values during the deployment phase, the release number reflects the Octopus Deploy release and the version number reflects the build number from TeamCity. Arguably we probably don’t need both, but I’m not entirely sure what I’m going to need yet so I’m going to stick with both for now.

Setting up Splunk

In our environment and most others I would assume, we are using the Splunk universal forwarder to send data to Splunk. The first step then is to add a new entry to the inputs.conf file to keep an eye on our logs directory.

[monitor://d:\logs\*.log]
sourcetype=log4net

Well that was easy, now that the Universal Forwarder is tracking the log files directory we should start seeing log entries appearing in Splunk.

This is a good start, but it would be more useful if we could start seeing the breakdown of the various entries. This is the point where we need to extract the various fields from the raw events.

To do this we need to make use of the Splunk field extractions. To extract more fields scroll down and click the link “Extract New Fields”

1-extract-new-fields

Then click “I prefer to enter the regex myself”

2-write-myown-regex

All the extractions follow a similar pattern, here is the regex to extract the log level:

Level:'(?P<Level>(([A-Za-z]*)))'

Enter that into the regex input, then click the Preview button, in the sample events you will see all the different logging levels highlighted, you will also notice a new tab called Level appear.

3-preview

Here is the complete list:

Release:'(?P<Release>(([0-9|.]*)))'
Version:'(?P<Version>(([0-9|.]*)))'
Level:'(?P<Level>(([a-zA-Z]*)))'
Message:'(?P<Message>((.+)))'

Go through the steps above for each one.

Log Intelligence

Now we can start doing some fancy queries.

Example one: number of errors by Release and host

index=logs sourcetype=log4net | stats count by Release, host

4-count-by-release-host

Example two: number of errors over time

index=logs sourcetype=log4net | timechart count

5-timechart

Example three: number of errors by application

index=logs sourcetype=log4net | rex field=source "Error\\\(?<app>(\w|\s|\d)*)" | stats count by Release, host, app | sort -Release, -count 

6-errors-by-application

Next steps

There are many possibilities, here are a couple of ideas:

  • Build a dashboard from the various queries above
  • Create some Splunk Alerts to trigger when a threshold of errors have been triggered
Advertisements

No More Interruptions – Integrating Codealike and HipChat

I’ve recently started using Codealike, a service that tracks various metrics while I’m coding. The data it collects is then presented in a bunch of really useful ways to help determine when I’m being most productive, as well as the places our code base I spend most of my time and various other things.

One of the metrics they calculate as part of this process is how “focused” you are and from this they determine whether or not someone should interrupt you – they have three different levels, No Activity, Can Interrupt and Cannot Interrupt.

One of the worst things that can happen is being interrupted while you’re “in the zone” or as Codealike put it “on fire”.

1-may-i-interrupt

Codealike does provide a webpage that you could put on a display in your office which displays your current status (you can view my current status), however whilst I have two monitors at work I tend you use both of them.

We use HipChat at work as our IM of choice, so, during the last week I’ve started to set my status as Do Not Disturb when Codealike thinks I’m in the zone.

Introducing: Codealike IM Updater

To remove the interruption of having to update my status in HipChat when I’m in the zone I set out to build the Codealike IM Updater. It’s a simple application that periodically checks my status according to Codealike and update my status on HipChat accordingly.

2-im-updaterAs you can see, it is very simple – all the user needs to do is supply their Codealike username, their HipChat API Token and Email address and how you want to map the different levels from Codealike to HipChat. Optionally you can also specify an optional message that is displayed next to your name in HipChat.

The code is available on GitHub and if anyone would like to extend it to work with other IM services then I will happily accept pull requests.