Scraping the Web with Excel 2013 – PowerBI Competition Entry

The Microsoft PowerBI Competition is now in full swing with the voting open to the public for the next week. (Check  out my entry).

As you can see below I just made my submission in time.

I like to cut it fine!
I like to cut it fine!

When I came to building my demo (check it out) I had a few different data sets in mind, but there were two main points that I wanted to highlight from my entry –

  1. How Power Query makes scraping webpages and extracting data into Excel really easy
  2. How Power View can be used to build visually stunning and interactive reports.

I ended up going with the National UFO Reporting Center (http://www.nuforc.org/webreports.html). They have very helpfully made their database available in plain old HTML tables, which makes it a rather trivial to retrieve using Power Query.

The biggest takeaway that I had from doing this is that doing screencasts is difficult! I have a new found appreciation for the Pluralsight authors who do an amazing job of lining up the timing of doing something on screen and narrating it.

Setting up SSDT Database Projects and tSQLt

Previously I’ve written about the database unit testing framework tSQLt, you can read about it here, there is also an excellent Pluralsight course by Dave Green (blog | twitter) which you can find here.

In this post I’m going to show you a method of version controlling your database and unit tests with SQL Server Database Projects in SQL Server Data Tools (SSDT).

Setting up the Solution

In my solution I’ve created two SQL Server Database Projects

1-the-solutionThe first project SimpleDB will represent the actual database project (i.e. the one that is being tested) this will contain all of the tables, stored procedures, functions etc. The second project SimpleDB_Tests will contain all of the tSQLt objects and associated tests.

Once this is done, the next step is to create some database references, right click on References and select Add Database Reference.

The first reference is to SimpleDB, ensure that you select Same Database under the Database Location.

2-simpledb-database-referenceThe second reference is to the master database, ensure that you enter sys as the Database name.

3-master-database-reference

Version Controlling tSQLt

The next step is to install the tSQLt framework into your local database, download the latest release from their website and follow the install instructions.

After you have installed tSQLt go back to SSDT and start a new Schema Comparison, select as the source your local development database and then target as your SimpleDB_Tests project.

To make the results easier to understand I like to group the results by schema, you can do this by clicking on the Group results icon and selecting schema.

Exclude everything except the objects in the tSQLt schema and the tSQLt assembly

4-tsqlt-schema-compare

Then write the results into the SimpleDB_Tests project by clicking the Update button.

5-ssdt-tsqlt-schema-objects

The next step is to create a Pre Deployment Script, right click on the SimpleDB_Tests project and under Add click Script. This will bring up the Add New Item dialog, click Pre-Deployment Script and give it the name Script.PreDeployment.Setup_tSQLt.sql

Into this script copy and paste the contents from the SetClrEnabled.sql script which is included with tSQLt.

Follow the above step, but this time add a Post Deployment Script and add the following

EXEC tSQLt.RunAll

This script will be executed each time the SimpleDB_Tests project is deployed and will run all of the tSQLt tests.

Adding Unit Tests

There are two ways you can go about adding unit tests.

The first and probably easiest is to create them in SQL Server Management Studio, you can read about how to do that in my previous post on tSQLt.

After you have created a new test class and some tests just simply use Schema Compare in SSDT to sync the changes into the project – just make sure you don’t include any of the objects that belong to the main database project. Only objects related to your test should go into the _Tests project!

The second way is to create everything inside of SSDT yourself.

If you do this then notice that when the tSQLt.NewTestClass proc is used to generate a test schema it adds an extended property, so make sure you do the same.

CREATE SCHEMA [stp_create_new_person]
    AUTHORIZATION [dbo];
GO

EXECUTE sp_addextendedproperty @name = N'tSQLt.TestClass', @value = 1, @level0type = N'SCHEMA', @level0name = N'stp_create_new_person';
GO

Writing a test is actually very easy

CREATE PROCEDURE [stp_create_new_person].[test Check that a person is created]
AS
BEGIN
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	SET NOCOUNT ON;

  --Assemble
  DECLARE @expected int = 1
  EXEC tSQLt.FakeTable 'Person', 'dbo'

  --Act
  EXEC stp_create_new_person @first_name = 'bob', @last_name = 'brown', @email = 'bob.brown@test.com'
  DECLARE @actual int = (SELECT COUNT(*) FROM dbo.Person)

  --Assert
	EXEC tSQLt.AssertEquals @expected = @expected, @actual = @actual, @message = 'Actual didn''t match expected'

END

Notice that since the reference to the SimpleDB was set up as ‘same database’ the test can just refer to the Person table and stp_create_new_person stored procedure directly.

Deploy and Test

Once you’ve made some changes, (e.g. adding new tests, changing stored procs etc…) it is time to deploy and test.

To do this we will be deploying the SimpleDB_Tests project. Right click on SimpleDB_Tests in the solution explorer and click Publish. You will need to set the Target Database Connection and ensure that the database name given doesn’t already exist.

6-publish-database-dialog

Then click Generate Script (or if you are brave you can just go ahead and click Publish).

Once the script has been generated you can give it a quick glance over then click the Play icon to execute it.

If you clicked Generate Script (instead of Publish) then you will see the status being updated in the messages section. Here you can see that one of my tests passed and the other failed so the status of the publish is ‘completed with errors’

7-publish-results-messages

If instead you clicked Publish you will see the status in the Data Tools Operations section being updated. Here you can see that it says ‘an error has occurred’

8-data-tools-operations

To see the details click on the View Results link. This will bring up the script that was executed and a very similar Messages section will appear with the following error

9-failed-messageSome Helpful Tips

If you use Schema Compare a lot to keep your projects in sync, then I suggest saving a schema compare file into each of your database projects with it setup to either, ignore the test objects (for your main database project e.g. SimpleDB) or all the main database objects (this would be for your test project e.g. SimpleDB_Tests). This streamlines the process of keeping your projects in sync.

Likewise save the Publishing Profile, unless you enjoy entering in the connection details over and over again (you can do this from the Save Profile As button on the Publish Database dialog).

One other idea that I have, but haven’t tried yet is having a post deployment script to drop the Tests database after all the tests have been executed.

An Introduction to Database Unit Testing with tSQLt

Last week I was looking through the Recently Published Courses list on Pluralsight and noticed one on Database Unit Testing (check it out). Given that it was quiet at work as not everyone was back from holidays I thought it would be a good time to look into it.

What is tSQLt?

tSQLt (website) is a testing framework for SQL Server, that basically means it provides (almost) everything you need to write tests against your database.

All the tests are written in T-SQL so that means that anyone working with SQL Server can write them and they are executed by a stored procedure. This means you stay working in SQL Server Management Studio with no need to context switch to another tool to write or execute your tests.

What tSQLt provides

tSQLt is a very rich testing framework and provides the following features:

  • Isolation and setup functionality
  • Assertions
  • Expectations (for exceptions)

Anatomy of a Unit Test (The Three A’s)

In case you aren’t familiar with unit testing theory then the three A’s are a very simple way of thinking about how to write easily understood tests.

The three A’s are Arrange, Act, Assert.

Arrange: This is where any setup for running the test is done, including mocking objects if necessary.

Act: This is where the piece of functionality we are testing is executed (this is often referred to the system under test).

Assert: This is where the result from the Act stage is compared to the expectation (i.e. did the test do what we were expecting it to do)

Example tSQLt Unit Test

Time for an example.

Take the following two tables:

create table dbo.[person]
(
  person_id INT Identity (1,1) Primary Key,
  first_name nvarchar(50) not null,
  last_name nvarchar(50) not null
)

create table dbo.[audit_log]
(
  audit_id INT Identity(1,1) Primary Key,
  audit_message nvarchar(MAX) NOT NULL
)

and the following stored procedure

create procedure stp_create_new_person
  @first_name nvarchar(50),
  @last_name nvarchar(50)
as
begin

  insert into dbo.[person] ( first_name, last_name ) values ( @first_name, @last_name )
  insert into dbo.[audit_log] ( audit_message ) values ( 'Created user with first_name ' + @first_name + ' and last_name ' + @last_name )

end

To test that an entry into the audit_log table is made after a user has been created we can write the following stored procedure to act as our tSQLt unit test


create procedure [stp_create_new_person].[test Check that an entry to the audit_log table is made with correct first and last name]
as
begin

  --Assemble
  declare @expected nvarchar(MAX) = 'Created user with first_name bob and last_name brown'

  exec tSQLt.FakeTable @TableName = 'person'
  exec tSQLt.FakeTable @TableName = 'audit_log'

  --Act
  exec dbo.stp_create_new_person @first_name = 'bob', @last_name = 'brown'
  declare @actual nvarchar(MAX) = (SELECT top 1 audit_message from [audit_log])

  --Assert
  exec tSQLt.AssertEqualsString @expected = @expected, @actual = @actual, @message = 'Audit message didn''t match expected result'

end

Let’s dissect this piece by piece. In the Assemble stage, we set our expectation, which will be used in the Assertion section

declare @expected nvarchar(MAX) = 'Created user with first_name bob and last_name brown'

Next is setting up the database, FakeTable is a tSQLt function which under the hoods makes a fresh copy of the table, removing all constraints, triggers, identities etc and then renames the existing one to a temporary name (this is then renamed back after the test has finished).

This allows the test to run in isolation.

exec tSQLt.FakeTable @TableName = 'person'
exec tSQLt.FakeTable @TableName = 'audit_log'

In the Act stage the stored procedure that is being tested is executed.

exec dbo.stp_create_new_person @first_name = 'bob', @last_name = 'brown'

Then from the audit_log table we set the actual result, which will then be compared to the excepted value

declare @actual nvarchar(MAX) = (SELECT top 1 audit_message from [audit_log])

Finally in the Assert stage we use the tSQLt.AssertEqualsString stored procedure to check if the actual result matches the expected value

exec tSQLt.AssertEqualsString @expected = @expected, @actual = @actual, @message = 'Audit message didn''t match expected result'

And that’s all there is to writing a simple unit test.

tSQLt likes you to group all the tests that relate to one database object into it’s own schema as this makes it easier to find all related tests in the object explorer.

tSQLt provides a stored procedure that can be used to create a new schema (test class)

  exec tSQLt.NewTestClass '[stp_create_new_person]'

tSQLt also provides a way to execute only unit test that belong to the same test class (schema), to execute all the stp_create_new_person tests

  exec tSQLt.Run '[stp_create_new_person]'

Wrap Up

As you can see, using tSQLt makes writing unit tests very easy. This should mean that you no longer have an excuse to not write them!

In my next post I’m going to cover off the issue of source control when using SQL Server Data Tools and the SQL Server project type.

Some Useful Resources

In this post I have only just scratched the surface of what can be achieved with tSQLt, I came across a number of useful resources that helped me to apply tSQLt: