Saturday, November 15, 2008

First encounter with T-SQL

I am building a little app for my team that collects weekly status emails and publishes it on a web site. Since I've worked on far lower level stuff most of my career (I only wrote the code that actually has UI at Google), it's fun and educating.

Using ASP.NET is quite fun actually - Microsoft REALLY nailed developer experience on this one. The API is logical, concise, well structured, and a pure joy to use. I started with having no clue whatsoever, and had a fully-functioning web app running in around two hours, and this included reading a book.

This story is about T-SQL, however. There came time where I wanted a stored procedure. On my website, there is an adminitrative user who can make changes in a database, and everybody else can do unlimited reading (anybody can see everybody's status mail), but only very limited types of writing (sign up to receive someone else's status updates in email, turn email reminders for her- or himself on and off, and join and depart a project).

So while everything other type of data access was done using LINQ, it seemed convenient to handle mutations by non-admin users via stored procedures. Specifically, I needed one right off the bat: any authenticated user - know by the alias - could sign up for status email for a person whose status was to be collected.

The task seemed quite obvious:

-- =============================================
-- Author: Sergey Solyanik
-- Create date: 11/14/2008
-- Description: Adds a subscription
-- =============================================
CREATE PROCEDURE [dbo].[AddSubscriptionToPerson]
@alias nvarchar(15),
@personId int
AS
BEGIN
DECLARE @subscriberId int

SET @subscriberId = (SELECT SubscriberId
FROM Subscribers
WHERE SubscriberAlias = @alias)

IF @subscriberId = NULL -- BUGBUG: '=' does not work!
BEGIN
INSERT INTO Subscribers (SubscriberAlias) VALUES(@alias)
SET @subscriberId = @@IDENTITY
END

IF NOT EXISTS(SELECT 1
FROM SubscriptionsPeople
WHERE SubscriberId = @subscriberId AND
PersonId = @personId)
BEGIN
INSERT INTO SubscriptionsPeople (PersonId, SubscriberId)
VALUES(@personId, @subscriberId)
END
END
Right? Wrong! If @subscriberId is NULL, the following statement:

IF @subscriberId = NULL
SELECT 'IS NULL'
else
SELECT 'IS NOT NULL'
prints out 'IS NOT NULL'! The correct way of tesing for NULL is not comparing with NULL, but using IS [NOT] NULL keyword:

IF @subscriberId IS NULL
SELECT 'IS NULL'
else
SELECT 'IS NOT NULL'

So the version of code that actually works is:

-- =============================================
-- Author: Sergey Solyanik
-- Create date: 11/14/2008
-- Description: Adds a subscription
-- =============================================
CREATE PROCEDURE [dbo].[AddSubscriptionToPerson]
@alias nvarchar(15),
@personId int
AS
BEGIN
DECLARE @subscriberId int

SET @subscriberId = (SELECT SubscriberId
FROM Subscribers
WHERE SubscriberAlias = @alias)

IF @subscriberId IS NULL
BEGIN
INSERT INTO Subscribers (SubscriberAlias) VALUES(@alias)
SET @subscriberId = @@IDENTITY
END

IF NOT EXISTS(SELECT 1
FROM SubscriptionsPeople
WHERE SubscriberId = @subscriberId AND
PersonId = @personId)
BEGIN
INSERT INTO SubscriptionsPeople (PersonId, SubscriberId)
VALUES(@personId, @subscriberId)
END
END
Unfortuately, discovering this was not trivial. You see, IS [NOT] NULL is nowhere near IF in the documentation, and neither it is in the book I was reading. Also, EXISTS works if you put your SQL query that returns NULL in the IF itself, like so:

IF NOT EXISTS(SELECT SubscriberId
FROM Subscribers
WHERE SubscriberAlias = @alias)
THEN...
...but not like this:

SET @id = (SELECT SubscriberId
FROM Subscribers
WHERE SubscriberAlias = @alias)
IF EXISTS(@id)...
Well, what should we expect from a language with the syntax and semantics right out of 1950s. I haven't worked with anything like this since FORTRAN...

4 comments:

Илья Казначеев said...

Still I don't understand why people like the idea of running random turing-complete code on the database server.

Why??

Sergey Solyanik said...

There could be two reasons.

First might be efficiency. You might want to do complex processing that involves computing stuff on a lot of data locally, instead of shipping all this data to the client.

Second, it lets you expose the database as an object - the guts are hidden, you (the developer) control the interface. As long as the interface stays the same, you can tweak the guts (rename the table, consolidate them, index them differently) without rewriting your clients.

With LINQ, a stored procedure is a function on your context, so you'd call the one I have in the article as context.AddSubscriptionToPerson("mailalias", 123);

There might be other reasons, these are the ones I've heard.

Anonymous said...

You didn't find it in the ‘if’ semantics because NULL has its own entry in books online: http://msdn.microsoft.com/en-us/library/aa196106.aspx

What you stumbled into is that NULL has special meaning. It points out is that variables are special in tsql. Take this example:

Declare @foo bool

@foo has three possible values: True, False and NULL (or undefined). This basically goes back to relational algebra and set theory. @foo isn’t a variable, it’s a tightly constrained set.

This is a classic case of the user model being decoupled from the underlying implementation. T-SQL and VS make a relational db look like a duck and talk like a duck but it’s still a relational db built on relational algebra.

Илья Казначеев said...

@foo has three possible values - True, False, and FileNotFound!