Tip #3: Record life

A data warehouse may have multiple rows representing one row in the OLTP source.

This is because a warehouse can only provide analysts a record as of October 1, 2008 if and only if INSERT/UPDATE/DELETE operations result in unique record snapshots in the warehouse.

Time box prep-logic should execute as OLTP data is being initially processed in the warehouse from the temporary, intermediary copy/source.

Here's a sample, source OLTP table:

CREATE TABLE People
(
Id int primary key
,Name varchar(50)
)

Some modelers like to build their audit tables like this:

CREATE TABLE People
(
Id int
,Name varchar(50)
,CreatedOn datetime default getdate()
,Version int NOT NULL
)

I like to build my audit tables like this:

CREATE TABLE People
(
Id int primary key
,Name varchar(50)
,LiveFrom datetime default getdate()
,LiveTo datetime
)

The problem with the first approach is that it requires a considerable lookup to calculate the next version number. Another problem with the first approach is that time boxes require a double query to find the timely record and the test for the subsequent version.

The benefit of the second approach is that no lookup ever need occur. The problem with the second approach is that unchanged records require an audit table UPDATE to LiveTo. But, that's straight forward and the resulting queries using LiveFrom and LIveTo are quite simple.

I admit both approaches work. I just prefer the second.

Microsoft’s new Charting Control

At PDC (2008), Microsoft announced a Web Form and Win Form control for charting. That’s right, both! And it’s not a dumbed-down chart either. It’s part of an ongoing effort to extend BI to the end user.

Now, if I were a charting vendor, I would be totally angry with Microsoft. But because I am a developer, I finally have charting options without increasing project cost or complexity with a third party component.

Click here to see the download site on MSDN.

The TSQL to insert an image/blog

I recently needed this syntax.

If you want to insert a local file into a database image field (varbinary(max) field) then this is the syntax:

CREATE TABLE MyTable 
(id int, image varbinary(max))

INSERT INTO MyTable
SELECT 1
,(SELECT * FROM OPENROWSET(
BULK 'C:\file.bmp', SINGLE_BLOB) as x )

Tip #2: Row Identification

If you are designing an OLTP database, subtle changes to your schema can help minimize the complexity and increase efficiency of your data warehouse ETL.

To identify an OLTP table's row, ETL matches the primary key. The problem is: the primary key can be from a long list of data types and can be comprised of multiple columns.

Here's a simple OLTP table:

Create table MyTable
{
Id int identity(1, 1) primary key
,col1 varchar(50)
}

ETL can handle any type of primary key. But we are not doing what we can do, but doing what we should do. Matching keys and data types across systems is cute, but not efficient.

A simple and consistent mechanism to identify a row will help simplify ETL and guarantee the correct identification of a single record across multiple systems.

Here's a better OLTP table:

Create table MyTable
{
Id int identity(1, 1) primary key
,col1 varchar(50)
,RowId uniqueidentifier default newid() NULL
}

So what does this do? First, and foremost, it doesn't change anything in your current transactional applications. However, it allows the ETL to easily match records across systems.

There is a side benefit: replication. SQL replication needs a row identity to function. You get this bonus for free.

This simple schema change doesn't impact your transactional system, makes your ETL less complex and more efficient, and simplifies replication.

So, why not?