SQL Scrapbook

Short and simple SQL Server posts

Archive for the ‘Data Warehousing’ Category

Keys to(in) Data Warehousing

leave a comment »

I was helping a co-worker understand how surrogate keys work in a data warehouse and the messgae from him at the end of the conversation was
“Keys are key to a data warehousing solution”. Thought of blogging about this.
In a OLTP based environment its all about primary keys(PK) and foreign keys(FK). However when it comes to data warehousing, in addition to PK and FK there are a couple of other concepts that are very commonly used.
Natural Key:
This can also be called as business key and is only a synonym of primary key and i think this doesn’t need any further explanation. It is nothing but the primary key from the OLTP system. Period.
Surrogate Key:
Surrogate Key is a common practice in any dimensional model. In simple terms, it is an auto generated identity column and this will act as primary key for the dimension table. The basic script to create a dimension table will look like



CREATE TABLE [HR].[Employee]
   (
       EmployeeKey INT IDENTITY(1,1) PRIMARY KEY  --SurrogateKey
       , EmpLicenseNumber VARCHAR(25) NOT NULL    --NaturalKey
   )

I guess the next question is why do we want to maintain a surrogate key instead of using the same primary key from OLTP. Well, hopefully you will find the answers below.
1) If you notice, the natural key in the above example is a varchar. The joins among facts and dimensions is done on key columns and joining on varchar instead of integer columns severly hurts query performance.
2) For a SCD type 2 dimension we should be able to insert multiple rows for the same employee. This will not be possible if we have EmpLicenseNumber as the primary key since you cannot insert duplicate values into a primary key column.

Advertisements

Written by sqlscrapbook

November 23, 2010 at 4:01 pm

Posted in Data Warehousing

Degenerate Dimensions

leave a comment »

So what are Degenerate dimensions? Where and why do we use them? This post will answer these questions.
Degenerate Dimensions are dimension keys in the fact tables and unlike other dimensions they are not related to any dimension tables. Instead Degenerate Dimensions are used to link back to the source system for data validation. Simply speaking these are the natural keys in the source system like the transaction number, sales order number, shipping tracking number etc., Clearly the business users will not be interested in browsing the facts on Degenerate dimensions because for instance total sales by sales order number doesn’t mean anything.

Written by sqlscrapbook

November 23, 2010 at 1:50 pm

Posted in Data Warehousing