Pimp my Data Warehouse: MDS-parametrized dimension tables

Every once and a while we need a “classification dimension” table in Data Warehouse. For example when we have a basic sales fact table and column like “(Sold)Quantity” that tells how many items have been sold within the sales transaction. The data in the column varies between 1 and 10 000 (actually there is no absolute maximum value but just to be clear here). Users often want to have different classifications depending on the Quantity column for example:

  • 1-10: Tiny sales
  • 11-50: Small sales
  • 51-100: Medium
  • 101 – 500: Huge
  • 501 – nnn: Enormous

Respectively our dimension table D_SALES_SIZE_CLASS would look like this:

CLASS_ID CLASS_NAME
1 Tiny
2 Small
3 Medium
4 Huge
5 Enormous

The problem often is that if we go by the book we need to create reference id column eg. “SALES_SIZE_CLASS_ID” in the fact table and then load the whole sales fact table from scratch. In big environments this could be a bit of a problem because the reloading could take hours and you might need to arrange a downtime window for the production DW.An alternative is use computed columns (virtual columns) and a CASE – statement to get the correct value for SALES_SIZE_CLASS_ID then we avoid the reloading of the fact table.

But, a more comprehensive way of doing this is not to alter the sales fact table at all but doing all the magic in the dimension table by generating as much rows in the dimension table as there are distinct values in the Quantity-column of the sales fact table. Like this:

QuantityAmount CLASS_ID CLASS_NAME
1 1 Tiny
2 1 Tiny
3 1 Tiny
4 1 Tiny
5 1 Tiny
6 1 Tiny
7 1 Tiny
8 1 Tiny
9 1 Tiny
10 1 Tiny
11 2 Small
12 2 Small
13 2 Small
14 2 Small
48 2 Small
49 2 Small
50 2 Small
51 3 Medium
52 3 Medium
53 3 Medium
9999 5 Enormous
10000 5 Enormous

Whenever the sales classification requirements are changing we only need to modify the dimension table rather than loading the possibly enormous sales fact table once again from the scratch.

Making it customizable and accessible by end-users with MDS

After it is implemented in DW why not make it customizable by end-users so they can change the classification whenever there is need. For that MDS comes once again handy. Traditionally one would dig and hide this deep into the logic of ETL and whenever there is a need for change you need an IT specialist to modify the ETL scripts. Now we just need to set up a special parameter entity in MDS and add some useful attributes like this:

Sales classification parameter entity in MDS

Create new entity PARAM_SALES_SIZE_CLASS with attributes MIN_VALUE and MAX_VALUE to define the limit values for each class. Very easily editable by end-users. Just make sure that the values don’t overlap each other.

SQL script to generate the sales classification dimension table rows

Then it is time for some SQL magic. We generate 10 000 rows (or any other value that represents the biggest possible sales quantity size) and assign the class levels by using the minimum and maximum values defined.

SQL query results ready for loading to dimension table

This will generate 10 000 rows that are ready to insert to our dimension table (e.g. D_SALES_SIZE_CLASS). We now have fully customizable dimension table that can be edited whenever the business users want to and when they need a different point of view for their analysis.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: