Every once and a while we need a “classification dimension” table in Data Warehouse. For example when we have a basic sales fact table and column like “(Sold)Quantity” that tells how many items have been sold within the sales transaction. The data in the column varies between 1 and 10 000 (actually there is no absolute maximum value but just to be clear here). Users often want to have different classifications depending on the Quantity column for example:
- 1-10: Tiny sales
- 11-50: Small sales
- 51-100: Medium
- 101 – 500: Huge
- 501 – nnn: Enormous
Respectively our dimension table D_SALES_SIZE_CLASS would look like this:
CLASS_ID | CLASS_NAME |
1 | Tiny |
2 | Small |
3 | Medium |
4 | Huge |
5 | Enormous |
The problem often is that if we go by the book we need to create reference id column eg. “SALES_SIZE_CLASS_ID” in the fact table and then load the whole sales fact table from scratch. In big environments this could be a bit of a problem because the reloading could take hours and you might need to arrange a downtime window for the production DW.An alternative is use computed columns (virtual columns) and a CASE – statement to get the correct value for SALES_SIZE_CLASS_ID then we avoid the reloading of the fact table.
But, a more comprehensive way of doing this is not to alter the sales fact table at all but doing all the magic in the dimension table by generating as much rows in the dimension table as there are distinct values in the Quantity-column of the sales fact table. Like this:
QuantityAmount | CLASS_ID | CLASS_NAME |
1 | 1 | Tiny |
2 | 1 | Tiny |
3 | 1 | Tiny |
4 | 1 | Tiny |
5 | 1 | Tiny |
6 | 1 | Tiny |
7 | 1 | Tiny |
8 | 1 | Tiny |
9 | 1 | Tiny |
10 | 1 | Tiny |
11 | 2 | Small |
12 | 2 | Small |
13 | 2 | Small |
14 | 2 | Small |
… | … | … |
48 | 2 | Small |
49 | 2 | Small |
50 | 2 | Small |
51 | 3 | Medium |
52 | 3 | Medium |
53 | 3 | Medium |
… | … | … |
… | … | … |
9999 | 5 | Enormous |
10000 | 5 | Enormous |
Whenever the sales classification requirements are changing we only need to modify the dimension table rather than loading the possibly enormous sales fact table once again from the scratch.
Making it customizable and accessible by end-users with MDS
After it is implemented in DW why not make it customizable by end-users so they can change the classification whenever there is need. For that MDS comes once again handy. Traditionally one would dig and hide this deep into the logic of ETL and whenever there is a need for change you need an IT specialist to modify the ETL scripts. Now we just need to set up a special parameter entity in MDS and add some useful attributes like this:
Create new entity PARAM_SALES_SIZE_CLASS with attributes MIN_VALUE and MAX_VALUE to define the limit values for each class. Very easily editable by end-users. Just make sure that the values don’t overlap each other.
Then it is time for some SQL magic. We generate 10 000 rows (or any other value that represents the biggest possible sales quantity size) and assign the class levels by using the minimum and maximum values defined.
This will generate 10 000 rows that are ready to insert to our dimension table (e.g. D_SALES_SIZE_CLASS). We now have fully customizable dimension table that can be edited whenever the business users want to and when they need a different point of view for their analysis.
Leave a Reply