Category Archives: Master Data Services

MDS 2012: Collections not supported in the new staging process

There were lots of good improvements and new features in MDS 2012 version but during our current project I found out some interesting things about collections.  According to Microsoft they do have made improvements to the collections GUI (which was crap indeed in MDS 2008R2) but for some reason they didn’t improve the collections importing via staging tables along with the release of the completely new staging process.

I was searching for collection-specific staging tables and confusingly found none. Then I made some findings in the following article: Discontinued Master Data Services Features in SQL Server 2012. There it’s clearly said that:

You cannot use the new staging process to:

  • Create or delete collections.
  • Add members to or remove members from collections.
  • Reactivate members and collections.

You can use the SQL Server 2008 R2 staging process to work with collections.

So it seems like they ran out of time creating new staging process for collections in product development and decided to continue supporting collections import in “2008R2 way”. I bet this will be deprecated in the next version and hopefully they’ll invent a new staging process for collections in the next SQL Server version. Until then I’ll stick up using the old staging tables and good old mdm.udpStagingSweep stored procedure 🙂

Advertisement

MDS auto-generated Code value – how to import new members via staging tables?

Creating the business rule is straightforward but problems start when it comes to importing data via staging tables.

Recently I had to set up an entity that had to have an unique code across the enterprise. In SQL Server 2012 version of MDS this is achieved quite easily just by checking the “Create Code values automatically” checkbox when creating the entity. In 2008 R2 version we have to leverage the business rules feature.

Action "defaults to a generated value" comes handy.

Action “defaults to a generated value” comes handy.

First, create a new business rule and choose the “defaults to a generated value” action and drag Code-attribute into the “Edit action”-section. Starting the incremental generation from number 1 is fine. Now we are ready. At least if we only use MDS web user interface to add new members to our entity. In my case I need to import the data from source system(s) and therefore I need to use staging tables.

Importing via staging tables new members that will have an auto-generated code

First try
One can not insert NULL values into the MemberCode column.

One can not insert NULL values into the MemberCode column.

First of all, how the hell I’m supposed to insert any new members into the staging table since the MemberCode-column is not nullable? Luckily I found this MSDN Forum Thread where I was told to assign the legendary “” – empty string value to MemberCode. OK, works out well but the problem is that I also want to import some additional attributes (other than the mandatory Name and Code attributes) via tblStgMemberAttribute-table and there I need to have the unique MemberCode that is assigned after the model is validated or the business rules are executed. How am I supposed to assign the correct attributes to correct member if I don’t know the unique MemberCode because it’s generated after the model is validated? Now what?

How to import additional attribute values if we don't know the Code value when importing?

How to import additional attribute values if we don’t know the Code value when importing?

Second try

We’re now facing bit of a “Chicken-and-egg problem” here. Luckily I came across another blog post that handled the very same issue. The trick is quite simple now when I think about it afterwards. The point is that when the new member records have been imported to MDS with empty string as their MemberCode value MDS will assign a special system-guid to the newly imported members:

Special "#SYS-"-prefixed GUID Codes are assigned to members that were imported with empty string as membercode.

Special “#SYS-“-prefixed GUID Codes are assigned to members that were imported with empty string as membercode.

These “#SYS-“-prefixed GUID values are only temporary because when the business rules are executed MDS replaces them with the value that we have set when defining the business rule. So how does this help us since the #SYS-values are still assigned AFTER the import and we need them BEFORE we import the rows via tblStgMemberAttribute-table? We simply “emulate” the MDS’s internal process by assigning the system guids by ourselves already in the loading phase by generating the unique system guids for example using the newid()-function in SQL Server.

Generating the #SYS GUID on our own by newid() function call.

Generating the #SYS GUID on our own by newid() function call.

Now, by assigning the system GUID ourselves to the tblStgMember and tblStgMemberAttribute tables we can successfully import new members with additional attributes and get the correct auto-generated code values by validating the model.


Pimp my Data Warehouse: MDS-parametrized dimension tables

Every once and a while we need a “classification dimension” table in Data Warehouse. For example when we have a basic sales fact table and column like “(Sold)Quantity” that tells how many items have been sold within the sales transaction. The data in the column varies between 1 and 10 000 (actually there is no absolute maximum value but just to be clear here). Users often want to have different classifications depending on the Quantity column for example:

  • 1-10: Tiny sales
  • 11-50: Small sales
  • 51-100: Medium
  • 101 – 500: Huge
  • 501 – nnn: Enormous

Respectively our dimension table D_SALES_SIZE_CLASS would look like this:

CLASS_ID CLASS_NAME
1 Tiny
2 Small
3 Medium
4 Huge
5 Enormous

The problem often is that if we go by the book we need to create reference id column eg. “SALES_SIZE_CLASS_ID” in the fact table and then load the whole sales fact table from scratch. In big environments this could be a bit of a problem because the reloading could take hours and you might need to arrange a downtime window for the production DW.An alternative is use computed columns (virtual columns) and a CASE – statement to get the correct value for SALES_SIZE_CLASS_ID then we avoid the reloading of the fact table.

But, a more comprehensive way of doing this is not to alter the sales fact table at all but doing all the magic in the dimension table by generating as much rows in the dimension table as there are distinct values in the Quantity-column of the sales fact table. Like this:

QuantityAmount CLASS_ID CLASS_NAME
1 1 Tiny
2 1 Tiny
3 1 Tiny
4 1 Tiny
5 1 Tiny
6 1 Tiny
7 1 Tiny
8 1 Tiny
9 1 Tiny
10 1 Tiny
11 2 Small
12 2 Small
13 2 Small
14 2 Small
48 2 Small
49 2 Small
50 2 Small
51 3 Medium
52 3 Medium
53 3 Medium
9999 5 Enormous
10000 5 Enormous

Whenever the sales classification requirements are changing we only need to modify the dimension table rather than loading the possibly enormous sales fact table once again from the scratch.

Making it customizable and accessible by end-users with MDS

After it is implemented in DW why not make it customizable by end-users so they can change the classification whenever there is need. For that MDS comes once again handy. Traditionally one would dig and hide this deep into the logic of ETL and whenever there is a need for change you need an IT specialist to modify the ETL scripts. Now we just need to set up a special parameter entity in MDS and add some useful attributes like this:

Sales classification parameter entity in MDS

Create new entity PARAM_SALES_SIZE_CLASS with attributes MIN_VALUE and MAX_VALUE to define the limit values for each class. Very easily editable by end-users. Just make sure that the values don’t overlap each other.

SQL script to generate the sales classification dimension table rows

Then it is time for some SQL magic. We generate 10 000 rows (or any other value that represents the biggest possible sales quantity size) and assign the class levels by using the minimum and maximum values defined.

SQL query results ready for loading to dimension table

This will generate 10 000 rows that are ready to insert to our dimension table (e.g. D_SALES_SIZE_CLASS). We now have fully customizable dimension table that can be edited whenever the business users want to and when they need a different point of view for their analysis.


Master Data Services 2012 is a must – Huge productivity improvements

Compared to the 2008R2 version you can now save even 75% of development time.

Microsoft just released the new SQL Server 2012 version. There are lots of new features included but one major improvement deals with Master Data Services (MDS) application. It has now moved to its second major release and it is now shipped with SQL Server 2012 Enterprise and Business Intelligence licenses. I don’t go into details that much but what I want to point out here is that the new 2012 version of MDS has remarkable improvements when it comes to user experience and the productivity of development work.

Biggest and maybe the most important improvement is the new Excel-add-in that can be used to modify data but also to create new entities and to import new data from scratch. Where the 2008R2 version relies only on the (clumsy) database staging tables in 2012 one can now use Excel to import batches of new data. This is of course good news for new users that are already familiar with Excel. The second big improvement is the new Silverlight-based web user interface. It has been completely renewed and I can tell that now it is really ready for end users. Unfortunately the initial 2008R2 version still suffers from nasty bugs and the usability is quite poor.

What I will now show you is a comparison between MDS2008R2 and MDS2012 and how they differ from each other when importing new data and creating entities.

Creating new entities and importing data

Let’s face it: I have a simple Product – structure that  grabbed from the Adventureworks – sample database. 5 entities stored in separate csv-files:

  • ProductCategory
  • ProductSubCategory
  • ProductModel
  • Color
  • Product

What we will next do is

  • Create a new Model to store the entities
  • Create 5 entities and the corresponding attributes + relationships: Product – SubCategory – Category, Product – Color and Product – ProductModel
  • Import data into entities
  • Create a derived hierarchy ProductCategory – ProductSubCategory – Product

Master Data Services 2008 R2

Create entities manually

In the old version we have to do it the hard way: first create the new entities in the web UI and then import the data via database staging tables using a proper ETL tool. I preffered SSIS.

This phase took me approximately 6 minutes to complete. Now I have the entities structure ready in MDS so it’s time to import some data.

Import data by using SSIS and the MDS staging tables

This is the most time-consuming part of the process. You have to create separate data flows for each entity and you also must do some unpivoting of the data when importing entity attributes. After each data flow you must go into the Import/Export  – page in the web UI and start the importing process manually. (OK, you can do all this automatically by using the web service calls and all that stuff but this approach is still the fastest way at this point).

Importing the attributes for each entity is bit tricky as you need to unpivot the source data into separate rows.

After a while and some serious SSIS-work we have 5 entities and a nice Product hierarchy set up in MDS:

The bad thing is that it took almost ~ 30 minutes to accomplish all this. With all the work that you have to do in the MDS Web UI we end up with appoximately total 40 minutes of work.

Master Data Services 2012

Create entities and import data

Now we’re talking. With MDS2012 you don’t actually have to separate the creating entities and importing data phases because now you can do them both at the same time! Starting with ProductCategory entity we just import the csv data into our Excel sheet and then connect to the MDS server and hit the Create Entity – button on the MDS ribbon.

Then we just choose right columns for code and name usage and we’re ready. As easy as that !

Handle domain-based attributes

The first thing that came to my mind when hearing about the new Excel-add-in was that if it was capable of dealing with domain-based attributes. And guess what, it sure is. Like we did with ProductCategory entity we do the same for ProductSubCategory. Notice here that there is a relation between SubCategory and Category and we have to handle it also correctly. That can be managed by using the Attribute Properties – functionality in the MDS ribbon.

Choose Constrained list (Domain-based) as the attribute type and populate values straight from the recently created ProductCategory entity:

Nice and easy! We now have succesfully created ProductCategory and ProductSubCategory entities and formed a relationship between. All this in ~ 3 minutes of time. After repeating the same steps for the rest of the entities (ProductModel, Color and Product) we end up with the same result as in 2008R2 but only in 10 minutes of development time. Saving 75% of time compared to the process in MDS2008R2.


MDS 2008R2 MDS 2012
Create model 1 min 1 min
Create entities 6 min
Import data 30 min 10 min
Other 4 min
Total 41 min 11 min

Extra mentions about MDS2012

When talking about how to automate data imports that’s where the staging tables are coming into picture also in MDS 2012. Now it’s also more user friendly since you don’t have to import entity rows and corresponding attributes in separate tables and you don’t have to unpivot the columns into rows in the tblStgAttributes – table. In MDS2012 there is a separate table for each entity and it is 1:1 with the entity definition.

What really completes the whole package is the also brand new Data Quality Services (DQS) application that works nicely together with MDS. More about DQS later …

Summary

When Microsoft launched the initial version of MDS in May 2010 it was a classic “first version” of the (recently acquired)  product: missing features and minor bugs here and there. Now the second major release really finds its place in the hearts of the users and developers since there are lots of really good improvements that make it a better product.

So, if you are planning to start a fresh MDS project my honest advice is: don’t start start with 2008R2, do it with SQL Server 2012.

-GD