I’ve seen a few data quality issues recently in a some client data warehouses/marts where I’ve been asked to carry out some Analysis Services work. These data quality issues have often led to the classic SSAS dimension processing error, namely:
“Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: ‘dbo_DimProduct’, Column: ‘SubCategory’, Value: ‘Other’. The attribute is ‘Sub Category'”
Normally this error is easy to track down, but there are a few scenarios that I’ve encountered where SSAS attribute properties actually make it harder for you to troubleshoot.
Scenario 1 – Trimming
When hitting the above error, the first thing to do would be to run a query to see where the duplicate attribute is. In my example, I’ve just produced a very simple dimension that’s loosely based on Adventure Works – here are the attribute relationships:
Therefore the first place to start would be to write a query that checks if we have any Sub Categories that are duplicated across Categories, such as:
SELECT SubCategory FROM DimProduct GROUP BY SubCategory HAVING COUNT(DISTINCT Category) > 1
In my case, this actually yields no results. So what’s happening here? We can’t find any duplicates in SQL, but SSAS thinks there are duplicates. Well….Analysis Services dimension attributes contain a property called Trimming, which, by default, will remove spaces at the end of an attribute key or name. Needless to say, a dimension shouldn’t actually contain leading or trailing spaces, but in this specific scenario, the ETL wasn’t as robust as it could be. In the actual example I encountered, there was a tab at the end of one of the attribute names. To visualise that in the DimProduct example that I’ve created, the highlighted attribute has a tab at the end:
Scenario 2 – NullProcessing
The next scenario I want to highlight is the same error, but a different dimension. Again I’m going to create a simple example to illustrate what actually happened. This time I’ve got a customer dimension table, as follows:
The error is: A duplicate attribute key has been found when processing: Table: ‘dbo_DimCustomer’, Column: ‘CustomerCountryKey’, Value: ‘0’. The attribute is ‘Customer Country’
A quick query on the above data will reveal that a CustomerCountryKey of 0 doesn’t actually exist twice for the same country, so what’s happened here? Again its due to a dimension property, this time on the attribute key, called NullProcessing. As its default is Automatic, it means that Nulls will get converted to zero during dimension processing. As we already have a member with CustomerCountryKey of 0, then we get an error.
ETL and Data Quality
These sort of issues highlight just why handling data quality during the ETL is so important! For the first scenario, some basic trimming/cleaning can fix the issue before it hits Analysis Services. As for the last issue, this is partly due to data warehouse/mart design. Personally I would never allow a dimension attribute to be null, partly because you shouldn’t present null attributes to a user, but also due to the lack of control that can result as shown above. Therefore a bit of design work up front to ensure that you have a robust data model is essential. Interestingly the properties that I’ve mentioned don’t exist for tables in Analysis Services Tabular. Whereas I think there are a few SSAS MD features that need to make it over into the next version of Tabular, I can personally live without the two properties outlined above.