I recently came across the need to build a screening pattern based on the Kimball “Screening” concept. One of the desired outputs was a Power BI report that allowed a data steward to easily identify the failed screen, drill down to the detail and show the row(s) rejected by the data quality rule. To achieve this goal, I had to mimic in Power BI an SSRS functionality called dynamic dataset, which allowed me to, using a single matrix, show different source columns based on a selected screen. Feeling curious already? Then, let’s dive into the details.
A screen is designed to operate on a single input file or database table and contains the data quality condition to check. For example, there could be a Missing Customer Postcode screen which would test for any customers who are missing a postcode.
Let’s assume I have two SQL tables, Customer and ShippingAddress. Following the above logic, I would have two screens with the following conditions: SELECT * FROM Stage.Customer WHERE Postcode IS NULL and SELECT * FROM Stage.ShippingAddress WHERE Postcode IS NULL. The output of these screens would be two different structures, as shown below:
To allow the data steward to drill down to the detail, the output of the screen had to be stored in a single table. To achieve this, the output of the screens were converted to XML and stored in a column with the XML data type. One important point to note is that XML does not support NULL fields, which means that, if the Postcode is not converted to a NULL string, the column will not be captured in the XML structure.
The next step can either be completed in the database or in Power BI. The idea is to extract the XML structure and then unpivot the column names to a column named Attributes and the values to a column named Values. To do that in Power BI, start by importing the table holding the XML results and then Transform the Data as followed:
Parse the column XmlResults to XML
Expand the Table columns
Expand the Table columns one more time
Rename the column names to remove the prefix “XmlResults.Table.Attribute”. Once completed, select all the columns and Unpivot them.
You can see that for Customer, only the relevant columns were selected
Close and apply the transformations. Add a matrix to the canvas and add “Attribute” to “Columns” and “Value” to “Rows”. When selecting a screen, only the relevant columns are shown. This happens because the Columns in the matrix are set to “hide items with no data”.
As always, if you have any questions, feel free to leave a comment.
Meet the Team – Catherine Sachdev, Marketing Assistant
Next up we’re introducing you to Catherine Sachdev. Catherine joined us just over a year
Jan
Data Lineage with Azure Purview
I wrote an introductory Purview blog post previously, where I explored what the tool is
Jan
The Next Era of Retail: How Technology is driving change in a COVID-19 World
The retail sector is of great importance and accounts for almost 5% of GDP and
Jan
Meet the Team – Alex Kordbacheh, Junior Consultant
It’s time for another Meet the Team blog! This time we’re introducing you to Alex
Dec
Use cases for Recursive CTEs
Introductions Recursive CTEs are a way to reference a query over and over again, until
Dec
Azure Sentinel is named a ‘Leader’ in the Forrester Research Wave Report
Microsoft have recently announced that they have been named a Leader by Forrester Research in
Dec
Getting Started with Azure Purview
Azure Purview (a.k.a Data Catalog gen 2) has been released into preview and is currently
2 Comments
Dec
An Introduction to ApexSQL Complete – Integration with SSMS
We all know that the idea of add-ins is to make our lives easier. In
Dec