I recently came across the need to build a screening pattern based on the Kimball “Screening” concept. One of the desired outputs was a Power BI report that allowed a data steward to easily identify the failed screen, drill down to the detail and show the row(s) rejected by the data quality rule. To achieve this goal, I had to mimic in Power BI an SSRS functionality called dynamic dataset, which allowed me to, using a single matrix, show different source columns based on a selected screen. Feeling curious already? Then, let’s dive into the details.
A screen is designed to operate on a single input file or database table and contains the data quality condition to check. For example, there could be a Missing Customer Postcode screen which would test for any customers who are missing a postcode.
Let’s assume I have two SQL tables, Customer and ShippingAddress. Following the above logic, I would have two screens with the following conditions: SELECT * FROM Stage.Customer WHERE Postcode IS NULL and SELECT * FROM Stage.ShippingAddress WHERE Postcode IS NULL. The output of these screens would be two different structures, as shown below:
To allow the data steward to drill down to the detail, the output of the screen had to be stored in a single table. To achieve this, the output of the screens were converted to XML and stored in a column with the XML data type. One important point to note is that XML does not support NULL fields, which means that, if the Postcode is not converted to a NULL string, the column will not be captured in the XML structure.
The next step can either be completed in the database or in Power BI. The idea is to extract the XML structure and then unpivot the column names to a column named Attributes and the values to a column named Values. To do that in Power BI, start by importing the table holding the XML results and then Transform the Data as followed:
Parse the column XmlResults to XML
Expand the Table columns
Expand the Table columns one more time
Rename the column names to remove the prefix “XmlResults.Table.Attribute”. Once completed, select all the columns and Unpivot them.
You can see that for Customer, only the relevant columns were selected
Close and apply the transformations. Add a matrix to the canvas and add “Attribute” to “Columns” and “Value” to “Rows”. When selecting a screen, only the relevant columns are shown. This happens because the Columns in the matrix are set to “hide items with no data”.
As always, if you have any questions, feel free to leave a comment.
Databricks Vs Synapse Spark Pools – What, When and Where?
Databricks or Synapse seems to be the question on everyone’s lips, whether its people asking
May
Power BI to Power AI – Part 2
This post is the second part of a blog series on the AI features of
Apr
Geospatial Sample architecture overview
The first blog ‘Part 1 – Introduction to Geospatial data’ gave an overview into geospatial
Apr
Data Lakehouses for Dummies
When we are thinking about data platforms, there are many different services and architectures that
Apr
Enable Smart Facility Management with Azure Digital Twins
Before I started writing this blog, I went to Google and searched for the keywords
Apr
Migrating On-Prem SSIS workload to Azure
Goal of this blog There can be scenario where organization wants to migrate there existing
Mar
Send B2B data with Azure Logic Apps and Enterprise Integration Pack
After creating an integration account that has partners and agreements, we are ready to create
Mar
Incremental Group is acquired by Telefónica Tech
Incremental’s acquisition by Telefónica Tech powers the next phase of growth for the digital technology
Mar