I recently came across the need to build a screening pattern based on the Kimball “Screening” concept. One of the desired outputs was a Power BI report that allowed a data steward to easily identify the failed screen, drill down to the detail and show the row(s) rejected by the data quality rule. To achieve this goal, I had to mimic in Power BI an SSRS functionality called dynamic dataset, which allowed me to, using a single matrix, show different source columns based on a selected screen. Feeling curious already? Then, let’s dive into the details.
A screen is designed to operate on a single input file or database table and contains the data quality condition to check. For example, there could be a Missing Customer Postcode screen which would test for any customers who are missing a postcode.
Let’s assume I have two SQL tables, Customer and ShippingAddress. Following the above logic, I would have two screens with the following conditions: SELECT * FROM Stage.Customer WHERE Postcode IS NULL and SELECT * FROM Stage.ShippingAddress WHERE Postcode IS NULL. The output of these screens would be two different structures, as shown below:
To allow the data steward to drill down to the detail, the output of the screen had to be stored in a single table. To achieve this, the output of the screens were converted to XML and stored in a column with the XML data type. One important point to note is that XML does not support NULL fields, which means that, if the Postcode is not converted to a NULL string, the column will not be captured in the XML structure.
The next step can either be completed in the database or in Power BI. The idea is to extract the XML structure and then unpivot the column names to a column named Attributes and the values to a column named Values. To do that in Power BI, start by importing the table holding the XML results and then Transform the Data as followed:
Parse the column XmlResults to XML
Expand the Table columns
Expand the Table columns one more time
Rename the column names to remove the prefix “XmlResults.Table.Attribute”. Once completed, select all the columns and Unpivot them.
You can see that for Customer, only the relevant columns were selected
Close and apply the transformations. Add a matrix to the canvas and add “Attribute” to “Columns” and “Value” to “Rows”. When selecting a screen, only the relevant columns are shown. This happens because the Columns in the matrix are set to “hide items with no data”.
As always, if you have any questions, feel free to leave a comment.
Pareto Charts in Power BI and the DAX behind them
The Pareto principle, commonly referred to as the 80/20 rule, is a concept of prioritisation.
Apr
Databricks: Cluster Configuration
Databricks, a cloud-based platform for data engineering, offers several tools that can be used to
Apr
AI Assistance in Microsoft Fabric
The exponential growth of Large Language Models (LLMs) couples with Microsoft’s close partnership with OpenAI
Apr
10 reasons why it’s worth the effort to understand the value of your data
“If leaders really want to create a data driven culture, the journey starts with them!
Apr
Content Safety in Azure AI Studio
Azure AI Content Safety is a solution designed to identify harmful content, whether generated by
Apr
Model Benchmarks in Azure AI Studio
In the constantly changing field of artificial intelligence (AI) and machine learning (ML), choosing the
Apr
Celebrating International Women’s Day: from Classroom to Code
As we celebrate International Women’s Day, I want to share my journey of breaking stereotypes
Mar
Pretty Power BI – Adding Pagination to Bar Charts
Good User Experience (UX) design is crucial in enabling stakeholders to maximise the insights that
Feb