Recently I was working on a project I noticed the many of the packages were using the below pattern to deal with failed lookups. I have seen this in many organisations, whilst this isn’t something new I wanted to write about it as it seems many people are unaware there is a better performing alternative.
What I have found is that when handling failed lookups many people use a pattern which looks like the below:
On initial inspection this may seem like a sensible way to deal with unknown members, it performs the required lookup and enables the calculation of an Unknown key where the lookup fails. However we must remember the Union All transformation is semi-blocking asynchronous transformation (it slows the flow of data as it passes through – see bottom of the post for further information). As such if we use this transformation it will decrease performance of SSIS packages and we should avoid it’s use wherever a better alternative exists.
As it happens there is a very simple alternative to this pattern. We set the lookup to ignore failure, rows where the lookup fails will have a null value.
All rows now flow down the Lookup Match Output with a null value where the lookup failed, the final step is to add a derived column transformation prior to the insert into the target table to replace the null values using a formula as per the below:
The package now looks like this:
When I run both versions of the package with my test data (10,000 rows in which 50% of the lookups fail) it is 30% faster when the Union All transform is avoided. It also creates a simpler flow which is easier to follow. For a full list of the SSIS components categorised into Non-blocking, Semi-blocking and Fully-blocking there is a great blog post on this here: http://sqlblog.com/blogs/jorg_klein/ .
Introduction to Data Wrangler in Microsoft Fabric
What is Data Wrangler? A key selling point of Microsoft Fabric is the Data Science
Jul
Autogen Power BI Model in Tabular Editor
In the realm of business intelligence, Power BI has emerged as a powerful tool for
Jul
Microsoft Healthcare Accelerator for Fabric
Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced
Jul
Unlock the Power of Colour: Make Your Power BI Reports Pop
Colour is a powerful visual tool that can enhance the appeal and readability of your
Jul
Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2
Part 2: Exploring Advanced Functionalities in Databricks Welcome back to our Databricks journey! In this
May
GPT-4 with Vision vs Custom Vision in Anomaly Detection
Businesses today are generating data at an unprecedented rate. Automated processing of data is essential
May
Exploring DALL·E Capabilities
What is DALL·E? DALL·E is text-to-image generation system developed by OpenAI using deep learning methodologies.
May
Using Copilot Studio to Develop a HR Policy Bot
The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot
Apr