Introduced in SQL 2008 the merge function is a useful way of inserting, updating and deleting data inside one SQL statement.
In the example below I have 2 tables one containing historical data using type 2 SCD (Slowly changing dimensions) called DimBrand and another containing just the latest dimension data called LatestDimBrand. Using the merge function I will insert new records from LatestDimBrand into DimBrand, I will archive (apply an end date) to any DimBrand records which do not appear in the latest data, and finally enter a new record and archive the old record for any Brands which may have changed.
DimBrand (the target of our Inserts, Updates, Deletes) and DimLatestBrand (the source for Inserts,Updates,Deletes):
The merge code in it’s entirety:
INSERT #DimBrand ([BrandCode],[BrandName],[StartDate]) SELECT [BrandCode],[BrandName],getdate() FROM ( MERGE #DimBrand AS Target USING ( SELECT [BrandCode],[BrandName],[StartDate],[EndDate] FROM #LatestDimBrand ) AS Source ON (Target.[BrandCode] = Source.[BrandCode]) ------------------------------- WHEN MATCHED AND Target.[BrandName] <> Source.[BrandName] THEN UPDATE SET Target.[EndDate] = getdate() ------------------------------- WHEN NOT MATCHED BY TARGET THEN INSERT ( [BrandCode] ,[BrandName] ,[StartDate] ,[EndDate] ) VALUES ( Source.[BrandCode], Source.[BrandName], Source.[StartDate], Source.[EndDate] ) ------------------------------- WHEN NOT MATCHED BY SOURCE THEN UPDATE SET Target.[EndDate] = getdate() ------------------------------- OUTPUT $Action, Source.* ) As i([Action],[BrandCode],[BrandName],[StartDate],[EndDate]) ------------------------------- WHERE [Action] = 'UPDATE' AND BrandCode IS NOT NULL
The insert statement, although appearing at the top of the SQL statement is the last thing to be executed and uses the results set of the merge function specified in the OUTPUT clause further down.
Next is the start of our merge statement here we specify a Target and a Source table. We also specify which columns the 2 tables should match on, in this case only the BrandCode.
The next step is to specify and handle any matches, here we are looking for Brand Code’s which appear in both tables, but with differing Brand Names. If a match is found the row in the target table is given an end date. As with the following “NOT MATCHED” section inserts,updates or deletes can be applied to either table here. This handles the first part of our Slowly Changing Dimension requirement.
Where there is no match between the Target table and the Source table, the relevant records from the Source table are inserted into the Target table.
The penultimate part of the merge statement takes any Target records which do not appear in our “Latest” Source table and sets an EndDate for them as they are no longer a valid record.
The OUTPUT clause populates the outer INSERT statement, here the Source rows which are involved in the UPDATE statements above are pulled out to the outer FROM clause. We have also selected $Action so that we can filter out the INSERTED rows under the NOT MATCHED by Target statement.
The result of all the above is the brand “Googles” is archived along with the “MikeRowSoft” which as you can see from the BrandCode has been rebranded as MickRowSoft.
Introduction to Data Wrangler in Microsoft Fabric
What is Data Wrangler? A key selling point of Microsoft Fabric is the Data Science
Jul
Autogen Power BI Model in Tabular Editor
In the realm of business intelligence, Power BI has emerged as a powerful tool for
Jul
Microsoft Healthcare Accelerator for Fabric
Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced
Jul
Unlock the Power of Colour: Make Your Power BI Reports Pop
Colour is a powerful visual tool that can enhance the appeal and readability of your
Jul
Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2
Part 2: Exploring Advanced Functionalities in Databricks Welcome back to our Databricks journey! In this
May
GPT-4 with Vision vs Custom Vision in Anomaly Detection
Businesses today are generating data at an unprecedented rate. Automated processing of data is essential
May
Exploring DALL·E Capabilities
What is DALL·E? DALL·E is text-to-image generation system developed by OpenAI using deep learning methodologies.
May
Using Copilot Studio to Develop a HR Policy Bot
The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot
Apr