Every professional (who has worked with SQL Server) will have used a Join operator. However, the same cannot be said about the APPLY operator. Although APPLY was introduced back in SQL Server 2005, there are still a number of developers who have barely seen the syntax, let alone utilised it’s capabilities.
This blog focuses solely on INNER JOINS vs CROSS APPLY and when it is more beneficial to use the latter. I will provide additional links at the end of the post, which will go into other features of APPLY, such as performance benefits against PIVOT’s.
This blog has been written with the assumption that the reader already knows about Joins. If you would like to refresh your memory, click here.
What is APPLY?
The original purpose of the APPLY operator was to use them with table-valued functions (TVF’s).
The MSDN definition is:
“The APPLY operator is similar to the JOIN operator, but the difference is that the right-hand side operator of APPLY can reference columns from the left-hand side”.
In simple terms, a join relies on self-sufficient sets of data, i.e. sets should not depend on each other. On the other hand, CROSS APPLY is only based on one predefined set and can be used with another separately created set. A worked example should help with understanding this difference.
Example Problem
Scenario: Find the last 3 orders from all customers that live in the UK.
Database: AdverntureWorksDW2014 – Download the free Microsoft database here.
Tables: dbo.FactInternetSales
dbo.DimCustomer
1. Create a function that accepts CustomerKey and the Top(N) orders (e.g. last 3 orders) to be returned.
USE [AdventureWorksDW2014]
GO
CREATE FUNCTION dbo.GetTopOrders (@CustomerKey INT, @N INT) RETURNS TABLE
AS
RETURN
— find last 3 orders, placed on different days
SELECT DISTINCT TOP(@N)
SalesOrderNumber,
OrderDate,
CustomerKey
FROM [dbo].[FactInternetSales]
WHERE CustomerKey = @CustomerKey
ORDER BY OrderDate DESC, SalesOrderNumber DESC
2. Using traditional CROSS JOIN approach. The C.CustomerKey reference (in the CROSS JOIN & function) is intended to dynamically pass through every customer, with the ‘3’ being the number of orders to show per customer.
SELECT C.CustomerKey,
C.FirstName,
C.LastName,
C.EmailAddress,
O.SalesOrderNumber,
O.OrderDate
FROM dbo.DimCustomer C
CROSS JOIN
dbo.GetTopOrders(C.CustomerKey,3) AS O
The reason why this fails is because C.CustomerKey is referencing the left set (DimCustomer), but does not exist itself in the right set (the Function). Joins require pre-defined sets from both sides, which is not how this query works.
3. Using CROSS APPLY. This operator reference the left set first, before then checking the second set against the first one. The second set runs a row by row basis, passing the CustomerKey in a type of recursive variable.
SELECT C.CustomerKey,
C.FirstName,
C.LastName,
C.EmailAddress,
O.SalesOrderNumber,
O.OrderDate
FROM dbo.DimCustomer C
CROSS APPLY
dbo.GetTopOrders(C.CustomerKey,3) AS O
— 26760 rows affected
The only change to the code is changing CROSS JOIN to CROSS APPLY. Although the sets are analysed separately, you can still return data from both – just like a Join.
4. The above example would also apply to an INNER JOIN. The left set only returns data if it matches with the right side, so if there is not a match, neither set is applied. A way to ensure all records are returned, is to use OUTER APPLY.
For the purpose of this article, I have added a new customer (not shown below) to the dbo.Customers table, who has yet to place an order. We can now run the original CROSS APPLY query, but as an OUTER APPLY instead.
SELECT C.CustomerKey,
C.FirstName,
C.LastName,
C.EmailAddress,
O.SalesOrderNumber,
O.OrderDate
FROM dbo.DimCustomer C
OUTER APPLY
dbo.GetTopOrders(C.CustomerKey,3) AS O
WHERE O.SalesOrderNumber IS NULL
The additional ‘Where’ clause specifically returns the non-matching record, but in total 26761 rows where affected.
Conclusion
As demonstrated, CROSS APPLY is a very useful operator when referencing a Table Function or filtering on a subset of data. The key benefit is the ability to use one defined set with another separately created set. Unlike Joins, you can define a dynamic subset of data to match with the outer query.
APPLY is a very underrated and underused within the SQL Server Community. Look out for future blogs, where I demonstrate the uses of the other APPLY operators and how they can produce more efficient queries.
References
For further reading on the keywords in this article, try the recommended links below:
1. Boost your T-SQL with CROSS APPLY – http://www.microsoftvirtualacademy.com/training-courses/boost-your-t-sql-with-the-apply-operator
2. CROSS APPLY in SQL Server – http://www.microsoftvirtualacademy.com/training-courses/boost-your-t-sql-with-the-apply-operator
3. Using APPLY – https://technet.microsoft.com/en-us/library/ms175156(v=sql.105).aspx
4. INNER JOIN VS CROSS APPLY – http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/
AI Assistance in Microsoft Fabric
The exponential growth of Large Language Models (LLMs) couples with Microsoft’s close partnership with OpenAI
Apr
10 reasons why it’s worth the effort to understand the value of your data
“If leaders really want to create a data driven culture, the journey starts with them!
Apr
Content Safety in Azure AI Studio
Azure AI Content Safety is a solution designed to identify harmful content, whether generated by
Apr
Model Benchmarks in Azure AI Studio
In the constantly changing field of artificial intelligence (AI) and machine learning (ML), choosing the
Apr
Celebrating International Women’s Day: from Classroom to Code
As we celebrate International Women’s Day, I want to share my journey of breaking stereotypes
Mar
Pretty Power BI – Adding Pagination to Bar Charts
Good User Experience (UX) design is crucial in enabling stakeholders to maximise the insights that
Feb
Pretty Power BI – Creating Dynamic Histograms
Good User Experience (UX) design is crucial in enabling stakeholders to maximise the insights that
Feb
Top Tips to Pass the Databricks Certified Data Engineer Professional Exam
Having recently passed the Databricks Certified Data Engineer Professional exam, this blog post covers some
Jan