Joins vs CROSS APPLY

Every professional (who has worked with SQL Server) will have used a Join operator.  However, the same cannot be said about the APPLY operator.  Although APPLY was introduced back in SQL Server 2005, there are still a number of developers who have barely seen the syntax, let alone utilised it’s capabilities.

This blog focuses solely on INNER JOINS vs CROSS APPLY and when it is more beneficial to use the latter.  I will provide additional links at the end of the post, which will go into other features of APPLY, such as performance benefits against PIVOT’s.

This blog has been written with the assumption that the reader already knows about Joins.  If you would like to refresh your memory, click here.

What is APPLY?

The original purpose of the APPLY operator was to use them with table-valued functions (TVF’s).

The MSDN definition is:

“The APPLY operator is similar to the JOIN operator, but the difference is that the right-hand side operator of APPLY can reference columns from the left-hand side”.

 

In simple terms, a join relies on self-sufficient sets of data, i.e. sets should not depend on each other.  On the other hand, CROSS APPLY is only based on one predefined set and can be used with another separately created set.  A worked example should help with understanding this difference. 

Example Problem

Scenario:             Find the last 3 orders from all customers that live in the UK.
Database:            AdverntureWorksDW2014 – Download the free Microsoft database
here.
Tables:                 dbo.FactInternetSales

                              dbo.DimCustomer

1.      Create a function that accepts CustomerKey and the Top(N) orders (e.g. last 3 orders) to be returned.

 

USE [AdventureWorksDW2014]

GO

CREATE FUNCTION dbo.GetTopOrders (@CustomerKey INT, @N INT) RETURNS TABLE

AS

RETURN

— find last 3 orders, placed on different days

       SELECT DISTINCT TOP(@N)   

                    SalesOrderNumber,

                    OrderDate,

                    CustomerKey

       FROM   [dbo].[FactInternetSales]

       WHERE  CustomerKey = @CustomerKey

       ORDER BY OrderDate DESC, SalesOrderNumber DESC

2.    Using traditional CROSS JOIN approach.  The C.CustomerKey reference (in the CROSS JOIN & function) is intended to dynamically pass through every customer, with the ‘3’ being the number of orders to show per customer.

SELECT       C.CustomerKey,

             C.FirstName,

             C.LastName,

             C.EmailAddress,

             O.SalesOrderNumber,

             O.OrderDate

FROM         dbo.DimCustomer C

CROSS JOIN

             dbo.GetTopOrders(C.CustomerKey,3) AS O

 

clip_image001

The reason why this fails is because C.CustomerKey is referencing the left set (DimCustomer), but does not exist itself in the right set (the Function).  Joins require pre-defined sets from both sides, which is not how this query works.

3.     Using CROSS APPLY.  This operator reference the left set first, before then checking the second set against the first one.  The second set runs a row by row basis, passing the CustomerKey in a type of recursive variable.

SELECT C.CustomerKey,

             C.FirstName,

             C.LastName,

             C.EmailAddress,

             O.SalesOrderNumber,

             O.OrderDate

FROM   dbo.DimCustomer C

CROSS APPLY

             dbo.GetTopOrders(C.CustomerKey,3) AS O

 

CustomerKey in a type of recursive variable

— 26760 rows affected

The only change to the code is changing CROSS JOIN to CROSS APPLY.  Although the sets are analysed separately, you can still return data from both – just like a Join. 

4.     The above example would also apply to an INNER JOIN.  The left set only returns data if it matches with the right side, so if there is not a match, neither set is applied.  A way to ensure all records are returned, is to use OUTER APPLY.

For the purpose of this article, I have added a new customer (not shown below) to the dbo.Customers table, who has yet to place an order.  We can now run the original CROSS APPLY query, but as an OUTER APPLY instead.

SELECT C.CustomerKey,

             C.FirstName,

             C.LastName,

             C.EmailAddress,

             O.SalesOrderNumber,

             O.OrderDate

FROM   dbo.DimCustomer C

OUTER APPLY

             dbo.GetTopOrders(C.CustomerKey,3) AS O

WHERE  O.SalesOrderNumber IS NULL

 

run the original CROSS APPLY query, but as an OUTER APPLY instead.

The additional ‘Where’ clause specifically returns the non-matching record, but in total 26761 rows where affected.

Conclusion

As demonstrated, CROSS APPLY is a very useful operator when referencing a Table Function or filtering on a subset of data.  The key benefit is the ability to use one defined set with another separately created set.  Unlike Joins, you can define a dynamic subset of data to match with the outer query.

APPLY is a very underrated and underused within the SQL Server Community.  Look out for future blogs, where I demonstrate the uses of the other APPLY operators and how they can produce more efficient queries.

References

For further reading on the keywords in this article, try the recommended links below:

1.      Boost your T-SQL with CROSS APPLY – http://www.microsoftvirtualacademy.com/training-courses/boost-your-t-sql-with-the-apply-operator

2.      CROSS APPLY in SQL Server – http://www.microsoftvirtualacademy.com/training-courses/boost-your-t-sql-with-the-apply-operator

3.      Using APPLY – https://technet.microsoft.com/en-us/library/ms175156(v=sql.105).aspx

4.      INNER JOIN VS CROSS APPLY – http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/