Query Folding in Power Query

Query Folding in Power Query isn’t an entirely new concept but when I first heard of it, I thought “What is Query Folding”?  There are probably a number of you out there who think the same, so let me explain it.

When carrying out transformations (e.g. Sort) in Power Query, it is possible that some will be sent back to the source.  In other words, Power Query doesn’t attempt to sort millions of records within the application itself, but will use the source to undertake the work.  This can drastically improve performance within Power Query.

Examples

I will now demonstrate some basic Query Folding techniques in Power Query.  Until recently, it wasn’t easy to know if a transformation was performing Query Folding.  This blog primarily talks you through monitoring outside of Power Query, but I will also touch upon how this is now possible to do within the application.

External Monitoring Tool

The monitoring tool used in this example if SQL Server Profiler.  For a simple video walkthrough on how to use Profiler, click here.

1.       Start a profiler session, ensuring you connect to the server where the AdventureWorksDW2014 Database resides – available to download here.

2.       Now we can connect to the database in Power Query and more specifically to the dbo.FactProductInventory table.

3.       I will perform three Power Query transformations:

a.       Filter Rows

                                                               i.      ProductTotal > £50000

b.      Group By

                                                               i.      DimProduct.EnglishProductName

                                                             ii.      SUM(Unit Cost) As ProductTotal

c.       Transform > Trim

                                                               i.      DimProduct.Status

NOTE:   This blog does not show you how to carry out the Power Query transformations but the workbook can be provided on request.

4.       The transformed query looks like the below:

 

clip_image002

 

5.       Now let’s take a look.

 

clip_image004
clip_image005

clip_image006

Points of Note:

1.       The Filter Rows Power Query transformation on TotalUnitCost is implemented as a WHERE clause in SQL Server.

2.       The Group By also becomes a T-SQL GROUP BY.  With the SUM function also being applied to find the total value for the EnglishProductName Group By.

3.       The T-SQL functions LTRIM/RTRIM are combined together on the Status column, replicating the functionality of the Trim feature in Power Query.

Within Power Query

Rather than show you this feature myself, I will point you to a great article by Chris Webb, which describes how you can easily check for Query Folding within your Power Query workbook.

In summary, there is an out of the box M function called GetMetaData that can be nested within your transformations.  This will tell you the source of your query and if it has been query folded.

Supported Transformations

Here are some other transformation types in Power Query that support Query Folding:

          Filtering (on rows or columns)

          Joins

          Aggregates and GROUP BY

          Pivot and unpivot

          Numeric calculations

          Simple transformations, such as UPPER

Current Limitations

Although there is still no official list (from Microsoft) of the sources that support Query Folding, here are some of the ones currently known:

          Relational sources (SQL Server, Oracle).

          OData sources (e.g. Azure Marketplace)

          Active Directory

          Exchange

          HDFS, Folder.Files and Folder.Contents

This means all a lot of other data sources do not support Query Folding.  For example, a flat file does not support Keep Top 5 Rows in Power Query, whereas SQL Server would use the TOP function or apply a filter in a WHERE clause. Whilst the lack of supported sources can be seen as a limitation, I would argue that any form of Query Folding, even if it were just SQL Server, is a big plus point.

You can deliberately prevent query folding, although the only time I see a benefit is if you are connecting to a server running at full capacity or during a large ETL.  There are far more instances when Power Query will not apply query folding, because of limitations to the sources functionality.  Check out Ken Verbeek’s blog if you would like know more on this.

Conclusion

This article is just scratching the surface of how beneficial Query Folding in Power Query can be.  The key takeaway is remembering to put the transformation steps (that can be folded back to source) at the beginning.  Steps that cannot be folded should be applied as late as possible.  If this is adhered to, you will see huge performance benefits – especially on large datasets.

If anyone out there would like to share other data sources that use Query Folding or just have a general interest, feel free to comment below.