Query Folding in Power Query isn’t an entirely new concept but when I first heard of it, I thought “What is Query Folding”? There are probably a number of you out there who think the same, so let me explain it.
When carrying out transformations (e.g. Sort) in Power Query, it is possible that some will be sent back to the source. In other words, Power Query doesn’t attempt to sort millions of records within the application itself, but will use the source to undertake the work. This can drastically improve performance within Power Query.
Examples
I will now demonstrate some basic Query Folding techniques in Power Query. Until recently, it wasn’t easy to know if a transformation was performing Query Folding. This blog primarily talks you through monitoring outside of Power Query, but I will also touch upon how this is now possible to do within the application.
External Monitoring Tool
The monitoring tool used in this example if SQL Server Profiler. For a simple video walkthrough on how to use Profiler, click here.
1. Start a profiler session, ensuring you connect to the server where the AdventureWorksDW2014 Database resides – available to download here.
2. Now we can connect to the database in Power Query and more specifically to the dbo.FactProductInventory table.
3. I will perform three Power Query transformations:
a. Filter Rows
i. ProductTotal > £50000
b. Group By
i. DimProduct.EnglishProductName
ii. SUM(Unit Cost) As ProductTotal
c. Transform > Trim
i. DimProduct.Status
NOTE: This blog does not show you how to carry out the Power Query transformations but the workbook can be provided on request.
4. The transformed query looks like the below:
5. Now let’s take a look.
Points of Note:
1. The Filter Rows Power Query transformation on TotalUnitCost is implemented as a WHERE clause in SQL Server.
2. The Group By also becomes a T-SQL GROUP BY. With the SUM function also being applied to find the total value for the EnglishProductName Group By.
3. The T-SQL functions LTRIM/RTRIM are combined together on the Status column, replicating the functionality of the Trim feature in Power Query.
Within Power Query
Rather than show you this feature myself, I will point you to a great article by Chris Webb, which describes how you can easily check for Query Folding within your Power Query workbook.
In summary, there is an out of the box M function called GetMetaData that can be nested within your transformations. This will tell you the source of your query and if it has been query folded.
Supported Transformations
Here are some other transformation types in Power Query that support Query Folding:
– Filtering (on rows or columns)
– Joins
– Aggregates and GROUP BY
– Pivot and unpivot
– Numeric calculations
– Simple transformations, such as UPPER
Current Limitations
Although there is still no official list (from Microsoft) of the sources that support Query Folding, here are some of the ones currently known:
– Relational sources (SQL Server, Oracle).
– OData sources (e.g. Azure Marketplace)
– Active Directory
– Exchange
– HDFS, Folder.Files and Folder.Contents
This means all a lot of other data sources do not support Query Folding. For example, a flat file does not support Keep Top 5 Rows in Power Query, whereas SQL Server would use the TOP function or apply a filter in a WHERE clause. Whilst the lack of supported sources can be seen as a limitation, I would argue that any form of Query Folding, even if it were just SQL Server, is a big plus point.
You can deliberately prevent query folding, although the only time I see a benefit is if you are connecting to a server running at full capacity or during a large ETL. There are far more instances when Power Query will not apply query folding, because of limitations to the sources functionality. Check out Ken Verbeek’s blog if you would like know more on this.
Conclusion
This article is just scratching the surface of how beneficial Query Folding in Power Query can be. The key takeaway is remembering to put the transformation steps (that can be folded back to source) at the beginning. Steps that cannot be folded should be applied as late as possible. If this is adhered to, you will see huge performance benefits – especially on large datasets.
If anyone out there would like to share other data sources that use Query Folding or just have a general interest, feel free to comment below.
Celebrating International Women’s Day: from Classroom to Code
As we celebrate International Women’s Day, I want to share my journey of breaking stereotypes
Mar
Pretty Power BI – Adding Pagination to Bar Charts
Good User Experience (UX) design is crucial in enabling stakeholders to maximise the insights that
Feb
Pretty Power BI – Creating Dynamic Histograms
Good User Experience (UX) design is crucial in enabling stakeholders to maximise the insights that
Feb
Top Tips to Pass the Databricks Certified Data Engineer Professional Exam
Having recently passed the Databricks Certified Data Engineer Professional exam, this blog post covers some
Jan
Python vs. PySpark Navigating Data Analytics in Databricks – Part 1
Introduction When it comes to conquering the data analytics landscape in Databricks, two heavyweights, Python
Jan
Impact of AI on Business Analysis
Artificial intelligence (AI) is rapidly transforming our world, and this blog post concentrates on the
Jan
Creating Clickbait Using Python
In 2023, about 5 billion people used the internet. With so many people contributing and
Dec
A Brief Overview of Security in Microsoft Fabric
Where Fabric Sits in the Hierarchy As you are probably aware, Microsoft Fabric is Microsoft’s
Dec