Query Folding in Power Query isn’t an entirely new concept but when I first heard of it, I thought “What is Query Folding”? There are probably a number of you out there who think the same, so let me explain it.
When carrying out transformations (e.g. Sort) in Power Query, it is possible that some will be sent back to the source. In other words, Power Query doesn’t attempt to sort millions of records within the application itself, but will use the source to undertake the work. This can drastically improve performance within Power Query.
Examples
I will now demonstrate some basic Query Folding techniques in Power Query. Until recently, it wasn’t easy to know if a transformation was performing Query Folding. This blog primarily talks you through monitoring outside of Power Query, but I will also touch upon how this is now possible to do within the application.
External Monitoring Tool
The monitoring tool used in this example if SQL Server Profiler. For a simple video walkthrough on how to use Profiler, click here.
1. Start a profiler session, ensuring you connect to the server where the AdventureWorksDW2014 Database resides – available to download here.
2. Now we can connect to the database in Power Query and more specifically to the dbo.FactProductInventory table.
3. I will perform three Power Query transformations:
a. Filter Rows
i. ProductTotal > £50000
b. Group By
i. DimProduct.EnglishProductName
ii. SUM(Unit Cost) As ProductTotal
c. Transform > Trim
i. DimProduct.Status
NOTE: This blog does not show you how to carry out the Power Query transformations but the workbook can be provided on request.
4. The transformed query looks like the below:
5. Now let’s take a look.
Points of Note:
1. The Filter Rows Power Query transformation on TotalUnitCost is implemented as a WHERE clause in SQL Server.
2. The Group By also becomes a T-SQL GROUP BY. With the SUM function also being applied to find the total value for the EnglishProductName Group By.
3. The T-SQL functions LTRIM/RTRIM are combined together on the Status column, replicating the functionality of the Trim feature in Power Query.
Within Power Query
Rather than show you this feature myself, I will point you to a great article by Chris Webb, which describes how you can easily check for Query Folding within your Power Query workbook.
In summary, there is an out of the box M function called GetMetaData that can be nested within your transformations. This will tell you the source of your query and if it has been query folded.
Supported Transformations
Here are some other transformation types in Power Query that support Query Folding:
– Filtering (on rows or columns)
– Joins
– Aggregates and GROUP BY
– Pivot and unpivot
– Numeric calculations
– Simple transformations, such as UPPER
Current Limitations
Although there is still no official list (from Microsoft) of the sources that support Query Folding, here are some of the ones currently known:
– Relational sources (SQL Server, Oracle).
– OData sources (e.g. Azure Marketplace)
– Active Directory
– Exchange
– HDFS, Folder.Files and Folder.Contents
This means all a lot of other data sources do not support Query Folding. For example, a flat file does not support Keep Top 5 Rows in Power Query, whereas SQL Server would use the TOP function or apply a filter in a WHERE clause. Whilst the lack of supported sources can be seen as a limitation, I would argue that any form of Query Folding, even if it were just SQL Server, is a big plus point.
You can deliberately prevent query folding, although the only time I see a benefit is if you are connecting to a server running at full capacity or during a large ETL. There are far more instances when Power Query will not apply query folding, because of limitations to the sources functionality. Check out Ken Verbeek’s blog if you would like know more on this.
Conclusion
This article is just scratching the surface of how beneficial Query Folding in Power Query can be. The key takeaway is remembering to put the transformation steps (that can be folded back to source) at the beginning. Steps that cannot be folded should be applied as late as possible. If this is adhered to, you will see huge performance benefits – especially on large datasets.
If anyone out there would like to share other data sources that use Query Folding or just have a general interest, feel free to comment below.
Introduction to Data Wrangler in Microsoft Fabric
What is Data Wrangler? A key selling point of Microsoft Fabric is the Data Science
Jul
Autogen Power BI Model in Tabular Editor
In the realm of business intelligence, Power BI has emerged as a powerful tool for
Jul
Microsoft Healthcare Accelerator for Fabric
Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced
Jul
Unlock the Power of Colour: Make Your Power BI Reports Pop
Colour is a powerful visual tool that can enhance the appeal and readability of your
Jul
Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2
Part 2: Exploring Advanced Functionalities in Databricks Welcome back to our Databricks journey! In this
May
GPT-4 with Vision vs Custom Vision in Anomaly Detection
Businesses today are generating data at an unprecedented rate. Automated processing of data is essential
May
Exploring DALL·E Capabilities
What is DALL·E? DALL·E is text-to-image generation system developed by OpenAI using deep learning methodologies.
May
Using Copilot Studio to Develop a HR Policy Bot
The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot
Apr