Data Analytics

Buffer() M Function in Query Editor (Power BI)

Posted on 9th January 20172nd April 2020 by Callum Green

09
Jan

Whilst I have been aware of the Buffer() M function in the Query Editor of Power BI for a while, I had never really utilised its capabilities until now. There are two main types of buffer functionality – Table.Buffer and List.Buffer. To define them simply, Table.Buffer puts an entire table into memory and prevents change during evaluation, whereas List.Buffer provides a stable list, meaning it has some form of count or order.

This blog will focus on the theory behind the general Buffer() M functionality, picking a specific scenario of when it can outperform the standard Query Editor behavior. I will also demonstrate that this is not always the most effective technique within the same scenario. The article will not give you a hard and fast rule of when to use the Buffer() function, because it can depend on a number of factors. These are described further below.

Note: It is assumed you have existing knowledge of Query Folding and if not, one of my previous blogs should help greatly.

Scenario

I found inspiration from Chris Webb’s example, using the Adventure Works DW 2014 database – available here.

The requirements are:

1. Obtain the first 10,000 rows from FactInternetSales

2. Remove the majority of columns, retaining ONLY:

a. SalesOrderLineNumber

b. CustomerKey

c. SalesAmount

3. Rank the current row based on Sales Amount.

List.Buffer()

Assuming your database exists on a local server and is named AdventureWorksDW2014, copy the following code into the Advanced Editor in the Query Editor screen.

let

//Connect to SQL Server

Source = Sql.Database(“localhost”, “AdventureWorksDW2014”),

//Get first 2000 rows from FactInternetSales

dbo_FactInternetSales = Table.FirstN(

Source{[Schema=”dbo”,Item=”FactInternetSales”]}[Data],

10000),

//Remove unwanted columns

RemoveColumns = Table.SelectColumns(

dbo_FactInternetSales,

{“SalesOrderLineNumber”, “CustomerKey”,”SalesAmount”}),

//Get sorted list of values from SalesAmount column

RankValues = List.Sort(RemoveColumns[SalesAmount], Order.Descending),

//Calculate ranks

AddRankColumn = Table.AddColumn(RemoveColumns , “Rank”,

each List.PositionOf(RankValues,[SalesAmount])+1)

AddRankColumn

You can visibly see the rows loading – one by one. In total, it takes nearly 1 minute to load all off the results.

Now let’s use the List.Buffer() function in the RankValues step.

Replace:

= List.Sort(RemoveColumns[SalesAmount], Order.Descending)

With:

= List.Buffer(List.Sort(RemoveColumns[SalesAmount], Order.Descending))

The entire transformation (from start to finish) completes in just under 2 seconds! This is because the List.Buffer function stores the sorted values in memory and therefore, the rank calculation is only evaluated once. The last query (and previous steps) were being evaluated multiple times. The M language is both functional and at times, lazy. In order to prevent the constant re-evaluation, buffer the list into memory.

The final query output is shown below:

Query Folding

We will implement the same requirements, but this time using Query Folding.

The third step in our current transformation is called ‘Removed Columns’. This is what prevents Query Folding, as this function cannot be interpreted/translated to the native SQL Server T-SQL language. All steps below are inadvertently not supported either.

The way around this is to write SQL Server View (in SSMS) to import just the fields required from the underlying FactInternetSales Table. The below query will give you the same result up to the ‘Remove Columns’ step.

CREATE VIEW dbo.VwFactInternetSalesAmount

SELECT SalesOrderNumber

,[CustomerKey]

,[SalesOrderLineNumber]

,[SalesAmount]

,RANK() over( order by [SalesAmount] desc) AS [Rank]

FROM [AdventureWorksDW2014].[dbo].[FactInternetSales]

The final steps are to filter on the top 10,000 rows and Group the Rows together – inserting the following M syntax into the last Applied Step:

let

//Connect to SQL Server

Source = Sql.Database(“.”, “AdventureWorksDW2014”),

// Connect to SQL Server

dbo_FactInternetSales = Source{[Schema=”dbo”,Item=”VwFactInternetSalesAmount”]}[Data],

#”Sorted Rows” = Table.Sort(dbo_FactInternetSales,{{“SalesOrderNumber”, Order.Ascending}}),

#”Kept First Rows” = Table.FirstN(#”Sorted Rows”,10000),

#”Grouped Rows” = Table.Group(#”Kept First Rows”, {“CustomerKey”, “SalesOrderLineNumber”, “Rank”}, {{“TotalSalesAmount”, each List.Sum([SalesAmount]), type number}})

#”Grouped Rows”

The query now returns instantly (under 1 second). Right click on the last applied step and select the View Native Query option, to show the underlying SQL.

select top 10000

[rows].[CustomerKey] as [CustomerKey],

[rows].

Callum Green

Introduction to Data Wrangler in Microsoft Fabric

What is Data Wrangler? A key selling point of Microsoft Fabric is the Data Science

25
Jul

Autogen Power BI Model in Tabular Editor

In the realm of business intelligence, Power BI has emerged as a powerful tool for

12
Jul

Microsoft Healthcare Accelerator for Fabric

Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced

09
Jul

Unlock the Power of Colour: Make Your Power BI Reports Pop

Colour is a powerful visual tool that can enhance the appeal and readability of your

09
Jul

Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2

Part 2: Exploring Advanced Functionalities in Databricks Welcome back to our Databricks journey! In this

20
May

GPT-4 with Vision vs Custom Vision in Anomaly Detection

Businesses today are generating data at an unprecedented rate. Automated processing of data is essential

08
May

Exploring DALL·E Capabilities

What is DALL·E? DALL·E is text-to-image generation system developed by OpenAI using deep learning methodologies.

03
May

Using Copilot Studio to Develop a HR Policy Bot

The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot

29
Apr

Data Analytics