In this post we will look into applying an order by clause to a query containing an aggregate function. The purpose of this is to allow us to order a result set by a column which does not appear in the select or group by clause. Without using an aggregate function, or including the column to order by in your select/group by, you will be unable to execute the query successfully.
This is quite a straight forward problem, but one that can easily trip people up. I started by creating a simple table containing Colours which appear multiple times over multiple dates and against multiple IDs.
Using the count() function to count how many times a colour appears on a certain date doesn’t cause us any problems.
The result set shows us 7 rows of data, we note that the colour Aqua Marine Green appears first in the list and while black appears second in the list as we will use this information later.
However if we wanted to see the result set in the order of their respective ID’s this is where we run into issues,
The error message is quite clear if not a little confusing on what needs to happen here, either we need to add the ID column to our group by clause, or wrap it in an aggregate function.
Adding the ID to the group by clause will change your result set, if we take the query above and simply add the ID column to our select and group by we no longer get an error message, however the data has lost all meaning as now we are back to our 10 original rows of data each being counted once.
Using an aggregate function in the order by clause “”eg: Min(ID) fixes the problem and provides us with the result set we are after. If we look above we can see that “Black” appears 3 times against Date “2015-09-03” and the ID’s for those 3 records are 1,3,8. It is only after specifying which ID to order by is SQL able to correctly execute the query.
Without adding an aggregate function be that MIN() SUM() AVG() SQL is unable to determine which ID it should use for each group to order the data by.
Be careful when picking the aggregate function to use as your choice will affect the result set. The screenshots below show the result set ordered by MIN(ID), AVG(ID), SUM(ID). Each returning the same set of data but not in the same order. (the aggregated ID column has been added to the screenshots for reference only.)
The first example orders by the MIN() function and results in Black appearing first in the result set.
The next example is ordered by the AVG() function and results in Black appearing fourth in the result set.
Finally ordering by the SUM() function results in Black appearing last in the result set.
Introduction to Data Wrangler in Microsoft Fabric
What is Data Wrangler? A key selling point of Microsoft Fabric is the Data Science
Jul
Autogen Power BI Model in Tabular Editor
In the realm of business intelligence, Power BI has emerged as a powerful tool for
Jul
Microsoft Healthcare Accelerator for Fabric
Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced
Jul
Unlock the Power of Colour: Make Your Power BI Reports Pop
Colour is a powerful visual tool that can enhance the appeal and readability of your
Jul
Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2
Part 2: Exploring Advanced Functionalities in Databricks Welcome back to our Databricks journey! In this
May
GPT-4 with Vision vs Custom Vision in Anomaly Detection
Businesses today are generating data at an unprecedented rate. Automated processing of data is essential
May
Exploring DALL·E Capabilities
What is DALL·E? DALL·E is text-to-image generation system developed by OpenAI using deep learning methodologies.
May
Using Copilot Studio to Develop a HR Policy Bot
The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot
Apr