I recently delivered an Analysis Services training course to a client who were a bit surprised to learn that the standard list of Aggregation Functions for an Analysis Services measure doesn’t include Mean, Median and Mode. My answer was of course that all three can delivered using MDX, so I thought it might be a good blog post to show how it’s done.
Before I begin, just a quick refresher for anyone who needs it, the following article seems to be a good reference on the difference between Mean, Median and Mode. Essentially Mean is the average, Median is the middle value, whereas Mode is the value that occurs the most.
Mean and Median
The Mean and the Median are both relatively easy to do in MDX as there are built-in functions for both AVG() and Median(). As they’re well documented here and here, I won’t spend too long covering them. In the following MDX query I’m producing the Mean and the Median of the product sub categories in Adventure Works:
WITH MEMBER [Measures].[Mean] AS AVG([Product].[Subcategory].[Subcategory].Members, [Measures].[Internet Sales Amount]) MEMBER [Measures].[Median] AS Median([Product].[Subcategory].[Subcategory].Members, [Measures].[Internet Sales Amount]) SELECT {[Measures].[Internet Sales Amount], [Measures].[Mean], [Measures].[Median]} ON 0, NonEmpty([Product].[Subcategory].[Subcategory].Members, [Measures].[Internet Sales Amount]) ON 1 FROM [Adventure Works] WHERE ([Date].[Date].&[20070727])
If you run the query and copy the data out to Excel you’ll see that it matches the result of Excel’s Median and Mode functions.
Mode
Mode is harder to achieve as there’s no built in MDX function. Fortunately, Excel is on hand to help out, as Analysis Services allows you to use some of the Excel functions in MDX. There are warnings that come with this approach, as you need to have Excel installed on the server and there can also be performance problems. But, if you’re comfortable with these caveats, then the Excel mode function may be an option for you.
If you do use the Excel Mode function then there are a few things to watch out for. Firstly, you’ll have to use the MDX SetToArray() function to pass the set in the format that Excel expects. Secondly, you may get an error returned with the description: “#Error The following system error occurred: Invalid flags.” This is due to the data type of the Measure – it seems that Currency measures are not supported by this function. Therefore, I’m using Cdbl() on Internet Sales Amount to get this to work:
WITH MEMBER [Measures].[Demo] AS Cdbl([Measures].[Internet Sales Amount]) MEMBER [Measures].[ExcelMode] AS Excel!MODE(SetToArray(NONEMPTY([Product].[Subcategory].[Subcategory].Members, [Measures].[Demo]) * [Measures].[Demo]))
Applying this to the example query I’ve got above, the mode returned is 8.99, as its the only number that appears twice.
Bi Modal Result Set
There is a bit more complexity to Mode in some situations, essentially as you can get a bi-modal result set – where there is more than one value occurring the most. E.g. the mode of 1,2,2,3 is just 2, but the mode of 1,2,2,3,3,4 is both 2 and 3 – so it’s known as bi-modal.
The Excel Mode function won’t help in this scenario, as it will just return one of the mode values. Interestingly Excel 2010 does support this, with the Mode.Mult() function, but I’ve been unable to get this to work, presumably as the Excel function returns an array.
All is not lost though, it’s possible to produce mode using MDX. First of all, the numbers that I’m operating on are shown below. As you can see, there are two sets of two numbers that are the same:
As the following forum thread shows, mode can be achieved using MDX. I’m adapting that approach to give the following MDX:
WITH --Produce a result set that will guarantee bi modal results MEMBER [Measures].[Demo] AS CASE WHEN [Product].[Subcategory].CurrentMember IS [Product].[Subcategory].&[31] THEN 65.91 ELSE Cdbl([Measures].[Internet Sales Amount]) END --Count how often each value appears MEMBER [Measures].[ValueCount] AS SUM( Union([Product].[Subcategory].CurrentMember.Level.Members, {[Product].[Subcategory].CurrentMember} AS Currentsub) , IIF(([Product].[Subcategory].CurrentMember, [Measures].[Demo]) = (Currentsub.Item(0).Item(0), [Measures].[Demo]), 1, null) )
--Only get the items that appear the most SET [MaxModes] AS ORDER(FILTER(NONEMPTY([Product].[Subcategory].[Subcategory].Members, {[Measures].[Demo]}), [Measures].[ValueCount] = MAX(NONEMPTY([Product].[Subcategory].[Subcategory].Members, [Measures].[Demo]), [Measures].[ValueCount])), [Measures].[Demo], ASC) SELECT {[Measures].[Demo], [Measures].[ValueCount]} on 0, [MaxModes] --Filter out the duplicates HAVING [MaxModes].CurrentOrdinal = 0 OR [Measures].[Demo] <> ([Measures].[Demo], [MaxModes].Item([MaxModes].CurrentOrdinal - 2)) ON 1 FROM [Adventure Works] WHERE ([Date].[Date].&[20070727])
This gives the following correct result set, assuming we just want the two bi modal values:
An alternative is to use an MDX Filter() or HAVING clause to just display all the sub categories that have the mode values, which would just require a small modification to the above code.
I’ve not used these approaches with big data volumes etc, but they should at least give you a few options if you’ve got to do these sort of calculations in your own environment.
Celebrating International Women’s Day: from Classroom to Code
As we celebrate International Women’s Day, I want to share my journey of breaking stereotypes
Mar
Pretty Power BI – Adding Pagination to Bar Charts
Good User Experience (UX) design is crucial in enabling stakeholders to maximise the insights that
Feb
Pretty Power BI – Creating Dynamic Histograms
Good User Experience (UX) design is crucial in enabling stakeholders to maximise the insights that
Feb
Top Tips to Pass the Databricks Certified Data Engineer Professional Exam
Having recently passed the Databricks Certified Data Engineer Professional exam, this blog post covers some
Jan
Python vs. PySpark Navigating Data Analytics in Databricks – Part 1
Introduction When it comes to conquering the data analytics landscape in Databricks, two heavyweights, Python
Jan
Impact of AI on Business Analysis
Artificial intelligence (AI) is rapidly transforming our world, and this blog post concentrates on the
Jan
Creating Clickbait Using Python
In 2023, about 5 billion people used the internet. With so many people contributing and
Dec
A Brief Overview of Security in Microsoft Fabric
Where Fabric Sits in the Hierarchy As you are probably aware, Microsoft Fabric is Microsoft’s
Dec