Mean, Median and Mode in MDX

I recently delivered an Analysis Services training course to a client who were a bit surprised to learn that the standard list of Aggregation Functions for an Analysis Services measure doesn’t include Mean, Median and Mode. My answer was of course that all three can delivered using MDX, so I thought it might be a good blog post to show how it’s done.

Before I begin, just a quick refresher for anyone who needs it, the following article seems to be a good reference on the difference between Mean, Median and Mode. Essentially Mean is the average, Median is the middle value, whereas Mode is the value that occurs the most.

Mean and Median

The Mean and the Median are both relatively easy to do in MDX as there are built-in functions for both AVG() and Median(). As they’re well documented here and here, I won’t spend too long covering them. In the following MDX query I’m producing the Mean and the Median of the product sub categories in Adventure Works:

WITH 
MEMBER  [Measures].[Mean] AS AVG([Product].[Subcategory].[Subcategory].Members,
        [Measures].[Internet Sales Amount])

MEMBER  [Measures].[Median] AS Median([Product].[Subcategory].[Subcategory].Members,
        [Measures].[Internet Sales Amount])

SELECT  {[Measures].[Internet Sales Amount], [Measures].[Mean], [Measures].[Median]} ON 0,
        NonEmpty([Product].[Subcategory].[Subcategory].Members, [Measures].[Internet Sales Amount]) ON 1
FROM    [Adventure Works]
WHERE   ([Date].[Date].&[20070727])

image

If you run the query and copy the data out to Excel you’ll see that it matches the result of Excel’s Median and Mode functions.

Mode

Mode is harder to achieve as there’s no built in MDX function. Fortunately, Excel is on hand to help out, as Analysis Services allows you to use some of the Excel functions in MDX. There are warnings that come with this approach, as you need to have Excel installed on the server and there can also be performance problems. But, if you’re comfortable with these caveats, then the Excel mode function may be an option for you.

If you do use the Excel Mode function then there are a few things to watch out for. Firstly, you’ll have to use the MDX SetToArray() function to pass the set in the format that Excel expects. Secondly, you may get an error returned with the description: “#Error The following system error occurred:  Invalid flags.” This is due to the data type of the Measure – it seems that Currency measures are not supported by this function. Therefore, I’m using Cdbl() on Internet Sales Amount to get this to work:

WITH 
MEMBER  [Measures].[Demo] AS Cdbl([Measures].[Internet Sales Amount])
    
MEMBER  [Measures].[ExcelMode] AS
        Excel!MODE(SetToArray(NONEMPTY([Product].[Subcategory].[Subcategory].Members, [Measures].[Demo])
        * [Measures].[Demo]))

Applying this to the example query I’ve got above, the mode returned is 8.99, as its the only number that appears twice.

Bi Modal Result Set

There is a bit more complexity to Mode in some situations, essentially as you can get a bi-modal result set – where there is more than one value occurring the most. E.g. the mode of 1,2,2,3 is just 2, but the mode of 1,2,2,3,3,4 is both 2 and 3 – so it’s known as bi-modal.

The Excel Mode function won’t help in this scenario, as it will just return one of the mode values. Interestingly Excel 2010 does support this, with the Mode.Mult() function, but I’ve been unable to get this to work, presumably as the Excel function returns an array.

All is not lost though, it’s possible to produce mode using MDX. First of all, the numbers that I’m operating on are shown below. As you can see, there are two sets of two numbers that are the same:

image

As the following forum thread shows, mode can be achieved using MDX. I’m adapting that approach to give the following MDX:

WITH 
--Produce a result set that will guarantee bi modal results
MEMBER    [Measures].[Demo] AS
        CASE WHEN [Product].[Subcategory].CurrentMember IS [Product].[Subcategory].&[31]  THEN 65.91
        ELSE Cdbl([Measures].[Internet Sales Amount]) END

--Count how often each value appears
MEMBER [Measures].[ValueCount] AS 
SUM( 
    Union([Product].[Subcategory].CurrentMember.Level.Members,
        {[Product].[Subcategory].CurrentMember} AS Currentsub)
    , IIF(([Product].[Subcategory].CurrentMember, [Measures].[Demo]) = 
        (Currentsub.Item(0).Item(0), [Measures].[Demo]), 1, null)
)
--Only get the items that appear the most
SET [MaxModes] AS 
    ORDER(FILTER(NONEMPTY([Product].[Subcategory].[Subcategory].Members, {[Measures].[Demo]}),
    [Measures].[ValueCount] = MAX(NONEMPTY([Product].[Subcategory].[Subcategory].Members, [Measures].[Demo]), 
        [Measures].[ValueCount])), [Measures].[Demo], ASC)

SELECT   {[Measures].[Demo], [Measures].[ValueCount]} on 0,
        [MaxModes]
        --Filter out the duplicates 
        HAVING [MaxModes].CurrentOrdinal = 0 OR [Measures].[Demo] <> 
            ([Measures].[Demo], [MaxModes].Item([MaxModes].CurrentOrdinal - 2)) ON 1
FROM    [Adventure Works]
WHERE    ([Date].[Date].&[20070727])

This gives the following correct result set, assuming we just want the two bi modal values:

image

An alternative is to use an MDX Filter() or HAVING clause to just display all the sub categories that have the mode values, which would just require a small modification to the above code.

I’ve not used these approaches with big data volumes etc, but they should at least give you a few options if you’ve got to do these sort of calculations in your own environment.