RANKX was a new function made available in DAX at the launch of SQL Server 2012; however Ranking is a widely used function in most BI projects, and was very much anticipated. Having worked extensively with Tabular on a recent project we had the need to rank over large sets of data, with models based on billions of rows. Whilst we were extremely impressed with the performance of Tabular, we found something not quite right with RANKX. Suppose we had the following very simple format of data, and wanted to perform a simple Rank over it.
CountryKey | Date | Units | SalesValue |
53 | 24/01/2012 | 62 | 81.290619 |
156 | 03/01/2012 | 3575 | 5186.565208 |
48 | 07/05/2012 | 321 | 362.836524 |
157 | 16/04/2012 | 7 | 8.005637 |
134 | 19/08/2012 | 20 | 44.12 |
232 | 29/07/2012 | 40 | 49.313156 |
178 | 05/03/2012 | 47 | 61.25 |
63 | 26/03/2012 | 51 | 73.369509 |
153 | 17/06/2012 | 3061 | 6026.4 |
31 | 08/07/2012 | 3622 | 5005.038257 |
140 | 13/02/2012 | 9870 | 7713.968564 |
14 | 27/05/2012 | 29220 | 46728.26302 |
98 | 20/08/2012 | 297 | 377.653459 |
131 | 30/07/2012 | 172 | 182.591931 |
48 | 27/03/2012 | 348 | 411.801792 |
157 | 06/03/2012 | 3 | 8.01422 |
In order to a Rank the above data by country over all time we would probably write a simple DAX query as follows:
-
- DEFINE
- MEASURE FactSales[UnitCount]= SUM(FactSales[Units])
- MEASURE FactSales[Rank]= RANKX(ALL(FactSales[CountryKey]), FactSales[UnitCount])
- EVALUATE
- SUMMARIZE(
- FactSales
- ,FactSales[CountryKey]
- ,”ProductSalesUnits”,FactSales[UnitCount]
- ,”Rank”, FactSales[Rank]
- )
- ORDER BY
- FactSales[UnitCount] DESC
Now you can see from the query above, in order to avoid any doubt we only rank over a single table and we have no joins to other tables at all. This works fine resulting in the data below.
FactSales[CountryKey] | [ProductSalesUnits] | [Rank] |
227 | 91818113 | 1 |
226 | 21487836 | 2 |
39 | 9033389 | 3 |
14 | 7597054 | 4 |
82 | 4656585 | 5 |
109 | 3678302 | 6 |
75 | 3371681 | 7 |
140 | 2511239 | 8 |
107 | 1850520 | 9 |
208 | 1443241 | 10 |
31 | 1092047 | 11 |
153 | 997860 | 12 |
201 | 914827 | 13 |
59 | 870366 | 14 |
163 | 840927 | 15 |
Now assume we rank over a lot more data and wish to apply a simple CALCULATETABLE in order to filter some data out. We may write a DAX Query as follows:
- DEFINE
- MEASURE FactSales[UnitCount]= SUM(FactSales[Units])
- MEASURE FactSales[Rank]= RANKX(ALL(FactSales[CountryKey]), FactSales[UnitCount])
- EVALUATE
- CALCULATETABLE (
- SUMMARIZE(
- FactSales
- ,FactSales[CountryKey]
- ,”ProductSalesUnits”,FactSales[UnitCount]
- ,”Rank”, FactSales[Rank]
- )
- ,DATESBETWEEN(‘Date'[CalDate], DATE(2012,1,1), DATE(2012,8,31))
- )
- ORDER BY
- FactSales[UnitCount] DESC
In this instance we are joining to a Date dimension, but that is it. The above query yields the below result.
FactSales[CountryKey] | [ProductSalesUnits] | [Rank] |
227 | 83147875 | 1 |
226 | 19446567 | 2 |
39 | 8137335 | 3 |
14 | 6769390 | 4 |
82 | 4100168 | 5 |
109 | 3321496 | 6 |
75 | 3010890 | 7 |
140 | 2165554 | 8 |
107 | 1655472 | 9 |
208 | 1274740 | 10 |
31 | 987122 | 11 |
153 | 908938 | 12 |
201 | 853322 | 13 |
59 | 771677 | 14 |
That works, we get results as we would expect within seconds. Now, if instead of ranking over an integer field, let’s apply the RANKX function to a real number. In this example we get a worrying result set using the DAX as follows:
- DEFINE
- MEASURE FactSales[SalesValue]= FactSales[Sales Value]
- MEASURE FactSales[Rank]= RANKX(ALL(FactSales[CountryKey]), FactSales[UnitCount])
- EVALUATE
- CALCULATETABLE (
- SUMMARIZE(
- FactSales
- ,FactSales[CountryKey]
- ,”ProductSalesValue”,FactSales[Sales Value]
- ,”Rank”, FactSales[Rank]
- )
- ,DATESBETWEEN(‘Date'[CalDate], DATE(2012,1,1), DATE(2012,8,31))
- )
- ORDER BY
- FactSales[UnitCount] DESC
Which outputs:
FactSales[CountryKey] | [ProductSalesValue] | [Rank] |
227 | 84074007.25 | 1 |
226 | 29928143.25 | 3 |
14 | 10859628.74 | 4 |
39 | 8451588.111 | 4 |
109 | 7964922.769 | 6 |
82 | 6254219.85 | 6 |
75 | 4730390.37 | 7 |
107 | 2466064.97 | 9 |
208 | 1904009.18 | 10 |
140 | 1862708.961 | 11 |
153 | 1311217.35 | 11 |
22 | 1207366.72 | 13 |
59 | 1182179.95 | 15 |
Now let’s be clear, all we have done is simply change the measure from an Integer to a Float, the rest of the data is the same. You will notice that there are tie’s in the data that there should not be. Having scratched our heads for hours, rebuilt the model, re wrote the DAX, and had a number of colleagues check it over we found that when no calculate table is applied, then we get the correct answer again as follows:
- DEFINE
- MEASURE FactSales[SalesValue]= FactSales[Sales Value]
- MEASURE FactSales[Rank]= RANKX(ALL(FactSales[CountryKey]), FactSales[UnitCount])
- EVALUATE
- SUMMARIZE(
- FactSales
- ,FactSales[CountryKey]
- ,”ProductSalesValue”,FactSales[Sales Value]
- ,”Rank”, FactSales[Rank]
- )
- ORDER BY
- FactSales[UnitCount] DESC
FactSales[CountryKey] | [ProductSalesValue] | [Rank] |
227 | 92885561.31 | 1 |
226 | 33237033.3 | 2 |
14 | 12253005.68 | 3 |
39 | 9414266.358 | 4 |
109 | 8928147.606 | 5 |
82 | 7120811.54 | 6 |
75 | 5296490.13 | 7 |
107 | 2756500.54 | 8 |
208 | 2146627.18 | 9 |
140 | 2115750.609 | 10 |
153 | 1479858.53 | 11 |
22 | 1339255.82 | 12 |
59 | 1324799.98 | 13 |
105 | 1320651.83 | 14 |
31 | 1277065.779 | 15 |
So what we have learnt here is that RANKX seems to give the wrong answer, but only when ranking over real numbers, and only when we filter the data set in some way using CALCULATETABLE. Not being able to find a clear reason for this behaviour we eventually gave in and raised it with Microsoft. Having spent a week or so working it through with the great support team at Microsoft, it seems that this is a current “feature” of RANKX. It is believed to be a floating point arithmetic issue that is driven from how floating point numbers are stored. It is documented here:
http://support.microsoft.com/kb/78113/EN-US
There is also some further information documented here:
http://technet.microsoft.com/en-us/library/gg492146.aspx
http://www.microsoft.com/en-us/download/details.aspx?id=4106
I have been told by Microsoft that this is the same as it is in Excel, and Analysis Services, however still does not explain why the CALCULATETABLE makes a difference. I am still working with Microsoft to see if we can get to the bottom of it. However, if precision beyond 15 significant figures within the rank is not important to you (up to 99 Billion with 2DP) this issue is very easily worked around. Take the following DAX query:
- DEFINE
- MEASURE FactSales[UnitCount]= FactSales[Royalty Value]
- MEASURE FactSales[Rank]= RANKX(ALL(FactSales[CountryKey]), ROUND(FactSales[UnitCount],2))
- EVALUATE
- CALCULATETABLE (
- SUMMARIZE(
- FactSales
- ,FactSales[CountryKey]
- ,”ProductSalesValue”,FactSales[UnitCount]
- ,”Rank”, FactSales[Rank]
- )
- ,DATESBETWEEN(‘Date'[CalDate], DATE(2012,1,1), DATE(2012,8,31))
- )
- ORDER BY
- FactSales[UnitCount] DESC
Which now correctly outputs:
FactSales[CountryKey] | [ProductSalesValue] | [Rank] |
227 | 84074007.25 | 1 |
226 | 29928143.25 | 2 |
14 | 10859628.74 | 3 |
39 | 8451588.111 | 4 |
109 | 7964922.769 | 5 |
82 | 6254219.85 | 6 |
75 | 4730390.37 | 7 |
107 | 2466064.97 | 8 |
208 | 1904009.18 | 9 |
140 | 1862708.961 | 10 |
153 | 1311217.35 | 11 |
22 | 1207366.72 | 12 |
59 | 1182179.95 | 14 |
105 | 1196551.61 | 13 |
31 | 1132926.109 | 15 |
You will notice that all I have done is ROUND the measure in the RANKX function to 2 decimal places, this will stop the calculation engine from ranking over an approximated value, and force it RANK on a decimal with less than 15 significant figures, which will in turn fix the ranking. Further to this, rounding the values prior to it being loaded into the model will also work, just be sure to keep below the 15 Significant figures if possible.
So in summary, it seems that the calculation engine, when using the RANKX function does not handle the float data type as we may expect. This can be easily rectified by losing some precision on the RANK measure you define. We have been informed that using the Currency function will also fix the problem. I hope this stops others going through the same pain. As and When i get any further information on this issue I will post it up.
Pareto Charts in Power BI and the DAX behind them
The Pareto principle, commonly referred to as the 80/20 rule, is a concept of prioritisation.
Apr
Databricks: Cluster Configuration
Databricks, a cloud-based platform for data engineering, offers several tools that can be used to
Apr
AI Assistance in Microsoft Fabric
The exponential growth of Large Language Models (LLMs) couples with Microsoft’s close partnership with OpenAI
Apr
10 reasons why it’s worth the effort to understand the value of your data
“If leaders really want to create a data driven culture, the journey starts with them!
Apr
Content Safety in Azure AI Studio
Azure AI Content Safety is a solution designed to identify harmful content, whether generated by
Apr
Model Benchmarks in Azure AI Studio
In the constantly changing field of artificial intelligence (AI) and machine learning (ML), choosing the
Apr
Celebrating International Women’s Day: from Classroom to Code
As we celebrate International Women’s Day, I want to share my journey of breaking stereotypes
Mar
Pretty Power BI – Adding Pagination to Bar Charts
Good User Experience (UX) design is crucial in enabling stakeholders to maximise the insights that
Feb