RANKX–Analysis Services 2012 Tabular

RANKX was a new function made available in DAX at the launch of SQL Server 2012; however Ranking is a widely used function in most BI projects, and was very much anticipated. Having worked extensively with Tabular on a recent project we had the need to rank over large sets of data, with models based on billions of rows. Whilst we were extremely impressed with the performance of Tabular, we found something not quite right with RANKX. Suppose we had the following very simple format of data, and wanted to perform a simple Rank over it.

CountryKey	Date	Units	SalesValue
53	24/01/2012	62	81.290619
156	03/01/2012	3575	5186.565208
48	07/05/2012	321	362.836524
157	16/04/2012	7	8.005637
134	19/08/2012	20	44.12
232	29/07/2012	40	49.313156
178	05/03/2012	47	61.25
63	26/03/2012	51	73.369509
153	17/06/2012	3061	6026.4
31	08/07/2012	3622	5005.038257
140	13/02/2012	9870	7713.968564
14	27/05/2012	29220	46728.26302
98	20/08/2012	297	377.653459
131	30/07/2012	172	182.591931
48	27/03/2012	348	411.801792
157	06/03/2012	3	8.01422

In order to a Rank the above data by country over all time we would probably write a simple DAX query as follows:

1. DEFINE
2. MEASURE FactSales[UnitCount]= SUM(FactSales[Units])
3. MEASURE FactSales[Rank]= RANKX(ALL(FactSales[CountryKey]), FactSales[UnitCount])
5. EVALUATE
6. SUMMARIZE(
7. FactSales
8. ,FactSales[CountryKey]
9. ,”ProductSalesUnits”,FactSales[UnitCount]
10. ,”Rank”, FactSales[Rank]
11. )
12. ORDER BY
13. FactSales[UnitCount] DESC

Now you can see from the query above, in order to avoid any doubt we only rank over a single table and we have no joins to other tables at all. This works fine resulting in the data below.

FactSales[CountryKey]	[ProductSalesUnits]	[Rank]
227	91818113	1
226	21487836	2
39	9033389	3
14	7597054	4
82	4656585	5
109	3678302	6
75	3371681	7
140	2511239	8
107	1850520	9
208	1443241	10
31	1092047	11
153	997860	12
201	914827	13
59	870366	14
163	840927	15

Now assume we rank over a lot more data and wish to apply a simple CALCULATETABLE in order to filter some data out. We may write a DAX Query as follows:

DEFINE
MEASURE FactSales[UnitCount]= SUM(FactSales[Units])
MEASURE FactSales[Rank]= RANKX(ALL(FactSales[CountryKey]), FactSales[UnitCount])
EVALUATE
CALCULATETABLE (
SUMMARIZE(
FactSales
,FactSales[CountryKey]
,”ProductSalesUnits”,FactSales[UnitCount]
,”Rank”, FactSales[Rank]
)
,DATESBETWEEN(‘Date'[CalDate], DATE(2012,1,1), DATE(2012,8,31))
)
ORDER BY
FactSales[UnitCount] DESC

In this instance we are joining to a Date dimension, but that is it. The above query yields the below result.

FactSales[CountryKey]	[ProductSalesUnits]	[Rank]
227	83147875	1
226	19446567	2
39	8137335	3
14	6769390	4
82	4100168	5
109	3321496	6
75	3010890	7
140	2165554	8
107	1655472	9
208	1274740	10
31	987122	11
153	908938	12
201	853322	13
59	771677	14

That works, we get results as we would expect within seconds. Now, if instead of ranking over an integer field, let’s apply the RANKX function to a real number. In this example we get a worrying result set using the DAX as follows:

DEFINE
MEASURE FactSales[SalesValue]= FactSales[Sales Value]
MEASURE FactSales[Rank]= RANKX(ALL(FactSales[CountryKey]), FactSales[UnitCount])
EVALUATE
CALCULATETABLE (
SUMMARIZE(
FactSales
,FactSales[CountryKey]
,”ProductSalesValue”,FactSales[Sales Value]
,”Rank”, FactSales[Rank]
)
,DATESBETWEEN(‘Date'[CalDate], DATE(2012,1,1), DATE(2012,8,31))
)
ORDER BY
FactSales[UnitCount] DESC

Which outputs:

FactSales[CountryKey]	[ProductSalesValue]	[Rank]
227	84074007.25	1
226	29928143.25	3
14	10859628.74	4
39	8451588.111	4
109	7964922.769	6
82	6254219.85	6
75	4730390.37	7
107	2466064.97	9
208	1904009.18	10
140	1862708.961	11
153	1311217.35	11
22	1207366.72	13
59	1182179.95	15

Now let’s be clear, all we have done is simply change the measure from an Integer to a Float, the rest of the data is the same. You will notice that there are tie’s in the data that there should not be. Having scratched our heads for hours, rebuilt the model, re wrote the DAX, and had a number of colleagues check it over we found that when no calculate table is applied, then we get the correct answer again as follows:

DEFINE
MEASURE FactSales[SalesValue]= FactSales[Sales Value]
MEASURE FactSales[Rank]= RANKX(ALL(FactSales[CountryKey]), FactSales[UnitCount])
EVALUATE
SUMMARIZE(
FactSales
,FactSales[CountryKey]
,”ProductSalesValue”,FactSales[Sales Value]
,”Rank”, FactSales[Rank]
)
ORDER BY
FactSales[UnitCount] DESC

FactSales[CountryKey]	[ProductSalesValue]	[Rank]
227	92885561.31	1
226	33237033.3	2
14	12253005.68	3
39	9414266.358	4
109	8928147.606	5
82	7120811.54	6
75	5296490.13	7
107	2756500.54	8
208	2146627.18	9
140	2115750.609	10
153	1479858.53	11
22	1339255.82	12
59	1324799.98	13
105	1320651.83	14
31	1277065.779	15

So what we have learnt here is that RANKX seems to give the wrong answer, but only when ranking over real numbers, and only when we filter the data set in some way using CALCULATETABLE. Not being able to find a clear reason for this behaviour we eventually gave in and raised it with Microsoft. Having spent a week or so working it through with the great support team at Microsoft, it seems that this is a current “feature” of RANKX. It is believed to be a floating point arithmetic issue that is driven from how floating point numbers are stored. It is documented here:

http://support.microsoft.com/kb/78113/EN-US

There is also some further information documented here:

http://technet.microsoft.com/en-us/library/gg492146.aspx

http://www.microsoft.com/en-us/download/details.aspx?id=4106

I have been told by Microsoft that this is the same as it is in Excel, and Analysis Services, however still does not explain why the CALCULATETABLE makes a difference. I am still working with Microsoft to see if we can get to the bottom of it. However, if precision beyond 15 significant figures within the rank is not important to you (up to 99 Billion with 2DP) this issue is very easily worked around. Take the following DAX query:

DEFINE
MEASURE FactSales[UnitCount]= FactSales[Royalty Value]
MEASURE FactSales[Rank]= RANKX(ALL(FactSales[CountryKey]), ROUND(FactSales[UnitCount],2))
EVALUATE
CALCULATETABLE (
SUMMARIZE(
FactSales
,FactSales[CountryKey]
,”ProductSalesValue”,FactSales[UnitCount]
,”Rank”, FactSales[Rank]
)
,DATESBETWEEN(‘Date'[CalDate], DATE(2012,1,1), DATE(2012,8,31))
)
ORDER BY
FactSales[UnitCount] DESC

Which now correctly outputs:

FactSales[CountryKey]	[ProductSalesValue]	[Rank]
227	84074007.25	1
226	29928143.25	2
14	10859628.74	3
39	8451588.111	4
109	7964922.769	5
82	6254219.85	6
75	4730390.37	7
107	2466064.97	8
208	1904009.18	9
140	1862708.961	10
153	1311217.35	11
22	1207366.72	12
59	1182179.95	14
105	1196551.61	13
31	1132926.109	15

You will notice that all I have done is ROUND the measure in the RANKX function to 2 decimal places, this will stop the calculation engine from ranking over an approximated value, and force it RANK on a decimal with less than 15 significant figures, which will in turn fix the ranking. Further to this, rounding the values prior to it being loaded into the model will also work, just be sure to keep below the 15 Significant figures if possible.

So in summary, it seems that the calculation engine, when using the RANKX function does not handle the float data type as we may expect. This can be easily rectified by losing some precision on the RANK measure you define. We have been informed that using the Currency function will also fix the problem. I hope this stops others going through the same pain. As and When i get any further information on this issue I will post it up.

Business Intelligence, Data Analytics

RANKX–Analysis Services 2012 Tabular

Martyn Bullerwell

Introduction to Data Wrangler in Microsoft Fabric

Autogen Power BI Model in Tabular Editor

Microsoft Healthcare Accelerator for Fabric

Unlock the Power of Colour: Make Your Power BI Reports Pop

Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2

GPT-4 with Vision vs Custom Vision in Anomaly Detection

Exploring DALL·E Capabilities

Using Copilot Studio to Develop a HR Policy Bot

London

Aberdeen

Surrey

Bulgaria

India

Glasgow

Manchester

Get in touch

Our Social Channels

Related Posts

Martyn Bullerwell

Introduction to Data Wrangler in Microsoft Fabric

Autogen Power BI Model in Tabular Editor

Microsoft Healthcare Accelerator for Fabric

Unlock the Power of Colour: Make Your Power BI Reports Pop

Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2

GPT-4 with Vision vs Custom Vision in Anomaly Detection

Exploring DALL·E Capabilities

Using Copilot Studio to Develop a HR Policy Bot