Microsoft introduced native predictive model scoring with the release of SQL Server 2017.
The PREDICT function (Documentation) is now a native T-SQL function that eliminates having to score using R or Python through the sp_execute_external_script procedure. It’s an alternative to sp_rxPredict. In both cases you do not need to install R but with PREDICT you do not need to enable SQLCLR either – it’s truly native.
PREDICT should make predictions much faster as the process avoids having to marshal the data between SQL Server and Machine Learning Services (Previously R Services).
Migrating from the original sp_execute_external_script approach to the new native approach tripped me up so I thought I’d share a quick summary of what I have learned.
Stumble One:
Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80004001.
Model type is unsupported.
Reason:
Not all models are supported. At the time of writing, only the following models are supported:
- rxLinMod
- rxLogit
- rxBTrees
- rxDtree
- rxdForest
sp_rxPredict supports additional models including those available in the MicrosoftML package for R (I was using attempting to use rxFastTrees). I presume this limitation will reduce over time. The list of supported models is referenced in the PREDICT function (Documentation).
Stumble Two:
Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80070057.
Model is corrupt or invalid.
Reason:
The serialisation of the model needs to be modified for use by PREDICT. Typically you might serialise your model in R like this:
model <- data.frame(model=as.raw(serialize(model, NULL)))
Instead you need to use the rxSerializeModel method:
model <- data.frame(rxSerializeModel(model, realtimeScoringOnly = TRUE))
There’s a corresponding rxUnserializeModel method, so it’s worth updating the serialisation across the board so models can be used interchangeably in the event all models are eventually supported. I have been a bit legacy.
That’s it. Oh, apart from the fact PREDICT is supported in Azure SQL DB, despite the documentation saying the contrary.
Introduction to Data Wrangler in Microsoft Fabric
What is Data Wrangler? A key selling point of Microsoft Fabric is the Data Science
Jul
Autogen Power BI Model in Tabular Editor
In the realm of business intelligence, Power BI has emerged as a powerful tool for
Jul
Microsoft Healthcare Accelerator for Fabric
Microsoft released the Healthcare Data Solutions in Microsoft Fabric in Q1 2024. It was introduced
Jul
Unlock the Power of Colour: Make Your Power BI Reports Pop
Colour is a powerful visual tool that can enhance the appeal and readability of your
Jul
Python vs. PySpark: Navigating Data Analytics in Databricks – Part 2
Part 2: Exploring Advanced Functionalities in Databricks Welcome back to our Databricks journey! In this
May
GPT-4 with Vision vs Custom Vision in Anomaly Detection
Businesses today are generating data at an unprecedented rate. Automated processing of data is essential
May
Exploring DALL·E Capabilities
What is DALL·E? DALL·E is text-to-image generation system developed by OpenAI using deep learning methodologies.
May
Using Copilot Studio to Develop a HR Policy Bot
The next addition to Microsoft’s generative AI and large language model tools is Microsoft Copilot
Apr