This is where the real fun begins! In this blog we will get to the heart of machine learning and produce a regression model.
Training the model
We now need to split the data into training and testing sets. This is so we can train the algorithm using the training set and then test the accuracy of the prediction using the testing set. To do so search for ‘split‘ in the Search experiment items search bar. Drag the Split Data task onto the canvas. Under the properties is a property called Fraction of rows in the first output dataset this lets you chose what percentage of rows is used for training and what percentage are held back to test the prediction accuracy. Let’s set it to 0.9, this means 90% will be used for training, 10% for testing. Leave the other properties as they are. The properties window should look like the below image:
Now let’s get to the very fundamental core of machine learning, the algorithm itself. For this we will use one of my personal favourites a Boosted Decision Tree. Decision Trees frequently have very high accurate prediction results and are great for discovering more about your data based on the leaves of the tree. Go to the item toolbox, clear the search box and navigate to Machine Learning > Initialize Model > Regression and drag on the Boosted Decision Tree Regression item onto the left side of the canvas. Change the properties to coincide with the values below, these have been selected after using a Sweep Parameters item to work out the optimal parameter settings.
Parameter Name | Parameter Value |
Create trainer mode | Single Parameter |
Maximum number of leaves per tree | 36 |
Minimum number of samples per leaf mode | 7 |
Learning rate | 0.33128 |
Total number of trees constructed | 182 |
Random number seed | |
Allow unknown categorical levels | Check |
Drag on the Train Model item, which is located under Train on the item toolbox. Join up the appropriate output and input ports so your canvas looks like the image below.
Click on the Train Model item and select Launch column selector in the properties window. Here you are selecting the column you want to predict, so just select price.
Now we need to predict the results of the testing data. To do so, drag on a Score Model item (located under Score) and connect the Train Model and Split Data items to each input note of the Score Model. Once complete, hit Run to run the experiment, your canvas should be eliminated with green ticks like the image below.
Now let’s have a look and see if this algorithm has actually produced any decent results. Right click on destination node of the Score Model and left click on Visualise. You should see something similar to the below image.
This table displays the values for each and every piece of test data. If you scroll all the way to the right and you should see two columns: price and Scored Labels. Price is the actual price of the car. Scored Labels is the amount the regression algorithm has predicted the price of the car to be. The numbers are quite close, which is exactly the result we’re after. If you click on the Score Labels column header you can conduct some further analysis, scrolling down and making sure compare to is set to price you can view a scatter plot of the two values. I have done so on the image above and looking at the scatter plot you can see that there is a strong positive correlation with only a few outliers.
Your Azure Machine Learning regression algorithm is now complete! In the next blog we will be deploying the model so we can use it outside of Azure Machine Learning and really put what we have created into practice.
How Artificial Intelligence and Data Add Value to Businesses
Knowledge is power. And the data that you collect in the course of your business
May
Databricks Vs Synapse Spark Pools – What, When and Where?
Databricks or Synapse seems to be the question on everyone’s lips, whether its people asking
1 Comment
May
Power BI to Power AI – Part 2
This post is the second part of a blog series on the AI features of
Apr
Geospatial Sample architecture overview
The first blog ‘Part 1 – Introduction to Geospatial data’ gave an overview into geospatial
Apr
Data Lakehouses for Dummies
When we are thinking about data platforms, there are many different services and architectures that
Apr
Enable Smart Facility Management with Azure Digital Twins
Before I started writing this blog, I went to Google and searched for the keywords
Apr
Migrating On-Prem SSIS workload to Azure
Goal of this blog There can be scenario where organization wants to migrate there existing
Mar
Send B2B data with Azure Logic Apps and Enterprise Integration Pack
After creating an integration account that has partners and agreements, we are ready to create
Mar