Building a Simple Linear Regression Model on Azure to Predict Car Prices

Last Updated on September 19, 2024 by Editorial Team

Author(s): Julius Nyerere Nyambok

Originally published on Towards AI.

Hello there, gentle reader!
I am Nyerere Julius, an aspiring machine learning engineer and cloud solution specialist, deeply intrigued by the vast possibilities that AI and ML offer.

In this project, I’ll guide you through building a simple machine-learning project on Azure that can greatly add to your portfolio.

Building a Simple Linear Regression Model on Azure to Predict Car Prices — Figure 1: Classic Azure Machine Learning Architecture

Introduction

In this article, we’ll create a simple Azure project that predicts car sales prices based on a set of features. We’ll use the Azure platform which you can use via a free account. The steps that I took are as follows:

Setting up required resources.
Designing a complete machine learning test pipeline that ingests data, prepares it, trains a linear regression model, scores and evaluates it.
Creating a real time inference pipeline that we can test out on unseen data.
Deploying a real time endpoint that will allow us to use our model.

Here’s the GitHub repository with the resources referenced throughout this article.

Setting up the resources

I’ll strive to simplify the technical jargon that we use in the cloud space at every stage of this article. Azure projects reside in a resource group. A resource group is a logical container that allows you to organize, house, and manage resources. A workspace is a collaborative environment within a resource group that allows users to work on specific resources related to their professions/tasks like data science and analysis.

Figure 2: Difference between resource group and workspace

After creating your free Azure subscription account, navigate to Azure’s homepage, and in the search bar on the top, type “Azure Machine Learning” which is the resource that we will use for this project. Since we haven’t “housed” our resource in any workspace, navigate to the top left and click on + Create then New workspace.

Figure 3: Creating an Azure ML workspace

Choose your subscription and create a new resource group to house our workspace (and resources) then click Review + Create. Azure products take a minute or two to be deployed so go and stretch. Once creation is complete, you’ll be redirected to the home page where you can select your newly created resource group which should look like this.

You’ll notice additional workspaces/resources deployed alongside your Azure Machine Learning workspace. These are background services that support your project and don’t require interactions. Click on the Azure Machine Learning workspace, scroll to the bottom, and click on the Launch Studio button to access the resource.

Resources are applications provided by cloud providers. Similar to any application, resources require infrastructure to run i.e your web browser requires a laptop for you to access it. This is where Compute Clusters come in. Clusters provide the processing power and virtual architecture required to run our resources. ( Ever heard of virtual machines? ). From the dashboard, navigate down and select Add Compute.

Figure 6: Provisioning the required compute cluster

Designing the test pipeline

Now that we have the necessary infrastructure, let’s design a pipeline that will ingest data, clean it, prepare it, score the model, and make predictions. On your dashboard, you are provided with a sidebar on the left that provides useful shortcuts for our project. Click on Designer on the left sidebar. You will be redirected to your pipeline designer dashboard. On the search bar, search for Automobile price data (Raw). This test dataset comprises the prices of various cars. Take a look at the features/columns and take note of the ones that require cleaning. Any hints as to which features require preparation when dealing with a linear regression model?

What you’ve done is add a module to your pipeline. A module is a reusable building block that is used in a pipeline to perform various related tasks. We need to select the relevant columns, clean the data by handling missing values, normalize the data, and encode categorical columns before our data is ready for any modeling.

After our mini-data exploration, we realized several things that need to be taken care of before submitting the first part of the pipeline :

Select a “Select Columns in a dataset” module — The ‘normalized-losses’ column is missing a significant amount of data. We will eliminate it by selecting every column and then deselecting the ‘normalized-losses’ column.
Select a “Clean Missing Data” module — The columns ‘bore, stroke, horsepower’ have a few rows missing. We will delete these rows from the dataset if empty.
Select a “Normalize Data” module — We will scale the numerical columns to fit a specific range using MinMax scaling. The columns to be normalized can be found here.

Click Configure & Submit. This acts as the first part of our pipeline. The pipeline will be submitted as a Job. A job typically refers to a task or process that is executed within a pipeline or workflow. Navigate to jobs on your sidebar and this is what you’ll see.

Figure 9: Our pipeline submitted as a job

We need to encode our categorical columns next. Navigate back to our designer and follow these steps:

Select an “Edit Metadata” module — This allows us to convert our categorical columns from strings to categories. Select the columns shown in figure 10
Select a “Convert to Indicator Values” module — This allows us to convert our categorical columns to label encodings. Select the columns shown in figure 11

Figure 11: Converting to the indicator variable

Finally:

Select a “Split Data” module — This module splits our data into our train and test split.
Select a “Train Model” module — This module trains the Linear Regression module on the train data
Select a “Linear Regression” module — This module instantiates a linear regression model.
Select a “Score Model” module — This module shows us various scoring metrics that give us an indication of how well our model has performed.

Figure 13: The “train model” preferences

Figure 13: Finishing the tail of the pipeline

Click configure & submit again. Once successful, check the results of your model in the score model module. Click on Jobs, select your successful pipeline job and right-click on the score model module.

Based on these scoring metrics, our model has performed considerably well. We move to building the real-time inference pipeline.

Designing a real-time inference pipeline

A real-time inference pipeline in Azure refers to a series of steps involved in deploying a machine learning model and using it to make predictions on new data promptly. A real-time inference pipeline in Azure typically separates itself from the training pipeline to simulate how it will perform in a production environment.

Navigate to your jobs and create a real-time inference pipeline as shown.

Figure 15: Creating a real-time inference pipeline

You will be redirected to the Designer section after it is successful. Remember that the whole point of an inference pipeline is to simulate how the model would perform if exposed to real-world/unseen data. This means that we’ll delete the dataset from the pipeline to simulate a real-world scenario where the pipeline receives unseen data. A valuable point to note is that our pipeline is not an “app” yet. We require some medium that will consolidate the results together and complete our pipeline. These are the steps to take:

Remove “automobile price raw” and the “convert to dataset” from the pipeline and add a “Enter data manually” module in their place — This acts as our data entry point to our real-time pipeline. You can find the data to paste here.
Edit the “Select columns in a dataset” — Remove the price column from the selected columns. Remember that this is a real-world scenario so the price is what we are trying to predict thus it doesn’t exist in our data
Add a “Execute Python Script” — Place this module before the “ Web service output”. This script renames the score results to price which is what we want displayed in our hypothetical app.

Figure 16: Just before you complete the pipeline, add your py script

Figure 17: Completing the inference pipeline

Creating a real-time endpoint

We have completed our real-life imitation and our model is ready for use in the real world. Click on the three dots on the top right and click deploy.
Deploy it as a new real-time endpoint and select Azure Container Instance as the compute type.

Figure 18:Setting up a real-time endpoint

This will take about 20–30 minutes so you might have to be patient.

Once the deployment state becomes healthy, you can deploy your model anywhere. Azure provides options such as Python or R code snippets you can integrate into your application and consume the model as shown in Figure 19.

Don’t forget to decommission your Azure resources after use. Cloud resources are not cheap.

Thank you for taking the time to read this article.

Check out other projects I’ve done on my portfolio and GitHub.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Building a Simple Linear Regression Model on Azure to Predict Car Prices

Author(s): Julius Nyerere Nyambok

Introduction

Setting up the resources

Designing the test pipeline

Designing a real-time inference pipeline

Creating a real-time endpoint

ml_engineer_nyerere

Data Science Portfolio Version 1

Jnyambok – Overview

Hi! I'm a certified IBM Data Scientist and Google Data Analyst who loves data science. Take a look at what I can do …

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Building a Simple Linear Regression Model on Azure to Predict Car Prices

Author(s): Julius Nyerere Nyambok

Introduction

Setting up the resources

Designing the test pipeline

Designing a real-time inference pipeline

Creating a real-time endpoint

ml_engineer_nyerere

Data Science Portfolio Version 1

Jnyambok – Overview

Hi! I'm a certified IBM Data Scientist and Google Data Analyst who loves data science. Take a look at what I can do …

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement