Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take the GenAI Test: 25 Questions, 6 Topics. Free from Activeloop & Towards AI

Publication

Building a Simple Linear Regression Model on Azure to Predict Car Prices
Cloud Computing   Data Science   Latest   Machine Learning

Building a Simple Linear Regression Model on Azure to Predict Car Prices

Last Updated on September 19, 2024 by Editorial Team

Author(s): Julius Nyerere Nyambok

Originally published on Towards AI.

Hello there, gentle reader!
I am Nyerere Julius, an aspiring machine learning engineer and cloud solution specialist, deeply intrigued by the vast possibilities that AI and ML offer.

In this project, I’ll guide you through building a simple machine-learning project on Azure that can greatly add to your portfolio.

Figure 1: Classic Azure Machine Learning Architecture

Introduction

In this article, we’ll create a simple Azure project that predicts car sales prices based on a set of features. We’ll use the Azure platform which you can use via a free account. The steps that I took are as follows:

  1. Setting up required resources.
  2. Designing a complete machine learning test pipeline that ingests data, prepares it, trains a linear regression model, scores and evaluates it.
  3. Creating a real time inference pipeline that we can test out on unseen data.
  4. Deploying a real time endpoint that will allow us to use our model.

Here’s the GitHub repository with the resources referenced throughout this article.

Setting up the resources

I’ll strive to simplify the technical jargon that we use in the cloud space at every stage of this article. Azure projects reside in a resource group. A resource group is a logical container that allows you to organize, house, and manage resources. A workspace is a collaborative environment within a resource group that allows users to work on specific resources related to their professions/tasks like data science and analysis.

Figure 2: Difference between resource group and workspace

After creating your free Azure subscription account, navigate to Azure’s homepage, and in the search bar on the top, type β€œAzure Machine Learning” which is the resource that we will use for this project. Since we haven’t β€œhoused” our resource in any workspace, navigate to the top left and click on + Create then New workspace.

Figure 3: Creating an Azure ML workspace

Choose your subscription and create a new resource group to house our workspace (and resources) then click Review + Create. Azure products take a minute or two to be deployed so go and stretch. Once creation is complete, you’ll be redirected to the home page where you can select your newly created resource group which should look like this.

Figure 4: Newly created resource group.

You’ll notice additional workspaces/resources deployed alongside your Azure Machine Learning workspace. These are background services that support your project and don’t require interactions. Click on the Azure Machine Learning workspace, scroll to the bottom, and click on the Launch Studio button to access the resource.

Resources are applications provided by cloud providers. Similar to any application, resources require infrastructure to run i.e your web browser requires a laptop for you to access it. This is where Compute Clusters come in. Clusters provide the processing power and virtual architecture required to run our resources. ( Ever heard of virtual machines? ). From the dashboard, navigate down and select Add Compute.

Figure 5: Navigating to the compute
Figure 6: Provisioning the required compute cluster

Designing the test pipeline

Now that we have the necessary infrastructure, let’s design a pipeline that will ingest data, clean it, prepare it, score the model, and make predictions. On your dashboard, you are provided with a sidebar on the left that provides useful shortcuts for our project. Click on Designer on the left sidebar. You will be redirected to your pipeline designer dashboard. On the search bar, search for Automobile price data (Raw). This test dataset comprises the prices of various cars. Take a look at the features/columns and take note of the ones that require cleaning. Any hints as to which features require preparation when dealing with a linear regression model?

Figure 7: Importing our dataset

What you’ve done is add a module to your pipeline. A module is a reusable building block that is used in a pipeline to perform various related tasks. We need to select the relevant columns, clean the data by handling missing values, normalize the data, and encode categorical columns before our data is ready for any modeling.

After our mini-data exploration, we realized several things that need to be taken care of before submitting the first part of the pipeline :

  1. Select a β€œSelect Columns in a dataset” module β€” The β€˜normalized-losses’ column is missing a significant amount of data. We will eliminate it by selecting every column and then deselecting the β€˜normalized-losses’ column.
  2. Select a β€œClean Missing Data” module β€” The columns β€˜bore, stroke, horsepower’ have a few rows missing. We will delete these rows from the dataset if empty.
  3. Select a β€œNormalize Data” module β€” We will scale the numerical columns to fit a specific range using MinMax scaling. The columns to be normalized can be found here.
Figure 8: Select, Clean, and Normalize

Click Configure & Submit. This acts as the first part of our pipeline. The pipeline will be submitted as a Job. A job typically refers to a task or process that is executed within a pipeline or workflow. Navigate to jobs on your sidebar and this is what you’ll see.

Figure 9: Our pipeline submitted as a job

We need to encode our categorical columns next. Navigate back to our designer and follow these steps:

  1. Select an β€œEdit Metadata” module β€” This allows us to convert our categorical columns from strings to categories. Select the columns shown in figure 10
  2. Select a β€œConvert to Indicator Values” module β€” This allows us to convert our categorical columns to label encodings. Select the columns shown in figure 11
Figure 10: Editing metadata
Figure 11: Converting to the indicator variable

Finally:

  1. Select a β€œSplit Data” module β€” This module splits our data into our train and test split.
  2. Select a β€œTrain Model” module β€” This module trains the Linear Regression module on the train data
  3. Select a β€œLinear Regression” module β€” This module instantiates a linear regression model.
  4. Select a β€œScore Model” module β€” This module shows us various scoring metrics that give us an indication of how well our model has performed.
Figure 12: The β€œsplit data” preferences
Figure 13: The β€œtrain model” preferences
Figure 13: Finishing the tail of the pipeline

Click configure & submit again. Once successful, check the results of your model in the score model module. Click on Jobs, select your successful pipeline job and right-click on the score model module.

Figure 14: The scoring metrics.

Based on these scoring metrics, our model has performed considerably well. We move to building the real-time inference pipeline.

Designing a real-time inference pipeline

A real-time inference pipeline in Azure refers to a series of steps involved in deploying a machine learning model and using it to make predictions on new data promptly. A real-time inference pipeline in Azure typically separates itself from the training pipeline to simulate how it will perform in a production environment.

Navigate to your jobs and create a real-time inference pipeline as shown.

Figure 15: Creating a real-time inference pipeline

You will be redirected to the Designer section after it is successful. Remember that the whole point of an inference pipeline is to simulate how the model would perform if exposed to real-world/unseen data. This means that we’ll delete the dataset from the pipeline to simulate a real-world scenario where the pipeline receives unseen data. A valuable point to note is that our pipeline is not an β€œapp” yet. We require some medium that will consolidate the results together and complete our pipeline. These are the steps to take:

  1. Remove β€œautomobile price raw” and the β€œconvert to dataset” from the pipeline and add a β€œEnter data manually” module in their place β€” This acts as our data entry point to our real-time pipeline. You can find the data to paste here.
  2. Edit the β€œSelect columns in a dataset” β€” Remove the price column from the selected columns. Remember that this is a real-world scenario so the price is what we are trying to predict thus it doesn’t exist in our data
  3. Add a β€œExecute Python Script” β€” Place this module before the β€œ Web service output”. This script renames the score results to price which is what we want displayed in our hypothetical app.
Figure 16: Just before you complete the pipeline, add your py script
Figure 17: Completing the inference pipeline

Creating a real-time endpoint

We have completed our real-life imitation and our model is ready for use in the real world. Click on the three dots on the top right and click deploy.
Deploy it as a new real-time endpoint and select Azure Container Instance as the compute type.

Figure 18:Setting up a real-time endpoint

This will take about 20–30 minutes so you might have to be patient.

Figure 19: Deployment state is healthy

Once the deployment state becomes healthy, you can deploy your model anywhere. Azure provides options such as Python or R code snippets you can integrate into your application and consume the model as shown in Figure 19.

Don’t forget to decommission your Azure resources after use. Cloud resources are not cheap.

Thank you for taking the time to read this article.

Check out other projects I’ve done on my portfolio and GitHub.

ml_engineer_nyerere

Data Science Portfolio Version 1

nyerere-data-scientist.carrd.co

Jnyambok – Overview

Hi! I'm a certified IBM Data Scientist and Google Data Analyst who loves data science. Take a look at what I can do …

github.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓