In-depth Azure Machine Learning Model Train, Test, and Deploy Pipelines on Cloud With Endpoints for Web APIs
Last Updated on November 20, 2022 by Editorial Team
Author(s): Amit Chauhan
The workspace consists of various artifacts
- Manage resources: It includes compute instances and compute clusters.
- Linked Services:
- Data Stores: It is a service to store various data. For example β Blob storage, hive storage, and SQL database.
- Compute targets: These are the machines where we run our model and do the train and test.
- Assets:
- Environments
- Experiments
- Pipeline
- Datasets
- Models
- Endpoints
The whole scope of the workspace depends on some dependencies, there will be various logs, various notebooks, entries of the assets, etc. For them, the workspace requires storage.
- Dependencies
- Azure Storage account: Used for the administration and the working of the workspace.
- Azure container registry: When we deploy our model to the production and docker instances.
- Azure key vault: To store various keys, secret information, and privacy information.
- Azure application insight: It is used to monitor our machine learning applications and various information like response time, requests, failure conditions, performance, etc.
Basic concepts
- Datasets
It is information composed in the form of rows and columns, i.e., a collection of data. There are many methods in azure to upload/fetch the dataset for machine learning experiments.
- Data-stores
When we want to fetch the dataset from the local system, then we need some storage that is where the data store comes into the picture. Data-store is just the connection to the various storage types like account storage, database, or analytics as a data lake.
- Various storage types
Blob, file storage, data lake, Azure SQL, Azure PostgreSQL, MySQL, Azure Data bricks. These are supported by the azure system.
Creating the machine learning workspace
Below are the following steps to create the workspace
- Open the azure dashboard, search for the machine learning resource, click on it and then create. If you donβt have an azure account, then follow the link below.
How to Open an Azure cloud account with Debit Card
A simple and easy process for all data scientist
amitprius.medium.com
2. Fill in all the information.
If there is no resource group name, then create a new one. When we will write the workspace name, the other information like a key vault, storage account, and application insight is filled automatically. We will keep the container registry to βNoneβ for now because it is required at the time of deployment.
We can choose any region, but if we have a large amount of data, we can choose the nearest region of fast data transfer.
- In the Networking option, choose public access for practicing the experiment.
- n the Advanced option, there are many options, and keep it as it is, in data impact, if we enable then we are telling Microsoft that the data we will upload is sensitive.
3. After getting the Validation passed, click on create to make the workspace. It will create the four resources as shown below.
4. Now, click on the go to the resource, and the workspace dashboard will open with the launch studio option as shown below.
In the above image, the access control (IAM) is used to create more users to use this workspace.
Launching the machine learning studio
- After creating the workspace, itβs time to launch the ML studio, and it will look like the image below.
The author in the above image is responsible for making machine learning experiments and pipelines.
2. Make a new storage account to avoid the files of other storage systems.
3. Now, create a container inside this storage account.
4. Now, create a data store in the ML studio that will connect to this newly made storage account.
5. Fill in the information.
To get the access key, go to the new storage account in step 2 and copy the key from the access key option, as shown below.
Now, click on the create button of the data store. The data store is created and registered with the workspace along with the storage account.
6. Now, upload the dataset to the container we created in the storage account in step 3.
We also check the file through the storage browser option in the storage account.
7. Now, create the dataset and choose the file from the data stores.
Click the next button; when we will select βFrom Azure storageβ other options will come on the left side. We choose this option because our storage is a blob type.
Now, we can deselect the Loan_ID and Gender column in the Schema option.
Our data is uploaded in the dataset.
Compute Resources
In this topic, we will discuss the managed resources artifacts i.e. compute instances and compute clusters that come in the machine learning workspace.
These are just different names for computers and virtual machines. The computed target is connected with linked services in the workspace.
Why do we need computing resources?
For any machine learning modeling, we need a computation resource that will train our model.
- Compute instance: It is a type of virtual machine/server or computer that is used for cloud computation. It is not a machine only but connected to the workspace and has Python, R, Docker, and Azure ML SDK configured. The default storage account while creating the workspace is attached to this instance means we can access all notebooks and other data stored. Mostly used in a development process training, testing, and inferencing. Inference means creating endpoints for web services.
- Compute clusters: It is also a managed resource that is a group of virtual machines. We can use the clusters for all three authors i.e.compute instance, designer, or autoML, for training and with limited deployment.
- Compute targets: These compute consist of remote/attached compute and Inference clusters.
- Remote compute: The primary aim of the target is used for training and testing the deployment. We can use any local machines, compute instances or virtual machines as a compute target. We can also use batch inferencing on compute clusters.
- Inference clusters: It is used to do real-time predictions in production using our model. They can be Azure Kubernetes Service (AKS).
Creating the compute clusters
- Go to the compute tab and find the compute cluster option in ML studio as shown below:
2. Now, choose the compute with a low budget as we are just practicing.
3. Give the name of the cluster and click on the create button.
4. Also, create a compute instance as shown below with the same information as above.
What is a pipeline?
It is just a series of tests or workflow from data processing to deployment.
We may need to create compute instances at the time of cleaning and training processes.
Creating a new training pipeline using a designer from the ML studio dashboard
- Click on the start now button in the designer option as shown below:
2. Now click on the plus button to create the pipeline.
3. The pipeline interface will open after clicking on the plus button.
4. In our data option we have our dataset as shown in the below image:
5. Just drag and drop the data to the right side.
6. Now, check for the selected columns from the dataset in the component option and drop them on the right side. Connect the output of the dataset to the input of the select column.
7. Now, double-click on the select column then click on the edit columns.
8. Now choose the clean missing option and choose the column in the edit column button.
9. Now choose the split option and then select the fraction as β0.7β and also select the stratified column with column name βloan_statusβ through the edit column.
10. Now, we are ready to train our two-class logistic regression model.
11. After all the options in the canvas, our pipeline is complete. Now go to settings to choose compute clusters.
12. Suppose our training model is big and we need to use a different compute cluster. Then we can choose the train model option and choose another compute in Run Settings as shown below:
13. Our pipeline is complete, we can click on the submit button to run the pipeline.
Our pipeline running is completed successfully.
14. After completing, right-click on the Evaluate model to see the ROC curves.
Including Confusion matrix
Creating Inference Pipeline and deploying it as a web service
- After completing the pipeline, we can get the create inference pipeline option as shown below:
2. When we choose real-time inference, azure do some changes in the pipeline as shown below:
3. We did some changes in the pipeline as shown below:
4. Now click on the submit button to run the real-time inference. After completing the automation, we can check the scores.
We got the scores and probabilities also.
5. But we need only score labels in the web service output, so we will need a select column option in the canvas to limit the output.
6. To make the predictions, we need to deploy the model in the cloud, if the deploy button is shown, then refresh the page after submitting run completion.
But before deployment, we need an Azure Kubernetes Cluster service for that need to make a new inference cluster from the ML studio.
Choose the virtual machine configuration.
Write the name of the cluster for the endpoint.
Now click on the create button to make the cluster, after some minutes it is completed successfully.
After the creation of an inference cluster, now itβs time to deploy the model and make the endpoints.
Give the name of the endpoint and select the newly made inference cluster and click on the deploy button as shown below:
After a few minutes, the deployment is complete and in the endpoint section we will see the endpoint in a healthy state.
In the test section, we can make the prediction based on the input parameters.
In the consume section, we can use the URL for scores.
Conclusion
This article is clearing the workflow of the pipeline of machine learning in the Azure cloud. The important steps are to take care of connections and services linked together.
I hope you like the article. Reach me on myΒ LinkedInΒ andΒ twitter.
Recommended Articles
1. Most Usable NumPy Methods with Python
2.Β NumPy: Linear Algebra on Images
3.Β Exception Handling Concepts in Python
4.Β Pandas: Dealing with Categorical Data
5.Β Hyper-parameters: RandomSeachCV and GridSearchCV in Machine Learning
6.Β Fully Explained Linear Regression with Python
7.Β Fully Explained Logistic Regression with Python
8.Β Data Distribution using Numpy with Python
9.Β 40 Most Insanely Usable Methods in Python
10.Β 20 Most Usable Pandas Shortcut Methods in Python