Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Beginner Tips for Using Azure Machine Learning
Latest

Beginner Tips for Using Azure Machine Learning

Last Updated on April 2, 2022 by Editorial Team

Author(s): Andrew Blance

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Beginner Tips for Getting Started with Azure MachineΒ Learning

Getting ready for the DP-100 Azure Data Science Associate exam.

Image by Alan Warburton / Β© BBC / Better Images of AI / Nature / CC-BYΒ 4.0

The end-to-end pipeline for a data science model is diverse and winding. Between exploratory data analysis, model training, deployment, and managing those models, there are a lot of moving parts. Azure Machine Learning is Microsoft’s cloud service to help developers along this journey. It offers a wide set of tools to track your model’s development, version your data, securely deploy your model, andΒ more.

I think many good APIs and software share some things in common. One of these is that they are good at predicting your desires, or at least have an understanding of how the user will want to interact with the system. When I write code and think β€œoh boy, I wish there was a way to do this really easily”, and then stumble across a feature of the language that does that exact thing in one line, I feel like the writer of the library has done something really special. This happens a lot in Python I think, with Pandas or Numpy being designed in a way that seems to understand how I will want to interact with it (except for times and dates, which kinda suck in everything).

Currently, I am preparing for the DP-100, a Microsoft exam about using Azure ML to do data science. I’ve spent a lot of time learning about the ecosystem and getting to know how it all works. I find myself thinking quite a lot about how well designed a lot of it is. How a lot of the features make my life a lot easier, removing my need to write a lot of code, as they have implemented a smart function to do itΒ already.

I’ve not really written about data science before, but I thought I would try it here. There are already lots of great stuff out there about the DP-100, so instead, I thought I would try something slightly different. This is a list of things that compliment the DP-100 and go well with the syllabus, or in some cases, things I thought were pretty neat from it. It’s not full tutorials on how to use the features, but lil suggestions of things to look at.Β Enjoy!

Visual StudioΒ Code

Ok, so this is sometimes mentioned in the syllabus. However, I’ll mention it again since it’s integrated into it soΒ well.

The Extension download page for Azure Machine Learning within VIM
By installing the Azure Machine Learning extension for VSCode, you are able to access most of its features! Isn’t it nice when all the Microsoft products play well with each other? I’d like to see VIM do this. Source: Image by theΒ author.

Within Visual Studio Code, the Azure Machine Learning plugin allows you to have access to your Workspaces, Datasets and Computes, etc. Basically, it allows you to use VSCode as your IDE while retaining the functionality of the Azure ML. Once you connect to a running compute you are able to access the files stored on it, and run your code the way you would on your local machine. On the Azure ML browser, you are a little limited to using notebooks, whereas here you can write scripts as youΒ please!

Standardisation

From my experience learning programming, I think there may be a distinction between β€œhard” and β€œsoft” programming skills. I’m gonna call the β€œhard” skills the pure coding: the language itself. The β€œsoft” stuff is literally everything else around it. I’m not even sure that this is a good distinction to make, in fact, splitting these in two probably results in worse code. However, I mention it as I think sometimes when you learn to code you are subtly trained to make the distinction. In my experience, coding courses and textbooks focus almost solely on the β€œhard” skills, and leave the β€œsoft” stuff as an exercise to theΒ reader.

I’m not much of a programmer, I still have a huge lot of room for improvement. I think much of where I have got better has come from embracing the softer side and a thoughtful and informed rejection of the β€œhard”. A lot of my problems were typical onesβ€Šβ€”β€Šβ€œoops I wrote this code 3 months ago and forgot what it does”, β€œoops I wish I could go back to an earlier version of the code” or β€œoops I’ve been given someone else's code and have no idea how to use it”. I feel a lot of these are solved not with being able to write speedier functions, but with DevOps and standards.

Anyway, that is a big introduction to simply say: try standardising stuff. Microsoft has recommended naming conventions for Azure recourses, and there are templates out there for laying out your coding projects (I’ve played around with this, badly, on Github). By standardising things, it helps new people come into a project, helps you when you go into a project you haven't been on, and you help yourself when you return to code you haven't seen in ages. Things will always be named in a consistent manner, and projects will be laid out in familiarΒ ways.

For example, you could have a recourse group for each every project, namedΒ like:

rg-example-dev-001

This tells you a lot of information already: you know its a recourse group (rg), you have an idea of its purpose (it's for a project called example), and you know it's for the dev build (rather than uat or prod). Now, inside here you could make an Azure Machine Learning Workspace, and callΒ it:

mlw-example-dev-001

This style can be followed for everything else, and should hopefully mean everything is all neat andΒ tidy.

CI/CD

Using some kind of version control is absolutely vital. This is when code is sent to a β€œrepo” for safekeeping. Being able to track changes and revisions to your code is an absolute lifesaver. It’s something that is left out of a lot of coding tutorials and textbooks I think, and I’ll admit I did spend way too long never using Git at all (just a monumentally terrible mistake).

Continuous integration then is when code is automatically checked whenever it is sent into a repo. This might involve automatically running all the tests you’ve written, checking if the code can build, or running aΒ linter.

At work, we use Azure DevOps, which has lots of fun tracking things for project management. I use Github for all my personal stuff, and it has a wonderful feature where you can launch notebooks in-browser by hitting theΒ . key on your keyboard. It's amazing. Both have robust CI/CD offerings: Github has Github Actions and Azure DevOps has Pipelines, but both work similarly.

In Github, you can create a fileΒ .github/workflows/main.ymlwhich looksΒ like:

name: Linting and Testing
on: [push]
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.7, 3.8, 3.9]
steps:
- uses: actions/checkout@v2
- name: Set up Python $
uses: actions/setup-python@v2
with:
python-version: $
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Test with pytest
run: |
pytest --ignore=docs

This code, every time you push your code will create a Python instance (either 3.7, 3.8 or 3.9) with packages based on the contents of requirements.txtΒ , then run pytestΒ . Now, when you check the project’s Github page you can take the results of the CI pipeline into account before you accept a pull request. Azure Pipelines use a very similar syntax. It's a great way of making sure code passes certain tests before accepting it.

ps, Microsoft, please release theΒ . thing for AzureΒ DevOps!

Sharing Environments

For each project, I create a Python environment for it. This is driven by a requirements.txt file. This file, as hinted at above, is also used to create the Python environment that is used for the CI/CD pipeline.

When you submit jobs in Azure Machine Learning using the run methods, you need to specify the environment to use. I (wrongly) imagined at the time I would have to create a whole new environment. I thought I would have to do this by looping through my requirements file, passing the contents of itΒ .add_pip_package() one at a time. Eventually, that would create the same environment as everywhere else. However, it's much easier thanΒ that.

Firstly, you can useΒ .from_pip_requirements() and pass the whole requirements file into it in one go. Or, if you’ve already created a conda environment you can just specify that in.from_existing_conda_environment()Β .

Then, if you register this environment, you can see it in Azure ML’s β€œEnvironment” tab! Now, you should have consistent environments across all parts of yourΒ project.

Setting aΒ Budget

I am sorry to break this to you: one day you will leave a VM or compute or something running when you thought you turned it off. This might be for an hour, or for weeks, but it’s gonna happen. I try not to think about how much I’ve accidentally spent, it’s notΒ good….

In your recourse group or subscription, you can set a budget, which can help you stop this problem before it’s a problem. You can put in the amount you think you should be spending, and set points at which you want to be warned if you are starting to approachΒ it.

The page showing how much I’ve spent this month
How much I’ve spent so far this month, and how much I am predicted to. Yeah, looks like I'm gonna spend just a teeeeeensy bit more than I wanted. Source: Image by theΒ author.

Final thoughts

So that are a few things that I’ve found useful when using Azure Machine Learning. Each point could really be its own articleβ€Šβ€”β€ŠI’ve done all the points a disservice, really! There are also so many other things I’ve found that I think are neat (the β€œModel” tab in Azure ML, Labeller, and Synapse) that I would also love to talk about. If people are interested, I might come back and write some more about everything!

However, these 5 things: VSCode, standardization, CI/CD, environment management, and Budgets are all good tools to build upon some of the DP-100 content! I might return to these things later for more in-depth exploration, but hopefully, you enjoyed what wasΒ here!

Andrew is a data scientist at Waterstons, an IT consultancy. He hosts a silly podcast called Brains on the Outside and is in a genuinely terrible band called Dioramarama. Regardless, Durham University deemed him sensible enough to make him a Doctor of particle physics (and kinda machine learning and quantum computing). Once, in a moment of unthinkable insanity, he decided to learn the 6502 assembly programming language. He can run 5km pretty fast (22:05) and thinks modern Star Trek isn’t as bad as people make it out toΒ be.


Beginner Tips for Using Azure Machine Learning was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓