Publication

4 Types of Machine Learning Interview Questions for Data Scientists and Machine Learning Engineers
Careers   Data Science   Machine Learning

4 Types of Machine Learning Interview Questions for Data Scientists and Machine Learning Engineers

Author(s): Emma Ding, Ziheng Lin

Lessons learned from interviewing with FAANG: the Most efficient ways to crack Machine learning problems

Photo by Hitesh Choudhary on Unsplash

The internet is flooded with top 10, top 20, and even top 200 machine learning interview questions covering a multitude of concepts from bias vs. variance to deep neural networks. While those concepts are important to master in order to ace machine learning interviews, you may feel underprepared and are often caught off-guard during interviews when you are only prepared to solve those problems. The truth is that machine learning interviews are more comprehensive than just a Q&A of basic machine learning concepts. Machine learning interviews evaluate a candidate’s capacity to work with a team to solve complex real-world problems using machine learning methodologies.

How Is This Article Different?

When you google “machine learning interview”, it’s hard to find articles that give you a full picture of what questions to expect in machine learning interviews. In this article, we will provide you with a comprehensive summary of the 4 types of machine learning questions you will encounter in interviews. We summarize these 4 types using our own experience interviewing with both small startups and several top-tier companies including Google, Facebook, LinkedIn, Airbnb, Twitter, Lyft, etc. Besides our own experience, we also gathered knowledge from the data scientists and machine learning engineers who have interviewed hundreds of candidates at those companies.

The 4 types of machine learning questions in this article cover almost all situations, regardless of if you are interviewing for a Data Scientist (algorithm-driven) position or a Machine Learning Engineer position in a small company or at FAANG. We provide some common examples with similar levels of difficulties to the actual interview questions, which we are unable to disclose. To help you to prepare and avoid pitfalls, we will also provide tips on the best way to answer as well as the most efficient ways to prepare.

Please note that mastering these 4 types of interview questions may not be enough because generic coding questions (algorithms and data structure) and system design (designing a non-machine learning system) also appear in interviews. These aspects are not covered in this particular post.

Here are the 4 types of questions:

  1. Machine Learning Basics
  2. Machine Learning Coding
  3. Applied Machine Learning Problems
  4. Project-Based Machine Learning Questions

The first 3 types are technically driven, and the last type tests both hard and soft skills by involving discussions of business impact, leadership skills, etc.

Before you start reading, if you are a video person, feel free to check out this video below for an abbreviated version of this post.

Table of Contents

1. Machine Learning Basics

2. Machine Learning Coding

3. Applied Machine Learning Questions

4. Project-Based Machine Learning Questions

Machine Learning Basics

Machine learning basics are commonly asked in both technical phone screens and onsite interviews to get a quick assessment of a candidate’s basic machine learning knowledge.

These machine learning conceptual questions can cover any step in developing machine learning models such as processing data, choosing models, handling details of training models, and evaluation.

During interviews, these types of questions usually do not take the entire 45 min or 1 hour. You can expect these questions to be asked either at the beginning or the end of an interview round along with other types of machine learning questions or generic coding questions.

How to Answer Machine Learning Basics Questions

The key to answering this kind of question is to be concise and organized. Here is our suggested answer outline.

  1. Give a concise definition in 2 to 3 sentences.
  2. Give one or two examples to convince the interviewer that you have both the theoretical knowledge and experience.
  3. If necessary, provide some common solutions to the problem.

Here is an example Q&A:

Q: “What is overfitting and how do you deal with overfitting?”

A: (Straight to the point definition) “Overfitting happens when the learning power of a model is too high or the data size is too small. The model ends up fitting the noise rather than the useful information of the data. So the model performs badly on unobserved datasets.”

A: (Give an example) “For example, we can encounter an overfitting problem when we have a regression model and the number of data points is less than the number of features. ”

A: (Solution) “There are a few approaches to deal with overfitting. One way is to use regularization to shrink the learned parameters. L2 regularization can keep the parameter values from going too extreme. While L1 regularization can help remove unimportant features. Another way is to use a simpler model to fit the data. Also, we can increase the training data.”

How to Prepare for Machine Learning Basics Questions

There are three main steps to preparing to answer machine learning basic questions: brushing up on your basics, collecting questions, and organizing those questions.

Brush Up On The Basics

The best way to learn is through watching lectures, reading books, and, most importantly, thinking and summarizing by yourself. You know you have truly mastered the concepts when you feel comfortable explaining them to a non-technical person. Below are some of the best resources for learning and reviewing machine learning basics.

  1. Andrew Ng’s machine learning course is the best in terms of clarity covering the fundamentals. It’s worth watching even for experienced professionals.
  2. If you are a book person, the classic Pattern Recognition and Machine Learning by Bishop is one of the best that also covers the fundamentals of statistics.
  3. For deep neural networks, the best courses are the Stanford University CS231n course offered by Andrej Karpathy and Neural Network for Machine Learning offered by Geoffery Hinton.

Collect Questions

Apart from googling “machine learning interview questions”, there are a couple of places to find interview questions:

Organize Questions

After getting a list of questions, the next step is to organize them. When preparing for dozens of interviews we discovered that organizing questions by machine learning workflow can help you discover the common problem in each step. This makes it easier for you to connect questions and give more comprehensive answers during the interview. Below are some of the most commonly asked questions organized in this manner.

Data processing

  • How to deal with outliers?
  • How to deal with missing values?
  • How to deal with an imbalanced dataset?

Feature engineering

  • How to reduce the data dimensions?
  • How to engineer new features?

Models

  • Briefly describe the Random Forest, SVM, neural networks.
  • What are the pros and cons of linear regression vs. tree-based models?
  • What are the assumptions of linear regression?

Modeling details

  • What is overfitting and how do you deal with it?
  • When will you use L1 regularization compared to L2 regularization?
  • What are hyperparameters and how do you tune model hyperparameters?”

Model Evaluation

  • List out 3 evaluation metrics for classification and regression.
  • What are precision and recall?
  • What is the difference between the ROC curve and the precision-recall curve?

Machine Learning Coding

Photo by ThisisEngineering RAEng on Unsplash

The second type of question is the machine learning coding question. Typically, these questions ask you to implement a machine learning algorithm from scratch with any language you prefer. These questions are often asked during onsite interviews to evaluate not only your familiarity with algorithms but also your ability to code up a bug-free implementation in a short amount of time. Just like any other coding interview, you will write the implementation either on a whiteboard in a face-to-face interview or on a text editor in a virtual interview.

This may seem a little daunting because there are so many machine learning algorithms and each has a unique implementation. However, you do not need to worry! There are only a limited number of algorithms that appear in interviews. Some algorithms are too complicated for a 1-hour interview and thus usually not seen.

As this great blog post points out, the most commonly asked algorithms are:

The Ultimate Guide to Acing Coding Interviews for Data Scientists

Supervised Learning:

  • Linear regression
  • Logistic Regression
  • K-nearest Neighbors
  • Decision Tree

Unsupervised Learning:

  • K-means Clustering

How to Answer Machine Learning Coding Questions

Answering machine learning coding questions is similar to generic coding questions. We recommend following a few steps.

  1. Briefly explain how the algorithm works to the interviewer.
  2. When implementing your solution move from the main function to helper functions. The main function handles the input data and returns the results. The helper functions should handle small tasks such as initializing parameters or computing gradients.
  3. Explain your code step by step to the interviewer. It’s your choice either to explain while writing code or to finish most of the coding before summarizing your solution.
  4. The most important thing is to keep your implementation bug free and readable.

How to Prepare for Machine Learning Coding Questions

Although the list contains only 5 algorithms, memorizing the code line by line is rather unrealistic (on top of everything else you need to study). Instead, focus on understanding and internalizing the algorithms. Then, you will feel much more confident and comfortable with the implementation. Here is how to study and practice by yourself.

Familiarize Yourself with the Algorithms

Before implementation, it’s essential to understand the algorithm steps clearly. Again, we recommend Andrew Ng’s machine learning class for reviewing the algorithms.

Practice

Writing code in Python on a Jupyter notebook is highly recommended for debugging and testing purposes.

  1. When implementing the first time, you can write everything as one function without worrying about the best coding practice.
  2. Focus on having a working solution without using any third-party libraries such as NumPy, SciPy, and scikit-learn.
  3. Then, work on breaking your code down into functions based on the algorithm steps.
  4. Ask yourself the space and time complexity of implementation in big O notations. This is very important because questions on complexity are often asked as follow-up questions in interviews.

Applied Machine Learning Questions

Photo by ThisisEngineering RAEng on Unsplash

The third kind of question is the applied machine learning questions, which are the most difficult, and, at the same time, the highest weighted questions. Typically, the interviewer gives you an open-ended problem and asks you to come up with an applicable machine learning solution in under 30–40 minutes. To assess your proficiency and level of experience, the interviewer will continually question your decisions, such as the choice of models, and dig into the details, such as handling data issues and running experiments. Here are some example questions:

Generic Problems

  • How to design a text classification model?
  • How to design an image classification model?
  • How to detect spam emails?
  • How to detect spam accounts?

Domain-Specific Problems

  • How to design a recommendation system?
  • How to design an estimated time of arrival (ETA) model?
  • How to design a query and ranking system?

Depending on your level of experience, your interview questions will differ. Candidates with little or no industry experience will likely get generic problems. Experienced candidates may face more domain-specific problems.

How to Answer Applied Machine Learning Questions

To get started, you first need to clarify what goal needs to be achieved, available data, and constraints. After clarification, you can walk through the overall ideas and discuss them with the interviewer. To keep you and the interviewer on the same page, it is helpful to follow a structure like the following:

Data

  • Clean data and dealing with outliers

Feature Engineering

  • Brainstorm the features needed for the task
  • Engineer new features if necessary

Models Selection and Engineering

  • Select 1 to 2 models that are suitable for the problem
  • Discuss the pros and cons of the models

Training, Model Tuning, and Evaluation

  • Develop metrics for evaluation
  • Design training, validation, and evaluation strategies
  • Discuss methodologies that improve the performance

Because of the open-ended nature of these questions, the interview depends on your solutions and the follow-up questions asked by the interviewer. Sometimes, you may feel overwhelmed after being asked a couple of follow-up questions. Make sure you come back to the structure above and complete your design. This shows that you are able to lead the discussion.

How to Prepare for Applied Machine Learning Questions

When preparing for applied machine learning questions, you will need to prepare differently for generic versus domain-specific questions.

Generic Problems

Kaggle is an excellent resource. There are lots of well-defined machine learning problems and in-depth solutions posted in the community.

Try to work on a project by yourself then compare your solution to others to find areas for improvement. When comparing, take a close look at the Exploratory Data Analysis (EDA), data processing, feature selection, and model selection. Pay attention to the documented explanation for the reasons behind these decisions. After training yourself on a few projects, you should develop a good sense of solving this type of problem.

Domain-Specific Problems

This kind of problem requires real work experience to be able to provide solid answers. However, if you don’t have first-hand experience, you can still ace the interview through preparation. The best (fastest and most efficient) way to prepare is to read research papers. Reading papers may seem like a lot of work, but it’s the best method to gain detailed insights. When reading papers, focus on the data format, features engineering, model architectures, and results/findings since these are often the focus of the interview. Sometimes you can find recorded conference talks by the authors, which will help speed up the reading.

How do you find papers to read? It’s pretty simple. Search keywords on Google Scholar. You can find relevant papers, then choose the top three with the highest citations. The methodologies in those papers are highly adopted in the industry. Therefore, they could be relevant to what the interviewer wants. Below are some resources related to designing a recommendation system. You can find similar papers for other domains for which you are interviewing.

Traditional matrix factorization solution:

Deep learning approaches:

Project-Based Machine Learning Questions

Photo by Van Tay Media on Unsplash

Like applied machine learning questions, the purpose of project-based questions is also to assess the level of expertise of a candidate. However, the difference is that this type of question can be either technically or non-technically oriented depending on who you are interviewing with, i.e., an individual contributor or a manager.

During the 45 min. to 1 hour interview, the interviewer may start by having you introduce a machine learning project that you have worked on or ask about a project listed on your resume. At the beginning, the interviewer will have you describe the context of the project. Then, depending on the type of the interview, the conversation will deviate towards either technical details, business impact, or leadership depending on the interviewer. Those questions can include:

  • What is the size of the data? How did you select features?
  • Why did you choose this model? Have you tried different models?
  • How did you evaluate the model performance (online and offline)?
  • What is the impact on the product or the service?
  • Did you work with other teams? Did you lead any of the process?

How to Answer Project-Based Machine Learning Questions

The key to this kind of question is to ALWAYS interact with the interviewer! Present your project in a conversational way and not as a report. We recommend using the following steps to describe your project.

  1. Summarize your project in 1 to 2 sentences (the goal of the project, what role you played, with which company), followed by the IMPACT (improved model performance, increased revenue, etc). It’s better to quantify it by numbers than using subject words.
  2. Highlight 2 to 3 challenges of the project such as the size of the data, the quality of the data, model training, and deployment.
  3. Share one interesting finding with the interviewer.
  4. If the interviewer is more interested in your leadership and influence, you can also talk about 1 to 2 non-technical contributions you have made such as bringing ideas, initializing meetings, and collaborating with other individuals on the team.

To engage the interviewer, once you finish talking about each part, confirm with the interviewer which direction he/she wants you to take. Should you provide more context or move to the next point?

How to Prepare for Project-Based Machine Learning Questions

There are three steps you can take to prepare for these types of questions: summarize your projects, think through technical details, and practice out loud.

Summarize Your Project:

The most important thing is to summarize the overall goal and impact of the project. Try to summarize them in concise and simple words so that the interviewer can understand the context easily. For describing the project and your contribution, you can leave out most of the details about groundwork and focus instead on what challenges you faced and what quantitative results you achieved. Below are some questions to get you started.

  • What is the business impact (eg. accuracy, revenue, revenue) of the project?
  • How did other people or teams benefit from the project?
  • Can the model be expanded to solve other business problems?

Think Through Technical Details:

Typically, you could use the aforementioned steps to answer the questions without any need to give too many details to the interviewer. However, if the interviewer is an individual contributor, he/she may be more interested in the technical details. In this case, it would be necessary to understand the theory and implementation of the models of the project and make sure you have clear answers to questions like the following.

Data Processing:

  • How many features did you use?
  • How did you select features?
  • Did you engineer new features? How?

Models:

  • What are some alternative models that you experimented with?
  • How do the performances differ from each other?
  • Have you tried a simpler model (eg. linear regression)? Why is it necessary to use a more complicated model?

Modeling details:

  • What are the hyperparameters you tune?
  • How did you tune the hyperparameters?

Model Evaluation:

  • What offline and online evaluation metrics did you use?

Practice Out Loud

The best way to make sure you are describing your project in an engaging way is practice. Practice presenting projects to others to ensure both grasp of the material and ease of communication.

Thanks for reading!

  • Clap if you learned something new in this post! It will motivate us to write more to help more people!
  • Connect with Emma and Ziheng on Linkedin!
  • Subscribe to Emma’s YouTube Channel!
  • Follow Emma on Medium!

Bios: Emma Ding is a software engineer at Airbnb who started as a data scientist. Previously, she was the first data scientist at Elementum, a SaaS company that provides a supply chain automation platform to centralize information and communication to ensure products are available at the right time, place, quantity, and cost. She got a master’s degree in Transportation Engineering from the University of California, Berkeley and a master’s degree in Computer Science from the University of Illinois Urbana-Champaign.

Ziheng Lin is a machine learning engineer at Google. Prior to Google, he was an ML research engineer at Rakuten focusing on applying ML models to improve predictions for delivery and pick up services. He got his Ph.D. in Transportation Engineering from the University of California, Berkeley.


4 Types of Machine Learning Interview Questions for Data Scientists and Machine Learning Engineers was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓