How to Become a Data Scientist? Step-by-step Path

Last Updated on July 20, 2023 by Editorial Team

Author(s): Ali Ghandi

Originally published on Towards AI.

Data Science, Opinion

Becoming a data scientist is a relatively new career trajectory that merges statistics, business logic, and programming knowledge. Especially a data scientist and not just a machine learning engineer needs a comprehensive understanding of algebra, statistics, machine learning, and Deep Learning algorithms. I want to suggest a path which you can take in 3 months to prepare for a data scientist interview . This path starts with simple steps and is completed with a crucial part of the field. This article is written in 2020.

Step 1: First, you should understand that if you want to work in this field. It’s fun to code an ML algorithm using Python’s popular libraries, but data science is not just that. You need to know so many algorithms’ theory in detail and do their math in case you want to be a senior Data scientist. So for the first step, before any coding, you can start with Andrew Ng videos on You-tube about machine learning. Find it here. If you finish that course and you still like the field, go to step 2.

Step 2: In this step, you have basic information about ML, and now you can start coding. There are so many resources for practicing the coding part. It is not important which course(video or texts) you take because they all use the same libraries, and for every coding step you learn, you should practice it using Kaggle competitions. Be sure you spend enough time practicing on each competition. As you try harder, you will have better experience in the next steps.

You should also practice data cleaning and data visualization in these competitions. As a data scientist, you spend 50% of your time just in these two parts. Be sure to be master in Seaborn, Mathplotlib, Pandas, and NumPy beside popular libraries in ML-like Scikit learn. (these are popular in 2020 and trend may change in future).

Step 3: Now, you have basic knowledge of the theoretical and practical part of data science. It is time to read some algebra. Goodfellow deep learning book has a good chapter about algebra you need in ML.

Here are some concepts you should learn about:

Eigenvalue and eigenvector intuitions in geometry and algebra
The geometrical intuition of determinant
Matrix linear transformation and matrix inverse intuition
Singular Value Decomposition
PCA

This Youtube channel has an intuitive series about essentials in algebra.

ِِYou may use the first five sessions of this MIT course for better understanding: Course.

You should also have a basic knowledge of statistics. Some problems like statistic tests and p-value or maximum likelihood estimation… are so important that interviewers are always asking about them.

Not only should you learn to do the math, but also you should learn all intuitions about these and find some samples in real life. For example, using statistic tests in real life or maximum likelihood in natural distributions.

Step 4: If you complete the last steps comprehensively, you are ready to get deep into ML. Andrew Ng has a course at Stanford university about advanced Machine Learning, in which you can find the mathematical details of what you have learned on the first steps. Find notes here. Be sure you understand the math behind each subject.

Step 5: One of the best ML practical books in 2020 is “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition”. Read the first ten chapters and try to do its exercises as much you can. It shows every detailed use of the scikit learn library and demonstrates an End-to-End ML project. You may find various practices here.

Now you are good enough to start an ML project on your own. You should always read and challenge your skills, which help you improve your knowledge. It’s Deep-Learning time.

Step 6: Deep Learning is so easy, and it has simple math behind that. Just start with its history. Read about Hilbert’s model, Perceptron, Adaline, and Hopfield neural nets and their evolution.

After that, you can use Andrew Ng’s course on Deep Learning. Again, be sure you understand each subject. They all are so important. Of course, you may also find the “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow” book helpful too in Deep Learning. It shows how Keras APIs works and describes some advanced use of this. You do not need to learn low-level tensor-flow APIs except when you need it.

Step 7: You are doing great. These steps need 3 to 4 months, and in this step, you just need to practice practice and practice.

Maybe I will write a post about advanced interview questions that can help you evaluate yourself. So follow if you are interested.

Extra steps: there are many opportunities in the field. Read “Mining of Massive Datasets.” Stream mining, big-data algorithm, and so many other subjects are there like, Frequent Item-sets, Recommendation Systems, Mining Social-Network Graphs, Advertising on the Web ….

Learning Spark and have knowledge about Hadoop may help you at work. Many companies need Spark and big data infrastructure as they have large-scale data. You can not use custom ML algorithms there. PyTorch, tensor-flow, and other frameworks are valuable if you learn.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

How to Become a Data Scientist? Step-by-step Path

Author(s): Ali Ghandi

Data Science, Opinion

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Fundamental Mathematics of Machine Learning

Built-In AI Web APIs Will Enable A New Generation Of AI Startups

Auditing Predictive A.I. Models for Bias and Fairness

Why is Llama 3.1 Such a Big deal?

5 AI Real-World Projects To Set Foot in The Door

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

How to Become a Data Scientist? Step-by-step Path

Author(s): Ali Ghandi

Data Science, Opinion

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement