# How to Become a Data Scientist? Step-by-step Path

Last Updated on July 20, 2023 by Editorial Team

#### Author(s): Ali Ghandi

Originally published on Towards AI.

## Data Science, Opinion

Becoming a data scientist is a relatively new career trajectory that merges statistics, business logic, and programming knowledge. Especially a data scientist and not just a machine learning engineer needs a comprehensive understanding of algebra, statistics, machine learning, and Deep Learning algorithms. I want to suggest a path which you can take in 3 months to prepare for a data scientist interview . This path starts with simple steps and is completed with a crucial part of the field. This article is written in 2020.

Step 1: First, you should understand that if you want to work in this field. It’s fun to code an ML algorithm using Python’s popular libraries, but data science is not just that. You need to know so many algorithms’ theory in detail and do their math in case you want to be a senior Data scientist. So for the first step, before any coding, you can start with Andrew Ng videos on You-tube about machine learning. Find it here. If you finish that course and you still like the field, go to step 2.

Step 2: In this step, you have basic information about ML, and now you can start coding. There are so many resources for practicing the coding part. It is not important which course(video or texts) you take because they all use the same libraries, and for every coding step you learn, you should practice it using Kaggle competitions. Be sure you spend enough time practicing on each competition. As you try harder, you will have better experience in the next steps.

You should also practice data cleaning and data visualization in these competitions. As a data scientist, you spend 50% of your time just in these two parts. Be sure to be master in Seaborn, Mathplotlib, Pandas, and NumPy beside popular libraries in ML-like Scikit learn. (these are popular in 2020 and trend may change in future).

Step 3: Now, you have basic knowledge of the theoretical and practical part of data science. It is time to read some algebra. Goodfellow deep learning book has a good chapter about algebra you need in ML.

Here are some concepts you should learn about:

1. Eigenvalue and eigenvector intuitions in geometry and algebra
2. The geometrical intuition of determinant
3. Matrix linear transformation and matrix inverse intuition
4. Singular Value Decomposition
5. PCA

ِِYou may use the first five sessions of this MIT course for better understanding: Course.

You should also have a basic knowledge of statistics. Some problems like statistic tests and p-value or maximum likelihood estimation… are so important that interviewers are always asking about them.

Not only should you learn to do the math, but also you should learn all intuitions about these and find some samples in real life. For example, using statistic tests in real life or maximum likelihood in natural distributions.

Step 4: If you complete the last steps comprehensively, you are ready to get deep into ML. Andrew Ng has a course at Stanford university about advanced Machine Learning, in which you can find the mathematical details of what you have learned on the first steps. Find notes here. Be sure you understand the math behind each subject.

Step 5: One of the best ML practical books in 2020 is “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition”. Read the first ten chapters and try to do its exercises as much you can. It shows every detailed use of the scikit learn library and demonstrates an End-to-End ML project. You may find various practices here.

Step 6: Deep Learning is so easy, and it has simple math behind that. Just start with its history. Read about Hilbert’s model, Perceptron, Adaline, and Hopfield neural nets and their evolution.

After that, you can use Andrew Ng’s course on Deep Learning. Again, be sure you understand each subject. They all are so important. Of course, you may also find the “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow” book helpful too in Deep Learning. It shows how Keras APIs works and describes some advanced use of this. You do not need to learn low-level tensor-flow APIs except when you need it.

Step 7: You are doing great. These steps need 3 to 4 months, and in this step, you just need to practice practice and practice.

Extra steps: there are many opportunities in the field. Read “Mining of Massive Datasets.” Stream mining, big-data algorithm, and so many other subjects are there like, Frequent Item-sets, Recommendation Systems, Mining Social-Network Graphs, Advertising on the Web ….

Learning Spark and have knowledge about Hadoop may help you at work. Many companies need Spark and big data infrastructure as they have large-scale data. You can not use custom ML algorithms there. PyTorch, tensor-flow, and other frameworks are valuable if you learn.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI