How to Become a Data Scientist? Step-by-step Path
Last Updated on July 20, 2023 by Editorial Team
Author(s): Ali Ghandi
Originally published on Towards AI.
Data Science, Opinion
Becoming a data scientist is a relatively new career trajectory that merges statistics, business logic, and programming knowledge. Especially a data scientist and not just a machine learning engineer needs a comprehensive understanding of algebra, statistics, machine learning, and Deep Learning algorithms. I want to suggest a path which you can take in 3 months to prepare for a data scientist interview . This path starts with simple steps and is completed with a crucial part of the field. This article is written in 2020.
Step 1: First, you should understand that if you want to work in this field. Itβs fun to code an ML algorithm using Pythonβs popular libraries, but data science is not just that. You need to know so many algorithmsβ theory in detail and do their math in case you want to be a senior Data scientist. So for the first step, before any coding, you can start with Andrew Ng videos on You-tube about machine learning. Find it here. If you finish that course and you still like the field, go to step 2.
Step 2: In this step, you have basic information about ML, and now you can start coding. There are so many resources for practicing the coding part. It is not important which course(video or texts) you take because they all use the same libraries, and for every coding step you learn, you should practice it using Kaggle competitions. Be sure you spend enough time practicing on each competition. As you try harder, you will have better experience in the next steps.
You should also practice data cleaning and data visualization in these competitions. As a data scientist, you spend 50% of your time just in these two parts. Be sure to be master in Seaborn, Mathplotlib, Pandas, and NumPy beside popular libraries in ML-like Scikit learn. (these are popular in 2020 and trend may change in future).
Step 3: Now, you have basic knowledge of the theoretical and practical part of data science. It is time to read some algebra. Goodfellow deep learning book has a good chapter about algebra you need in ML.
Here are some concepts you should learn about:
- Eigenvalue and eigenvector intuitions in geometry and algebra
- The geometrical intuition of determinant
- Matrix linear transformation and matrix inverse intuition
- Singular Value Decomposition
- PCA
This Youtube channel has an intuitive series about essentials in algebra.
ΩΩYou may use the first five sessions of this MIT course for better understanding: Course.
You should also have a basic knowledge of statistics. Some problems like statistic tests and p-value or maximum likelihood estimation⦠are so important that interviewers are always asking about them.
Not only should you learn to do the math, but also you should learn all intuitions about these and find some samples in real life. For example, using statistic tests in real life or maximum likelihood in natural distributions.
Step 4: If you complete the last steps comprehensively, you are ready to get deep into ML. Andrew Ng has a course at Stanford university about advanced Machine Learning, in which you can find the mathematical details of what you have learned on the first steps. Find notes here. Be sure you understand the math behind each subject.
Step 5: One of the best ML practical books in 2020 is βHands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Editionβ. Read the first ten chapters and try to do its exercises as much you can. It shows every detailed use of the scikit learn library and demonstrates an End-to-End ML project. You may find various practices here.
Now you are good enough to start an ML project on your own. You should always read and challenge your skills, which help you improve your knowledge. Itβs Deep-Learning time.
Step 6: Deep Learning is so easy, and it has simple math behind that. Just start with its history. Read about Hilbertβs model, Perceptron, Adaline, and Hopfield neural nets and their evolution.
After that, you can use Andrew Ngβs course on Deep Learning. Again, be sure you understand each subject. They all are so important. Of course, you may also find the βHands-On Machine Learning with Scikit-Learn, Keras, and TensorFlowβ book helpful too in Deep Learning. It shows how Keras APIs works and describes some advanced use of this. You do not need to learn low-level tensor-flow APIs except when you need it.
Step 7: You are doing great. These steps need 3 to 4 months, and in this step, you just need to practice practice and practice.
Maybe I will write a post about advanced interview questions that can help you evaluate yourself. So follow if you are interested.
Extra steps: there are many opportunities in the field. Read βMining of Massive Datasets.β Stream mining, big-data algorithm, and so many other subjects are there like, Frequent Item-sets, Recommendation Systems, Mining Social-Network Graphs, Advertising on the Web β¦.
Learning Spark and have knowledge about Hadoop may help you at work. Many companies need Spark and big data infrastructure as they have large-scale data. You can not use custom ML algorithms there. PyTorch, tensor-flow, and other frameworks are valuable if you learn.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI