The analysis is done from 1000+ recent Data scientist jobs, extracted from job portals using web scraping.
Recently, I actively started looking for a job change to Data science, and I don’t have any formal education like a Master’s or Ph.D. background in AI/Machine Learning. I started learning it completely out of my own interest (not just because of the hype). It was one of the challenging tracks to opt-in, especially if you are working simultaneously on some other technology. I started my journey by enrolling myself in many MOOCs(Massive Open Online Courses) and started reading multiple blogs. Initially, it didn’t make sense, eventually after reading other people’s code and getting my hands dirty with real-time datasets. It slowly started making sense.
When I started searching for jobs, there began a new interesting story. I opened a top job portal in India and started searching for jobs, I found few jobs that were relevant to what I was looking for, but when I opened one of them, to my surprise, the requirements they mentioned were something new to me. Leaving traditional Data analysis, Machine learning, Deep learning apart, some ETL tools, and multiple Big Data technologies were mentioned as required skills. I thought It’s okay since every company has their own definition of a data scientist these days and opened another job. This time it showed up with a requirement of some other technologies like AWS, Azure, and Power BI.
Remember, all these openings were tagged under Data scientist only. All these openings have common requirements like Machine learning algorithms, Statistics, Data Analysis, Data cleaning, and Deep learning techniques. Along with these skills, a few companies were expecting the candidates to have knowledge in the cloud (AWS, Azure, or GCP) and data visualization tools like Tableau, Power BI, and ETL tools like SSIS. Usually, these technologies are more to do with Data Analyst/Data Engineer roles, but the Data scientist role is still evolving and doesn’t really stick to a particular skillset yet.
I do understand the fact that companies look for an applicant who fits their vacancy and also has the skillset in the technologies they are looking for. This will definitely save time and money for the company instead of providing the training again.
So, Here I got an interesting idea to understand what exactly IT industry is expecting for a data scientist role in real-time not what usually taught in MOOCs.
Objective: We will try to find out the skills and trends that are most sought in the industry right now. For this, we will scrape data from the job portal.
Note: This whole analysis is done for data scientist role In Indian market.
In this article, we will try to find answers to a few important questions, which every data science job seeker will have in mind.
- What are the top skills companies are looking for?
- What is the most desired experience level in the industry?
- What are the companies that are actively offering jobs in this field?
- What are the locations that have more openings?
Note: You can find the link to complete code in Conclusion section.
1. Web- Scraping:
I have gathered all the relevant job information from the top job portal in India- Naukri.com, which almost every job aspirant and recruiter uses these days. I have used selenium-python for web scraping since the traditional BeautifulSoap approach somehow didn’t work well on this site.
Disclaimer: The webscraping is performed purely for educational puposes.
We will scrape these five elements for each job: Role, Company name, Experience, Location, and Key Skills.
Code for scraping:
2. Pre Processing:
Let’s do some basic preprocessing before we dive in.
2.1. Handling missing values:
Performed a basic cleaning of finding the missing values and dropping them.
2.2. Handling duplicate data:
We need to be really careful while handling duplicate data since a company might post the same requirement multiple times because the job is still open, or on the other hand, the company might be looking for a completely new opening with the same requirement. To keep it simple, I’ve not dropped any data.
2.3. Tokenizing locations and skills columns
Converted all the strings to the lower case to avoid redundancy and tokenized the locations and skills columns since there is more than one value in these columns.
This is how it looked after the preprocessing.
Now, we have everything to get started.
3.1. Which location offers more openings? :
Note: If you are not from India, feel free to skip this locations part.
- If we observe the above plot, there is almost 38% of the jobs located in Bengaluru.
- The top 4 cities, namely Bengaluru, Mumbai, Hyderabad, and Pune, constitute almost 72% of total data science jobs in the country.
- So if you are from any of these cities, your chances of getting a Data scientist job is probably more than in other cities.
3.2. What Companies are actively recruiting?:
- Analytics Vidhya educon topped the list with almost 21% of total job listings.
- There are many consultancies on the list too. These consultancies usually conduct recruitment for their clients.
- In general, competition in job portals would be very high. Most of the time, your profile might not even be viewed by the recruiter due to the huge amount of applications received. There are instances where even for a single vacancy, you will have to compete with hundreds of other applicants. It is better to know the companies who are recruiting actively so that we can apply directly through their official website, which increases the probability of landing an interview.
3.3. What is the most desired Experience?:
- We can observe that companies are clearly looking for experienced candidates. There seem to be more vacancies for candidates with 5–10 years of experience. This makes sense since a data scientist’s job involves key decision-making skills that come with experience.
- Candidates with at least 2 years of experience have fairly good opportunities.
- This doesn’t mean that freshers cant get in. It’s just that there are more openings for experienced candidates than freshers. Companies usually don’t recruit freshers from these job portals, and they will directly recruit them from campus recruitment. Freshers can always opt to work for startups to gain the necessary experience.
3.4. What are the Roles in demand:
This is an important step to look into because, after a few results, job portals usually start showing some other jobs that are irrelevant to the job we were searching for. Just to be assured that we are looking at the right roles, let’s check the top 10 frequently mentioned roles.
- If we observe in the previous section, there were more vacancies for people with more experience, which leaves us a question of openings based on roles.
- Most of the vacancies are still termed as Data scientists. Followed by Senior Data scientist and Lead Data scientist who, of course, needs good previous experience.
3.5. Skills that companies are looking for:
Finally, here we are. The main reason why you are probably reading this.
- It looks very complex right, don’t worry, I will break it down in the later part. The reason I have included many skills in the plot is due to the vast areas involved in Data Science.
- Though we were able to depict some top skills in the above plot, it still doesn’t serve the purpose of this analysis.
Let’s dive in deep to understand the trends more clearly.
3.5.1. Must-Have Skills?:
- Machine learning is no surprise as the most important skill to have for a data scientist.
- Data mining and Data analysis are the key activities that every data scientist has to go through.
3. Strong statistical modeling is required to be a better data scientist.
4. Companies are expecting a good knowledge of deep learning since it provides the state of the art techniques to solve some interesting real-time problems in fields like NLP and Computer Vision.
5. Employers are expecting the candidates to have knowledge of big data technologies due to the huge rise in the amount of data recorded every day. In real-time, we might be working on huge datasets where these skills will definitely come in handy.
3.5.2. Programming Language in demand? :
- If you are starting out to learn Data Science, In the beginning, you’ll definitely find it hard to choose the right programming language. Though there are many languages, the competition has always been among Python and R itself. Let’s see what the data is telling us.
2. The industry is still in favor of Python due to its rich libraries followed by the R language.
3. SQL is a must for every data scientist. However, it doesn’t fit to be treated as a programming language. I still included here by taking my chances :).
4. After python and R, there seems to be good demand for SAS and C++ languages.
3.5.3. Deep learning Framework to opt for? :
- Due to the sudden rise in deep learning, many deep learning frameworks came into the market from giants like Google and Facebook.
3. Keras has its good share in the market. People love it because of its simple and easy nature.
4. Though there are many other frameworks like Caffe, Maxnet there seems to be not many openings. If not in the world, at least in India.
3.5.4. Which big data technology has the edge?
- Spark tops the list. One can go for the python version of spark -Pyspark.
- Hadoop is with almost the same opportunities as spark, only with a minor difference.
- There are considerable openings with hive too.
3.5.5. Which Cloud provider is in demand for ML?
- Training the models involves huge computations, which can easily get very expensive. Companies are in search of cheaper ways to get the work done. That is where these cloud platforms came into the picture.
- AWS tops the list, followed by Azure.
- Companies are moving quickly towards cloud options. There are more chances that these technologies will play a major role in the coming days in Data science.
3.5.6. Data Visualization Tool in demand?
- Employers are showing more interest in Tableau for data visualization.
- While Microsoft’s Power BI is still lagging behind.
Do you really have to match all the skills mentioned in this post to get a job?
Well, not really. There are few tools in the list which are easy to pick up on the job if you are strong with your fundamentals. Having said that, if you are looking for a job having these skills on your resume will probably help you land an interview.
If you are good with all the mentioned must-have skills for a data scientist, then the best approach should be to start attending the interviews and meanwhile try to fill the gaps in your understanding and learn the tools/technologies you feel will give you an edge over other candidates.
If you find this helpful or have any questions, do let me know in the comments.
See you later. Happy Coding!
Bio: Shareef Shaik is an Aspiring Data scientist passionate about solving real-world problems with the help of AI.
Know What Employers are expecting for a Data Scientist Role in-2020 was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI per author’s request.