A Data Scientist Is More Than Just a Data Scientist

Last Updated on July 26, 2023 by Editorial Team

Author(s): Shanmukh Dara

Originally published on Towards AI.

My thoughts on the best way to enter and advance in the field of data science…

Hello there, may I ask you a question? What are the top skills required for a data scientist to be successful? I can guarantee that the answer to this question differs from person to person and firm to firm. I must admit that there is no single objective answer to this question. But, as a data scientist, I’ve always wondered why. If we can create driverless cars and forecast the future, why can’t we answer this issue objectively? So let me explain why answering this question is difficult, as well as my thoughts on what skills a data scientist should have and the best way to develop them.

This blog is not intended to provide technical resources. Rather, my focus will be on changing your perspective and steering your journey to enter and grow in the data science domain.

So, to return to my original point, why is it so difficult to determine the abilities necessary for a data scientist? This, in my opinion, is attributable to three major factors:

In recent years, the phrase “data scientist” has become diluted
The company’s culture
Division of Labor

Data scientist is without a doubt the sexiest job in the twenty-first century (Image Source)

Anyone who has been following this domain for a few years would undoubtedly agree with me on the first two points. Indeed, the term “data scientist” has become diluted in recent years; now, a data scientist can play any role, ranging from business problem formulation to model deployment and monitoring. Second, the role of a data scientist can be influenced by the company’s culture and data maturity. After working with a number of established firms and a few startups, I’ve discovered that working as a data scientist at a startup in its early stages necessitates more business acumen than technical skills.

Finally, and most importantly, there is the “Division of Labor.” Adam Smith uses the vivid example of a pin factory assembly line in The Wealth of Nations to explain how the division of labor is the primary source of productivity gains. Data analysis tasks, like pin-making, necessitate numerous processes, which is why organizations typically hire specialists such as data engineers, experimentation scientists, machine learning experts, and so on. A product manager oversees the work and handles hand-offs between functions.

Because of this division of labor, many data scientists wind up doing a lot of data modeling, which fosters the impression that data scientists only need data-related skills and nothing else. Let me tell you that this is entirely incorrect, which is why I titled the blog “A data scientist is more than just a data scientist.”

This is why, rather than being a specialist in only one domain, a data scientist’s knowledge should take the form of a π shape, with good horizontal knowledge across the entire end-to-end process and in-depth knowledge in 1–to 2 particular domains. A data scientist must be more of a generalist than a specialist.

When you ask a couple of data scientists for advice on how to become an expert in this domain, the most common response is “Kaggle competitions.” However, the harsh reality is that Kaggle competitions will not prepare you for the real world. Without a doubt, Kaggle is a good place for newcomers and those looking to build a profile. However, after the first learning phase, Kaggle fails to provide a sense of real-world challenges.

Is it hard to believe? I learned this the hard way when I began my practicum project at Kiva Organization as part of my MSBA coursework. Here are a few reasons why experiential learning is superior to Kaggle competitions.

1. Problem-solving thought process

Real-world projects are not like Kaggle competitions, Kaggle offers you a clear view of the problem, the data at hand, the solution required, and sometimes what needs to be done. So you’re left with almost nothing to brainstorm and think about.

However, in the real world, your problem statements are most often not precisely defined or are open-ended business problems. In most cases, the analysis begins with transforming the business problem into an analytics challenge and then experimenting with various analytics methodologies to address it.

As part of my practicum project at Kiva, we were given a very vague business problem; it took us approximately 3–4 weeks to fully understand Kiva’s business, then the business problem, and ultimately scope the business problem. Then, to ensure that we were on the right track, we explained our understanding and suggested numerous approaches to the problem. That’s when we came to an agreement on a problem and the approach. This provided us with a safe environment in which to brainstorm, generate innovative and creative ideas, do a fast sanity check, and, if necessary, kill the ideas.

2. Data collection and cleaning

Datasets are already available in Kaggle competitions, and they are frequently clean and well-structured. This limits your thinking; you grasp the problem and try different methods to discover which one works the best. In the real world, however, it is our responsibility as data scientists to comprehend the problem and find the key list of data attributes that would be useful from the massive amounts of data that exist in data warehouses. In some cases, the data is not readily available and must be gathered from a variety of sources using web scraping.

Before deciding on a set of features to use at Kiva, we had to understand all of the accessible data, its quality, and quantity. This required a great deal of trial and error. Furthermore, most of the data is not clean in general, so it is our responsibility to clean and fix the data before analyzing it.

3. Performance vs Business impact

In the classroom and at other competitions, our success metric is the model’s performance, or how well it predicts unseen data. However, in most real-world scenarios, we are more concerned with the business impact than with performance. The business impact can range from increased bottom-line profit to increased sales to decreased expenditure.

This involves ensuring that you thoroughly grasp the client’s (marketing team or product team’s) problem and that the client is well aware of the problem we are trying to solve and is eager to use these predictions in the future.

4. Communicating with a non-technical audience

We are required to collaborate closely with marketing, product, sales, and engineering teams as data scientists. Most of these individuals are non-technical, which means you cannot communicate with them in the same manner that you would with a fellow data scientist. You must clearly comprehend what that team is interested in and communicate only the necessary information with minimal jargon.

5. Miscellaneous

Aside from everything mentioned above, here are a few other skills that the practicum project helped me improve.

Storytelling — Another key talent that every data scientist requires is storytelling. Since the majority of the people we interact with on a daily basis are non-technical, this skill will come in handy when communicating your impact to others.
Prioritization — On a daily basis, we are inundated with ad hoc analysis requests, and there may be a large number of projects in the pipeline. It is our responsibility to evaluate the impact and prioritize the projects.

So what now?

A good data scientist is much more than a data scientist. He is a detective who can spot problems, a very good storyteller, a magician who can solve problems using machine learning approaches, a good team member who can collaborate, and a good project manager who can prioritize tasks.

And, in order to become a good data scientist, you should focus on holistic growth rather than simply data tools, which can only be accomplished by working on end-to-end projects similar to my practicum project at Kiva.

So what next?

Try working on end-to-end problems
Start with an open-ended business problem, brainstorm, and scope it
Scrape/Collect data if possible
Think of success criteria

Thanks for Reading!

Do you agree/disagree with me? Let me know in the comments below.

Further reading

Last Updated on September 5, 2016, Julia Evans wrote a post recently titled " Machine learning isn't Kaggle competitions…

machinelearningmastery.com

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

A Data Scientist Is More Than Just a Data Scientist

Author(s): Shanmukh Dara

My thoughts on the best way to enter and advance in the field of data science…

1. Problem-solving thought process

2. Data collection and cleaning

3. Performance vs Business impact

4. Communicating with a non-technical audience

5. Miscellaneous

So what now?

So what next?

Thanks for Reading!

Why Data Science Teams Need Generalists, Not Specialists

Most businesses organize for efficient productivity. They do this through specialization. Workers that are highly…

Machine Learning is Kaggle Competitions – Machine Learning Mastery

Last Updated on September 5, 2016, Julia Evans wrote a post recently titled " Machine learning isn't Kaggle competitions…

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Arbitration for AI: A New Frontier in Governing Uncensored Models

Fine-Tuning vs Distillation vs Transfer Learning: What’s The Difference?

#63: Full of Frameworks: APDTFlow, NSGM, MLFlow, and more!

Vector Databases 101: A Beginner’s Guide to Vector Search and Indexing

AI Agent Developer: A Journey Through Code, Creativity, and Curiosity

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

A Data Scientist Is More Than Just a Data Scientist

Author(s): Shanmukh Dara

My thoughts on the best way to enter and advance in the field of data science…

1. Problem-solving thought process

2. Data collection and cleaning

3. Performance vs Business impact

4. Communicating with a non-technical audience

5. Miscellaneous

So what now?

So what next?

Thanks for Reading!

Why Data Science Teams Need Generalists, Not Specialists

Most businesses organize for efficient productivity. They do this through specialization. Workers that are highly…

Machine Learning is Kaggle Competitions – Machine Learning Mastery

Last Updated on September 5, 2016, Julia Evans wrote a post recently titled " Machine learning isn't Kaggle competitions…

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement