A Data Scientist Is More Than Just a Data Scientist
Last Updated on July 26, 2023 by Editorial Team
Author(s): Shanmukh Dara
Originally published on Towards AI.
My thoughts on the best way to enter and advance in the field of data scienceβ¦
Hello there, may I ask you a question? What are the top skills required for a data scientist to be successful? I can guarantee that the answer to this question differs from person to person and firm to firm. I must admit that there is no single objective answer to this question. But, as a data scientist, Iβve always wondered why. If we can create driverless cars and forecast the future, why canβt we answer this issue objectively? So let me explain why answering this question is difficult, as well as my thoughts on what skills a data scientist should have and the best way to develop them.
This blog is not intended to provide technical resources. Rather, my focus will be on changing your perspective and steering your journey to enter and grow in the data science domain.
So, to return to my original point, why is it so difficult to determine the abilities necessary for a data scientist? This, in my opinion, is attributable to three major factors:
- In recent years, the phrase βdata scientistβ has become diluted
- The companyβs culture
- Division of Labor
Anyone who has been following this domain for a few years would undoubtedly agree with me on the first two points. Indeed, the term βdata scientistβ has become diluted in recent years; now, a data scientist can play any role, ranging from business problem formulation to model deployment and monitoring. Second, the role of a data scientist can be influenced by the companyβs culture and data maturity. After working with a number of established firms and a few startups, Iβve discovered that working as a data scientist at a startup in its early stages necessitates more business acumen than technical skills.
Finally, and most importantly, there is the βDivision of Labor.β Adam Smith uses the vivid example of a pin factory assembly line in The Wealth of Nations to explain how the division of labor is the primary source of productivity gains. Data analysis tasks, like pin-making, necessitate numerous processes, which is why organizations typically hire specialists such as data engineers, experimentation scientists, machine learning experts, and so on. A product manager oversees the work and handles hand-offs between functions.
Because of this division of labor, many data scientists wind up doing a lot of data modeling, which fosters the impression that data scientists only need data-related skills and nothing else. Let me tell you that this is entirely incorrect, which is why I titled the blog βA data scientist is more than just a data scientist.β
This is why, rather than being a specialist in only one domain, a data scientistβs knowledge should take the form of a Ο shape, with good horizontal knowledge across the entire end-to-end process and in-depth knowledge in 1βto 2 particular domains. A data scientist must be more of a generalist than a specialist.
When you ask a couple of data scientists for advice on how to become an expert in this domain, the most common response is βKaggle competitions.β However, the harsh reality is that Kaggle competitions will not prepare you for the real world. Without a doubt, Kaggle is a good place for newcomers and those looking to build a profile. However, after the first learning phase, Kaggle fails to provide a sense of real-world challenges.
Is it hard to believe? I learned this the hard way when I began my practicum project at Kiva Organization as part of my MSBA coursework. Here are a few reasons why experiential learning is superior to Kaggle competitions.
1. Problem-solving thought process
Real-world projects are not like Kaggle competitions, Kaggle offers you a clear view of the problem, the data at hand, the solution required, and sometimes what needs to be done. So youβre left with almost nothing to brainstorm and think about.
However, in the real world, your problem statements are most often not precisely defined or are open-ended business problems. In most cases, the analysis begins with transforming the business problem into an analytics challenge and then experimenting with various analytics methodologies to address it.
As part of my practicum project at Kiva, we were given a very vague business problem; it took us approximately 3β4 weeks to fully understand Kivaβs business, then the business problem, and ultimately scope the business problem. Then, to ensure that we were on the right track, we explained our understanding and suggested numerous approaches to the problem. Thatβs when we came to an agreement on a problem and the approach. This provided us with a safe environment in which to brainstorm, generate innovative and creative ideas, do a fast sanity check, and, if necessary, kill the ideas.
2. Data collection and cleaning
Datasets are already available in Kaggle competitions, and they are frequently clean and well-structured. This limits your thinking; you grasp the problem and try different methods to discover which one works the best. In the real world, however, it is our responsibility as data scientists to comprehend the problem and find the key list of data attributes that would be useful from the massive amounts of data that exist in data warehouses. In some cases, the data is not readily available and must be gathered from a variety of sources using web scraping.
Before deciding on a set of features to use at Kiva, we had to understand all of the accessible data, its quality, and quantity. This required a great deal of trial and error. Furthermore, most of the data is not clean in general, so it is our responsibility to clean and fix the data before analyzing it.
3. Performance vs Business impact
In the classroom and at other competitions, our success metric is the modelβs performance, or how well it predicts unseen data. However, in most real-world scenarios, we are more concerned with the business impact than with performance. The business impact can range from increased bottom-line profit to increased sales to decreased expenditure.
This involves ensuring that you thoroughly grasp the clientβs (marketing team or product teamβs) problem and that the client is well aware of the problem we are trying to solve and is eager to use these predictions in the future.
4. Communicating with a non-technical audience
We are required to collaborate closely with marketing, product, sales, and engineering teams as data scientists. Most of these individuals are non-technical, which means you cannot communicate with them in the same manner that you would with a fellow data scientist. You must clearly comprehend what that team is interested in and communicate only the necessary information with minimal jargon.
5. Miscellaneous
Aside from everything mentioned above, here are a few other skills that the practicum project helped me improve.
- Storytelling β Another key talent that every data scientist requires is storytelling. Since the majority of the people we interact with on a daily basis are non-technical, this skill will come in handy when communicating your impact to others.
- Prioritization β On a daily basis, we are inundated with ad hoc analysis requests, and there may be a large number of projects in the pipeline. It is our responsibility to evaluate the impact and prioritize the projects.
So what now?
A good data scientist is much more than a data scientist. He is a detective who can spot problems, a very good storyteller, a magician who can solve problems using machine learning approaches, a good team member who can collaborate, and a good project manager who can prioritize tasks.
And, in order to become a good data scientist, you should focus on holistic growth rather than simply data tools, which can only be accomplished by working on end-to-end projects similar to my practicum project at Kiva.
So what next?
- Try working on end-to-end problems
- Start with an open-ended business problem, brainstorm, and scope it
- Scrape/Collect data if possible
- Think of success criteria
Thanks for Reading!
Do you agree/disagree with me? Let me know in the comments below.
Further reading
Why Data Science Teams Need Generalists, Not Specialists
Most businesses organize for efficient productivity. They do this through specialization. Workers that are highlyβ¦
hbr.org
Machine Learning is Kaggle Competitions – Machine Learning Mastery
Last Updated on September 5, 2016, Julia Evans wrote a post recently titled " Machine learning isn't Kaggle competitionsβ¦
machinelearningmastery.com
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI