Data Scientists in the Age of AI Agents and AutoML
Last Updated on January 22, 2025 by Editorial Team
Author(s): Edoardo De Nigris
Originally published on Towards AI.
Uncomfortable reality: In the era of large language models (LLMs) and AutoML, traditional skills like Python scripting, SQL, and building predictive models are no longer enough for data scientist to remain competitive in the market.
Are we cooked? It depends. In this article i will give my 2 cents on what I think itβs useful to focus on to be a strong candidate from 2025 onward.
Coding skills remain important, but the real value of data scientists today is shifting. Itβs less about just building models and more about how those models fit into scalable, business-critical systems β usually in the cloud.
The role of a data scientist is changing so fast that often schools canβt keep up. Universities still mostly focus on things like EDA, data cleaning, and building/fine-tune models. These are important, but theyβre just a small part of what companies actually need now. Why? Because the job isnβt just about coding in notebooks anymore β itβs about building end-to-end solutions that actually work in the real world.
Why?
- We reached a point where have tons of pre-trained models, often thereβs no need to re-invent everything from scratch, we can just work at a higher level of abstraction
- AI agents are becoming a thing
- AutoML and other low-code platforms are making coding skills less critical
In this scenario I believe a data scientist has to differentiate him/herself and is required to master the entire lifecycle of the data: from building data pipelines, building and optimizing model training, mastering containers/orchestrators, deployment and beyond. Simply put, focusing solely on data analysis, coding or modeling will no longer cuts it for most corporate jobs.
What to do then? My personal opinion: itβs more important than ever to be an βend-to-end data scientistβ.
Yes I know, the bar is getting higher, the era of scripting and modeling in Jupyter notebooks alone is over.
Data roles will be less focused on coding and more on having a general understanding of the whole data infrastructure and the business. As an analogy think of it like running a restaurant. The data scientist is the chefβββtheyβre in charge of the big, high-impact decisions, like creating the menu, choosing the ingredients, and designing the vibe of the place. Meanwhile, AI agents (or autoML) are like the kitchen assistants, waiters, and cashiersβββthey handle the repetitive, routine coding tasks to keep everything running smoothly. The chefβs job is to focus on the creative and strategic work that makes the restaurant stand out, while the AI takes care of the rest.
In this regard, I believe the future of data science belongs to those:
- who can connect the dots and deliver results across the entire data lifecycle.
- Have strong business acumen and deliver solution that are either widely used or that drives revenues / cut costs.
Letβs dig into it. I think a competitive data professional in 2025 must possess a comprehensive understanding of the entire data lifecycle without necessarily needing to be super good at coding per se.
These are instead some of the skills that I would strongly master:
- Theoretical foundation: A strong grasp of concepts like exploratory data analysis (EDA), data preprocessing, and training/finetuning/testing practices, ML models remains essential. You have to understand data, how to extract value from them and how to monitor model performances.
- Programming expertise: A medium/high proficiency in Python and SQL is enough. These two languages cover most data science workflows. Additionally, languages like DAX can be helpful for specific use cases involving data models and dashboards. Emphasis not much on producing code, but rather to understanding and customizing it.
- Model deployment: The ability to build applications that operationalize models, such as Flask or Django apps, is increasingly vital. Thus a basic understanding of html to create simple frontends, as well as of hosting applications in cloud services like Google Cloud Run or Heroku. This creates a massive advantage when you want to quickly create an MVP that stakeholders can work with immediately.
- Containerization and orchestration: Familiarity with Docker, Containers, Airflow/Kubeflow and Kubernetes ensures to be able to provide consistency and scalability across different environments.
- Cloud platforms: Expertise in at least one major cloud provider (e.g., AWS, Google Cloud, or Azure) is essential. For example in the Google Cloud ecosystem, understanding how different tools interact with each other: BigQuery, Cloud Storage, Cloud Build, Cloud Run, Vertex AI, Container Registry, and Composer like AirFlow or Kubeflow are increasingly indispensable.
- CI/CD practices: Yes, you need to be also decent at software development. At least know the best practices of continuous integration and delivery (CI/CD) processes β using GitHub for version control, YAML files for build automation etc.
- Post-deployment monitoring and maintenance: Managing deployed models includes monitoring for data drift, model performance issues, and operational errors, as well as performing A/B testing on your different models. Tools like Google Cloud Monitoring, logging frameworks, and artifact management systems are essential for maintaining reliability and transparency.
- Understanding Data Model and Feature Stores: The biggest lie that has been told to students and young practitioners is that datasets and features are already there to be analyzed. In reality you spend most of the time actually building them from scratch, in a way that is re-usable in the future and/or by other teams in your company.
And also, the most underrated skill: business acumen
- Knowing how to communicate to non-technical people is one of the most valuable skill. You must be able to explain complex thing easily without dumbing them down.
- Business understanding of the data you are working with is what drives ultimate value and it is hard to be replaced by AI.
- Project management skills in understanding how quickly to iterate on data projects, from an MVP to a Final product.
- Ability to evaluate costs for projects coming 3rd party consulting companies
This holistic approach aligns closely with the principles of MLOps (Machine Learning Operations), a practice that combines machine learning with software engineering and DevOps to ensure scalable, maintainable, and efficient workflows.
While some might argue that data scientists focus primarily on models in Jupyter notebooks, data engineers manage tables and data pipelines, cloud architects handle infrastructure, and machine learning engineers specialize in building and optimizing pipelines, these roles are increasingly overlapping. In my opinion, the boundaries between them will continue to blur as businesses prioritize end-to-end solutions and cross-functional expertise.
Thank you for your time, I am curious to know your opinions in the comment!
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI