Data Scientists in the Age of AI Agents and AutoML

Last Updated on January 22, 2025 by Editorial Team

Author(s): Edoardo De Nigris

Originally published on Towards AI.

Uncomfortable reality: In the era of large language models (LLMs) and AutoML, traditional skills like Python scripting, SQL, and building predictive models are no longer enough for data scientist to remain competitive in the market.

Are we cooked? It depends. In this article i will give my 2 cents on what I think it’s useful to focus on to be a strong candidate from 2025 onward.

Coding skills remain important, but the real value of data scientists today is shifting. It’s less about just building models and more about how those models fit into scalable, business-critical systems — usually in the cloud.

The role of a data scientist is changing so fast that often schools can’t keep up. Universities still mostly focus on things like EDA, data cleaning, and building/fine-tune models. These are important, but they’re just a small part of what companies actually need now. Why? Because the job isn’t just about coding in notebooks anymore — it’s about building end-to-end solutions that actually work in the real world.

Why?

We reached a point where have tons of pre-trained models, often there’s no need to re-invent everything from scratch, we can just work at a higher level of abstraction
AI agents are becoming a thing
AutoML and other low-code platforms are making coding skills less critical

In this scenario I believe a data scientist has to differentiate him/herself and is required to master the entire lifecycle of the data: from building data pipelines, building and optimizing model training, mastering containers/orchestrators, deployment and beyond. Simply put, focusing solely on data analysis, coding or modeling will no longer cuts it for most corporate jobs.

What to do then? My personal opinion: it’s more important than ever to be an “end-to-end data scientist”.

Yes I know, the bar is getting higher, the era of scripting and modeling in Jupyter notebooks alone is over.

Data roles will be less focused on coding and more on having a general understanding of the whole data infrastructure and the business. As an analogy think of it like running a restaurant. The data scientist is the chef — they’re in charge of the big, high-impact decisions, like creating the menu, choosing the ingredients, and designing the vibe of the place. Meanwhile, AI agents (or autoML) are like the kitchen assistants, waiters, and cashiers — they handle the repetitive, routine coding tasks to keep everything running smoothly. The chef’s job is to focus on the creative and strategic work that makes the restaurant stand out, while the AI takes care of the rest.

In this regard, I believe the future of data science belongs to those:

who can connect the dots and deliver results across the entire data lifecycle.
Have strong business acumen and deliver solution that are either widely used or that drives revenues / cut costs.

Let’s dig into it. I think a competitive data professional in 2025 must possess a comprehensive understanding of the entire data lifecycle without necessarily needing to be super good at coding per se.

These are instead some of the skills that I would strongly master:

Theoretical foundation: A strong grasp of concepts like exploratory data analysis (EDA), data preprocessing, and training/finetuning/testing practices, ML models remains essential. You have to understand data, how to extract value from them and how to monitor model performances.
Programming expertise: A medium/high proficiency in Python and SQL is enough. These two languages cover most data science workflows. Additionally, languages like DAX can be helpful for specific use cases involving data models and dashboards. Emphasis not much on producing code, but rather to understanding and customizing it.
Model deployment: The ability to build applications that operationalize models, such as Flask or Django apps, is increasingly vital. Thus a basic understanding of html to create simple frontends, as well as of hosting applications in cloud services like Google Cloud Run or Heroku. This creates a massive advantage when you want to quickly create an MVP that stakeholders can work with immediately.
Containerization and orchestration: Familiarity with Docker, Containers, Airflow/Kubeflow and Kubernetes ensures to be able to provide consistency and scalability across different environments.
Cloud platforms: Expertise in at least one major cloud provider (e.g., AWS, Google Cloud, or Azure) is essential. For example in the Google Cloud ecosystem, understanding how different tools interact with each other: BigQuery, Cloud Storage, Cloud Build, Cloud Run, Vertex AI, Container Registry, and Composer like AirFlow or Kubeflow are increasingly indispensable.
CI/CD practices: Yes, you need to be also decent at software development. At least know the best practices of continuous integration and delivery (CI/CD) processes — using GitHub for version control, YAML files for build automation etc.
Post-deployment monitoring and maintenance: Managing deployed models includes monitoring for data drift, model performance issues, and operational errors, as well as performing A/B testing on your different models. Tools like Google Cloud Monitoring, logging frameworks, and artifact management systems are essential for maintaining reliability and transparency.
Understanding Data Model and Feature Stores: The biggest lie that has been told to students and young practitioners is that datasets and features are already there to be analyzed. In reality you spend most of the time actually building them from scratch, in a way that is re-usable in the future and/or by other teams in your company.

And also, the most underrated skill: business acumen

Knowing how to communicate to non-technical people is one of the most valuable skill. You must be able to explain complex thing easily without dumbing them down.
Business understanding of the data you are working with is what drives ultimate value and it is hard to be replaced by AI.
Project management skills in understanding how quickly to iterate on data projects, from an MVP to a Final product.
Ability to evaluate costs for projects coming 3rd party consulting companies

This holistic approach aligns closely with the principles of MLOps (Machine Learning Operations), a practice that combines machine learning with software engineering and DevOps to ensure scalable, maintainable, and efficient workflows.

While some might argue that data scientists focus primarily on models in Jupyter notebooks, data engineers manage tables and data pipelines, cloud architects handle infrastructure, and machine learning engineers specialize in building and optimizing pipelines, these roles are increasingly overlapping. In my opinion, the boundaries between them will continue to blur as businesses prioritize end-to-end solutions and cross-functional expertise.

Thank you for your time, I am curious to know your opinions in the comment!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Data Scientists in the Age of AI Agents and AutoML

Author(s): Edoardo De Nigris

Are we cooked? It depends. In this article i will give my 2 cents on what I think it’s useful to focus on to be a strong candidate from 2025 onward.

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Data Scientists in the Age of AI Agents and AutoML

Author(s): Edoardo De Nigris

Are we cooked? It depends. In this article i will give my 2 cents on what I think it’s useful to focus on to be a strong candidate from 2025 onward.

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥