Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Data Scientists in the Age of AI Agents and AutoML
Data Science   Latest   Machine Learning

Data Scientists in the Age of AI Agents and AutoML

Last Updated on January 22, 2025 by Editorial Team

Author(s): Edoardo De Nigris

Originally published on Towards AI.

Uncomfortable reality: In the era of large language models (LLMs) and AutoML, traditional skills like Python scripting, SQL, and building predictive models are no longer enough for data scientist to remain competitive in the market.

Generated with DALL-E 3

Are we cooked? It depends. In this article i will give my 2 cents on what I think it’s useful to focus on to be a strong candidate from 2025 onward.

Coding skills remain important, but the real value of data scientists today is shifting. It’s less about just building models and more about how those models fit into scalable, business-critical systems β€” usually in the cloud.

The role of a data scientist is changing so fast that often schools can’t keep up. Universities still mostly focus on things like EDA, data cleaning, and building/fine-tune models. These are important, but they’re just a small part of what companies actually need now. Why? Because the job isn’t just about coding in notebooks anymore β€” it’s about building end-to-end solutions that actually work in the real world.

Why?

  1. We reached a point where have tons of pre-trained models, often there’s no need to re-invent everything from scratch, we can just work at a higher level of abstraction
  2. AI agents are becoming a thing
  3. AutoML and other low-code platforms are making coding skills less critical

In this scenario I believe a data scientist has to differentiate him/herself and is required to master the entire lifecycle of the data: from building data pipelines, building and optimizing model training, mastering containers/orchestrators, deployment and beyond. Simply put, focusing solely on data analysis, coding or modeling will no longer cuts it for most corporate jobs.

What to do then? My personal opinion: it’s more important than ever to be an β€œend-to-end data scientist”.

Yes I know, the bar is getting higher, the era of scripting and modeling in Jupyter notebooks alone is over.

Data roles will be less focused on coding and more on having a general understanding of the whole data infrastructure and the business. As an analogy think of it like running a restaurant. The data scientist is the chefβ€Šβ€”β€Šthey’re in charge of the big, high-impact decisions, like creating the menu, choosing the ingredients, and designing the vibe of the place. Meanwhile, AI agents (or autoML) are like the kitchen assistants, waiters, and cashiersβ€Šβ€”β€Šthey handle the repetitive, routine coding tasks to keep everything running smoothly. The chef’s job is to focus on the creative and strategic work that makes the restaurant stand out, while the AI takes care of the rest.

In this regard, I believe the future of data science belongs to those:

  • who can connect the dots and deliver results across the entire data lifecycle.
  • Have strong business acumen and deliver solution that are either widely used or that drives revenues / cut costs.

Let’s dig into it. I think a competitive data professional in 2025 must possess a comprehensive understanding of the entire data lifecycle without necessarily needing to be super good at coding per se.

These are instead some of the skills that I would strongly master:

  • Theoretical foundation: A strong grasp of concepts like exploratory data analysis (EDA), data preprocessing, and training/finetuning/testing practices, ML models remains essential. You have to understand data, how to extract value from them and how to monitor model performances.
  • Programming expertise: A medium/high proficiency in Python and SQL is enough. These two languages cover most data science workflows. Additionally, languages like DAX can be helpful for specific use cases involving data models and dashboards. Emphasis not much on producing code, but rather to understanding and customizing it.
  • Model deployment: The ability to build applications that operationalize models, such as Flask or Django apps, is increasingly vital. Thus a basic understanding of html to create simple frontends, as well as of hosting applications in cloud services like Google Cloud Run or Heroku. This creates a massive advantage when you want to quickly create an MVP that stakeholders can work with immediately.
  • Containerization and orchestration: Familiarity with Docker, Containers, Airflow/Kubeflow and Kubernetes ensures to be able to provide consistency and scalability across different environments.
  • Cloud platforms: Expertise in at least one major cloud provider (e.g., AWS, Google Cloud, or Azure) is essential. For example in the Google Cloud ecosystem, understanding how different tools interact with each other: BigQuery, Cloud Storage, Cloud Build, Cloud Run, Vertex AI, Container Registry, and Composer like AirFlow or Kubeflow are increasingly indispensable.
  • CI/CD practices: Yes, you need to be also decent at software development. At least know the best practices of continuous integration and delivery (CI/CD) processes β€” using GitHub for version control, YAML files for build automation etc.
  • Post-deployment monitoring and maintenance: Managing deployed models includes monitoring for data drift, model performance issues, and operational errors, as well as performing A/B testing on your different models. Tools like Google Cloud Monitoring, logging frameworks, and artifact management systems are essential for maintaining reliability and transparency.
  • Understanding Data Model and Feature Stores: The biggest lie that has been told to students and young practitioners is that datasets and features are already there to be analyzed. In reality you spend most of the time actually building them from scratch, in a way that is re-usable in the future and/or by other teams in your company.

And also, the most underrated skill: business acumen

  1. Knowing how to communicate to non-technical people is one of the most valuable skill. You must be able to explain complex thing easily without dumbing them down.
  2. Business understanding of the data you are working with is what drives ultimate value and it is hard to be replaced by AI.
  3. Project management skills in understanding how quickly to iterate on data projects, from an MVP to a Final product.
  4. Ability to evaluate costs for projects coming 3rd party consulting companies

This holistic approach aligns closely with the principles of MLOps (Machine Learning Operations), a practice that combines machine learning with software engineering and DevOps to ensure scalable, maintainable, and efficient workflows.

While some might argue that data scientists focus primarily on models in Jupyter notebooks, data engineers manage tables and data pipelines, cloud architects handle infrastructure, and machine learning engineers specialize in building and optimizing pipelines, these roles are increasingly overlapping. In my opinion, the boundaries between them will continue to blur as businesses prioritize end-to-end solutions and cross-functional expertise.

Thank you for your time, I am curious to know your opinions in the comment!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓