Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

From Notebook to Production
Latest   Machine Learning

From Notebook to Production

Last Updated on July 26, 2023 by Editorial Team

Author(s): Eija-Leena Koponen

Originally published on Towards AI.

DevOps

How to Bridge the Gap between Data Science and Engineering?

Image by author

Data Scientist was hyped to be the 21st century sexiest job. Now, in 2021, most of the companies have adopted some Data Science in their processes and core tasks, analyzing their customer base or optimizing and automating some of the manual processes. On the other hand, data scientists are still a very heterogeneous bunch of people, some having a background in Statistics, some in mechanical engineering, and some in Physics to name a few. Of course, there is increased output from actual Data Science Master’s programs to the workforce, but the majority of the folks are still self-taught (including me).

The bottleneck no longer seems to be the data but how to bring the exploratory projects into production. In other words, make Software Engineers and Data Scientists talk the same language to smooth the process.

Data Scientists are usually very good with fetching the data, wrangling it a bit, making visualizations, some initial models, and notebooks. But when it comes to the production phase, actually providing the model to end-users and integrating it to the (existing) tools, Data Scientist needs to pass the baton to Software engineers. Unfortunately, for a software engineer, these notebooks and messy scripts are often like shoddy cardboard models, and translating that to a real thing takes (way too much) time. So what would be the first steps to encourage everyone to take to bridge the gap?

Here are a few steps we recommend!

Image by author

Using Git

Version control is a system that records changes to a file or set of files over time so that you can see specific versions later. Services like Github, Bitbucket, and others make it easy to see what was done with multiple people working on a single code base. This is to avoid unintentional overwrites and other mistakes. You have version control for your documents on Google Docs, so why not have that for your code too?

In addition to version control (which is worth it even when working alone), your code will be stored on the cloud, accessible for all of your project members and not on someone’s laptop, ready to be forgotten. As a result, your teammates can be aware of what you are building and pick up where you left off.

There’s also a side effect β€” code tends to be less spaghetti-like when you know someone will read it.

Reproducible notebooks

Notebooks are the holy grail of Data Scientists; they are super handy for exploring your local machine! Nowadays, there are many online providers too, e.g. Kaggle kernels, Sagemaker, Google Colab to name a few.

Notebooks, however, give quite many liberties for Data Scientists. For example, you can run the code in any order you want β€” and trust me; people do β€” they include loads of checks and other β€œunnecessary” things for anyone else. Then there’s the setup hassle; which version of python, pandas, and other libraries did you have installed, and what are the other dependent libraries when trying to reproduce whatever the other DS just did. Thus they are far from production-ready.

Valohai has done a unique solution to this problem: Jupyhai. You can run a Valohai cloud execution from your local Jupyter notebook. It requires the whole notebook to be run simultaneously and from top to bottom, forcing data scientists to keep the code cleaner and in order. You don’t need Valohai to adopt this principle; every notebook will be easier to understand if written this way.

VoilΓ‘! You are one step close to the production code.

Modular work

When you need to scroll and scroll the notebook, it’s time to start thinking modularly. Splitting your work into steps forming a pipeline requires a little extra work now, but it will pay in the longer run.

You can run only one piece at a time, see where the possible errors and bugs are and the bottleneck of your project. Maybe even write some tests to ensure future changes to a single part don’t break the whole.

It also helps in staying organized and eases code maintainability. This way, it might also be easier to explain to a new person the flow of your work on a higher level and focus on one thing at a time.

Definition of done

Hopefully, many of your projects turn to products; a never-ending loop where there is always something to iterate on, make better, optimize, and thus the definition of done is very vague.

Here we can learn again from frameworks such as agile development. Of course, not all parts of it work for Data Science, but trying to plan sprints, prioritize each of them, clearly define tasks with deliverables and timelines, and use retrospectives and demos might make you feel that you finished something.

In addition, communication with developers (and business stakeholders) is more straightforward.

Last but not least: empathy

We are all different, all from various backgrounds, and diversity is a richness. But having diverse thoughts and ways of doing requires work. It is so much easier to communicate with a person, e.g. from a similar educational background on a project, but, well, then it is a project done with only one set of skills.

So, for all the Data Scientists reading this: developers aren’t trying to make your life miserable by introducing tools and frameworks working for them. For all the developers: Data Scientists aren’t producing incomprehensible spaghetti code on purpose; it’s usually a different way of thinking and lack of knowing better β€” this whole industry is where you were 10–15 years ago.

Trying to understand and learn from others, trying to see the other’s perspective, i.e. being empathetic, allows you to work on way more incredible things than you’d do on your own!

Recap

Giving the wild bunch of Data Scientists in your team frameworks and best practices that help them do their work, you promote making the work understandable and transferable to others. Tackle one workflow obstacle at a time and only after start tackling MLOps and tooling.

DS U+2764️ Dev

Originally published at https://valohai.com.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓