Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Docker — Containerization for Data Scientists
Latest   Machine Learning

Docker — Containerization for Data Scientists

Last Updated on July 24, 2023 by Editorial Team

Author(s): Dhilip Subramanian

Originally published on Towards AI.

Image by Markus Distelrath from Pixabay

Data Science

A simple explanation to containerization with Docker

Data scientists come from different backgrounds. In today’s agile environment, it is highly essential to respond quickly to customer needs and deliver value. Faster value provides more wins for the customer and hence more wins for the organization.

Information Technology is always under immense pressure to increase agility and speed up delivery of new functionality to the business. A particular point of pressure is the deployment of new or enhanced application code at the frequency and immediacy demanded by typical digital transformation. Under the covers, this problem is not simple, and it is compounded by infrastructure challenges. Challenges like how long it takes to provide a platform for the development team or how difficult it is to build a test system that emulates the production environment adequately (ref: IBM). Docker and Containers exploded onto the scene in 2013, and it has shaped the software development and is causing a structural change in the cloud computing world.

It is essential for data scientists to be self-sufficient and participate in continuous deployment activities. Building an effective model requires multiple iterations of deployment. It is highly important to have the ability to make small changes and deploy and test frequently. Based on the queries I received over recent times, I wanted to write this blog to help people understand what Docker and Containers are and how they promote continuous deployment and help the business.

In this blog, I am writing about Docker and covering the following.

  1. When do we need Docker?
  2. Where does Docker operate in Data Science?
  3. What is Docker?
  4. How does Docker work?
  5. Advantages of using Docker

Why do we need Docker?

This happens many times in our work; whenever you develop a model, code, or build an application, it always works on your laptop. However, it gives certain issues when we try to run the same model or application in the production or testing environment. This happened because of the different computing environment between a developer platform or production platform. For example, you could have used Windows OS or any upgraded software, and in production, they might have used Linux OS or a different software version.

In the real world, both the developer’s system and production environment should be consistent. However, it is very difficult to achieve as each person has their own preferences and cannot be forced to use them uniformly. This is where Docker comes into the picture and solves this problem.

Where does Docker operate in Data Science?

In the Data Science or Software development life cycle, Docker comes into the deployment stage.

Docker makes the deployment process very easy and efficient. It also solves any issues related to deploying the applications.

What is Docker?

Ref: ibexa.co

Docker is the world’s leading software container platform. Let’s take our real example, as we know, data science is a team project and needs to be coordinated with other areas like Client-side (Front end development), Backend (Server), Database, another environment/library dependencies for running the model. The model will not be deployed alone, and it will be deployed along with other software applications to get a final product.

From the above picture, we can see the technology stack which has different components and platform which has a different environment. We need to make sure that each component in the technology stack should be compatible with every possible hardware (platform). In reality, it becomes complex to work with all the platforms due to the different computing environments of each component. This is the main problem in the industry, and we know that Docker can solve this problem. But how?

Let’s take one more practical use case from the Shipping industry.

Everybody knows that ships can take all types of goods to different countries. Have you ever noticed that the products shipped are different in sizes? Each ship carries all types of products. However, there are no separate ships for each product. We can see from the above picture there is a car, food items, truck, steel plates, compressors, furniture. All these products are different in nature, sizes, packaging, etc. Some of the items are fragile, some need different packaging like food, furniture, etc., also how it is going to ship, etc. It is a complex problem, and the shipping industry solved these using Containers. Whatever the items to be, the only thing we need to do is packaging the items and kept inside the container. Containers help the shipping industry to export the goods easily, safely, and efficiently.

Now let’s take our problem. We have a similar kind of problem. Instead of items, we have different components (technology stack), and the solution is using Containers with the help of Docker.

Docker is a tool which helps to create, deploy, and run applications by using containers in a simpler way.

The container helps the data scientist or developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and deploy it as one package.

In simpler terms, a developer and data scientist will package all the software, models, and components into a box called Container, and Docker will take care of shipping this container into different platforms. You see, the developer and data scientist clearly focus on the code, model, software, and its dependencies and put it into the container. They don’t need to worry about deployment into the platform which Docker can take care of. Machine learning algorithms have several dependencies, and Docker helps in downloading and building the same automatically.

How does Docker work?

Developer or Data Scientist will define all the requirements (software, model, dependencies, etc.) in a file called Docker file. In other terms, a list of steps used to create a Docker image.

Docker Image — It’s just like a food recipe with all ingredients and procedures to make a dish. In simple terms, it is a blueprint that contains all the software applications, dependencies required to run that application on Docker.

Docker Hub — Official online repository where we can save and find all the Docker images. We can keep only one Docker image in the Docker hub for a free version and need to subscribe to save more images. Please refer here

When running a Docker image, we can get Docker containers. Docker containers are the runtime instances of a Docker image, and these images can be stored in an online cloud repository called Docker hub, or you can store in your own repository or any version control. Now, these images can be pulled to create a Docker container in any environment (test or production or any environment). Then all our applications run inside the container for both the test and production environment. Now both our test and production environment are the same as because they are running in the same Docker container.

Advantages of using Docker

1. Build an application only once

In Docker, we can build the application only once for any environment. Not required to build separate applications for a different environment. It saves time.

2. Portability

After we tested our containerized application, we can deploy the same to any other system where Docker is running, and it will run exactly as it did when we tested it.

3. Version Control

We can do version control in Docker. Docker has inbuilt version control and can commit changes to our Docker image and version control them.

4. Independent

Every application works inside its own container, and it won’t disturb any other applications. This is one of the great advantages as it won’t create any issues with the applications. It gives peace of mind to the people.

With Docker, we can package all the software and its dependencies in the container. And Docker will make sure that all this deployed on every possible platform, and everything works fine on every system. Hence, Docker makes the deployment easy and faster.

I will write about Docker commands, how to dockerize the ML model in my next blog.

Thanks for reading. Keep learning and stay tuned for more!

You can also read this article on KDnuggets.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓