Docker in MLOps For Starters
Last Updated on July 17, 2023 by Editorial Team
Author(s): Sawon
Originally published on Towards AI.
Motivation
I am writing this article to provide valuable information and guidance to individuals who are new to the field of MLOps and are looking to understand the concepts and practices related to Docker containerization in machine learning projects. In an era dominated by GPTs (Generative Pre-trained Transformers), the article aims to simplify and consolidate the necessary knowledge on this topic, making it accessible to beginners like me.
Whatβs Docker
Docker is an open-source platform that helps developers to automate the deployment and manage applications inside isolated containers. It is very lightweight compared to traditional Virtual Machines. It has end-to-end packaging by building a self-sufficient environment that includes all the libraries, packages, and tools that removes the dependency from the source side and facilitates seamless deployment of ML models across different platforms or cloud environments.
Why itβs important in MLOps
A Machine Learning life cycle is a long process where the same process is run multiple times to generate results and small changes can make a significant difference in the Model Performance. Using docker in this life cycle can improve the following things:
Collaboration: It immensely helps MLEs to work as a team. It reduces dependency by providing the packaging with all the necessary tools and libraries. For example, Developer A has built an environment and a model to detect fraudulent payments. Once the first version was built, Developer B wanted to go through the process and add some inputs. If the code is dockerized by A then B can easily use that and run the model without Aβs intervention and build a better model by fine-tuning the hyperparameters or by changing the algorithm.
Reproducibility: It reduces the confusion between the development and production teams by providing a consistent and easily reproducible environment. This also diminishes the discrepancies in software versioning, hardware configuration, etc., which ultimately helps to improve the model performance.
Scalability: Not only it helps to automate deployment, but it also simplifies the scaling for ML applications by providing a consistent deployment model. The containers can be easily replicated and distributed across cloud instances, allowing efficient utilization of resources and scaling of ML applications.
Docker Installation in WSL without Desktop Docker
To install Docker in Windows Subsystem for Linux (WSL) without the desktop Docker, you can follow these steps:
1. Enable WSL: Make sure you have WSL enabled on your Windows machine. You can enable it by following the instructions provided by Microsoft for your Windows version.
2. Install a Linux distribution: Choose and install a Linux distribution from the Microsoft Store. Ubuntu is a popular choice, but you can select any distribution that suits your requirements.
3. Set up WSL: Once the Linux distribution is installed, launch it from the Start menu or use the wsl command in the command prompt. Follow the initial setup instructions to create a username and password for your Linux environment.
4. Update the Linux distribution: Run the following commands in the Linux terminal to update the package manager and installed packages:
sudo apt update
sudo apt upgrade
5. Install Docker: Execute the following commands in the Linux terminal to install Docker:
sudo apt install docker.io -y
Here, the β-yβ option is used to grant permission to install required packages automatically
6. Verify Docker installation: To verify that Docker is installed correctly, run the following command:
docker βversion
You should see the Docker version information if the installation was successful.
7. (Optional) Allowing non-root access (recommended): By default, Docker commands require root privileges. If you want to run Docker commands without using sudo, you can add your user to the Docker group. Execute the following command:
sudo usermod -aG docker $USER
Remember to log out and log back in or restart your system for the group changes to take effect.
With Docker successfully installed in your WSL environment, you can now start using Docker to build, run, and manage containers from within your Linux distribution. Note that since you are not using the desktop Docker application, youβll interact with Docker exclusively through the command line in your WSL environment.
How to Start Docker Engine
Open a WSL terminal and run the docker engine using this command
sudo dockerd
If Docker failed to start because it found a PID (Process ID) file, which suggests that Docker might already be running or there was an issue with a previous Docker instance.
Debugging
sudo systemctl status docker
If Docker is running, you should see an output indicating its status.
Stop Docker service: If Docker is running, you need to stop it before starting it again. Execute the following command:
To resolve this issue, you can follow these steps:
sudo systemctl stop docker
Start Docker: After removing the PID file, you can start Docker again by executing the following command:
sudo dockerd
Docker should now start successfully without any errors.
Basic Commands in Docker
Open another terminal to run and learn the basic commands of Docker.
1. docker ps: List running containers.
2. docker images: List of all Images
3. docker run <image_name>:tag : Run the docker image with Ubuntu in the layer Image
4. docker pull<image_name>:tag : Pull the docker image from the registry
5. docker stop <container_id>: Stop a running container
6. docker rm <container_id>: Remove a container
7. docker rmi <image_name>:tag: Remove an image
8. docker build -t <image_name> .: Build an image based on current directory
9. docker-compose up: Start containers defined in a Compose file.
10. docker exec -it <container_id> bash: Run a command inside a running container.
NB: Press Ctrl+P and Ctrl+Q this is used when detaching from a container while keeping it running.
Simple Package code
I have built a simple codebase to do basic arithmetic operations. I have one folder or package called arithmetic inside it. I have written a script to do all the operations. Inside the root directory, I have the main, which uses these functions to do all the calculations.
Docker File Setup
After finishing the code, we need to set up a docker file which will be used to build the image. Here I am attaching the demo docker file for better understanding:
# Use an official Python runtime as the base image
FROM python:3.9
# Set the working directory inside the container
WORKDIR /app
# Copy the requirements file to the working directory
COPY requirements.txt .
# Install the Python dependencies
RUN pip install - no-cache-dir -r requirements.txt
# Copy the entire codebase to the working directory
COPY . .
# Set the entrypoint command to run the main.py file
CMD ["python", "main.py"]
Now the final directory structure will look like this:
βββ arithmetic
β βββ __init__.py
β βββ operations.py
βββ main.py
βββ requirements.txt
βββ Dockerfile
Docker Build
Build the Docker image using the following command:
docker build -t arithmetic-app .
Run the newly built docker image by running this command:
docker run arithmetic-app
Docker Account Setup
Go to the Dockerhub Website and create your own account to publish your local Docker Images.
The process to set up the repository:
Once the Docker repo is created, come back to the terminal to push the local image.
Log in to Docker Hub using the docker login command:
docker login
You will be prompted to enter your Docker Hub username and password. After entering the credentials, Docker will authenticate and log you into Docker Hub.
Tag the image with your Docker Hub repository details. myimage:latest and you want to push it to your Docker Hub repository named myusername/myrepository, use the following command to retag the image:
docker tag arithmetic-app:v2 sawon17/sandocker_2023:v.0
Docker push
Push the tagged image to Docker Hub using the docker push command:
docker push sawon17/sandocker_2023:v.0
This will upload the image to your Docker Hub repository.
This is the URL that can be shared among the team:
https://hub.docker.com/r/sawon17/sandocker_2023/tags
The format of the link is: https://hub.docker.com/r/<myusername>/<myrepository>
Testing
Open a terminal or command prompt. Use the docker pull command to pull the Docker image from Docker Hub:
docker pull sawon17/sandocker_2023:v.0
To run a container using the pulled image, we will use the docker run command:
docker run sawon17/sandocker_2023:v0
Summary
If you have made it this far, congratulations. We have built our first Docker Container. Letβs summarize the journey to understand the key points:
Docker proves to be an invaluable tool for streamlining the development and deployment process in machine learning projects. Through the article, we explored the significance of Docker in MLOps, also provided a step-by-step guide on Docker installation in WSL without the desktop Docker, demonstrated basic Docker commands, and showcased a code structure for arithmetic operations that can be containerized using Docker.
Embracing Docker containerization in machine learning projects offers numerous advantages, including enhanced productivity, simplified deployment, and accelerated innovation. By harnessing the power of Docker, ML practitioners can focus more on developing cutting-edge models and less on the intricacies of environment setup and deployment.
Resources
1. https://youtu.be/8vmKtS8W7IQ
2. https://linuxhint.com/run-docker-in-wsl-without-docker-desktop/
5. https://towardsdatascience.com/why-using-docker-for-machine-learning-74c927ceb6c4
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI