BERT HuggingFace Model Deployment using Kubernetes [ Github Repo] — 03/07/2024

Last Updated on July 4, 2024 by Editorial Team

Author(s): Vaibhawkhemka

Originally published on Towards AI.

Github Repo : https://github.com/vaibhawkhemka/ML-Umbrella/tree/main/MLops/Model_Deployment/Bert_Kubernetes_deployment

Motivation:

Model development is useless if you don’t deploy it to production, which comes with a lot of issues of scalability and portability.

I have deployed a basic BERT model from the huggingface transformer on Kubernetes with the help of docker, which will give a feel of how to deploy and manage pods on production.

Model Serving and Deployment:

ML Pipeline:

Workflow:

Model server (using FastAPI, uvicorn) for BERT uncased model →

Containerize model and inference scripts to create a docker image →

Kubernetes deployment for these model servers (for scalability) → Testing

Components:

Model server

Used BERT uncased model from hugging face for prediction of next word [MASK]. Inference is done using transformer-cli which uses fastapi and uvicorn to serve the model endpoints

Server streaming:

Testing: (fastapi docs)

http://localhost:8888/docs/

{ “output”: [ { “score”: 0.21721847355365753, “token”: 2204, “token_str”: “good”, “sequence”: “today is a good day” }, { “score”: 0.16623663902282715, “token”: 2047, “token_str”: “new”, “sequence”: “today is a new day” }, { “score”: 0.07342924177646637, “token”: 2307, “token_str”: “great”, “sequence”: “today is a great day” }, { “score”: 0.0656224861741066, “token”: 2502, “token_str”: “big”, “sequence”: “today is a big day” }, { “score”: 0.03518620505928993, “token”: 3376, “token_str”: “beautiful”, “sequence”: “today is a beautiful day” } ]

Containerization

Created a docker image from huggingface GPU base image and pushed to dockerhub after testing.

Testing on docker container:

You can directly pull the image vaibhaw06/bert-kubernetes:latest

K8s deployment

Used minikube and kubectl commands to create a single pod container for serving the model by configuring deployment and service config

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
 name: bert-deployment
 labels:
 app: bertapp
spec:
 replicas: 1
 selector:
 matchLabels:
 app: bertapp
 template:
 metadata:
 labels:
 app: bertapp
 spec:
 containers:
 - name: bertapp
 image: vaibhaw06/bert-kubernetes
 ports:
 - containerPort: 8080
 
---
apiVersion: v1
kind: Service
metadata:
 name: bert-service
spec:
 type: NodePort
 selector:
 app: bertapp
 ports:
 - protocol: TCP
 port: 8080
 targetPort: 8080
 nodePort: 30100

Setting up minikube and running pods using kubectl and deployment.yaml

minikube start
kubectl apply -f deployment.yaml

Final Testing:

kubectl get all

It took around 15 mins to pull and create container pods.

kubectl image list

kubectl get svc

minikube service bert-service

After running the last command minikube service bert-service, you can verify the resulting deployment on the web endpoint.

Find the GitHub Link: https://github.com/vaibhawkhemka/ML-Umbrella/tree/main/MLops/Model_Deployment/Bert_Kubernetes_deployment

If you have any questions, ping me on my LinkedIn: https://www.linkedin.com/in/vaibhaw-khemka-a92156176/

Follow ML Umbrella for more such detailed, actionable projects.

Future Extension:

Scaling with pod replicas and load balancer –

Self-healing

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

BERT HuggingFace Model Deployment using Kubernetes [ Github Repo] — 03/07/2024

Author(s): Vaibhawkhemka

Motivation:

Model Serving and Deployment:

ML Pipeline:

Components:

Model server

Containerization

K8s deployment

Final Testing:

Future Extension:

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

BERT HuggingFace Model Deployment using Kubernetes [ Github Repo] — 03/07/2024

Author(s): Vaibhawkhemka

Motivation:

Model Serving and Deployment:

ML Pipeline:

Components:

Model server

Containerization

K8s deployment

Final Testing:

Future Extension:

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement