Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

BERT HuggingFace Model Deployment using Kubernetes [ Github Repo] — 03/07/2024
Latest   Machine Learning

BERT HuggingFace Model Deployment using Kubernetes [ Github Repo] — 03/07/2024

Last Updated on July 4, 2024 by Editorial Team

Author(s): Vaibhawkhemka

Originally published on Towards AI.

Source: Image by Author

Github Repo : https://github.com/vaibhawkhemka/ML-Umbrella/tree/main/MLops/Model_Deployment/Bert_Kubernetes_deployment

Source: Image by Author

Motivation:

Model development is useless if you don’t deploy it to production, which comes with a lot of issues of scalability and portability.

I have deployed a basic BERT model from the huggingface transformer on Kubernetes with the help of docker, which will give a feel of how to deploy and manage pods on production.

Model Serving and Deployment:

ML Pipeline:

Workflow:

Model server (using FastAPI, uvicorn) for BERT uncased model →

Containerize model and inference scripts to create a docker image →

Kubernetes deployment for these model servers (for scalability) → Testing

Components:

Model server

Used BERT uncased model from hugging face for prediction of next word [MASK]. Inference is done using transformer-cli which uses fastapi and uvicorn to serve the model endpoints

Source: Image by Author

Server streaming:

Source: Image by Author

Testing: (fastapi docs)

http://localhost:8888/docs/

Source: Image by Author
Source: Image by Author

{ “output”: [ { “score”: 0.21721847355365753, “token”: 2204, “token_str”: “good”, “sequence”: “today is a good day” }, { “score”: 0.16623663902282715, “token”: 2047, “token_str”: “new”, “sequence”: “today is a new day” }, { “score”: 0.07342924177646637, “token”: 2307, “token_str”: “great”, “sequence”: “today is a great day” }, { “score”: 0.0656224861741066, “token”: 2502, “token_str”: “big”, “sequence”: “today is a big day” }, { “score”: 0.03518620505928993, “token”: 3376, “token_str”: “beautiful”, “sequence”: “today is a beautiful day” } ]

Containerization

Created a docker image from huggingface GPU base image and pushed to dockerhub after testing.

Source: Image by Author

Testing on docker container:

Source: Image by Author

You can directly pull the image vaibhaw06/bert-kubernetes:latest

Source: Image by Author

K8s deployment

Used minikube and kubectl commands to create a single pod container for serving the model by configuring deployment and service config

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: bert-deployment
labels:
app: bertapp
spec:
replicas: 1
selector:
matchLabels:
app: bertapp
template:
metadata:
labels:
app: bertapp
spec:
containers:
- name: bertapp
image: vaibhaw06/bert-kubernetes
ports:
- containerPort: 8080

---
apiVersion: v1
kind: Service
metadata:
name: bert-service
spec:
type: NodePort
selector:
app: bertapp
ports:
- protocol: TCP
port: 8080
targetPort: 8080
nodePort: 30100

Setting up minikube and running pods using kubectl and deployment.yaml

minikube start
kubectl apply -f deployment.yaml

Final Testing:

kubectl get all
Source: Image by Author

It took around 15 mins to pull and create container pods.

kubectl image list
Source: Image by Author
kubectl get svc
Source: Image by Author
minikube service bert-service
Source: Image by Author
Source: Image by Author

After running the last command minikube service bert-service, you can verify the resulting deployment on the web endpoint.

Find the GitHub Link: https://github.com/vaibhawkhemka/ML-Umbrella/tree/main/MLops/Model_Deployment/Bert_Kubernetes_deployment

If you have any questions, ping me on my LinkedIn: https://www.linkedin.com/in/vaibhaw-khemka-a92156176/

Follow ML Umbrella for more such detailed, actionable projects.

Future Extension:

Scaling with pod replicas and load balancer –

Self-healing

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓