Deployment & Serving: Exploring 6 Key MLOps Questions using AWS SageMaker

Last Updated on November 5, 2023 by Editorial Team

Author(s): Anirudh Mehta

Originally published on Towards AI.

Deployment & Serving: Exploring 6 Key MLOps Questions using AWS SageMaker

This article is part of the AWS SageMaker series for exploration of ’31 Questions that Shape Fortune 500 ML Strategy’.

What?

The previous blog posts, “Data Acquisition & Exploration”, “Data Transformation and Feature Engineering” and “Experiments, Model Training & Evaluation”, explored how AWS SageMaker’s capabilities can help data scientists collaborate and accelerate the journey from data exploration to model creation.

This blog post will focus on key questions related to deploying and serving the model and explore how AWS SageMaker can help address them.

▢ [Automation] How do you ensure that the deployed models can scale with increasing workloads?
▢ [Automation] How are the new versions rolled out and the process to compare them against the running version? (A/B testing, canary, shadow, etc.)
▢ [Automation] Are there mechanisms to roll back or revert deployments if issues arise?
▢ [Collaboration] How can multiple data scientists understand the impact of their version before releasing it? (A/B testing, canary, shadow, etc.)
▢ [Reproducibility] How do you package your ML models for serving in the cloud or at the edge?
▢ [Governance & Compliance] How do you track the predicted decisions for auditability and accountability?

Use Case & Dataset

We will reuse the Fraud Detection use-case, dataset generator, and generated customer & transaction dataset.

Host with SageMaker Hosting

[U+2713] [Automation] How are the new model versions rolled out / deployed?

In the previous article, we trained a fraud detection model, deployed it, and validated it using our test dataset. We deployed the models in an always-online, real-time inference mode, accessible via REST APIs. However, addressing enterprise challenges often requires a more tailored approach to meet specific use case requirements:

Certain problem types may require low-latency, high-throughput output.
Unpredictable, intermittent and no traffic workloads may exists for some problem types.
Cost sensitivity may be a factor in decision making, especially when managing multiple models with small resource requirements.
Some use cases may involve offline bulk data processing.
In certain situations, there may be a need to handle large payloads asynchronously.
The problem type may have fluctuating workloads and require auto-scaling capabilities based on demand.

SageMaker offers various options for deploying models for inference. These options are designed to meet the diverse needs that an enterprise may encounter when deploying and managing machine learning models.

Creating an endpoint:

To deploy and serve your model using SageMaker, you need to create an endpoint. An endpoint is an operational runtime environment that serves predictions generated based on the Endpoint Configuration.

The Endpoint Configuration serves as a blueprint, defining the necessary settings such as model selection, instance types, quantity, and various runtime configurations for SageMaker endpoints.

If you need to update an endpoint, you can create or clone the endpoint configuration, make changes, and update the endpoint with the new configuration.

Steps:

Configure endpoint type — provisioned, serverless, or asynchronous

2. Attach a model to the endpoint configuration

3. Customize the model setting — instance type, count, weight and variant name

4. Lastly, save the endpoint.

Testing an endpoint:

We covered in detail how to test the endpoint using both CLI and UI in the previous article titled “Experiments, Model Training & Evaluation”. Please visit that article for more details.

Picking a Endpoint Type:

As mentioned previously, depending on your specific use case, you may need to choose from the different endpoint types offered by SageMaker:

Real-time inference:

Online, real-time inference through REST APIs.
Ideal for use cases that demand low latency or high throughput.
Supports payload sizes of up to 6 MB and processing times of 60 seconds.

Serverless Inference:

Online, near-real-time inference through REST APIs.
Suitable for use cases with intermittent or unpredictable traffic patterns.
SageMaker handles instances and scaling policies.
Supports payload sizes of up to 4 MB and processing times of 60 seconds.

Batch Transform:

Offline batch processing.
Suitable for use cases with large datasets that are available upfront.
No persistent endpoint.

Asynchronous Inference:

Queue processing, through REST APIs
Suitable for use cases with long processing times or large payload that are not available upfront.
Supports payload sizes of up to 1 GB and processing times of 1 hour.

Package with SageMaker Neo

[U+2713] [Reproducibility] How do you package your ML models for serving in the cloud or at the edge?

During AutoML training, the model was provided as a tar package, which can be downloaded and executed locally using your framework’s container image. However, when deploying models on the cloud or edge, it is essential to optimize them for the specific platform and environment, such as Amazon Inferentia or Trainium instances.

To do this, you can utilize SageMaker Neo’s compilation jobs. In these jobs, you specify the model version from the model registry created earlier and indicate the target platform for the optimization process to run.

Promote with Endpoint Configurations

[U+2713] [Automation] [cont..] How to compare rolled out model against the running version?

As mentioned earlier, to update an endpoint in SageMaker, you can clone the current endpoint configuration, make modifications, and then update the endpoint using the new configuration.

Publish a new version

For instance, if you want to introduce a new version, you can clone the existing configuration, add the new variant, and update the existing endpoint to use the new configuration.

You have the ability to configure the weights and capacity of the new variant to control the traffic distribution between the new and old models. This allows you to implement changes in SageMaker without any application downtime.

Migrate traffic (A/B Testing / Canary / Blue Green / Rolling Deployment)

Once you have published an endpoint configuration with multiple variants, you will be able to update the weights assigned to each variant. This allows you to selectively migrate traffic to your new model version. From UI, you have the option to migrate all traffic at once, or manually manage it in a fixed step or canary deployment fashion.

In the example below, I have adjusted the variant weights to a 3:1 ratio, which means that 25% of new requests will be routed to the new version of the model.

To automate, you can use the update-endpoint CLI and API.

# Migrate 20% traffic every 300 seconds
aws update-endpoint
--endpoint-name automl-deploy # Endpoint Name 
--endpoint-config-name a-b-shadow. # Endpoint Configuration
--deployment-config '
 "BlueGreenUpdatePolicy": {
 "TrafficRoutingConfiguration": {
 "Type": "LINEAR",
 "LinearStepSize": {
 "Type": "CAPACITY_PERCENT",
 "Value": 20
 },
 "WaitIntervalInSeconds": 300
 },
 "TerminationWaitInSeconds": 600, 
 "MaximumExecutionTimeoutInSeconds": 1800
 }
'

[U+2713] [Collaboration] How can multiple data scientists understand the impact of their version before releasing it?

As seen above, you can effectively manage the traffic directed to a variant by controlling the weights. However, there are situations where data scientists want to evaluate the impact of a model without directly exposing it to users. In such cases, Amazon SageMaker offers “shadow testing.”

During shadow testing, SageMaker deploys a variant alongside the existing production model. Instead of directing actual user traffic to it, SageMaker sends a replica of real-world traffic to this new model. This allows data scientists to observe the behavior and performance of the version without affecting users.

Additionally, similar to production variants, you can have multiple shadow variants and fine-tune traffic directed to them using weight settings.

[U+2713] [Automation] Are there mechanisms to roll back or revert deployments if issues arise?

The update-endpoint CLI / API methods provide the ability to configure rollback settings. You can set up alarms based on metrics for your endpoint. If the specified alarm condition is triggered, SageMaker will automatically initiate a rollback to a previous configuration.

# Migrate 20% traffic every 300 seconds
aws update-endpoint
--endpoint-name automl-deploy # Endpoint Name 
--endpoint-config-name a-b-shadow. # Endpoint Configuration
--deployment-config '
 "BlueGreenUpdatePolicy": {
 # Removed for brevity
 },
 "AutoRollbackConfiguration": {
 "Alarms": [
 {
 "AlarmName": "invocation_errors" # Alarm 
 }
 ]
 }
'

Size with Inference Recommender

[U+2713] [Automation] How do you ensure that the deployed models can scale with increasing workloads?

Autoscaling

Out of the box, SageMaker enables to set up autoscaling for instances that are not of the t2 type, using an alarm. In this example, we are configuring autoscaling to scale up if the average number of requests per minute to an instance exceeds 100.

Inference Recommender

Autoscaling lets you add more instances when there’s increased workload. However, it’s important to determine the ideal size for each instance. This is where the SageMaker Inference Recommender comes into play.

It assists in instance selection by offering recommendation jobs that help you choose the most suitable instance type and configuration (such as instance count, container parameters, and model optimizations) or serverless configuration (such as max concurrency and memory size).

Source: https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender-instance-recommendation.html

Record with SageMaker Inference

[U+2713][Governance & Compliance] How do you track the predicted decisions for auditability and accountability?

The SageMaker Endpoint also enables the recording of both the prediction request and response. This facilitates debugging, monitoring, auditing, and when combined with SageMaker Clarify, it can be utilized for model explainability and drift detection. We will explore this topic further in the upcoming article on “Monitoring & Continuous Improvement”.

U+26A0️ Clean-up

If you have been following along with the hands-on exercises, make sure to clean up to avoid charges.

Delete Inference Endpoint
To avoid incurring costs, make sure to delete the endpoint and any other processing jobs.

Summary

In summary, SageMaker enables data scientists to easily deploy and serve models to generate predictions. It allows them to monitor the impact of the new versions and respond using both manual and automated deployment techniques.

In the upcoming article, I will explore how AWS SageMaker can assist with model monitoring and automated retraining.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Deployment & Serving: Exploring 6 Key MLOps Questions using AWS SageMaker

Author(s): Anirudh Mehta

Deployment & Serving: Exploring 6 Key MLOps Questions using AWS SageMaker

What?

Use Case & Dataset

Host with SageMaker Hosting

Creating an endpoint:

Testing an endpoint:

Picking a Endpoint Type:

Package with SageMaker Neo

Promote with Endpoint Configurations

Publish a new version

Migrate traffic (A/B Testing / Canary / Blue Green / Rolling Deployment)

Size with Inference Recommender

Autoscaling

Inference Recommender

Record with SageMaker Inference

U+26A0️ Clean-up

Summary

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Understandability of Deep Learning Models

AI for Everyone: The Biggest AI Myths People Still Believe

How We Taught Machines to Think

#62 Will AI Take Your Job?

NN#6 — Neural Networks Decoded: Concepts Over Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Deployment & Serving: Exploring 6 Key MLOps Questions using AWS SageMaker

Author(s): Anirudh Mehta

Deployment & Serving: Exploring 6 Key MLOps Questions using AWS SageMaker

What?

Use Case & Dataset

Host with SageMaker Hosting

Creating an endpoint:

Testing an endpoint:

Picking a Endpoint Type:

Package with SageMaker Neo

Promote with Endpoint Configurations

Publish a new version

Migrate traffic (A/B Testing / Canary / Blue Green / Rolling Deployment)

Size with Inference Recommender

Autoscaling

Inference Recommender

Record with SageMaker Inference

U+26A0️ Clean-up

Summary

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement