Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Experiments, Model Training & Evaluation: Exploring 6 Key MLOps Questions using AWS SageMaker
Latest   Machine Learning

Experiments, Model Training & Evaluation: Exploring 6 Key MLOps Questions using AWS SageMaker

Last Updated on November 5, 2023 by Editorial Team

Author(s): Anirudh Mehta

Originally published on Towards AI.

Experiments, Model Training & Evaluation: Exploring 6 Key MLOps Questions using AWS SageMaker

This article is part of the AWS SageMaker series for exploration of ’31 Questions that Shape Fortune 500 ML Strategy’.

What?

The previous blog posts, “Data Acquisition & Exploration” and Data Transformation and Feature Engineering”, explored how AWS SageMaker’s capabilities can help data scientists collaborate and accelerate data exploration, understanding, transformation, and feature engineering.

This blog post will focus on key questions related to Experiments, Model Training, and evaluation and explore how AWS SageMaker can help address them.

[Automation] How can data scientists automatically partition the data for training, validation, and testing purposes?
[Automation] Does the existing platform helps to accelerate the evaluation of multiple standard algorithms and tune value of hyperparameters
[Collaboration] How can a data scientist share the experiment, configurations & trained models?
[Reproducibility] How can you ensure the reproducibility of the experiment outputs?
[Reproducibility] How do you track and manage different versions of trained models?
[Governance & Compliance] How do you track the model boundaries allowing you to explain the model decisions and detect bias?

Use Case & Dataset

We will reuse the Fraud Detection use-case, dataset generator, and generated customer & transaction dataset.

Partition dataset with SageMaker Wrangler

[U+2713] [Automation] How can data scientists automatically partition the data for training, validation, and testing purposes?

In previous articles, we explored how SageMaker can accelerate the processes of data understanding, transformation, and feature creation in model development.

Data quality and coverage play a key role in the outcomes of the model. While we want the model to have access to a larger dataset, we need to ensure that the model isn’t trained on the complete dataset.

Why?

  • Avoid overfitting: The model should understand patterns, not just memorize them.
  • Unbiased evaluation: Live data is always unseen
  • Tuning: Automate selection of optimal value of hyperparameters

What?

Thus, to ensure that your model generalizes well to unseen data, it’s essential to have distinct datasets:

  • Training data: Train the model’s parameters
  • Validation data: Tune the value of hyperparameters
  • Test data: Evaluate the final model’s performance.

When?

To prevent data leakage from the test or validation dataset to the training dataset, it’s crucial that any transformation that looks at the entire dataset, such as oversampling, should happen after the split and separately.

How?

SageMaker Wrangler offers inbuilt transformation for creating data splits: random, ordered, key, or stratified. Refer to Data Transformation and Feature Engineering for details on adding transformation.

U+26A0️ The Data Wrangler’s free tier provides only 25 hours of ml.m5.4xlarge instances per month for 2 months. Additionally, there are associated costs for reading and writing to S3.

Source: Image by the author.

Experiment with SageMaker AutoPilot

[U+2713] [Automation] Does the existing platform helps to accelerate the evaluation of multiple standard algorithms and tune value of hyperparameters

Great! So far, we have seen how SageMaker can easily address our concerns regarding data cleanup, transformation, and feature extraction. It also simplified the process of splitting data into training, validation, and test sets. Now, the crucial next step for a data scientist is to utilize this prepared data to start building a model. Let’s explore how SageMaker can accelerate this process.

In this section, we will focus on SageMaker AutoPilot, an AWS service that automates model training & evaluation.

U+1F4A1 AutoPilot is a feature-rich tool that can automatically handle various data transformations like handling missing data, which we previously had to deal with manually in other articles. Additionally, AutoPilot can natively split the data into training and validation groups.

Export test & validation data:

Before we start, let’s export our test dataset (and, optionally, the validation dataset) for future model assessment. To do this, add the “Export to S3” wrangler action to both the test and validation datasets and then run the wrangler job. (Why not training data?)

U+26A0️ You may need to raise a quota increase request for “ml.m5.4xlarge” instance for processing.

Source: Image by the author.
Source: Image by the author.

Export training data:

An avid reader may wonder why we didn’t export the training dataset earlier. Well, you can! However, we skipped that step because the training output is automatically exported when we choose the “Train” option in SageMaker Studio.

Source: Image by the author.

After completing the previous two steps, you should now have three folders in S3: one for testing, one for validation, and one for training. Each folder will include the dataset partitioned into multiple parts, indexed by the manifest file.

Source: Image by the author.

Configure AutoPilot Experiment

Now that the data has been exported, we can start training the model. Clicking “Export & Train” will redirect you to the AutoPilot wizard. The wizard offers various sections and options to refine the process.

1. Experiment & Data Details: Here, you can specify the training and validation datasets, as well as the output location where the experiment outputs — such as models and reports — will be stored. For this example, I have set the training and validation to the exported manifest file and parts.

Source: Image by the author.

2. Target & Features: Select the target and the features that influence the target.

Source: Image by the author.

3. Training Method & Algorithm: Select Auto for SageMaker AutoPilot to automatically evaluate multiple algorithms and select the best model.

4. Deployment & Advanced Settings: Deploy the best model for serving. Optionally, tune the experiment settings, including restricting the per-trial or overall job time, to control costs.

Source: Image by the author.

Train using AutoPilot

After reviewing the configuration, you can create an experiment. This will trigger multiple trials, which can be found under AutoML > Your Experiment. Once the experiment finishes, you will see the best model.

The job output also includes notebooks for data exploration and model candidate generation for further manual fine-tuning.

Source: Image by the author.

Test the best model

Now that we have the best model, let’s put it to the test using the test data we exported earlier.

If you didn’t select ‘Auto Deploy’ during the AutoML wizard flow, you can manually deploy the model from the experiment UI. We will discuss deployment in more detail in the next article on “Deployment & Serving”.

Source: Image by the author.

U+26A0️ For batch mode, you may need to raise a quota increase request for instance, for transform job usage.

For now, let’s deploy it in real-time mode and send a couple of request data to observe the response. You can use either the UI or CLI to query the endpoint.

Source: Image by the author.

To use the CLI, open the cloud shell and execute the following command:

# Let's send an invalid data to understand the expected request payload 
aws sagemaker-runtime invoke-endpoint --body `echo 'example' U+007C base64` --endpoint-name automl-fraud-detection --content-type text/csv /dev/stdout
> Exception: Invalid data format. Input data has 1 while the model expects 6
> ['customer_id_0', 'state', 'amount', 'transaction_time', 'transaction_state', 'time_of_day'] <class 'list'>

# Send higher amount (947.98)
aws sagemaker-runtime invoke-endpoint --body `echo 2,TX,947.08,1695574580,TX,16 U+007C base64` --endpoint-name automl-fraud-detection --content-type text/csv /dev/stdout
> 1.0 // Fraud

# Send lower amount (947.98)
aws sagemaker-runtime invoke-endpoint --body `echo 2,TX,94.08,1695574580,TX,16 U+007C base64` --endpoint-name automl-fraud-detection --content-type text/csv /dev/stdout
> 0.0 // Not Fraud

Troubleshoot with SageMaker Debugger

U+2606 [Bonus] How can you understand and monitor the progress of your long-running training jobs?

In this example, I used a small dataset, so the training typically finishes within an hour or a few hours. However, in enterprise scenarios with large datasets, training can take days or weeks. It is important to monitor and understand what is happening during this time to avoid unnecessary costs.

SageMaker Debugger is a useful tool for this purpose. It wiretaps into your training process, continuously capturing metrics and tensors, storing them in a target bucket, and applying various rules to assess and visualize training progress. While I won’t go into the specifics of these rules or architecture here, those interested, they can read about them here and here, respectively.

Source: https://aws.amazon.com/sagemaker/debugger/

Explain with SageMaker Clarify

[U+2713] [Governance & Compliance] How do you track the model boundaries allowing you to explain the model decisions and detect bias?

SageMaker natively integrates with SageMaker Clarify, which allows you to explain the feature correlation in the training output. This, in turn, enables you to understand the model decisions better.

We will take a closer look at SageMaker Clarify in the “Monitoring & Continuous Improvement” article [LINK].

Source: Image by the author.

Organize with SageMaker Experiments

[U+2713] [Collaboration] How can a data scientist share the experiment, configurations & trained models?

As discussed in the Data Acquisition & Exploration, SageMaker Studio allows all users in the domain to access the same information, including experiments.

Additionally, you can easily share these experiments with Canvas users or export visualizations and trained model artifacts for local, cloud or edge execution.

Source: Image by the author.

[U+2713] [Reproducibility] How can you ensure the reproducibility of the experiment outputs?

When conducting trials with SageMaker, here, AutoML, each run is logged and monitored. SageMaker Experiments automatically captures all steps, parameters, settings, and related input and output artifacts. This allows easy reproducibility of models with consistent settings. Thereby, it simplifies troubleshooting production issues and auditing models for compliance.

Source: Image by the author.

Catalog with SageMaker Model Registry

[U+2713] [Reproducibility] How do you track and manage different versions of trained models

In a typical modeling process, data scientists continuously experiment and fine-tune models to solve specific problems, resulting in the creation of multiple model versions. However, as the repository of trained models grows, managing them becomes a challenge. It is important that the platform we use allows data scientists to:

  • Logically organize related models into groups,
  • Seamlessly manage different versions,
  • Rollback or rollforward versions if issues arise in production.

This is where the SageMaker Model Registry comes into play. It allows a data scientist to register trained models into model groups as versions. Additionally, it enables the establishment of an approval process with statuses such as “Pending”, “Approved”, and “Rejected”. When connected to a CI/CD pipeline (I will discuss this later in “Monitoring & Continuous Improvement” article [LINK]), it can automate the deployment of a SageMaker endpoint whenever a model is approved.

Create a Model Group
Create a group to logically organize the models that solve the specific problem.

Source: Image by the author.

Register a Model Version
First, let’s gather details for model registration. During the cataloging process, ensure that you provide as much information as possible. This includes sample input, explainability, bias, quality reports, package details, and inference details.

In SageMaker Pipelines, you can use the RegisterModel or the newer Model Step to automatically register the model with the Model registry during the training process. We will discuss this in the “Monitoring & Continuous Improvement” [LINK].

Source: Image by the author.

Once you have the necessary information, you can register the version to the model group that was created previously.

Source: Image by the author.
Source: Image by the author.

Approve Model Versions
I have created two additional versions within the same group. For each version, you should be able to view and review the details as provided. Additionally, if you upload quality metrics, you can compare the multiple versions. And as mentioned before, you can approve or reject the model.

Source: Image by the author.

U+26A0️ Clean-up

If you have been following along with the hands-on exercises, make sure to clean up to avoid charges.

Source: Image by the author.

Delete Inference Endpoint
To avoid incurring costs, make sure to delete the endpoint and any other processing jobs.

Source: Image by the author.

Summary

In summary, AWS SageMaker can accelerate the journey of a data scientist in model experimentation, evaluation, and management. Additionally, tools like SageMaker Debugger and SageMaker Clarify can aid in understanding the training process and model.

In the next article, I will explore how AWS SageMaker can help in making the model live, understanding or shadow learning on production data.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓