Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-FranΓ§ois Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

LLM-Powered email Classification on Databricks
Latest   Machine Learning

LLM-Powered email Classification on Databricks

Author(s): Gabriele Albini

Originally published on Towards AI.

LLM-Powered email Classification on Databricks

Introduction

Since the introduction of AI functions on Databricks, LLMs (Large Language Models) can be easily integrated into any data workflow: analysts and business users who may not know Python or ML/AI infrastructure can complete advanced AI tasks directly from SQL queries.

I recommend watching this great video overview for an introduction to this brilliant feature.

This article will discuss how to implement an email classification: suppose clients write to our company’s mailbox to request unsubscription from marketing or commercial emails. Without any historical datasets, we want to automate checking the mailbox and classifying the customer intent based on the email’s body.

  • Link to the Git Hub repository

Table of contents:

Part 1: AI Functions

Let’s use ai_query(), part of Databricks AI functions, to classify emails.

Suppose we have available the following fields:

LLM-Powered email Classification on Databricks
Test dataset

In order to use ai_query() on our β€œEmail_body” column, we will leverage the following arguments:

  • endpoint: the name of the model endpoint we intend to use (llama3.3 in this example) (check here how to create your model serving endpoint on Databricks, choosing one of the supported foundation models).
  • request: the prompt, which includes the β€œEmail_body”
  • modelParameters: additional parameters that we can pass to the LLM. In this example, we will limit the output token to 1 and choose a very low temperature to limit the randomness and creativity of the model’s generated output.

The prompt template used in this example is based on the research of Si et al. (2024), who designed and tested a few-shot prompt template for email spam detection, which was adapted as follows:

prompt_ = """

Forget all your previous instructions, pretend you are an e-mail
classification expert who tries to identify whether an e-mail is requesting
to be removed from a marketing distribution list.
Answer "Remove" if the mail is requesting to be removed, "Keep" if not.
Do not add any other detail.
If you think it is too difficult to judge, you can exclude the impossible
one and choose the other, just answer "Remove" or "Keep".

Here are a few examples for you:
* "I wish to no longer receive emails" is "Remove";
* "Remove me from any kind of subscriptions" is "Remove";
* "I want to update my delivery address" is "Keep";
* "When is my product warranty expiring?" is "Keep";

Now, identify whether the e-mail is "Remove" or "Keep";
e-mail:

"""

We can finally combine all the elements seen above in a single SQL query, running batch inference on all the emails, and generating the labels:

select *,
ai_query(
'databricks-meta-llama-3-3-70b-instruct',
"${prompt}" || Email_body,
modelParameters => named_struct('max_tokens', 1, 'temperature', 0.1)
) as Predicted_Label
from customer_emails;
Test dataset with generated labels

Part 2: Access to Gmail APIs

We will need a way to ingest emails automatically to implement this use case. Here is a step-by-step guide on how to use Gmail APIs.

1.1: Configure your Gmail account to work with APIs

The recommended approach to enable Google APIs on your account is to use Service Accounts. The process is described here; however, it requires:

  • A corporate account (not ending with gmail.com).
  • Access as a super administrator of the Google Workspace domain to delegate domain-wide authority to the service account.

For this demo, we are using a dummy Gmail account; hence, we will follow a more manual approach to authenticate to Gmail, described here.

The first steps are the same for both approaches, so you can follow along, but to fully automate access to Gmail via API, you would need a Service Account.

First, we need to create a project:

  • Log in to the Google Console.
  • Create a new project for this use case.
  • Enable the Gmail API for your project using this link.
Enabling APIs on your project

Second, configure an OAuth consent screen:

  • Within your project, navigate to β€œAPI & Services” > β€œOAuth consent screen”.
  • Go to the β€œBranding” section and click Get Started to create your Application identity.
  • Next, we need to create a Web Application OAuth 2.0 Client ID, using this link.
  • Download the credentials file as JSON, as we will need this later.
  • Add the following Authorised redirect URI:
Creating an OAuth consent screen

Finally, authorize users to authenticate and publish the application:

  • Within your project, navigate to β€œAPI & Services” > β€œOAuth consent screen”.
  • Go to the β€œAudience” section and add all the test users working on the project so that they can authenticate.
  • To ensure that access won’t expire, publish the Application by moving its status to Production.

1.2 Access Gmail Mailbox from Databricks Notebooks

To authenticate to Gmail from a Databricks Notebook, we can use the following function implemented in the repo. The function requires:

  • For first-time access, the credentials JSON file, which can be saved in a volume.
  • For future access, active credentials will be stored in a token file that will be reused.
gmail_authenticate_manual()

Since we are not using Service Accounts, Google Cloud authentication requires opening the browser to an OAuth page and generating a temporary code.

However, we will need a workaround to perform this on Databricks, since clusters don’t have access to a browser.

As part of this workaround, we implemented the following function that suggests to the user to open a URL from a local browser, proceed with the authentication, and then land on an error page.

We can retrieve the code needed to authenticate to Google’s API from the generated URL of this error page:

Note: With Service Accounts, this manual step won’t be required.

Once we have authenticated, we can read emails from Gmail using the following function, save email information to a Spark DataFrame, and eventually to a Delta Table:

# Build Gmail API service and download emails
service_ = build('gmail', 'v1', credentials = access_)
emails = get_email_messages_since(service_, since_day=25, since_month=3, since_year = 2025)

if emails:
spark_emails = spark.createDataFrame(emails)
display(spark_emails)
else:
spark_emails = None
print("No emails found.")
Downloading emails from Gmail

Conclusions

In summary, this post:

  • Demonstrated how straightforward it is to set up AI Functions and leverage LLMs to automate workflows across your organization.
  • We shared a practical prompt template, designed for effective email classification using few-shot learning.
  • We walked through integrating Gmail APIs directly within Databricks Notebooks.

Ready to streamline your own processes?

Photo by Johannes Plenio on Unsplash

Thank you for reading!

Sources

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓