Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

How To Create an End-2-End Text Paraphrase App
Latest

How To Create an End-2-End Text Paraphrase App

Last Updated on January 1, 2023 by Editorial Team

Author(s): Stavros Theocharis

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Create a paraphrase app with FastAPI or Streamlit andΒ Docker

Image generated byΒ DALL-E

The internet is home to a myriad of innovative AI tools that are available for use today. If you do a search on Google using the key phrase β€œparaphrasing tools,” you will observe that a significant number of results are returned.

But how exactly does one go about developing their very own simple app to perform the function of paraphrasing? In this piece of writing, I will explain how to construct such an app in two distinct ways, each of which has its own advantages. The first approach is to make use of the FastAPI, while the second approach is to make use of Streamlit. The two approaches are completely independent and are used as two different examples. They only use the same paraphrasing pipeline. In the end, I additionally include the components required for the creation of the relevant DockerΒ images.

I have already set up a new repository on GitHub, and all of the components that are being discussed here can be found there in their fully operational state. Because I like to give the names of my projects that seem like products, I decided to call this one β€œQuotera.”

Paraphrasing models

In order to proceed with paraphrasing algorithms, a dedicated AI model is mandatory. Hugging Face offers a wide variety of versions to accommodate a diverse range of scopes. The use of a model obtained through Hugging Face is made simple by the creation of an account and a token. In addition to that, in order to put the paraphrase feature into action for this project, I made use of the Parrot Paraphraser package.

Even though the pre-trained model this package uses is called β€œprithivida/parrot_paraphraser_on_T5”, other pre-trained or custom models can be usedΒ instead.

You may use the following code to log in to Hugging Face, which will also give you access to theΒ model:

from huggingface_hub import HfApi
from huggingface_hub.commands.user import _login
from parrot import Parrot

token = "your_token"
_login(HfApi(), token=token)

parrot_model = Parrot()

Paraphrasing pipeline

It is needed to provide the requisite functionality to generate paraphrased text from a source text. The following parameters (also sourced from the Parrot library) are utilized as a baseline by the vast majority of paraphrase products on the marketΒ today:

adequacy threshold: How much of the meaning is preserved adequately.

fluency threshold: How much of the paraphrased text will be fluent in the specific language.

diversity ranker: Based on the different diversity rankers, how much has the paraphrase changed the original sentence (lexical, / phrasal / syntactical).

The diversity ranker may take on values like β€œlevenhstein”, β€œeyclidean”, and β€œdiff” based on the algorithm used to determine the similarity distance between texts, while the adequacy and fluency thresholds can take on values between 0 andΒ 1.

The class Parrot has a function labeled β€œaugment” that may be used to generate potential paraphrases of text. It requires a distinct invocation for each phrase since it operates at the sentence level. It produces a list of tuples as output. An additional set of paraphrased text is represented by each tuple as the first value. As for the second value, it’s the diversity score comparing the paraphrased text to the original.

For example, we have the following originalΒ text:

text = "An internal combustion engine (ICE or IC engine) is a heat engine."

And the outputΒ is:

[('an ice - or ic - engine is a heat engine', 41), ('an ice or ic engine is a heat engine', 40), ('an ice or ic engine is an internal combustion engine that acts as a heat engine', 36), ('an internal combustion engine ice or ic engine is a heat engine', 13)]

Since I don’t always like the provided diversity score when picking a fresh, original phrase, I’ll be making use of a new function made from the fuzz partial ratio from the fuzzywuzzy library:

from fuzzywuzzy import fuzz

def check_text_similarity(first_text: str, second_text: str) -> float:
"""
Checks similarity between 2 given texts based on the fuzz partial ratio
"""
return (
fuzz.partial_ratio(
first_text,
second_text,
)
/ 100
)

And the function that should be used for all the new sentences that are being compared to the originalΒ is:

def get_most_diverse_text(original_text: str, text_variations_list: List) -> str:
"""
Get an original text and compare it with text variations coming as a list.

kept_sentence: Str text coming as the least similar to the original text

"""
similarity_list = []

for temp_dif_sentence in text_variations_list:
temp_text_similarity = check_text_similarity(
original_text, temp_dif_sentence[0].capitalize()
)
similarity_list.append(temp_text_similarity)

min_index = similarity_list.index(min(similarity_list))
kept_sentence = text_variations_list[min_index][0].capitalize()
kept_sentence = kept_sentence + ". "

return kept_sentence

Keep in mind that the library operates on a per-sentence basis, which means that we need to break the text up into sentences and run it individually for eachΒ one.

After then, we are free to pick which one to keep based on our approach. The most diverse sentences are kept here (other tools keep them all and let the user hover over the sentences to choose their preferred one in a corresponding UI).

The main paraphrase functionΒ is:

def create_paraphrase(
parrot_model: Parrot,
text: str,
adequacy_threshold: float = 0.75,
fluency_threshold: float = 0.90,
diversity_ranker: str = "levenshtein",
) -> Dict[str, str]:
"""
Creates the new paraphrased text
"""
logging.info("Tokenising text.........")
sentences = re.split('[,.!?;]', text)
new_text = ""
logging.info("Creating paraphrased text for each sentence.........")

for sentence in sentences:
dif_sentences = parrot_model.augment(
input_phrase=sentence,
diversity_ranker=diversity_ranker,
adequacy_threshold=adequacy_threshold,
fluency_threshold=fluency_threshold,
)

if dif_sentences != None:
kept_sentence = get_most_diverse_text(sentence, dif_sentences)
new_text = new_text + kept_sentence

return {
"text": new_text
}

Creation of the FastAPIΒ app

Image by https://fastapi.tiangolo.com/

First of all, why have an API for paraphrasing?

The answers vary, but one that is consistent is that once you have the paraphrase pipeline, the next question is how someone or something can make use of it. Perhaps there is another service or more than one service that needs to send text requests and get back the paraphrased text.

If you are thinking of scalability, you are right! It plays an important role in this situation, and given the amount of time required to run the entire pipeline, it is unable to support a large number of requests with the current implementation, but this code is only theΒ basis.

What isΒ FastAPI?

Built on top of the standard Python type hints, the contemporary web framework known as FastAPI is designed to facilitate the creation of application programming interfaces (APIs) using Python versions 3.7 andΒ above.

Here, the fundamental components will be outlined, however further information may be discovered inside the repository.

Implementation

So, firstly, let’s define the two basic schemas, which are the input and output classes, using the most common β€œpydantic” library:

from pydantic import BaseModel
from typing import Optional, List, Literal

class ItemBase(BaseModel):
"""Base entity"""

pass

class TextIn(ItemBase):
text: str
adequacy_threshold: float
fluency_threshold: float
diversity_ranker: str

class TextOut(ItemBase):
text: str

In order to receive the input data and have it in the suitable format, it is further essential to utilize an encoder and to develop the transformation function:

from fastapi.encoders import jsonable_encoder

def transform_input_data(data):
encoded_data = jsonable_encoder(data)

return encoded_data

def get_transformations(
body: TextIn,
) -> Tuple[str, float, float, str]:
transformed_text = transform_input_data(body.text)
transformed_adequacy_threshold = transform_input_data(body.adequacy_threshold)
transformed_fluency_threshold = transform_input_data(body.fluency_threshold)
transformed_diversity_ranker = transform_input_data(body.diversity_ranker)

return (
transformed_text,
transformed_adequacy_threshold,
transformed_fluency_threshold,
transformed_diversity_ranker,
)

Create the API’s endpoint:

from fastapi import APIRouter

paraphrase_analytics_router = APIRouter()

@paraphrase_analytics_router.post(
"/text/paraphrase",
tags=["Text Paraphrase"],
summary="Paraphrase text based on each phrase",
response_description="Paraphrased text",
response_model=TextOut,
)
async def paraphrase_text(body: TextIn):
"""
Use this route to paraphrase text.
"""
logging.info("Triggering paraphrase pipeline")
(
transformed_text,
transformed_adequacy_threshold,
transformed_fluency_threshold,
transformed_diversity_ranker,
) = get_transformations(body)
logging.info("Processing & paraphrasing text")

paraphrased_text = create_paraphrase(
parrot_model,
transformed_text,
transformed_adequacy_threshold,
transformed_fluency_threshold,
transformed_diversity_ranker,
)
end = time.time()

return paraphrased_text

CLI part:

@app.command()
def serve(env: "dev", host: str = "0.0.0.0"):
typer.echo(f"\nRunning Quotera API | Environment: {env} πŸš€ \n")
typer.echo(cprint(figlet_format("Quotera", font="doom"), "green", attrs=["bold"]))
os.system(
f"ENV_FILE='.env.{env}' GUNICORN_CMD_ARGS='--keep-alive 0' uvicorn src.main:quotera --reload --host {host}"
)p

By running the serve command, we can have our app running at http://0.0.0.0:8000.

If you are directly using the code inside the repository, please refer to the README.md file in order to follow the instructions.

Image by theΒ author

The endpoint can be found at http://0.0.0.0:8000/docs

By pressing β€œTry it out” you can give your own text and parameters and check itΒ out.

Creation of the Streamlit app

Image by https://streamlit.io/

What is Streamlit?

Streamlit is a free and open-source framework that enables developers to easily construct and distribute online applications for machine learning and data science. It is a library written in Python that was developed with the machine learning developers’ needs inΒ mind.

In my opinion, Streamlit is a very good tool for rapid prototyping, and it may also be helpful when someone wants their analytics, etc. to be turned into a UIΒ form.

Implementation

In order to visualize the results in a nice way, let’s create a basicΒ card:

import streamlit as st


def card(paraphrased_text):
st.markdown(
f"""
<div class="card text-white bg-dark mb-3" style="max-width: 280rem;">
<div class="card-body">
<p class="card-text">{paraphrased_text}</p>
</div>
</div>
""",
unsafe_allow_html=True,
)

Let’s create the basic part of the dashboard (the config readme file and the util function can be found inside the repository):

st.set_page_config(page_title="Quotera", layout="wide")
readme = load_config("config_readme.toml")

with st.spinner(
"Loading and geting ready the AI models (this action may take some time)..."
):
parrot_model = get_model()

# Info
st.title("Paraphrase Text")

# Parameters
st.sidebar.title("Parameters")
adequacy_threshold = st.sidebar.slider(
"Select adequacy threshold",
0.0,
1.0,
step=0.01,
value=0.90,
help=readme["tooltips"]["adequacy_threshold"],
)
fluency_threshold = st.sidebar.slider(
"Select fluency threshold",
0.0,
1.0,
step=0.01,
value=0.90,
help=readme["tooltips"]["fluency_threshold"],
)
diversity_ranker = st.sidebar.selectbox(
"Select diversity ranker",
("levenshtein", "euclidean", "diff"),
help=readme["tooltips"]["diversity_ranker"],
)

# Text loading
text, button = input_text(readme)


# Paraphrase generation
with st.spinner("Generating Paraphrased text.."):
if button and text:
paraphrased_text = create_paraphrase(
parrot_model,
text,
adequacy_threshold=adequacy_threshold,
fluency_threshold=fluency_threshold,
diversity_ranker="levenshtein",
)["text"]
st.markdown(
"""
<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-1BmE4kWBq78iYhFldvKuhfTAU6auU8tT94WrHftjDbrCEXSU1oBoqyl2QvZ6jIW3" crossorigin="anonymous">
""",
unsafe_allow_html=True,
)

st.write(card(paraphrased_text["text"]))
st.download_button("Download .txt", paraphrased_text["text"])

CLI part:

import os
import sys
from streamlit.web import cli

def deploy_streamlit():
sys.argv = [
"streamlit",
"run",
f"{os.path.dirname(os.path.realpath(__file__))}/dashboard.py",
"--server.port=8060",
"--server.address=0.0.0.0",
]
sys.exit(cli.main())

By running the β€œdeploy_streamlit” command, we can have our app running at http://0.0.0.0:8060.

If you are directly using the code inside the repository, please refer to the README.md file in order to follow the instructions.

And our app in the browser is asΒ follows:

Dockerize ourΒ apps

Image by https://www.docker.com/

What isΒ Docker?

Docker is a platform that is available for free that may be used to build, deploy, and manage containerized applications. Applications are referred to be containerized when they operate in what are known as containers and are isolated runtime environments. An application and all of its dependent components, such as system libraries, binaries, and configuration files, are encased inside a container together.

Implementation

The two above apps, FastAPI and Streamlit, are completely independent services. We can have two different dockerfiles, one for each service. Python 3.9 will be used as the base images. For more information, you can read the compose specification.

The chosen structure is the following:

β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ Dockerfile.api
└── Dockerfile.streamlit

For FastAPI we have the Dockerfile.api:

FROM python:3.9

# install project requirements
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
RUN pip install git+https://github.com/PrithivirajDamodaran/Parrot_Paraphraser.git

COPY . .

EXPOSE 8000

ENTRYPOINT python ./quotera_cli.py serve

For Streamlit we have the Dockerfile.streamlit:

FROM python:3.9

# install project requirements
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
RUN pip install git+https://github.com/PrithivirajDamodaran/Parrot_Paraphraser.git

COPY . .

EXPOSE 8000

ENTRYPOINT python ./quotera_streamlit_cli.py

In order to build the images a β€œdocker-compose.yaml” file isΒ needed:

services:
api:
build:
context: .
dockerfile: Dockerfile.api
ports:
- 8080:8080
volumes:
- ~/.cache/huggingface:/.cache/huggingface
streamlit:
build:
context: .
dockerfile: Dockerfile.streamlit
ports:
- 5000:5000
volumes:
- ~/.cache/huggingface:/.cache/huggingface

In order to create the two above docker images, just need to run β€œdocker-compose up”.

Conclusion

This article demonstrated how to develop a tool for paraphrasing text using either the FastAPI or the Streamlit programming interfaces, as well as how to generate the relevant Docker images. The pipeline for generating paraphrases is based on a pretrained model that was obtained from Hugging Face; however, other variants may be utilized in itsΒ place.

You can get the code you need to follow along with the instructions at this link: https://github.com/stavrostheocharis/quotera

If you have any questions, you are more than welcome to post a comment or create an issue onΒ Github.


How To Create an End-2-End Text Paraphrase App was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓