Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

ChatGP…Me? Building a Telegram bot “clone” with OpenAI’s GPT
Latest   Machine Learning

ChatGP…Me? Building a Telegram bot “clone” with OpenAI’s GPT

Last Updated on July 17, 2023 by Editorial Team

Author(s): Bingwen Tan

Originally published on Towards AI.

Learn how to use Python, NodeJS, and Google Firebase, together with the OpenAI GPT API, to build a Telegram bot that learns the way you speak (but with a very loose grasp of the facts)!

Tempted to take up its advice…

Of late, ChatGPT has been all the rage. I myself have been using it regularly as a super-powered personal assistant of sorts.

But despite its prowess, to someone steeped in WhatApp and Telegram banter with friends, ChatGPT sounds pretty flat — just like the (im)personal assistant it has become to me. Two main reasons:

  1. In an interaction with ChatGPT, both parties take turns sending one message to the other. This is more akin to email than messaging a friend over Telegram, where messages bounce around freely, with no particular structure.
  2. ChatGPT writing style defaults to that of a tryhard teacher’s pet!
Just another day with ChatGPT at work

Wouldn’t it be great if we had a version that was more colloquial, natural, and fun to talk to, that we could chat with on Telegram? With that in mind, I set out to try and build such a contraption/monster/digital twin, that chats like me.

Overall Strategy

The plan was to fine-tune one of the models from the GPT-3 family with my own Telegram chats. For this exercise, I selected just one chat with a friend whose conversations with me lay on the tamer end of the spectrum yet hopefully contained enough messages (48,000) to sufficiently teach the model my speaking style (and life story).

For the fine-tuning process, I broadly followed the strategy laid out in OpenAI’s customer support chatbot case study, with some modifications. The fine-tuning was done in Python.

For serving, I spun up a Firebase application, written in NodeJS, to 1) store conversation history and 2) serve the model via Telegram by responding to Telegram HTTP webhooks. More detailed steps follow, and you can get both fine-tuning and serving code at this repository.

The next few parts get a little technical. If you’re not keen on these implementation details, feel free to skip to the observations!

Fine-Tuning

The concepts behind fine-tuning GPT models are covered in detail here. In a nutshell, we need to show the GPT models a number of examples, where each example contains a prompt (what is fed into the model), and its corresponding completion (what the model returns). The challenge is how to turn our Telegram conversation history into a series of such prompt-completion pairs, where the prompts and completions are engineered to meet our goal.

Getting the conversation history from Telegram

Pretty trivial, just follow Telegram’s instructions and download the history in JSON format. Read the data to get a list of messages.

with open('result.json', encoding="utf8") as json_file:
data = json.load(json_file)

messages = data["messages"] #messages is a list
print(len(messages)) #47902 messages
pprint(messages[0]) #example data format as below

# {'date': '2021-01-07T19:00:52',
# 'date_unixtime': '1610017252',
# 'from': 'Redacted',
# 'from_id': 'user135884720',
# 'id': 13405,
# 'text': 'Hello!',
# 'text_entities': [{'text': 'Hello!',
# 'type': 'plain'}],
# 'type': 'message'}

Turning the messages into meaningful prompts-completions pairs

OpenAI’s customer chatbot fine-tuning example suggests designing prompts as follows:

{"prompt":"Customer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent:", "completion":" <response2>\n"}
{"prompt":"Customer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent: <response2>\nCustomer: <message3>\nAgent:", "completion":" <response3>\n"}

In English, this means that this (simulated) conversation with an agent…

Yup Microsoft bankrolls you but Bing, really?

…would yield 3 prompt-completion pairs. The first two are as follows:

Prompt 1:

Customer: Hi, I would like to cancel my credit card
Agent:

Completion 1:

Of course! I’d be happy to help you cancel your credit card. May I have your name and the last four digits of the card you would like to cancel, please?

Prompt 2:

Customer: Hi, I would like to cancel my credit card
Agent: Of course! I’d be happy to help you cancel your credit card. May I have your name and the last four digits of the card you would like to cancel, please?
Customer: Bing Wen, 1111
Agent:

Completion 2:

Thank you, Bing. May I ask why you would like to cancel your card?

This prompt-completion schema is neat but still results in the single-response-per-message behavior that we want to avoid. To address this, we modify the schema slightly. Let’s say we have this Telegram conversation:

This will yield 2 examples (the first set of messages, initiated by me, doesn’t count as a completion since the final bot will not initiate randomly, unlike its irl counterpart)

Prompt 1:

Me: I’m writing my medium post on chatbings now
Me: Taking longer than expected
They: Oo hehe
They: Looking forward to it!
They: I gonna measure carpentry today
Me:

Completion 1:

The screenshot of this chat will go into the post
Me: So choose your next words carefully
Me: HAHAHA
<END>

Note that the first message in the completion does not come pre-pended with “me”. This is because it’s already included in the prompt. By doing so, we force GPT to respond as “me”. The customer service chatbot examples do this as well. Also, I added an <END> token, to signal the end of a series of replies. This teaches GPT to also emit this signal at the end of its replies when being used live.

Prompt 2:

Me: I’m writing my medium post on chatbings now
Me: Taking longer than expected
They: Oo hehe
They: Looking forward to it!
They: I gonna measure carpentry today
Me: The screenshot of this chat will go into the post
Me: So choose your next words carefully
Me: HAHAHA
They: Oh dear
They: HAHA
Me:

Completion 2:

Ok that’s enough HAHAHA
Me: Just need to explain how the fine- tuning works
<END>

With this mechanism, GPT eventually learns to respond as “me”, and is also able to understand and reply in sets of messages, as opposed to just one message per turn.

The last idea to introduce is how to segment the message history into conversations. As seen by the prompts above, we are passing in earlier messages earlier on in the conversation so that GPT is able to maintain conversational context. However, in the context of Telegram chats, it wouldn’t make sense to do this indefinitely up the message history since a message history is segmented into multiple, distinct conversations, which should be mostly independent of each other. We will need some way of programmatically breaking up the message history into such conversations.

I decided to flag the start of a new conversation each time at least 1 hour had passed without further messages from me.

With all that theory out of the way, here is the code for creating the conversations.

new_convo_threshold_seconds = 3600 #a new conversation starts if 1 hr without further messages elapses after the last message from me
telegram_name = "Bing Wen" #my name

#obtain date of messages
for message in messages:
message["datetime"] = datetime.strptime(message['date'], '%Y-%m-%dT%H:%M:%S')

def check_new_convo(previous_message, current_message):
return (previous_message["from"] == telegram_name and
(current_message["datetime"] - previous_message["datetime"]).seconds >
new_convo_threshold_seconds)

#this loop creates a list of conversations
conversations = []
for idx,message in enumerate(messages):
if (idx == 0) or check_new_convo(messages[idx-1],messages[idx]):
if idx > 0:
conversations.append(new_conversation)
new_conversation = []
new_conversation.append(message)

conversations = list(filter(lambda x: len(x) > 1, conversations)) #one-message conversations are not conversations
print(len(conversations))

With the conversations properly segmented, we then create the prompt-completion pairs.

for conversation in conversations:
for idx,message in enumerate(conversation):
if idx == (len(conversation)-1):
continue
if idx == 0: #if start of convo
message["prompt_start"] = True #this is a start of a prompt
message["completion_start"] = False
if message["from"] != telegram_name and conversation[idx+1]["from"] == telegram_name: #if this is the end of the other party's messages
message["prompt_end"] = True #it's end the of the prompt
conversation[idx+1]["completion_start"] = True #and the start of a completion
else:
message["prompt_end"] = False
conversation[idx+1]["completion_start"] = False
if message["from"] == telegram_name and conversation[idx+1]["from"] != telegram_name: #if this is the end of a string of my messages
message["completion_end"] = True #it's the end of a completion
conversation[idx+1]["prompt_start"] = True #and the next line is a start of a new prompt
else:
message["completion_end"] = False
conversation[idx+1]["prompt_start"] = False

training_pairs = []

def get_line(message): #this function prepends
if message["from"] == telegram_name:
name = "Me"
else:
name = "They"
if 'photo' in message: #handle image messages
text = '<IMAGE>'
else:
text = message["text"]
if text:
try: #handling some weird situations where there are urls/entities in the text
if isinstance(text, list):
textStr = ""
for stuff in text:
if isinstance(stuff, dict):
textStr += stuff["text"]
else:
textStr += stuff
text = textStr
except:
print(text)
return f"{name}:{text}\n"
else:
return False

#this loop creates the multiple training examples from each example
for conversation in conversations:
seed_pair = {"prompt": "", "completion":""}
for message in conversation:
if message["prompt_start"]:
key = "prompt"
elif message["completion_start"]:
key = "completion"
new_line = get_line(message)
if new_line:
seed_pair[key] += get_line(message)
if message.get("completion_end",True):
training_pairs.append(seed_pair.copy())
seed_pair["prompt"] += seed_pair["completion"]
seed_pair["completion"] = ""

#strip those pairs with no completions
training_pairs = [pair for pair in training_pairs if len((pair["completion"].rstrip())) > 0]

#postprocessing
stop_sequence = "<END>"
me_token = "Me:"
acceptable_char_length = 1400
min_prompt_length = 1400

def truncate_prompt(prompt, completion):
if (len(prompt) + len(completion)) > acceptable_char_length:
length_for_prompt = max(acceptable_char_length - len(completion), min_prompt_length)
new_prompt = prompt[-length_for_prompt:]
lower = min(new_prompt.find("\nMe:"),new_prompt.find("\nThey:"))
new_prompt = new_prompt[lower+1:]
return new_prompt
else:
return prompt

char_counter = 0

for pair in training_pairs:
# next two lines gets rid of the first me in the completion, and appends it to the prompt instead
pair['prompt'] += me_token
pair['completion'] = " "+me_token.join(pair['completion'].split(me_token)[1:])+stop_sequence
if len(pair['prompt']) + len(pair['completion']) > acceptable_char_length:
pair['prompt'] = truncate_prompt(pair['prompt'],pair['completion']) #truncates prompt if conversation too long, retaining the more recent messages
char_counter += (len(pair['prompt']) + len(pair['completion']))

print(f"{len(training_pairs)} training pairs") #9865 training pairs
pprint(training_pairs[29])
# {'completion': ' HAHA omg VBA\n'
# 'Me:if you like that kinda stuff, we can do alot with VBA here '
# 'too\n'
# '<END>',
# 'prompt': 'They:Some profs really quite funny haha\n'
# 'Me:ya haha BL himself is quite funny\n'
# 'They:Reminds me of my financial modeling prof\n'
# 'Me:what did u study again ah\n'
# 'They:He made some How To Train Your Dragon worksheet\n'
# 'They:On VBA\n'
# 'They:<IMAGE>\n'
# 'They:I was econs and business degree!\n'
# 'Me:'}

Initiate fine-tuning

With the training examples ready, we now install the OpenAI command-line interface (CLI).

pip install --upgrade openai

We are ready to prepare the fine-tuning JSONL file that OpenAI expects.


df = pd.DataFrame(training_pairs)
df.to_json("fine_tuning.jsonl", orient='records', lines=True)
## note - this was run in Jupyter Notebook, hence the next line is to execute the shell command
!openai tools fine_tunes.prepare_data -f fine_tuning.jsonl -q

Finally, we call the fine-tuning endpoint via the CLI. Here, we opt to fine-tune the default model, Curie. Curie is the 2nd best of the 4 OpenAI text models currently available. Performance would likely be improved by using Davinci, the best one, but it also costs 10x more. As it was, the fine-tuning on Curie cost me $47 USD, so Davinci would have been a bit too costly to stomach, even in the name of science.

import os
os.environ["OPENAI_API_KEY"] = "<YOUR KEY>" #set env variables
!openai api fine_tunes.create -t "fine_tuning.jsonl" #call shell command in jupyter notebook

Testing the fine-tuned model in Python

Before serving the model via Telegram, we test it out on Python. Initial testing showed the bot repeating itself very often. I adjusted the hyperparameters frequency_penalty and presence_penalty to reduce repetitions while playing around with temperature to adjust the bot’s imaginativeness. Eventually, this ensued:

import os
import openai
openai.organization = "<YOUR ORGANISATION KEY>"
openai.api_key = "<YOUR API KEY>"

def get_chatbings_response(text):
prompt = "They:" + text + "\nMe:"
stop_sequence = "<END>"
response = openai.Completion.create(
model="curie:ft-<your organisation>-2023-01-23-07-14-12",
prompt=prompt,
temperature=0.2, #injects more randomness into the model, which makes it's imagination wilder
max_tokens=100, #caps the response length
frequency_penalty=0.6, #penalises repetitions
presence_penalty=0.6, #penalises repetitions
stop = stop_sequence
)
return response.choices[0].text

print(get_chatbings_response("What is the meaning of life?"))

# HAHAHAHA
# Me:I'm not sure if I have a definitive answer to that
# Me:But I think it's important to live life with purpose and meaning

Deep.

It was now time to serve the model up in Telegram!

Serving

We now have a model that takes in n ≥1 prior Telegram message as a prompt, and spits out m ≥ 0 messages in response (yes the bot theoretically could just ignore you). The broad strategy behind serving this model as a Telegram bot is to:

  1. Store all messages from users and from the bot in a database
  2. Devise some mechanism or rule by which the model is triggered (remember, we want to avoid triggering it upon every message — that would make the conversation really unnatural and weird!).
  3. Write some logic to respond to the aforementioned trigger by:
    a. compiling all prior messages in the current conversation
    b. generating the prompt, sending it to the model’s API endpoint
    , c. sending the returned completion back to the user via Telegram’s API
  4. Deploy the necessary logic behind an HTTP endpoint and set the Telegram bot’s webhook.

The main point of this article is on the fine-tuning and prompt engineering process, so I’ll cover the serving aspects in lesser detail. Do refer to the serving repo for more detailed instructions!

Architecture

To achieve all this, I used Google Cloud’s Firebase platform. Messages would be stored in Cloud Firestore, while an HTTP-triggered Cloud Function would listen to Telegram’s webhooks and execute the necessary logic.

Triggering Mechanism

After racking my brain over how to design a stateless, serverless system that would only trigger the model if at least x seconds had elapsed since the last message, I decided to skip all this complexity and instead having the user trigger the bot’s response by sending in an “X”. Simplicity ftw!

Cloud Function

With the above simplification, just one function, written in NodeJS, was enough to handle everything! This function runs every time a Telegram message is sent to the bot. It does the following:

The full function can be found here. Do refer to the readme on how to deploy the whole architecture onto Google Cloud!

Telegram Stuff

To get a bot up and running, we have to first create a Telegram bot via the one-and-only Botfather (remember to save your bot token). Once the function above has been deployed to Firebase, you need to set the bot’s webhook to point to the HTTP endpoint that Google just created for you. This can be done via Postman or simply curl.

And there you have it! A Telegram bot that hopefully talks like you.

Observations

After letting a few closer friends and colleagues play around with the bot (christened ChatBings), here are some of what I’ve learned about it (and them).

  • It picked up my linguistic style and quirks pretty well…
Having a work-related discussion. It’s learnt my jaunty chat mannerisms.
  • It’s terrible at small talk (probably learned from me), and often blows people off (hopefully not learned from me…)
Oof…
  • While it pretty much mimics (and amps up) my style, I can’t say the same for content. It’s basically a fantasist with a lurid imagination that has clearly gone way beyond the scope of the conversation history it was fine-tuned on. In addition, it’s inconsistent across conversations, making up things as it goes along. Sometimes I’m single, sometimes attached, sometimes I like girls, sometimes guys — you get the drift.
I ended up implementing access control to the Telegram bot — reputational risks are too great
  • It’s especially imaginative and chatty when talking about relationships and love. I hypothesize it’s because the internet is full of such writing because it certainly didn’t pick this up from the fine-tuning data…
THIS IS TOO CONSUMING FOR ME SO BYE
  • It comes up with the occasional meme-able quote.
  • My friends have a repressed urge to enquire about my love life.
  • I use “HAHAHA” and U+1F602 wayyy too much in chats. HAHAHA.
I don’t even know what a riesling is. HAHAHA

Jokes aside, the outcome really surpassed my expectations. It’s passable as me, at least until one starts to delve a little deeper.

Final thoughts and conclusions

All in all, the whole process of building this bot was really fun. It taught me a ton about prompt engineering and GPT’s hyperparameters, and the end product was surprisingly legit, even without using the costlier Davinci model. It clearly was fun for my friends who interacted with it, too (at the expense of my OpenAI and GCP credits). And now I’ve got something to show off at social gatherings!

That said, while being an excellent gimmick, the bot is still (at least according to those who tried it), inferior as a chatting companion to the real deal. I guess I (and we?) should take heart from that…? Also, I’m not convinced of the utility of doing something like this in a business context. The model’s ability to hallucinate beyond what it was fine-tuned on is just too dangerous, especially for critical domains. Perhaps with future enhancements and/or with a better fine-tuning / prompt engineering strategy, this might yet change.

Still, even today, something like this could potentially be used for certain niche situations. One possibility I’ve been mulling is that of bringing loved ones back to life on Telegram — hollow as the chat might be, it might still bring a sense of comforting familiarity. On the flip side, I’m also quite concerned about something like this being used for love scams at an industrial scale, given the bot’s clear penchant for romance.

And with that, that’s the end of this missive! Have ideas on how else this could be used or improved? Or want to have a chat with the real me? Do hit me up on LinkedIn!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓

Sign Up for the Course
`; } else { console.error('Element with id="subscribe" not found within the page with class "home".'); } } }); // Remove duplicate text from articles /* Backup: 09/11/24 function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag elements.forEach(el => { const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 2) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); */ // Remove duplicate text from articles function removeDuplicateText() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, strong'); // Select the desired elements const seenTexts = new Set(); // A set to keep track of seen texts const tagCounters = {}; // Object to track instances of each tag // List of classes to be excluded const excludedClasses = ['medium-author', 'post-widget-title']; elements.forEach(el => { // Skip elements with any of the excluded classes if (excludedClasses.some(cls => el.classList.contains(cls))) { return; // Skip this element if it has any of the excluded classes } const tagName = el.tagName.toLowerCase(); // Get the tag name (e.g., 'h1', 'h2', etc.) // Initialize a counter for each tag if not already done if (!tagCounters[tagName]) { tagCounters[tagName] = 0; } // Only process the first 10 elements of each tag type if (tagCounters[tagName] >= 10) { return; // Skip if the number of elements exceeds 10 } const text = el.textContent.trim(); // Get the text content const words = text.split(/\s+/); // Split the text into words if (words.length >= 4) { // Ensure at least 4 words const significantPart = words.slice(0, 5).join(' '); // Get first 5 words for matching // Check if the text (not the tag) has been seen before if (seenTexts.has(significantPart)) { // console.log('Duplicate found, removing:', el); // Log duplicate el.remove(); // Remove duplicate element } else { seenTexts.add(significantPart); // Add the text to the set } } tagCounters[tagName]++; // Increment the counter for this tag }); } removeDuplicateText(); //Remove unnecessary text in blog excerpts document.querySelectorAll('.blog p').forEach(function(paragraph) { // Replace the unwanted text pattern for each paragraph paragraph.innerHTML = paragraph.innerHTML .replace(/Author\(s\): [\w\s]+ Originally published on Towards AI\.?/g, '') // Removes 'Author(s): XYZ Originally published on Towards AI' .replace(/This member-only story is on us\. Upgrade to access all of Medium\./g, ''); // Removes 'This member-only story...' }); //Load ionic icons and cache them if ('localStorage' in window && window['localStorage'] !== null) { const cssLink = 'https://code.ionicframework.com/ionicons/2.0.1/css/ionicons.min.css'; const storedCss = localStorage.getItem('ionicons'); if (storedCss) { loadCSS(storedCss); } else { fetch(cssLink).then(response => response.text()).then(css => { localStorage.setItem('ionicons', css); loadCSS(css); }); } } function loadCSS(css) { const style = document.createElement('style'); style.innerHTML = css; document.head.appendChild(style); } //Remove elements from imported content automatically function removeStrongFromHeadings() { const elements = document.querySelectorAll('h1, h2, h3, h4, h5, h6, span'); elements.forEach(el => { const strongTags = el.querySelectorAll('strong'); strongTags.forEach(strongTag => { while (strongTag.firstChild) { strongTag.parentNode.insertBefore(strongTag.firstChild, strongTag); } strongTag.remove(); }); }); } removeStrongFromHeadings(); "use strict"; window.onload = () => { /* //This is an object for each category of subjects and in that there are kewords and link to the keywods let keywordsAndLinks = { //you can add more categories and define their keywords and add a link ds: { keywords: [ //you can add more keywords here they are detected and replaced with achor tag automatically 'data science', 'Data science', 'Data Science', 'data Science', 'DATA SCIENCE', ], //we will replace the linktext with the keyword later on in the code //you can easily change links for each category here //(include class="ml-link" and linktext) link: 'linktext', }, ml: { keywords: [ //Add more keywords 'machine learning', 'Machine learning', 'Machine Learning', 'machine Learning', 'MACHINE LEARNING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ai: { keywords: [ 'artificial intelligence', 'Artificial intelligence', 'Artificial Intelligence', 'artificial Intelligence', 'ARTIFICIAL INTELLIGENCE', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, nl: { keywords: [ 'NLP', 'nlp', 'natural language processing', 'Natural Language Processing', 'NATURAL LANGUAGE PROCESSING', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, des: { keywords: [ 'data engineering services', 'Data Engineering Services', 'DATA ENGINEERING SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, td: { keywords: [ 'training data', 'Training Data', 'training Data', 'TRAINING DATA', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, ias: { keywords: [ 'image annotation services', 'Image annotation services', 'image Annotation services', 'image annotation Services', 'Image Annotation Services', 'IMAGE ANNOTATION SERVICES', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, l: { keywords: [ 'labeling', 'labelling', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, pbp: { keywords: [ 'previous blog posts', 'previous blog post', 'latest', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, mlc: { keywords: [ 'machine learning course', 'machine learning class', ], //Change your article link (include class="ml-link" and linktext) link: 'linktext', }, }; //Articles to skip let articleIdsToSkip = ['post-2651', 'post-3414', 'post-3540']; //keyword with its related achortag is recieved here along with article id function searchAndReplace(keyword, anchorTag, articleId) { //selects the h3 h4 and p tags that are inside of the article let content = document.querySelector(`#${articleId} .entry-content`); //replaces the "linktext" in achor tag with the keyword that will be searched and replaced let newLink = anchorTag.replace('linktext', keyword); //regular expression to search keyword var re = new RegExp('(' + keyword + ')', 'g'); //this replaces the keywords in h3 h4 and p tags content with achor tag content.innerHTML = content.innerHTML.replace(re, newLink); } function articleFilter(keyword, anchorTag) { //gets all the articles var articles = document.querySelectorAll('article'); //if its zero or less then there are no articles if (articles.length > 0) { for (let x = 0; x < articles.length; x++) { //articles to skip is an array in which there are ids of articles which should not get effected //if the current article's id is also in that array then do not call search and replace with its data if (!articleIdsToSkip.includes(articles[x].id)) { //search and replace is called on articles which should get effected searchAndReplace(keyword, anchorTag, articles[x].id, key); } else { console.log( `Cannot replace the keywords in article with id ${articles[x].id}` ); } } } else { console.log('No articles found.'); } } let key; //not part of script, added for (key in keywordsAndLinks) { //key is the object in keywords and links object i.e ds, ml, ai for (let i = 0; i < keywordsAndLinks[key].keywords.length; i++) { //keywordsAndLinks[key].keywords is the array of keywords for key (ds, ml, ai) //keywordsAndLinks[key].keywords[i] is the keyword and keywordsAndLinks[key].link is the link //keyword and link is sent to searchreplace where it is then replaced using regular expression and replace function articleFilter( keywordsAndLinks[key].keywords[i], keywordsAndLinks[key].link ); } } function cleanLinks() { // (making smal functions is for DRY) this function gets the links and only keeps the first 2 and from the rest removes the anchor tag and replaces it with its text function removeLinks(links) { if (links.length > 1) { for (let i = 2; i < links.length; i++) { links[i].outerHTML = links[i].textContent; } } } //arrays which will contain all the achor tags found with the class (ds-link, ml-link, ailink) in each article inserted using search and replace let dslinks; let mllinks; let ailinks; let nllinks; let deslinks; let tdlinks; let iaslinks; let llinks; let pbplinks; let mlclinks; const content = document.querySelectorAll('article'); //all articles content.forEach((c) => { //to skip the articles with specific ids if (!articleIdsToSkip.includes(c.id)) { //getting all the anchor tags in each article one by one dslinks = document.querySelectorAll(`#${c.id} .entry-content a.ds-link`); mllinks = document.querySelectorAll(`#${c.id} .entry-content a.ml-link`); ailinks = document.querySelectorAll(`#${c.id} .entry-content a.ai-link`); nllinks = document.querySelectorAll(`#${c.id} .entry-content a.ntrl-link`); deslinks = document.querySelectorAll(`#${c.id} .entry-content a.des-link`); tdlinks = document.querySelectorAll(`#${c.id} .entry-content a.td-link`); iaslinks = document.querySelectorAll(`#${c.id} .entry-content a.ias-link`); mlclinks = document.querySelectorAll(`#${c.id} .entry-content a.mlc-link`); llinks = document.querySelectorAll(`#${c.id} .entry-content a.l-link`); pbplinks = document.querySelectorAll(`#${c.id} .entry-content a.pbp-link`); //sending the anchor tags list of each article one by one to remove extra anchor tags removeLinks(dslinks); removeLinks(mllinks); removeLinks(ailinks); removeLinks(nllinks); removeLinks(deslinks); removeLinks(tdlinks); removeLinks(iaslinks); removeLinks(mlclinks); removeLinks(llinks); removeLinks(pbplinks); } }); } //To remove extra achor tags of each category (ds, ml, ai) and only have 2 of each category per article cleanLinks(); */ //Recommended Articles var ctaLinks = [ /* ' ' + '

Subscribe to our AI newsletter!

' + */ '

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

'+ '

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

' + '
' + '' + '' + '

Note: Content contains the views of the contributing authors and not Towards AI.
Disclosure: This website may contain sponsored content and affiliate links.

' + 'Discover Your Dream AI Career at Towards AI Jobs' + '

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 10,000 live jobs today with Towards AI Jobs!

' + '
' + '

🔥 Recommended Articles 🔥

' + 'Why Become an LLM Developer? Launching Towards AI’s New One-Stop Conversion Course'+ 'Testing Launchpad.sh: A Container-based GPU Cloud for Inference and Fine-tuning'+ 'The Top 13 AI-Powered CRM Platforms
' + 'Top 11 AI Call Center Software for 2024
' + 'Learn Prompting 101—Prompt Engineering Course
' + 'Explore Leading Cloud Providers for GPU-Powered LLM Training
' + 'Best AI Communities for Artificial Intelligence Enthusiasts
' + 'Best Workstations for Deep Learning
' + 'Best Laptops for Deep Learning
' + 'Best Machine Learning Books
' + 'Machine Learning Algorithms
' + 'Neural Networks Tutorial
' + 'Best Public Datasets for Machine Learning
' + 'Neural Network Types
' + 'NLP Tutorial
' + 'Best Data Science Books
' + 'Monte Carlo Simulation Tutorial
' + 'Recommender System Tutorial
' + 'Linear Algebra for Deep Learning Tutorial
' + 'Google Colab Introduction
' + 'Decision Trees in Machine Learning
' + 'Principal Component Analysis (PCA) Tutorial
' + 'Linear Regression from Zero to Hero
'+ '

', /* + '

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

',*/ ]; var replaceText = { '': '', '': '', '
': '
' + ctaLinks + '
', }; Object.keys(replaceText).forEach((txtorig) => { //txtorig is the key in replacetext object const txtnew = replaceText[txtorig]; //txtnew is the value of the key in replacetext object let entryFooter = document.querySelector('article .entry-footer'); if (document.querySelectorAll('.single-post').length > 0) { //console.log('Article found.'); const text = entryFooter.innerHTML; entryFooter.innerHTML = text.replace(txtorig, txtnew); } else { // console.log('Article not found.'); //removing comment 09/04/24 } }); var css = document.createElement('style'); css.type = 'text/css'; css.innerHTML = '.post-tags { display:none !important } .article-cta a { font-size: 18px; }'; document.body.appendChild(css); //Extra //This function adds some accessibility needs to the site. function addAlly() { // In this function JQuery is replaced with vanilla javascript functions const imgCont = document.querySelector('.uw-imgcont'); imgCont.setAttribute('aria-label', 'AI news, latest developments'); imgCont.title = 'AI news, latest developments'; imgCont.rel = 'noopener'; document.querySelector('.page-mobile-menu-logo a').title = 'Towards AI Home'; document.querySelector('a.social-link').rel = 'noopener'; document.querySelector('a.uw-text').rel = 'noopener'; document.querySelector('a.uw-w-branding').rel = 'noopener'; document.querySelector('.blog h2.heading').innerHTML = 'Publication'; const popupSearch = document.querySelector$('a.btn-open-popup-search'); popupSearch.setAttribute('role', 'button'); popupSearch.title = 'Search'; const searchClose = document.querySelector('a.popup-search-close'); searchClose.setAttribute('role', 'button'); searchClose.title = 'Close search page'; // document // .querySelector('a.btn-open-popup-search') // .setAttribute( // 'href', // 'https://medium.com/towards-artificial-intelligence/search' // ); } // Add external attributes to 302 sticky and editorial links function extLink() { // Sticky 302 links, this fuction opens the link we send to Medium on a new tab and adds a "noopener" rel to them var stickyLinks = document.querySelectorAll('.grid-item.sticky a'); for (var i = 0; i < stickyLinks.length; i++) { /* stickyLinks[i].setAttribute('target', '_blank'); stickyLinks[i].setAttribute('rel', 'noopener'); */ } // Editorial 302 links, same here var editLinks = document.querySelectorAll( '.grid-item.category-editorial a' ); for (var i = 0; i < editLinks.length; i++) { editLinks[i].setAttribute('target', '_blank'); editLinks[i].setAttribute('rel', 'noopener'); } } // Add current year to copyright notices document.getElementById( 'js-current-year' ).textContent = new Date().getFullYear(); // Call functions after page load extLink(); //addAlly(); setTimeout(function() { //addAlly(); //ideally we should only need to run it once ↑ }, 5000); }; function closeCookieDialog (){ document.getElementById("cookie-consent").style.display = "none"; return false; } setTimeout ( function () { closeCookieDialog(); }, 15000); console.log(`%c 🚀🚀🚀 ███ █████ ███████ █████████ ███████████ █████████████ ███████████████ ███████ ███████ ███████ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ Towards AI is looking for contributors! │ │ Join us in creating awesome AI content. │ │ Let's build the future of AI together → │ │ https://towardsai.net/contribute │ │ │ └───────────────────────────────────────────────────────────────────┘ `, `background: ; color: #00adff; font-size: large`); //Remove latest category across site document.querySelectorAll('a[rel="category tag"]').forEach(function(el) { if (el.textContent.trim() === 'Latest') { // Remove the two consecutive spaces (  ) if (el.nextSibling && el.nextSibling.nodeValue.includes('\u00A0\u00A0')) { el.nextSibling.nodeValue = ''; // Remove the spaces } el.style.display = 'none'; // Hide the element } }); // Add cross-domain measurement, anonymize IPs 'use strict'; //var ga = gtag; ga('config', 'G-9D3HKKFV1Q', 'auto', { /*'allowLinker': true,*/ 'anonymize_ip': true/*, 'linker': { 'domains': [ 'medium.com/towards-artificial-intelligence', 'datasets.towardsai.net', 'rss.towardsai.net', 'feed.towardsai.net', 'contribute.towardsai.net', 'members.towardsai.net', 'pub.towardsai.net', 'news.towardsai.net' ] } */ }); ga('send', 'pageview'); -->