Towards AI Can Help your Team Adopt AI: Corporate Training, Consulting, and Talent Solutions.

Publication

Tokens and Models: Understanding LangChain 🦜️🔗 Part:3
Latest   Machine Learning

Tokens and Models: Understanding LangChain 🦜️🔗 Part:3

Last Updated on July 17, 2023 by Editorial Team

Author(s): Chinmay Bhalerao

Originally published on Towards AI.

Tokens and Models: Understanding LangChain U+1F99C️U+1F517 Part:3

Understanding tokens and how to select OpenAI models for your use case, how API key pricing works

Image by Author

WE CAN CONNECT ON :U+007C LINKEDIN U+007C TWITTER U+007C MEDIUM U+007C SUBSTACK U+007C

Prior to exploring tokens and models, I composed two segments of a series. In the first segment, I provided a theoretical explanation of different modules and the functioning of LangChain. In the second segment, I implemented a practical use case of document-based question answering using Gradio as a frontend tool. If you haven’t had the chance to read them yet, they are available below.

Understanding LangChain U+1F99C️U+1F517: PART 1

Theoretical understanding of chains, prompts, and other important modules in Langchain

pub.towardsai.net

Understanding LangChain U+1F99C️U+1F517: Part:2

Implementing LangChain practically for building custom data bots involves incorporating memory, prompt templates, and…

pub.towardsai.net

U+26A0 You will understand the following concepts from this blogU+26A0

U+2714 What are tokens?

U+2714How tokens are calculated for paragraphs ?

U+2714 How does the number of tokens impact the selection of the model?

U+2714 what are different models for OpenAI ?

U+2714How to choose a model for our LangChain problem?

U+2714how to calculate usage of your OpenAI API key?

Tokens :

Tokens can be thought of as parts of words. The API breaks down the input into tokens before processing the prompts. Tokens are not cut exactly where words start or end, and can include trailing spaces and even sub-words.

In NLP, we have a term known as tokenization, where we cut all paragraphs into sentences or words. Here we are doing exactly the same and cutting sentences and para into small chunks of words.

The above fig shows how texts are divided into tokens. the different colors are representing different tokens. A general rule of thumb is that one token is roughly equivalent to 4 characters of text in common English text. This means that 100 tokens are approximately equal to 75 words.

If you want to check any particular text for a number of tokens then you can directly check on OpenAI’s Tokenizer.

Another method to count tokens is to use tiktoken library.

import tiktoken
#Write function to take string input and return number of tokens
def num_tokens_from_string(string: str, encoding_name: str) -> int:
"""Returns the number of tokens in a text string."""
encoding = tiktoken.encoding_for_model(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens

And at last, apply the above function

prompt = []
for i in data:
prompt.append((num_tokens_from_string(i['prompt'], "davinci")))

completion = []
for j in data:
completion.append((num_tokens_from_string(j['completion'], "davinci")))

res_list = []
for i in range(0, len(prompt)):
res_list.append(prompt[i] + completion[i])

no_of_final_token = 0
for i in res_list:
no_of_final_token+=i
print("Number of final token",no_of_final_token)

Output

Number of final token 2094

Likewise, you can count the number of tokens.

So your input data will be converted into tokens and then it will feed to models.

How does the number of tokens impact the selection of the model?

First, let's see which different models are available in OpenAI. In this blog, I am explicitly focusing on OpenAI models. we can use hugging faces and cohere AI models but I will write about it in the next blog.

Let's understand basic models first.

Models

GPT is so powerful because it is trained on a massive dataset. However, more power comes at a cost, so OpenAI provides multiple models to choose from. The available models, also known as engines, are named.

Davinci

Babbage

Curie

Ada

Davinci is the largest and most capable engine. It can perform any task that any other engine can perform. Babbage is the next most capable engine. It can do anything that Curie or Ada can do. Ada is the least capable engine, but it is the best-performing and lowest-cost engine.

there are various generations of these models as GPT is getting advanced. there are approximately 50 + models present in the GPT family.

Screenshot by the author from the official model page of openAI

So there are different models for different purposes like generating and editing images, working on audio, and coding. We want models which will help us to do text processing accurately and the model which will work on natural language. In the above image, there are 3 models which we can work with.

GPT 3

GPT 3.5

GPT 4

In all above, we cant use GPT4 directly because GPT-4 is currently in a limited beta and only accessible to those who have been granted access. We have to join a waitlist and wait for our term. we can't do that. So we now have 2 options left. GPT3 and GPT 3.5.

Screenshot by the author from the official model page of openAI

The above images show models available for GPT3 and GPT 3.5. You can observe that all are some sort of generations of basic Davinci, Babbage, Curie, and Ada models.

If you see the above charts then it has one column named Max tokens.
MAX TOKENS is a parameter in OpenAI models that limits the number of tokens that can be generated in a single request. MAX TOKENS limit includes both the prompt and the completion. This means that if your prompt is 1,000 tokens long, you can only generate up to 3,000 tokens of completion text. Second, the MAX TOKENS limit is enforced by OpenAI’s servers. If you try to generate more text than the limit, your request will be rejected.

GPT 3-based models have a lower number [2049]of Max tokens. whereas GPT 3.5-based models have a higher[4096] number of Max tokens. So you can process more amount of data with the help of GPT 3.5 models.

Let's see the pricing of different models

We can take GPT 3.5 based“ gpt-3.5-turbo” model

So if I have 5000 words, and if I am using “gpt-3.5-turbo” model then

5000 words ~= 6667 tokens

now for 1000 tokens, we require 0.002$

so for 6667 tokens, we will require ~=0.0133$

roughly, we can calculate how much usage we will require to process. Also, the number of epochs is one parameter that will change this token count so you have to consider that for calculation.

Now you can relate how important tokens are… so that's why we have to do a very neat and proper pre-processing such that it will reduce the noise from the document and also reduce our cost to process tokens. So it is important to clean text properly, like eliminating noise. Even removing extra blank spaces will save your price for the API key.

Let's see all models in one memory chart.

Image by author

Conclusive words

The tokens will contribute much to any sort of question answering or any other LLM-related task. How to pre-process your data in a way that allows you to use much cheaper models is the real game-changer. The selection of a model is dependent on what trade-off you want. Davinci generations will give you higher speed with accuracy but it will cost much. whereas GPT 3.5 turbo-based generation will save you money but it will be quite slow.

If you have found this article insightful

It is a proven fact that “Generosity makes you a happier person”; therefore, Give claps to the article if you liked it. If you found this article insightful, follow me on Linkedin and medium. You can also subscribe to get notified when I publish articles. Let’s create a community! Thanks for your support!

Also, medium doesn’t give me anything for writing, if you want to support me then you can click here to buy me coffee.

You can read my blogs related to,

Understanding LangChain U+1F99C️U+1F517: PART 1

Theoretical understanding of chains, prompts, and other important modules in Langchain

pub.towardsai.net

Mastering Large Language Models: PART 1

A basic introduction to large language models and their emergence

medium.com

Traditional object detection will take over by zero-shot learning?

Comparison of zero-shot learning and traditional object detection models

medium.com

Understanding Hyper-parameter-tuning of YOLO’s

Different hyper-parameters and their importance in model building

pub.towardsai.net

Signing off,

Chinmay

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓