Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

LangChain: Develop LLM-powered applications using LangChain, Hugging Face, and Facebook AI Similarity Search
Data Science   Latest   Machine Learning

LangChain: Develop LLM-powered applications using LangChain, Hugging Face, and Facebook AI Similarity Search

Last Updated on January 10, 2024 by Editorial Team

Author(s): Charu Makhijani

Originally published on Towards AI.

Image by Author

Generative AI is the latest wave in the tech industry. Generative AI Applications like chatbots, text generation, image generation, and text-to-video are booming like never before. In the last year, many startups have emerged, building tools and applications using Generative AI. One field of Generative AI that took the most attention is Large Language Models, transforming the way we interact with text-based data.

Large Language Models

Although LLMs have been there since early 2018 but they became popular only after the release of ChatGPT in December 2022. This interest was refueled with the release of GPT4 in March 2023. Since then, many companies, like Google, Meta, Microsoft, Cohere, and Anthropic, have released their LLMs.

A Large Language Model is an AI algorithm that has been trained on a massive amount of text data applying deep learning techniques with lots of parameters to process the text.

The most common use cases of Large Language Models are- Text Generation, Text Summarization, Q&A bots, Coding Assistants, Text to Image Generation, Data Extraction, and Conversational AI. Training the foundational models for LLM is hard and requires time, money, and resources. But thankfully, due to some large corporations, we have some pre-trained models to use. Like- GPT-3, GPT-4 from OpenAI, LLaMA by Meta, and PaLM2 by Google.

Orchestration Framework

Building LLMs using these foundational models is a complex task and that’s where the orchestration framework plays a role.

Orchestration frameworks provide an easy and effective way to build and deploy LLM-based applications.

Dust, LangChain, LlamaIndex, Retune, Orkes, and Steamship are famous orchestration frameworks.

Orchestration Frameworks simplify the development of Large Language models by providing a high-level interface for complex LLMs. They provide an abstraction layer and hide the complexities of prompt generation and resource management so that developers can focus on building the core functionalities. Hence increasing the overall productivity and performance of the application.

LangChain

LangChain is a Python library with rich set of features that simplify the development and experiment of applications powered by large language models.

LangChain is an open-source project by Harrison Chase. It offers a variety of tools & APIs to integrate the power of LLM into your applications. LangChain provides a framework for connecting LLM to external data sources like PDF files, Internet, and Private Data Sources. By using other sources of data, LLMs can now have access to new data along with the data on which they were trained. Hence, create LLM-powered applications that are both data-aware and agentic.

Data-Aware as LLM is now connected with external data source and can now fetch the personalized information to generate more accurate responses instead of generic answers.

Agentic as base model of LLMs is trained on large human data and patterns that is useful to produce meaningful text and interact effectively.

LangChain Setup

Install LangChain

The first step to start the LangChain journey is installing the library. Install LangChain using pip or conda. Simply run the command below in the terminal:

or

Before installing LangChain, just make sure you have Python installed with version ≥ 3.8.

Install LLM Provider’s Library

The next step is to install the library of the LLM provider you chose to use. You can use open-source models from Hugging Face or Stability AI, or closed-source models from OpenAI or Anthropic. Closed-source models are proprietary to the companies building them, so you have to pay for using them. In this post, we are using an open-source model from Hugging Face.

Use this command to install the Hugging Face library-

API Key

After the LLM library, you need an API key to access the model API. You can get the API key from the LLM provider's website. For the Hugging Face API Key, go to https://huggingface.co/ and create an account. Then go to Settings -> Access Token and create a new Token.

Image by Author

Import library & API Key

After installing the LLM library and getting the API key, the next step is to import them into your project. This will be the first step to start creating your project using LLMs. Use the commands below to set the API key for your project-

Image by Author

There are a few other libraries you might need to install, like transformers, Text, PDF, Image, or Video tool libraries (depending on your project need) and vector databases. I’ll share those as and when we will need them throughout the project so that you understand the specific purpose of each library.

If you have reached this point, you have done the basic setup for your LangChain project journey. Good job!

LangChain Modules

LangChain has a set of modules to build the pipeline to integrate with foundational LLM models, vector stores, external data sources, data loaders, prompt templates, and other tool libraries through agents. Each of these modules has a specific purpose and can be used as a standalone module or integrated with other modules.

As of writing this post, LangChain has 6 different modules

  1. Model I/O: To interact with large language models.
  2. Retrieval: To retrieve data from multiple data sources.
  3. Agents: To choose the sequence of actions to take.
  4. Chains: To provide compositions to interact with other LLMs.
  5. Memory: To store information about past interactions.
  6. Callbacks: To maintain state across multiple application stages.
Image by Author

Now, let’s dive into each module with an example.

Model I/O

The model is the core component of any LLM-powered application. The Model I/O module of LangChain provides the interface to communicate with the language models. It is built of 3 key components:

  1. LLM or Chat Models:

a. LLMs- LLMs are text-based models that take text as input (or prompt) and provide text as output (response).

b. Chat Models- Chat Models are very similar to LLMs, but instead of a text message, they take a list of chat messages as input and provide a chat message as output.

LangChain does not have any LLMs or Chat Models, but it provides an abstraction layer for using language models from OpenAI, Anthropic, Cohere, and Hugging Face.

For example, I am using a LLM model from Hugging Face in the code below.

# open-source LLM from Hugging Face
llm=HuggingFaceHub(repo_id="google/flan-t5-large")
llm_out=llm("Which is most expensive city in the world?")
print(llm_out)

2. Prompts: a set of instructions provided as input to the model. Creating a prompt to get the desired output is an important step in your LLM-powered application. There are multiple ways to ask the same question, which can lead to different responses.

The process of tweaking the prompt to get the relevant answer is called Prompt Engineering

LangChain provides multiple classes and functions to create prompts. The most popular and widely used class is PromptTemplate, which provides a template to create string prompts, using multiple components and attributes.

For example, I am using a PromptTemplate to ask specific questions using the custom template. Templates are extremely helpful when you don’t want to write the whole Prompt again and want to reuse the specific patterns for prompting in your application.

from langchain import PromptTemplate

# Write a query template
template = "Which is most {input} city in the world?"

# Create a prompt template
prompt = PromptTemplate(template=template, input_variables=['input'], )

# Format the prompt
_input=prompt.format(input="expensive")

# Generate the output
output = llm(_input)

# The response
print(output)

3. Output Parsers: Output Parsers are classes to extract and format the information from the model. Basically, they structure the response of the model in a required format rather than just a plain text response.

LangChain provides different kinds of parsers to format the model output, like- DatetimeOutputParser, EnumOutputParser, PydanticOutputParser, PandasDataFrameOutputParser, CommaSeparatedListOutputParser, StructuredOutputParser, and XMLOutputParser.

In the example below, I am using a CommaSeparatedListOutputParser to output a list of strings separated by a comma.

from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate

# Initialize the CommaSeparatedListOutputParser
output_parser = CommaSeparatedListOutputParser()

# Create format instructions
format_instructions = output_parser.get_format_instructions()

# Create a prompt
prompt = PromptTemplate(
template="List three {subject}.\n{format_instructions}",
input_variables=["subject"],
partial_variables={"format_instructions": format_instructions}
)

# Create a query for prompt template
query = "Planets in the Universe"

# Generate the output
output = llm(prompt.format(subject=query))

# Parse the output using the parser
parsed_result = output_parser.parse(output)

# The result is a list of items
print(parsed_result)

Retrieval

LLMs are trained on vast amounts of data, but there is no user-specific or application-specific data that is useful for making custom LLMs. The Retrieval module helps in loading, transforming, combining, storing, and retrieving external data with LLM information.

The process of extracting the external data and combining it with LLM in the generation step is called Retrieval Augmented Generation (RAG).

Image by Author

LangChain provides the building blocks to load, transform, store, and fetch the data through 5 key modules.

  1. Document Loaders

Document Loader loads data from external sources. LangChain has 100s of document loaders to support the integration from different sources like- PDF, Text, CSV, JSON, HTML, and Code Base.

For Retrieval I will use a PDF file (eBook of Hands-On Machine Learning with Scikit-Learn & TensorFlow) for the 6 steps- load, transform, embed, store, retrieve, and index. We’ll use the PDF information and combine it with the Hugging Face LLM model and then query it.

Install these libraries-

pip install pypdf

from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("MachineLearning.pdf")
transcript = loader.load_and_split()

2. Document Transformers

Document Transformers are useful for getting only the relevant information from documents. For this, the documents are split into smaller chunks to prepare for embedding models.

LangChain has multiple document transformers to help split, combine, and filter the documents, like HTMLHeaderTextSplitter, MarkdownHeaderTextSplitter, TokenTextSplitter, CharacterTextSplitter, and RecursiveCharacterTextSplitter.

In this example, I am using RecursiveCharacterTextSplitter to split the document (PDF file).

Install these libraries-

pip install transformers

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=150)
docs = text_splitter.split_documents(transcript)

3. Text Embedding Models

Text Embedding Models transform the unstructured text into vector embeddings. These embeddings allow semantic search (finding similar text) for any LLM application.

LangChain provides an interface to integrate with embedding models from more than 25 different providers like OpenAI, Hugging Face, and Cohere.

In this example, I am using the embedding model from Hugging Face.

Install these libraries-

pip install chainlit
pip install sentence-transformers

from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name = "sentence-transformers/all-MiniLM-L6-v2")

text_embeddings = embeddings.embed_query("Which is most expensive city in the world?")
print(text_embeddings)

4. Vector Stores

When the size of embeddings grows, there is a requirement to store these embeddings. And that’s where Vector Store comes into the picture- to store and search over vector embeddings.

LangChain provides integration with more than 50 vector store providers like Chroma, Elastic Search, Pinecone, LanceDB, Hippo, and FAISS.

In this example, I am using the FAISS (Facebook AI Similarity Search) to store vector embeddings from the previous step, and then query using similarity search.

Install these libraries-

pip install faiss-cpu

from langchain.vectorstores import FAISS

db = FAISS.from_documents(docs, embeddings)

query = "Why Use Machine Learning?"
docs = db.similarity_search(query)
print(docs[0].page_content)

5. Retrievers

Once you have loaded, transformed, and stored the documents you need to query it. Retrievers help you query the vector store and retrieve useful information.

LangChain provides an interface to query using different retrievers, like MultiQueryRetriever, MultiVectorRetriever, WebResearchRetriever, RetrievalQA, RetrievalQAWithSourcesChain, EnsembleRetriever, ParentDocumentRetriever, and SelfQueryRetriever.

In this example, I am using RetrievalQA to query from the vector store.

Install these libraries-

pip install SQLAlchemy
pip install flask-sqlalchemy
pip install loguru
pip install unstructured
pip install pdf2image
pip install pdfminer.six
pip install opencv-python
pip install pytesseract
pip install unstructured_pytesseract
pip install unstructured_inference
pip install google-api-python-client>=2.100.0

from langchain.chains import RetrievalQA
retriever = db.as_retriever()
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True,
#verbose=True
)

query = "What is Machine Learning?"
result = qa({"query": query})
print(result['result'])

Another example is where I am using WebResearchRetriever. So instead of the document (pdf file in this case), I am going to Google to get the response to my query.

from langchain.retrievers.web_research import WebResearchRetriever
from langchain.utilities import GoogleSearchAPIWrapper
from langchain.chains import RetrievalQAWithSourcesChain

os.environ["GOOGLE_CSE_ID"] = "xxx"
os.environ["GOOGLE_API_KEY"] = "xxx"
search = GoogleSearchAPIWrapper()

web_research_retriever = WebResearchRetriever.from_llm(vectorstore=db, llm=llm, search=search)
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=web_research_retriever,
#return_source_documents=True,
#verbose=True
)
result = qa_chain({"question": query})
print(result)

6. Indexing

So, up to this point, we have done load, transform, embed, store, and retrieve. And you have a basic API where you can load any documents and query from them. Great!

But now, this is the time to work on performance. And that’s when we need Indexing. Indexing helps to keep your vector store documents in sync. You would save your vector store from rewriting and maintaining multiple copies of your document, as well as recomputing the embeddings. Indexing is most helpful when there are frequent changes in your documents. With indexing, you don’t have to load and compute everything again; it will only copy new data, and your documents will always be in sync with the vector store. It will save a lot of time, resources, and money.

In this example, I am using VectorstoreIndexCreator API from LangChain. I am using a PDF file to load into the vector store using HuggingFaceEmbeddings, and then I am querying the indexed vector store.

from langchain.document_loaders import UnstructuredPDFLoader
from langchain.indexes import VectorstoreIndexCreator

loader1 = [UnstructuredPDFLoader("MLInterview.pdf") for fn in os.listdir("/")]
llm=HuggingFaceHub(repo_id="google/flan-t5-large")

from langchain.text_splitter import CharacterTextSplitter
index = VectorstoreIndexCreator(
embedding=HuggingFaceEmbeddings(),
text_splitter=CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)).from_loaders(loader1)

results = index.query("What is Machine Learning?", llm=llm)
print(results)

Agents

LLMs are most valuable when you have to work on text. They have a lot of textual information they are trained on. But when it comes to specific information, maths, or reasoning, LLMs don’t perform that well. For example, look at this conversation with ChatGPT.

Image by Author

Due to the large amount of data used for training, many times, LLMs hallucinate and provide incorrect information. And that’s when Agents come into the picture. The idea behind Agents is to dynamically set a chain of actions to perform. So, instead of doing tasks on their own, you provide them the sequence of steps and tools to use.

LangChain has multiple built-in tools, like Python REPL (for doing calculations), Google Search, Bing Search, Google Drive, Wikipedia, Yahoo Finance, and YouTube.

For example, I am using the Wikipedia tool in this example.

Install these libraries-

pip install wikipedia
pip install youtube_search
pip install duckduckgo-search

from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

tools = load_tools(["wikipedia"], llm=llm)
agent = initialize_agent(tools,
llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
max_iterations=1,
handle_parsing_errors=True)


agent.run("Who is Barak Obama?")

Another example is where I am using YouTube Search and DuckDuckGo Search.

from langchain.tools import YouTubeSearchTool
tool = YouTubeSearchTool()

tool.run(
"How to choose a Niche"
)
from langchain.tools import DuckDuckGoSearchRun
search = DuckDuckGoSearchRun()
search.run("Tell me about Abraham Lincoln")

You can chain agents and also use multiple tools and let agents decide on which tool to use. Now, let’s learn about Chaining in LangChain to streamline LLM workflow to accomplish certain tasks.

Chains

When you are building a simple application using LLM, for example- to query a few documents using LLM, then the above 3 modules are enough for you. But if you want to create more complex applications where you have to use multiple LLMs along with other components or where you have to use the response from one LLM in a prompt for another LLM, then you need Chains.

LangChain provides 2 frameworks for Chaining:

  1. LangChain Expression Language
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
("system", "You're a finance person who likes to roam around the world, and very well versed about the world economy."),
("human", "{question}")
])
runnable = prompt U+007C llm

for chunk in runnable.stream({"question": "Which is most expensive city in the world?"}):
print(chunk, end="", flush=True)

2. Chain interface

from langchain import LLMChain

chain = LLMChain(llm = llm, prompt = prompt)

print(chain.run(question="Which is most expensive city in the world?"))

Both of them will produce the same response, “New York City”. The difference is in the formatting. LangChain Expression Language is more readable, and the Chain interface provides multiple built-in chains for streaming, batch, parallel processing, and tracing.

In simple words, Chains are pipelines that take input as a Prompt/Prompt Template and run it through the LLMs. You can chain the output of LLM to other LLMs and also include an output parser for formatting the response or memory object to persist the response through multiple chains. Let’s dive more with an example in the next module of Memory.

Memory

When you are using LLMs to create conversational apps like Chatbots, then the past interactions need to be captured. But sadly LLMs do not have long-term memory. So what to do? Create Memory. Yes, create a memory by storing previous interactions and referring them to respond to new queries.

LangChain provides different utilities for creating memory. You can create a memory to refer to the most recent conversation, or past k conversations or keep all conversations and refer them when referenced in a query. LangChain provides multiple types of Memory, like ConversationBufferMemory, ConversationBufferWindowMemory, Conversation Knowledge Graph Memory, ConversationEntityMemory, ConversationSummaryMemory, ConversationSummaryBufferMemory, ConversationTokenBufferMemory, and VectorStoreRetrieverMemory.

For example, I create multiple prompts and chain them using ConversationChain to retain the previous prompts.

from langchain.chains import ConversationChain

conversation = ConversationChain(llm=llm, verbose=True)

conversation.predict(input="I have a cat.")

conversation.predict(input="My mom gave me a cat and a dog.")

conversation.predict(input="Now how many pets I have?")

Look at how the final response is generated by referring to previous prompts and their response.

Image by Author

Another example is using ConversationBufferMemory to store messages in a buffer to refer to them later in the query.

Image by Author

Callbacks

The above 5 modules help you create your robust applications using LLMs. But creating an app is not sufficient, you have to maintain the application as well. And that’s when Callbacks come into the picture. Callbacks are useful to do logging and monitoring of your application.

LangChain provides a CallbackHandler interface to subscribe to different events. When the event is triggered, it will call the interface methods. LangChain has multiple Callback Handlers like AsyncCallbackHandler, BaseCallbackHandler, ConsoleCallbackHandler, FileCallbackHandler, and StdOutCallbackHandler.

For example, here I am using the FileCallbackHandler to log the output from the LLM chain to an output file.

from langchain.callbacks import FileCallbackHandler
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from loguru import logger

logfile = "output.log"

logger.add(logfile, colorize=True, enqueue=True)
handler = FileCallbackHandler(logfile)

prompt = PromptTemplate.from_template("Conversation:: + {message} = ")
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler], verbose=True)

msg = chain.run(message="What is the capital of United States?")
logger.info("Answer:: " + msg)

So that’s the end of the 6 modules of LangChain. But LangChain is much more powerful than this. It is a package that provides an interface to many large language models. Currently, LangChain has more than 72K stars on Github, with 3700 closed and 1600 open issues. So the development is going very fast. Within a year, only many new components and a few modules were introduced.

In my next post, I will discuss how to deploy the LLM using LangServe and also create the LLM with the OpenAI API. Till then, enjoy building your free LLM-powered apps using Hugging Face models.

To access the complete source code of LLM modules from this article, please refer to the GitHub link.

Thank you for reading until the end. Before you go:

Please like, share, and follow for more such content. Subscribe to receive an email whenever I publish a new article.

As always, please reach out for any questions/comments/feedback.

Github: https://github.com/charumakhijani
LinkedIn:
https://www.linkedin.com/in/charumakhijani/

References-

https://blog.langchain.dev/
https://python.langchain.com/docs/modules/
https://github.com/langchain-ai/langchain

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓