Building AI for Production

Resources & Links

This page is a comprehensive compilation of all the links and resources in the book “Building AI for Production: Enhancing LLM Abilities and Reliability with Fine-Tuning and RAG”. Here, you’ll find a collection of code notebooks, checkpoints, GitHub repositories, learning resources, and all other materials shared throughout the book. It is organized chapter-wise and presented in chronological order for easy access.

If you see discrepancies between the code in the book and the code in colab, or want to improve the colabs with new updates, please feel free to create a pull request in the GitHub.

Key Updates from Edition 1 to Edition 2

Chapter	Update Description
Chapter 1	Expanded to include the latest benchmarks, such as LMSYS, and a run-through of recent models like GPT-4 and Gemini 1.5 and technniques like Infinite Attention.
Chapter 2	Extended industry applications of LLMs in sectors like media, education, finance, and medicine, with a deeper dive into pecific use cases in each industry, such as autocompletion, code prediction, and debugging in technology and software.
Chapter 3	Minor restructuring to improve logical flow and progressive understanding of LLM challenges and solutions.
Chapter 4	A new section on prompt injection introduces this emerging security challenge, detailing its types, impact on reliability, with solutions such as guardrails and safeguards to protect LLM integrity.
Chapter 5: RAG (Previously, Introduction to LangChain and LlamaIndex)	Includes a step-by-step guide to building a basic Retrieval-Augmented Generation (RAG) pipeline from scratch, covering essentials like embeddings, cosine similarity, and vector stores. This foundation equips you to apply modern frameworks like LlamaIndex and LangChain more efficiently or go on your own with custom implementations and prepares you for their evolution better.
Chapter 6: Introduction to LangChain & LlamaIndex (Previously Prompting with LangChain)	Introduces foundational elements of LangChain as part of a complete system, providing a comprehensive understanding of how each component functions within a broader context. This structured overview acts as a roadmap, enabling a clearer grasp of RAG pipelines in the upcoming chapters.
Chapter 7: Prompting with LangChain (Previously RAG)	Includes LangChain Chains, previously part of the RAG chapter, for clarity.
Chapter 8: Indexes, Retrievers, and Data preparation (New Chapter)	Indexes, Retrievers, and Data Preparation are essential components of a RAG pipeline. While these concepts were introduced in the first edition, this updated edition includes a dedicated chapter that focuses on their foundational principles. This approach ensures that readers can effectively scale LLM applications, optimize performance, and enhance response quality. Additionally, by emphasizing the fundamentals, this edition allows readers to understand and implement RAG concepts independently, without relying exclusively on frameworks like LangChain.
Chapter 9: Advanced RAG	Only structural updates
Chapter 10: Agents	Only structural updates
Chapter 11: Fine-tuning	Only structural updates
Chapter 12: Deployment and Optimization	The updated version takes a deeper dive into essential techniques for LLM deployment and optimization, making it more practical and relevant for current AI development needs. For example, the book explores model distillation, a powerful technique to reduce inference costs and improve latency, with a detailed case study on Google’s Gemma 2, demonstrating its real-world impact. With open-source LLMs growing in popularity, this edition also covers the deployment of LLMs on various cloud platforms, including Together AI, Groq, Fireworks AI, and Replicate. This broader approach helps readers find cost-effective and scalable solutions for real-world applications.

A Note on Library and Model Versioning

LLMs are advancing rapidly, but the core skills and tools covered in this book—like fine-tuning, prompt engineering, and retrieval-augmented generation— will remain essential for adapting next-generation models to specific data, workflows, and industries. These principles will stay relevant across models, even as some specific libraries evolve.

For seamless code execution, we’ve included a requirements file for library versions. If you’re running notebooks on Google Colab, be aware that libraries like “pytorch” and “Transformers” are pre-installed. Should compatibility issues arise, try uninstalling these libraries in Colab and reinstalling the specified versions from the requirements file.

Switching to newer LLMs is straightforward. For instance, with OpenAI models, you can update the model simply by changing its name in the code. We recommend using GPT-4o-mini over “GPT-3.5 Turbo” in the book examples. Regularly checking documentation for Langchain, LlamaIndex, and OpenAI is also encouraged to stay aligned with updates and best practices.

This approach ensures your skills remain applicable in the dynamic LLM field.

Introduction

No Notebooks.

Book Library Requirements

Requirements

Resources

Python & Other Technical Notes: Our guide to starting in AI (Python, Math, and more resources. All free)
Towards AI Open Source AI chatbot: AI Tutor
Discord Community: Learn AI Together
Coding Environment and Packages: Visual Studio Code

Chapter I: Introduction to LLMs

No Notebooks.

Research Papers

Attention Is All You Need (Section: Transformers)
Training Compute-Optimal Large Language Models. (Section: Scaling Laws)
Emergent Abilities of Large Language Models (Section: What are Emergent Abilities)
Evaluation Benchmarks for Emergent Abilities: Massive Multi-task Language Understanding MMLU | Word in Context
Optimization Techniques to Expand the Context Window: ALiBi Positional Encoding | Sparse Attention | FlashAttention | Multi-Query Attention (MQA)
FlashAttention-2
LONGNET: Scaling Transformers to 1,000,000,000 Tokens.
A Survey of Large Language Models (Section: A Timeline of the Most Popular LLMs)
Evaluation Benchmarks for Emergent Abilities GitHub Repo: BIG-Bench suite | TruthfulQA |

Chapter II: LLM Architectures & Landscape

Notebook

Understanding Transformer (Section: The Architecture in Action)
Transformer Architecture (Section: Transformer Model’s Design Choices)

Resources & Additional Links

Tutorial: Let’s build GPT: from scratch, in code, spelled out
Demo Environment: IDEFICS Playground
GitHub Repo: minGPT – A PyTorch re-implementation of GPT, both training and inference
Open AI Blog Post: InstructGPT
LLM Leaderboard: Chatbot Arena
LLaVA – An Instruction-tuned LMM: Vicuna
Open Flamingo
Beyond Vision and Language (Models): PandaGPT | ImageBind | SpeechGPT | NExT-GPT
Proprietary and Open-Source LLMs: Cohere LLMs | Open AI GPT 3.5 | Anthropic’s Claude Models | Google Deepmind’s Gemini | Meta’s Llama Models | Falcon | Dolly | Open Assistant | Mistral LLMs
Research Paper: “Vision Transformers: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale”
Research Paper: “Multimodal Foundation Models: From Specialists to General-Purpose Assistants”
Research Paper: Flamingo: a Visual Language Model for Few-Shot Learning

Chapter III: LLM Landscape

No Notebooks.

Research Papers: Evaluating LLM Performance (Benchmarks)

Chapter IV: Introduction to Prompting

Notebook

No Notebooks.

Resources

Chapter V: Retrieval-Augmented Generation

Notebook

Building a Basic RAG Pipeline from Scratch (Section: Building a Basic RAG Pipeline from Scratch)

Resources

Research Paper: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Blog: Retrieval Augmented Generation (RAG)
Deep Lake Vector Store

Chapter VI: Introduction to Langchain & LlamaIndex

Notebook

Building Application Powered by LLMs with LangChain (Section: Tutorial 1: Building LLM-Powered Applications with LangChain)
Build a News Articles Summarizer (Section: Tutorial 2: Building a News Articles Summarizer)
LlamaIndex Introduction (Section: LlamaIndex Introduction)

Resources

LangChain Documentation
Useful LangChain Components: | Prompts | Output Parsers | Retrievers | Document Loaders | Text Splitters | Indexes | Embeddings models | Vector Stores | Agents | Chain | Tool | Memory | Callbacks
Useful LangChain Agents: Zero-shot ReAct | Structured Input ReAct | OpenAI Functions Agent | Self-Ask with Search Agent | ReAct Document Store Agent | Plan-and-execute agents
The output for the Building LLM-Powered Applications with LangChain section is based on The One Page Linux Manual: A summary of useful Linux commands
Building applications with LLMs through composability
A Complete Guide to LangChain: Building Powerful Applications with LLMs
LlamaIndex Index Guide
LlamaIndex: How to use Index correctly
Defining a Custom Query Engine
Working Example of Implementing Routers
LlamaIndex documentation
Financial Document Analysis with LlamaIndex
LlamaIndex: Adding Personal Data to LLMs
Llamahub

Chapter VII: Prompting with LangChain

Notebook

Using Prompt Templates (Section: What are LangChain Prompt Templates)
Getting the Best of Few Shot Prompts & Example Selectors (Section: Few Shot Prompts and Example Selectors)
Chains and Why They Are Used Notebook (Section: What are LangChain Chains)
Managing Outputs with Output Parsers (Section: Tutorial 1: Managing Outputs with Output Parsers)
Improving Our News Articles Summarizer (Section: Tutorial 2: Improving Our News Articles Summarizer)
Creating Knowledge Graphs from Textual Data Unveiling Hidden Connections (Section: Tutorial 3: Creating Knowledge Graphs from Textual Data: Finding Hidden Connections)

Resources

Building a Knowledge Base from Texts: a Full Practical Example
GitHub: langchain
Knowledge Graph Visualization: NetworkX library
Knowledge Graph Visualization: Pyvis library

Chapter VIII: Indexes, Retrievers, and Data Preparation

Notebook

What are Text Splitters and Why They are Useful (Section: Text Splitters)
Create a YouTube Video Summarizer Using Whisper and LangChain (Section: Tutorial 2: A YouTube Video Summarizer Using Whisper and LangChain)
Guarding Against Undesirable Outputs With the Self-Critique Chain (Section: Tutorial 4: Preventing Undesirable Outputs with the Self-Critique Chain)
Guarding Against Undesirable Outputs With the Self-Critique Chain Example (Section: Tutorial 5: Preventing Undesirable Outputs from a Customer Service Chatbot)

Book File

Book File Requirements
Sample PDF Used for Customizing Text Splitters Example: The One Page Linux Manual

Tokens and APIs & Packages

Resources

Tutorial: How to install ffmpeg
Documentation: Text wrapping and filling
Code: idontcalculate/langchain
GitHub Repo: JarvisBase
LangChain Docs: Split by character | Split code | Recursively split by character | Summarization
Open AI Whisper
Acitveloop Docs: Deep Lake Vector Store in LangChain
Documentation: Self-critique chain with constitutional AI

Chapter IX: Advanced RAG

Notebook

Masterting Advanced RAG (Section: Using a Query Engine to Answer Queries)
RAG Metrics & Evaluations (Section: RAG Metrics & Evaluation)
LangSmith Introduction (Section: LangChain LangSmith and LangChain Hub)

Resources

Source Document for the Query Engine Example: Text Data for the example (Section: Query Engine Example)
Tutorial: Building an Advanced Fusion Retriever from Scratch
LangChain Docs: Query construction
Cohere Reranking
Tutorial: Hands-on Tutorial for Implementing Small-to-Big Retrieval
Colab Notebook: Cohere Rerank Endpoint
Blog: Complex Query Resolution through LlamaIndex Utilizing Recursive Retrieval, Document Agents, and Sub Question Query Decomposition
Blog: Improving Retrieval Performance by Fine-tuning Cohere Reranker with LlamaIndex
LlamaIndex Notebook
LllamaIndex Docs (Section: The Role of the Retrieval Step): Retriever Query Engine with Custom Retrievers – Simple Hybrid Search | Metadata Filtering | Cohere Rerank | Document Summary Index
LllamaIndex Docs (Section: RAG Metrics): Correctness | Faithfulness | Context Relevancy | Guideline Adherence | Embedding Semantic Similari | Finetuning Embeddings | Evaluating – LlamaIndex | Retrieval Evaluation | Golden Dataset | Response Evaluation | Recursive Retriever + Query Engine Demo | Query Engine | Chat Engine
Creating the Dataset
openai-cookbook
RAGAS GitHub repository
RagEvaluatorPack Downloading a LlamaDataset from LlamaHub
LangSmith
Hub-examples: LangSmith cookbook
The Art of LangSmith
LangServe Github Repository

Chapter X: Agents

Notebook

Using AutoGPT with LangChain (Section: Using AutoGPT with LangChain)
Using AutoGPT with LangChain Output (Section: Using AutoGPT with LangChain)
Building Autonomous Agents to Create Analysis Reports (Section: Tutorial 1: Building Agents for Analysis Report Creation)
Query and Summarize a DB with LlamaIndex (Section: Tutorial 2: Query and Summarize a DB with LlamaIndex)
Building Agents with OpenAI Assistants (Section: Tutorial 3: Building Agents with OpenAI Assistants)
Multimodal Finance + Deep Memory (Section: Tutorial 5: Multimodal Financial Document Analysis from PDFs)

Dataset

Dataset for the Multimodal Financial Document Analysis Example: Tesla Q3 Financial Report
Preprocessed Text/Label for the Multimodal Financial Document Analysis
Preprocessed Graphs for the Multimodal Financial Document Analysis

Resources

GitHub: Babyagi Inspired Projects
GitHub: Agent Simulations
GitHub: CAMEL Role-Playing Autonomous Cooperative Agents
GitHub; LlamaHub
LlamaIndex Docs: Data Agents | OpenAI Agent with Query Engine Tools | Multi-Document Agents
Blog: OpenAI Assistants API: Walk-through and Coding a Research Assistant
GitHub: HuggingFace Inference Community
Colab Notebook: Assistants API
OpenAI Docs: OpenAI Knowledge Retrieval
Blog: Function Calling OpenAI
GitHub: LangChain OpenGPTs
Blog: Maximizing LangChain Efficiency: Agents and ReAct Method Review
LangChain Docs: Defining Custom Tools
Tutorial: Installing Poppler on Windows
Tutorial: Installing Tesseract on Windows
Website: AutoGPT
Website: BabyAGI
Research: On AutoGPT – LessWrong
Website: CAMEL
Research Paper: The CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society paper
Research Paper: Generative Agents: Interactive Simulacra of Human Behavior
Website: OpenGPTs

Chapter XI: Fine-Tuning

Notebook

FineTuning a LLM Lima CPU (Section: Tutorial 1: SFT with LoRA)
FineTuning a LLM Financial Sentiment CPU (Section: Tutorial 2: Using SFT and LoRA for Financial Sentiment)
Create a Dataset For Cohere Fine-Tuning (Section: Tutorial 3: Fine-Tuning a Cohere LLM with Medical Data)
Fine-Tuning Using Cohere for Medical Data (Section: Tutorial 3: Fine-Tuning a Cohere LLM with Medical Data)
Finetuning a LLM QloRA (Section: Tutorial 4/Supervised Fine-Tuning Notebook)
Finetuning a Reward Model (Section: Tutorial 4/Training a Reward Model Notebook)
Finetune RLHF (Section: Tutorial 4/RLHF)

Book Model Checkpoints, Requirements, Datasets, W&B Reports

OPT fine-tuned LIMA checkpoint on CPU (Section: Practical Example: SFT with LoRA)
OPT Fine-tuned finGPT with CPU (Section: Using SFT for Financial Sentiment)
The Merged Model Checkpoint (2GB) (Section: Supervised Fine-Tuning Notebook)
Requirements (Section: Supervised Fine-Tuning Notebook)
The Reward Model Checkpoint (Step 1000 – 2GB) (Section: Training a Reward Model Notebook)
Requirements (Section: Training a Reward Model Notebook)
The Merged RL Model Checkpoint (2GB) (Section: RLHF)
Requirements (Section: RLHF)
BC5CDR Dataset in JSON format (Section: Fine-Tuning a Cohere LLM with Medical Data)
Preprocessed Dataset (Section: Fine-Tuning a Cohere LLM with Medical Data)
Complete Dataset (Section: Supervised Fine-Tuning Notebook)
OpenOrca Dataset (Section: Supervised Fine-Tuning Notebook & Section: RLHF)
“helpfulness/harmless”: (hh) by Anthropic (Section: Training a Reward Model Notebook)
OPT Fine-tuned LIMA CPU (Section: Practical Example: SFT with LoRA)
Weights & Bias Report (Section: Supervised Fine-Tuning Notebook)
Weights & Bias Report (Section: Training a Reward Model Notebook)
Weights and Biases report (Section: RLHF)

Resources

Research Paper: Low-Rank Adaptation (LoRA)
Research Paper: QLoRA: An Efficient Variant of LoRA
Open-source Resources for LoRA: PEFT Library | Lit-GPT
Cohere Docs: Fine-tuning an Embedding Model for Classification
Research Paper: Reinforcement Learning from Human Feedback
Research Paper: LIMA: Less Is More for Alignment
Research Paper: Direct Preference Optimization (DPO)
Research Paper: Google DeepMind’s Reinforced Self-Training (ReST)
Research Paper: Reinforcement Learning from AI Feedback (RLAIF)

Chapter XII: Deployment

Notebook

Benchmark Inference (Section: Tutorial: Deploying a Quantized LLM on a CPU on Google Cloud Platform (GCP))

Resources

Research Paper: Model Compression
Research Paper: Distilling the Knowledge in a Neural Network
Research Paper: A Survey of Quantization Methods for Efficient Neural Network Inference
Research Paper: Sparsity in Deep Learning
GitHub: Hugging Face Optimum
GitHub: Intel Neural Compressor
Research Paper: LLM.int8(): 8bit Matrix Multiplication for Transformers at Scale
Research Paper: GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Research Paper: AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Research Paper: Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures
Reasearch Paper: A Simple and Effective Pruning Approach for Large Language Models
Reasearch Paper: Structured Pruning of Deep Convolutional Neural Networks
Research Paper: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
A complete list of tasks supported with Simple Quantization (Using CLI)
The Docker Image under Latitude: Llama 2 API Inference

Conclusion

No Notebooks.

Previous Courses

Free Resources

Note: This webpage has been updated to follow the order and structure of the second edition. All resources from the first edition are still available but may have been rearranged to match the new sequence. Additionally, you’ll find new links and resources exclusive to the second edition—an added benefit for readers with the first edition.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

book

Building AI for Production

Resources & Links

Key Updates from Edition 1 to Edition 2

A Note on Library and Model Versioning

Introduction

Book Library Requirements

Resources

Chapter I: Introduction to LLMs

Research Papers

Chapter II: LLM Architectures & Landscape

Notebook

Chapter III: LLM Landscape

Chapter IV: Introduction to Prompting

Notebook

Chapter V: Retrieval-Augmented Generation

Notebook

Chapter VI: Introduction to Langchain & LlamaIndex

Notebook

Chapter VII: Prompting with LangChain

Notebook

Chapter VIII: Indexes, Retrievers, and Data Preparation

Notebook

Book File

Chapter IX: Advanced RAG

Notebook

Chapter X: Agents

Notebook

Dataset

Chapter XI: Fine-Tuning

Notebook

Book Model Checkpoints, Requirements, Datasets, W&B Reports

Chapter XII: Deployment

Notebook

Conclusion

Further Reading and Courses

Previous Courses

Free Resources

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement