Building AI for Production
Resources & Links
This page is a comprehensive compilation of all the links and resources in the book “Building AI for Production: Enhancing LLM Abilities and Reliability with Fine-Tuning and RAG”. Here, you’ll find a collection of code notebooks, checkpoints, GitHub repositories, learning resources, and all other materials shared throughout the book. It is organized chapter-wise and presented in chronological order for easy access.
If you see discrepancies between the code in the book and the code in colab, or want to improve the colabs with new updates, please feel free to create a pull request inΒ the GitHub.
Key Updates from Edition 1 to Edition 2
Chapter | Update Description |
Chapter 1 | Expanded to include the latest benchmarks, such as LMSYS, and a run-through of recent models like GPT-4 and Gemini 1.5 and technniques like Infinite Attention. |
Chapter 2 | Extended industry applications of LLMs in sectors like media, education, finance, and medicine, with a deeper dive into pecific use cases in each industry, such as autocompletion, code prediction, and debugging in technology and software. |
Chapter 3 | Minor restructuring to improve logical flow and progressive understanding of LLM challenges and solutions. |
Chapter 4 | A new section on prompt injection introduces this emerging security challenge, detailing its types, impact on reliability, with solutions such as guardrails and safeguards to protect LLM integrity. |
Chapter 5: RAG (Previously, Introduction to LangChain and LlamaIndex) | Includes a step-by-step guide to building a basic Retrieval-Augmented Generation (RAG) pipeline from scratch, covering essentials like embeddings, cosine similarity, and vector stores. This foundation equips you to apply modern frameworks like LlamaIndex and LangChain more efficiently or go on your own with custom implementations and prepares you for their evolution better. |
Chapter 6: Introduction to LangChain & LlamaIndex (Previously Prompting with LangChain) | Introduces foundational elements of LangChain as part of a complete system, providing a comprehensive understanding of how each component functions within a broader context. This structured overview acts as a roadmap, enabling a clearer grasp of RAG pipelines in the upcoming chapters. |
Chapter 7: Prompting with LangChain (Previously RAG) | Includes LangChain Chains, previously part of the RAG chapter, for clarity. |
Chapter 8: Indexes, Retrievers, and Data preparation (New Chapter) | Indexes, Retrievers, and Data Preparation are essential components of a RAG pipeline. While these concepts were introduced in the first edition, this updated edition includes a dedicated chapter that focuses on their foundational principles. This approach ensures that readers can effectively scale LLM applications, optimize performance, and enhance response quality. Additionally, by emphasizing the fundamentals, this edition allows readers to understand and implement RAG concepts independently, without relying exclusively on frameworks like LangChain. |
Chapter 9: Advanced RAG | Only structural updates |
Chapter 10: Agents | Only structural updates |
Chapter 11: Fine-tuning | Only structural updates |
Chapter 12: Deployment and Optimization | The updated version takes a deeper dive into essential techniques for LLM deployment and optimization, making it more practical and relevant for current AI development needs. For example, the book explores model distillation, a powerful technique to reduce inference costs and improve latency, with a detailed case study on Googleβs Gemma 2, demonstrating its real-world impact. With open-source LLMs growing in popularity, this edition also covers the deployment of LLMs on various cloud platforms, including Together AI, Groq, Fireworks AI, and Replicate. This broader approach helps readers find cost-effective and scalable solutions for real-world applications. |
A Note on Library and Model Versioning
LLMs are advancing rapidly, but the core skills and tools covered in this bookβlike fine-tuning, prompt engineering, and retrieval-augmented generationβ will remain essential for adapting next-generation models to specific data, workflows, and industries. These principles will stay relevant across models, even as some specific libraries evolve.
For seamless code execution, weβve included a requirements file for library versions. If you’re running notebooks on Google Colab, be aware that libraries like “pytorch” and “Transformers” are pre-installed. Should compatibility issues arise, try uninstalling these libraries in Colab and reinstalling the specified versions from the requirements file.
Switching to newer LLMs is straightforward. For instance, with OpenAI models, you can update the model simply by changing its name in the code. We recommend using GPT-4o-mini over “GPT-3.5 Turbo” in the book examples. Regularly checking documentation for Langchain, LlamaIndex, and OpenAI is also encouraged to stay aligned with updates and best practices.
This approach ensures your skills remain applicable in the dynamic LLM field.
Table of Contents
Introduction
No Notebooks.
Book Library Requirements
Resources
- Python & Other Technical Notes: Our guide to starting in AI (Python, Math, and more resources. All free)
- Towards AI Open Source AI chatbot: AI TutorΒ
- Discord Community: Learn AI Together
- Coding Environment and Packages: Visual Studio CodeΒ
Chapter I: Introduction to LLMs
No Notebooks.
Research Papers
- Attention Is All You Need (Section: Transformers)
- Training Compute-Optimal Large Language Models. (Section: Scaling Laws)
- Emergent Abilities of Large Language Models (Section: What are Emergent Abilities)
- Evaluation Benchmarks for Emergent Abilities: Massive Multi-task Language Understanding MMLU | Word in Context
- Optimization Techniques to Expand the Context Window: ALiBi Positional Encoding | Sparse Attention | FlashAttention | Multi-Query Attention (MQA)
- FlashAttention-2Β
- LONGNET: Scaling Transformers to 1,000,000,000 Tokens.
- A Survey of Large Language Models (Section: A Timeline of the Most Popular LLMs)
- Evaluation Benchmarks for Emergent Abilities GitHub Repo: BIG-Bench suite | TruthfulQA |Β
Chapter II: LLM Architectures & Landscape
Notebook
- Understanding TransformerΒ (Section: The Architecture in Action)
- Transformer Architecture (Section: Transformer Model’s Design Choices)
Resources & Additional Links
- Tutorial: Let’s build GPT: from scratch, in code, spelled outΒ
- Demo Environment: IDEFICS Playground
- GitHub Repo: minGPT – A PyTorch re-implementation of GPT, both training and inference
- Open AI Blog Post: InstructGPTΒ
- LLM Leaderboard: Chatbot ArenaΒ
- LLaVA – An Instruction-tuned LMM: VicunaΒ
- Open Flamingo
- Beyond Vision and Language (Models): PandaGPT | ImageBind | SpeechGPT | NExT-GPTΒ
- Proprietary and Open-Source LLMs: Cohere LLMs | Open AI GPT 3.5 | Anthropicβs Claude Models | Google Deepmindβs Gemini | Metaβs Llama Models | Falcon | Dolly | Open Assistant | Mistral LLMsΒ
- Research Paper: βVision Transformers: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scaleβ
- Research Paper: βMultimodal Foundation Models: From Specialists to General-Purpose Assistantsβ
- Research Paper: Flamingo: a Visual Language Model for Few-Shot Learning
Chapter III: LLM Landscape
No Notebooks.
Research Papers: Evaluating LLM Performance (Benchmarks)
Chapter IV: Introduction to Prompting
Notebook
No Notebooks.
Resources
Chapter V: Retrieval-Augmented Generation
Notebook
- Building a Basic RAG Pipeline from Scratch (Section: Building a Basic RAG Pipeline from Scratch)
Resources
Chapter VI: Introduction to Langchain & LlamaIndex
Notebook
- Building Application Powered by LLMs with LangChain (Section: Tutorial 1: Building LLM-Powered Applications with LangChain)
- Build a News Articles Summarizer (Section: Tutorial 2: Building a News Articles Summarizer)
- LlamaIndex Introduction (Section: LlamaIndex Introduction)
ResourcesΒ
- LangChain Documentation
- Useful LangChain Components: | Prompts | Output Parsers | Retrievers | Document Loaders | Text Splitters | Indexes | Embeddings models | Vector Stores | Agents | Chain | Tool | Memory | Callbacks
- Useful LangChain Agents: Zero-shot ReAct | Structured Input ReAct | OpenAI Functions Agent | Self-Ask with Search Agent | ReAct Document Store Agent | Plan-and-execute agentsΒ
- The output for the Building LLM-Powered Applications with LangChain section is based on The One Page Linux Manual: A summary of useful Linux commands
- Building applications with LLMs through composability
- A Complete Guide to LangChain: Building Powerful Applications with LLMs
- LlamaIndex Index Guide
- LlamaIndex: How to use Index correctly
- Defining a Custom Query Engine
- Working Example of Implementing Routers
- LlamaIndex documentationΒ
- Financial Document Analysis with LlamaIndex
- LlamaIndex: Adding Personal Data to LLMs
- LlamahubΒ Β
Chapter VII: Prompting with LangChain
Notebook
- Using Prompt Templates (Section: What are LangChain Prompt Templates)
- Getting the Best of Few Shot Prompts & Example Selectors (Section: Few Shot Prompts and Example Selectors)
- Chains and Why They Are Used Notebook (Section: What are LangChain Chains)
- Managing Outputs with Output Parsers (Section: Tutorial 1: Managing Outputs with Output Parsers)
- Improving Our News Articles Summarizer (Section: Tutorial 2: Improving Our News Articles Summarizer)
- Creating Knowledge Graphs from Textual Data Unveiling Hidden Connections (Section: Tutorial 3: Creating Knowledge Graphs from Textual Data: Finding Hidden Connections)
Resources
- Building a Knowledge Base from Texts: a Full Practical Example
- GitHub: langchain
- Knowledge Graph Visualization: NetworkX libraryΒ
- Knowledge Graph Visualization: Pyvis library
Chapter VIII: Indexes, Retrievers, and Data Preparation
Notebook
- What are Text Splitters and Why They are UsefulΒ (Section: Text Splitters)Β
- Create a YouTube Video Summarizer Using Whisper and LangChain (Section: Tutorial 2: A YouTube Video Summarizer Using Whisper and LangChain)
- Guarding Against Undesirable Outputs With the Self-Critique Chain (Section: Tutorial 4: Preventing Undesirable Outputs with the Self-Critique Chain)Β
- Guarding Against Undesirable Outputs With the Self-Critique Chain Example (Section: Tutorial 5: Preventing Undesirable Outputs from a Customer Service Chatbot)Β
Book File
- Book File Requirements
- Sample PDF Used for Customizing Text Splitters Example: The One Page Linux Manual
Tokens and APIs & Packages
Resources
- Tutorial: How to install ffmpeg
- Documentation: Text wrapping and filling
- Code: idontcalculate/langchain
- GitHub Repo: JarvisBaseΒ
- LangChain Docs: Split by character | Split code | Recursively split by character | SummarizationΒ
- Open AI WhisperΒ
- Acitveloop Docs: Deep Lake Vector Store in LangChainΒ
- Documentation: Self-critique chain with constitutional AI
Chapter IX: Advanced RAG
Notebook
- Masterting Advanced RAG (Section: Using a Query Engine to Answer Queries)Β
- RAG Metrics & Evaluations (Section: RAG Metrics & Evaluation)
- LangSmith Introduction (Section: LangChain LangSmith and LangChain Hub)
Resources
- Source Document for the Query Engine Example: Text Data for the example (Section: Query Engine Example)
- Tutorial: Building an Advanced Fusion Retriever from Scratch
- LangChain Docs: Query constructionΒ
- Cohere Reranking
- Tutorial: Hands-on Tutorial for Implementing Small-to-Big Retrieval
- Colab Notebook: Cohere Rerank Endpoint
- Blog: Complex Query Resolution through LlamaIndex Utilizing Recursive Retrieval, Document Agents, and Sub Question Query Decomposition
- Blog: Improving Retrieval Performance by Fine-tuning Cohere Reranker with LlamaIndex
- LlamaIndex Notebook
- LllamaIndex Docs (Section: The Role of the Retrieval Step):Β Retriever Query Engine with Custom Retrievers – Simple Hybrid Search | Metadata Filtering | Cohere Rerank | Document Summary Index
- LllamaIndex Docs (Section: RAG Metrics): Correctness | Faithfulness | Context Relevancy | Guideline Adherence | Embedding Semantic Similari | Finetuning Embeddings | Evaluating – LlamaIndex | Retrieval Evaluation | Golden Dataset | Response Evaluation | Recursive Retriever + Query Engine Demo | Query Engine | Chat Engine
- Creating the Dataset
- openai-cookbook
- RAGAS GitHub repository
- RagEvaluatorPack Downloading a LlamaDataset from LlamaHub
- LangSmithΒ
- Hub-examples: LangSmith cookbook
- The Art of LangSmith
- LangServe Github Repository
Chapter X: Agents
Notebook
- Using AutoGPT with LangChain (Section: Using AutoGPT with LangChain)
- Using AutoGPT with LangChain Output (Section: Using AutoGPT with LangChain)
- Building Autonomous Agents to Create Analysis Reports (Section: Tutorial 1: Building Agents for Analysis Report Creation)
- Query and Summarize a DB with LlamaIndex (Section: Tutorial 2: Query and Summarize a DB with LlamaIndex)
- Building Agents with OpenAI Assistants (Section: Tutorial 3: Building Agents with OpenAI Assistants)
- Multimodal Finance + Deep Memory (Section: Tutorial 5: Multimodal Financial Document Analysis from PDFs)
Dataset
- Dataset for the Multimodal Financial Document Analysis Example: Tesla Q3 Financial Report
- Preprocessed Text/Label for the Multimodal Financial Document Analysis
- Preprocessed Graphs for the Multimodal Financial Document Analysis
Resources
- GitHub: Babyagi Inspired Projects
- GitHub: Agent Simulations
- GitHub: CAMEL Role-Playing Autonomous Cooperative Agents
- GitHub; LlamaHub
- LlamaIndex Docs: Data Agents | OpenAI Agent with Query Engine Tools | Multi-Document Agents
- Blog: OpenAI Assistants API: Walk-through and Coding a Research Assistant
- GitHub: HuggingFace Inference CommunityΒ
- Colab Notebook: Assistants APIΒ
- OpenAI Docs: OpenAI Knowledge Retrieval
- Blog: Function Calling OpenAI
- GitHub: LangChain OpenGPTs
- Blog: Maximizing LangChain Efficiency: Agents and ReAct Method Review
- LangChain Docs: Defining Custom Tools
- Tutorial: Installing Poppler on WindowsΒ
- Tutorial: Installing Tesseract on WindowsΒ
- Website: AutoGPTΒ
- Website: BabyAGIΒ
- Research: On AutoGPT – LessWrongΒ
- Website: CAMEL
- Research Paper: The CAMEL: Communicative Agents for βMindβ Exploration of Large Language Model Society paperΒ
- Research Paper: Generative Agents: Interactive Simulacra of Human Behavior
- Website: OpenGPTs
Chapter XI: Fine-Tuning
Notebook
- FineTuning a LLM Lima CPU (Section: Tutorial 1: SFT with LoRA)
- FineTuning a LLM Financial Sentiment CPU (Section: Tutorial 2: Using SFT and LoRA for Financial Sentiment)
- Create a Dataset For Cohere Fine-Tuning (Section: Tutorial 3: Fine-Tuning a Cohere LLM with Medical Data)
- Fine-Tuning Using Cohere for Medical Data (Section: Tutorial 3: Fine-Tuning a Cohere LLM with Medical Data)
- Finetuning a LLM QloRA (Section:Β Tutorial 4/Supervised Fine-Tuning Notebook)Β
- Finetuning a Reward Model (Section: Tutorial 4/Training a Reward Model Notebook)
- Finetune RLHF (Section: Tutorial 4/RLHF)
Book Model Checkpoints, Requirements, Datasets, W&B Reports
- OPT fine-tuned LIMA checkpoint on CPU (Section: Practical Example: SFT with LoRA)
- OPT Fine-tuned finGPT with CPU (Section: Using SFT for Financial Sentiment)
- The Merged Model Checkpoint (2GB) (Section: Supervised Fine-Tuning Notebook)Β
- Requirements (Section: Supervised Fine-Tuning Notebook)Β
- The Reward Model Checkpoint (Step 1000 – 2GB) (Section: Training a Reward Model Notebook)
- Requirements (Section: Training a Reward Model Notebook)
- The Merged RL Model Checkpoint (2GB) (Section: RLHF)Β
- Requirements (Section: RLHF)Β
- BC5CDR Dataset in JSON format (Section: Fine-Tuning a Cohere LLM with Medical Data)
- Preprocessed DatasetΒ (Section: Fine-Tuning a Cohere LLM with Medical Data)
- Complete Dataset (Section: Supervised Fine-Tuning Notebook)
- OpenOrca Dataset (Section: Supervised Fine-Tuning Notebook & Section: RLHF)
- “helpfulness/harmless”: (hh) by Anthropic (Section: Training a Reward Model Notebook)
- OPT Fine-tuned LIMA CPU (Section: Practical Example: SFT with LoRA)
- Weights & Bias Report (Section: Supervised Fine-Tuning Notebook)
- Weights & Bias Report (Section: Training a Reward Model Notebook)
- Weights and Biases report (Section: RLHF)
Resources
- Research Paper: Low-Rank Adaptation (LoRA)
- Research Paper: QLoRA: An Efficient Variant of LoRA
- Open-source Resources for LoRA: PEFT Library | Lit-GPT
- Cohere Docs: Fine-tuning an Embedding Model for ClassificationΒ
- Research Paper: Reinforcement Learning from Human FeedbackΒ
- Research Paper: LIMA: Less Is More for Alignment
- Research Paper: Direct Preference Optimization (DPO)
- Research Paper: Google DeepMind’s Reinforced Self-Training (ReST)
- Research Paper: Reinforcement Learning from AI Feedback (RLAIF)
Chapter XII: Deployment
Notebook
- Benchmark Inference (Section: Tutorial: Deploying a Quantized LLM on a CPU on Google Cloud Platform (GCP))
Resources
- Research Paper:Β Model Compression
- Research Paper:Β Distilling the Knowledge in a Neural Network
- Research Paper: A Survey of Quantization Methods for Efficient Neural Network Inference
- Research Paper: Sparsity in Deep Learning
- GitHub: Hugging Face Optimum
- GitHub: Intel Neural Compressor
- Research Paper:Β LLM.int8(): 8bit Matrix Multiplication for Transformers at Scale
- Research Paper: GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
- Research Paper: AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
- Research Paper:Β Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures
- Reasearch Paper: A Simple and Effective Pruning Approach for Large Language Models
- Reasearch Paper: Structured Pruning of Deep Convolutional Neural NetworksΒ
- Research Paper:Β The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
- A complete list of tasks supported with Simple Quantization (Using CLI)
- The Docker Image under Latitude: Llama 2 API Inference
Conclusion
No Notebooks.
Further Reading and Courses
Previous Courses
Free Resources
Note: This webpage has been updated to follow the order and structure of the second edition. All resources from the first edition are still available but may have been rearranged to match the new sequence. Additionally, youβll find new links and resources exclusive to the second editionβan added benefit for readers with the first edition.