Building AI for Production
Resources & Links
This page is a comprehensive compilation of all the links and resources in the book “Building AI for Production: Enhancing LLM Abilities and Reliability with Fine-Tuning and RAG”. Here, you’ll find a collection of code notebooks, checkpoints, GitHub repositories, learning resources, and all other materials shared throughout the book. It is organized chapter-wise and presented in chronological order for easy access.
If you see discrepancies between the code in the book and the code in colab, or want to improve the colabs with new updates, please feel free to create a pull request inΒ the GitHub.
Table of Contents
Introduction
No Notebooks.
Book Library Requirements
Resources
- Python & Other Technical Notes: Our guide to starting in AI (Python, Math, and more resources. All free)
- Towards AI Open Source AI chatbot: AI TutorΒ
- Discord Community: Learn AI Together
- Coding Environment and Packages: Visual Studio CodeΒ
Chapter I: Introduction to LLMs
No Notebooks.
Research Papers
- Attention Is All You Need (Section: Transformers)
- Training Compute-Optimal Large Language Models. (Section: Scaling Laws)
- Emergent Abilities of Large Language Models (Section: What are Emergent Abilities)
- Evaluation Benchmarks for Emergent Abilities: Massive Multi-task Language Understanding MMLU | Word in Context
- Optimization Techniques to Expand the Context Window: ALiBi Positional Encoding | Sparse Attention | FlashAttention | Multi-Query Attention (MQA)
- FlashAttention-2Β
- LONGNET: Scaling Transformers to 1,000,000,000 Tokens.
- A Survey of Large Language Models (Section: A Timeline of the Most Popular LLMs)
- Evaluation Benchmarks for Emergent Abilities GitHub Repo: BIG-Bench suite | TruthfulQA |Β
Chapter II: LLM Architectures & Landscape
Notebook
- Understanding TransformerΒ (Section: Understanding Transformers)
- Transformer Architecture (Section: Transformer Architecture Design Choice)
Resources & Additional Links
- Tutorial: Let’s build GPT: from scratch, in code, spelled outΒ
- Demo Environment: IDEFICS Playground
- GitHub Repo: minGPT – A PyTorch re-implementation of GPT, both training and inference
- Open AI Blog Post: InstructGPTΒ
- LLM Leaderboard: Chatbot ArenaΒ
- LLaVA – An Instruction-tuned LMM: VicunaΒ
- Open Flamingo
- Beyond Vision and Language (Models): PandaGPT | ImageBind | SpeechGPT | NExT-GPTΒ
- Proprietary and Open-Source LLMs: Cohere LLMs | Open AI GPT 3.5 | Anthropicβs Claude Models | Google Deepmindβs Gemini | Metaβs Llama Models | Falcon | Dolly | Open Assistant | Mistral LLMsΒ
- Research Paper: βVision Transformers: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scaleβ
- Research Paper: βMultimodal Foundation Models: From Specialists to General-Purpose Assistantsβ
- Research Paper: Flamingo: a Visual Language Model for Few-Shot Learning
Chapter III: LLM Landscape
No Notebooks.
Research Papers: Evaluating LLM Performance (Benchmarks)
Chapter IV: Introduction to Prompting
Notebook
- Intro to Prompt Engineering Tips and Tricks (Section: Integrating Prompting into Code Examples)Β
Resources
Chapter V: Introduction to Langchain & LlamaIndex
Notebook
- Building Application Powered by LLMs with LangChain (Section: Building LLM-Powered Applications with LangChain)
- Build a News Articles Summarizer (Section: Building a News Articles Summarizer)
- LlamaIndex Introduction (Section: LlamaIndex Introduction)
ResourcesΒ
- LangChain Documentation
- Building applications with LLMs through composability
- A Complete Guide to LangChain: Building Powerful Applications with LLMs
- LlamaIndex Index Guide
- LlamaIndex: How to use Index correctly
- Defining a Custom Query Engine
- Working Example of Implementing Routers
- LlamaIndex documentationΒ
- Financial Document Analysis with LlamaIndex
- LlamaIndex: Adding Personal Data to LLMs
- The output for the Building LLM-Powered Applications with LangChain section is based on The One Page Linux Manual: A summary of useful Linux commands
- Useful LangChain Components: | Prompts | Output Parsers | Retrievers | Document Loaders | Text Splitters | Indexes | Embeddings models | Vector Stores | Agents | Chain | Tool | Memory | Callbacks
- Useful LangChain Agents: Zero-shot ReAct | Structured Input ReAct | OpenAI Functions Agent | Self-Ask with Search Agent | ReAct Document Store Agent | Plan-and-execute agentsΒ
- Useful LangChain Tools The Python tool | The JSON tool | The CSV tool | Custom tools
- LlamahubΒ
- Deep Lake Vector StoreΒ
Chapter VI: Prompting with LangChain
Notebook
- Using Prompt Templates (Section: What are LangChain Prompt Templates)
- Getting the Best of Few Shot Prompts & Example Selectors (Section: Few Shot Prompts and Example Selectors)
- Managing Outputs with Output Parsers (Section: Managing Outputs with Output Parsers)
- Improving Our News Articles Summarizer (Section: Improving Our News Articles Summarizer)
- Creating Knowledge Graphs from Textual Data Unveiling Hidden Connections (Section: Creating Knowledge Graphs from Textual Data)
Resources
- Building a Knowledge Base from Texts: a Full Practical Example
- GitHub: langchain
- Knowledge Graph Visualization: NetworkX libraryΒ
- Knowledge Graph Visualization: Pyvis library
Chapter VII: Retrieval Augmented Generation
Notebook
- What are Text Splitters and Why They are UsefulΒ (Section: What are Text Splitters and Why They are Useful)Β
- Chains and Why They Are Used Notebook (Section: Chains and Why They Are Used Notebook)
- Create a YouTube Video Summarizer Using Whisper and LangChain (Section: Create a YouTube Video Summarizer Using Whisper and LangChain)
- Guarding Against Undesirable Outputs With the Self-Critique Chain (Section: Preventing Undesirable Outputs With the Self-Critique Chain)Β
- Guarding Against Undesirable Outputs With the Self-Critique Chain Example (Section: Real World Example)Β
Book File
- Book File Requirements
- Sample PDF Used for Customizing Text Splitters Example: The One Page Linux Manual
Tokens and APIs & Packages
Resources
- Research Paper: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- Blog: Retrieval Augmented Generation (RAG)
- Tutorial: How to install ffmpeg
- Documentation: Text wrapping and filling
- Code: idontcalculate/langchain
- GitHub Repo: JarvisBaseΒ
- LangChain Docs: Split by character | Split code | Recursively split by character | SummarizationΒ
- Open AI WhisperΒ
- Acitveloop Docs: Deep Lake Vector Store in LangChainΒ
- Documentation: Self-critique chain with constitutional AI
Chapter VIII: Advanced RAG
Notebook
- Masterting Advanced RAG (Section: Advanced RAG Techniques with LlamaIndex)Β
- RAG Metrics & Evaluations (Section: RAG Metrics & Evaluation)
- LangSmith Introduction (Section: LangChainβs LangSmith – Introduction)
Resources
- Source Document for the Query Engine Example: Text Data for the example (Section: Query Engine Example)
- Tutorial: Building an Advanced Fusion Retriever from Scratch
- LangChain Docs: Query constructionΒ
- Cohere Reranking
- Tutorial: Hands-on Tutorial for Implementing Small-to-Big Retrieval
- Colab Notebook: Cohere Rerank Endpoint
- Blog: Complex Query Resolution through LlamaIndex Utilizing Recursive Retrieval, Document Agents, and Sub Question Query Decomposition
- Blog: Improving Retrieval Performance by Fine-tuning Cohere Reranker with LlamaIndex
- LlamaIndex Notebook
- LllamaIndex Docs (Section: The Role of the Retrieval Step):Β Retriever Query Engine with Custom Retrievers – Simple Hybrid Search | Metadata Filtering | Cohere Rerank | Document Summary Index
- LllamaIndex Docs (Section: RAG Metrics): Correctness | Faithfulness | Context Relevancy | Guideline Adherence | Embedding Semantic Similari | Finetuning Embeddings | Evaluating – LlamaIndex | Retrieval Evaluation | Golden Dataset | Response Evaluation | Recursive Retriever + Query Engine Demo | Query Engine | Chat Engine
- Creating the Dataset
- openai-cookbook
- RAGAS GitHub repository
- RagEvaluatorPack Downloading a LlamaDataset from LlamaHub
- LangSmithΒ
- Hub-examples: LangSmith cookbook
- The Art of LangSmith
- LangServe Github Repository
Chapter IX: Agents
Notebook
Dataset
- Using AutoGPT with LangChain (Section: Using AutoGPT with LangChain)
- Using AutoGPT with LangChain Output (Section: Using AutoGPT with LangChain)
- Building Autonomous Agents to Create Analysis Reports (Section: Tutorial 1: Building Agents for Analysis Report Creation)
- Query and Summarize a DB with LlamaIndex (Section: Tutorial 2: Query and Summarize a DB with LlamaIndex)
- Building Agents with OpenAI Assistants (Section: Tutorial 3: Building Agents with OpenAI Assistants)
- Multimodal Finance + Deep Memory (Section: Tutorial 5: Multimodal Financial Document Analysis from PDFs)Β
- Dataset for the Multimodal Financial Document Analysis Example: Tesla Q3 Financial Report
- Preprocessed Text/Label for the Multimodal Financial Document Analysis
- Preprocessed Graphs for the Multimodal Financial Document Analysis
Resources
- GitHub: Babyagi Inspired Projects
- GitHub: Agent Simulations
- GitHub: CAMEL Role-Playing Autonomous Cooperative Agents
- GitHub; LlamaHub
- LlamaIndex Docs: Data Agents | OpenAI Agent with Query Engine Tools | Multi-Document Agents
- Blog: OpenAI Assistants API: Walk-through and Coding a Research Assistant
- GitHub: HuggingFace Inference CommunityΒ
- Colab Notebook: Assistants APIΒ
- OpenAI Docs: OpenAI Knowledge Retrieval
- Blog: Function Calling OpenAI
- GitHub: LangChain OpenGPTs
- Blog: Maximizing LangChain Efficiency: Agents and ReAct Method Review
- LangChain Docs: Defining Custom Tools
- Tutorial: Installing Poppler on WindowsΒ
- Tutorial: Installing Tesseract on WindowsΒ
- Website: AutoGPTΒ
- Website: BabyAGIΒ
- Research: On AutoGPT – LessWrongΒ
- Website: CAMEL
- Research Paper: The CAMEL: Communicative Agents for βMindβ Exploration of Large Language Model Society paperΒ
- Research Paper: Generative Agents: Interactive Simulacra of Human Behavior
- Website: OpenGPTs
Chapter X: Fine-Tuning
Notebook
- FineTuning a LLM Lima CPU (Section: Tutorial 1: SFT with LoRA)
- FineTuning a LLM Financial Sentiment CPU (Section: Tutorial 2: Using SFT and LoRA for Financial Sentiment)
- Create a Dataset For Cohere Fine-Tuning (Section: Tutorial 3: Fine-Tuning a Cohere LLM with Medical Data)
- Fine-Tuning Using Cohere for Medical Data (Section: Tutorial 3: Fine-Tuning a Cohere LLM with Medical Data)
- Finetuning a LLM QloRA (Section:Β Tutorial 4/Supervised Fine-Tuning Notebook)Β
- Finetuning a Reward Model (Section: Tutorial 4/Training a Reward Model Notebook)
- Finetune RLHF (Section: Tutorial 4/RLHF)
Book Model Checkpoints, Requirements, Datasets, W&B Reports
- OPT fine-tuned LIMA checkpoint on CPU (Section: Practical Example: SFT with LoRA)
- OPT Fine-tuned finGPT with CPU (Section: Using SFT for Financial Sentiment)
- The Merged Model Checkpoint (2GB) (Section: Supervised Fine-Tuning Notebook)Β
- Requirements (Section: Supervised Fine-Tuning Notebook)Β
- The Reward Model Checkpoint (Step 1000 – 2GB) (Section: Training a Reward Model Notebook)
- Requirements (Section: Training a Reward Model Notebook)
- The Merged RL Model Checkpoint (2GB) (Section: RLHF)Β
- Requirements (Section: RLHF)Β
- BC5CDR Dataset in JSON format (Section: Fine-Tuning a Cohere LLM with Medical Data)
- Preprocessed DatasetΒ (Section: Fine-Tuning a Cohere LLM with Medical Data)
- Complete Dataset (Section: Supervised Fine-Tuning Notebook)
- OpenOrca Dataset (Section: Supervised Fine-Tuning Notebook & Section: RLHF)
- “helpfulness/harmless”: (hh) by Anthropic (Section: Training a Reward Model Notebook)
- OPT Fine-tuned LIMA CPU (Section: Practical Example: SFT with LoRA)
- Weights & Bias Report (Section: Supervised Fine-Tuning Notebook)
- Weights & Bias Report (Section: Training a Reward Model Notebook)
- Weights and Biases report (Section: RLHF)
Resources
- Research Paper: Low-Rank Adaptation (LoRA)
- Research Paper: QLoRA: An Efficient Variant of LoRA
- Open-source Resources for LoRA: PEFT Library | Lit-GPT
- Cohere Docs: Fine-tuning an Embedding Model for ClassificationΒ
- Research Paper: Reinforcement Learning from Human FeedbackΒ
- Research Paper: LIMA: Less Is More for Alignment
- Research Paper: Direct Preference Optimization (DPO)
- Research Paper: Google DeepMind’s Reinforced Self-Training (ReST)
- Research Paper: Reinforcement Learning from AI Feedback (RLAIF)
Chapter XI: Deployment
Notebook
- Benchmark Inference (Section: Deploying an LLM on a Cloud CPU)
Resources
- Research Paper:Β Model Compression
- Research Paper:Β Distilling the Knowledge in a Neural Network
- Research Paper: A Survey of Quantization Methods for Efficient Neural Network Inference
- Research Paper: Sparsity in Deep Learning
- GitHub: Hugging Face Optimum
- GitHub: Intel Neural Compressor
- Research Paper:Β LLM.int8(): 8bit Matrix Multiplication for Transformers at Scale
- Research Paper: GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
- Research Paper: AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
- Research Paper:Β Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures
- Reasearch Paper: A Simple and Effective Pruning Approach for Large Language Models
- Reasearch Paper: Structured Pruning of Deep Convolutional Neural NetworksΒ
- Research Paper:Β The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
- A complete list of tasks supported with Simple Quantization (Using CLI)
- The Docker Image under Latitude: Llama 2 API Inference
Conclusion
No Notebooks.