Introducing “Building LLMs for Production”

Last Updated on June 13, 2024 by Editorial Team

Author(s): Towards AI Editorial Team

Originally published on Towards AI.

We are thrilled to introduce Towards AI’s new book “Building LLMs for Production: Enhancing LLM Abilities and Reliability with Prompting, Fine-Tuning, and RAG.”

This book, now available on Amazon, is the only AI engineering toolkit for building LLM Applications. It comes with essential AI & LLM concepts, many Colab notebooks, hands-on projects, community access, and our own AI Tutor. Written by over 10 people on our Team at Towards AI and curated by experts from Activeloop, LlamaIndex, Mila, and more, it is a roadmap to the tech stack of the future. The book focuses on adapting large language models (LLMs) to specific use cases by leveraging Prompt Engineering, Fine-Tuning, and Retrieval Augmented Generation (RAG). It is tailored for readers with an intermediate knowledge of Python, although no programming knowledge is necessary to explore this book’s AI and LLM-specific concept explanations.

UPDATE: After many requests from the community, we are really excited to announce that our book, Building LLMs for Production, is now available in India. Thanks to a partnership with Shroff Publishers, you can order it here!

It is an end-to-end resource for anyone looking to enhance their skills, dive into the world of AI, or develop their understanding of Generative AI and large language models (LLMs).

Why This Book?

Generative AI and LLMs are transforming industries with their ability to understand and generate human-like text and images. However, building reliable and scalable LLM applications requires a lot of extra work and a deep understanding of various techniques and frameworks.

We focus on the LLM concepts from the ground up to advanced techniques. Most importantly, this book is a product of identifying the **challenges we face in a production environment. So, it particularly focuses on practical solutions for tackling each roadblock.

The book is packed with theories, concepts, projects, applications, and experience that you can confidently put on your CVs. We also hope it is a great motivation for you to finish the book. And if you do, add this straight into your resume with confidence:

Amazing Early Feedback from AI Industry Leaders & Professionals

“This is the most comprehensive textbook to date on building LLM applications — all essential topics in an AI Engineer’s toolkit.”
— Jerry Liu, Co-founder and CEO of LlamaIndex

“An indispensable guide for anyone venturing into the world of large language models. It’s a must-have in the library of every aspiring and seasoned AI professional.”
— Shashank Kalanithi, Data Engineer at Meta

“A truly wonderful resource that develops understanding of LLMs from the ground up, from theory to code and modern frameworks.”
— Pete Huang, Co-founder of The Neuron

“This book covers everything you need to know to start applying LLMs in a pragmatic way — it balances the right amount of theory and applied knowledge, providing intuitions, use-cases, and code snippets.”
— Jeremy Pinto, Senior Applied Research Scientist at Mila

“The book is accessible, with multiple tutorials that you can readily copy, paste, and run on your local machine to showcase the magic of modern AI.”
— Rafid Al-Humaimidi, Senior Software Engineer at Amazon Web Services (AWS)

Ready to take your AI skills to the next level? Get your copy of “Building LLMs for Production” and start building robust, reliable, and scalable AI applications. Join us in revolutionizing the future of AI.

Get Your Copy Today!

Why Towards AI?

Since 2019, Towards AI has educated hundreds of thousands of AI developers, many of whom have grown into senior roles in the industry. Our mission is to make AI more accessible — both to individuals and to corporate teams.

We have a huge audience of AI developers, with 400,000 followers, 120,000 subscribers to our weekly AI newsletter, and 60,000 members in our” Learn AI Together” Discord Community. In 2023, we wrote the hugely successful GenAI360: Foundational Model Certification three-course series (~30,000 students) on behalf of Intel and Activeloop. We have many more B2C and B2B AI courses in the pipeline for both technical and non-technical audiences.

Over 2,000 AI practitioners have written for our AI publication, where we publish ~40 AI articles and tutorials each week. This contributes to an incredible talent pipeline for Towards AI. We hire the best talent from our writer and community networks and the best B2C course students to help write and instruct our AI books and courses and to code our practical projects. The best of these then work on our LLM consulting projects. We can also help with AI recruitment from within our network, our top students, and via our AI jobs board.

Our team can combine AI and software experience with business, product, and strategy understanding, including our Co-founder and CEO Louie Peter’s past experience as Vice President in Investment Research at J.P. Morgan and subsequent work growing and advising startups. Our Co-founder and CTO Louis-François Bouchard, has prior experience as Head of AI and recently left his AI PhD d at MILA to focus on Towards AI. We currently have 15 AI practitioners on the team — operating globally but with a particular cluster in Montreal, given our co-founder’s MILA connection.

Why Prompt Engineering, Fine-tuning, and RAG?

LLMs such as GPT-4 often lack domain-specific knowledge, making generating accurate or relevant responses in specialized fields challenging. They can also struggle with handling large data volumes, limiting their utility in data-intensive scenarios. Another critical limitation is their difficulty processing new or technical terms, leading to misunderstandings or incorrect information. Hallucinations, where LLMs produce false or misleading information, further complicate their use. Hallucinations are a direct result of the model training goal of the next token prediction — to some extent, they are a feature that allows “creative” model answers. However, it is difficult for an LLM to know when it is answering from memorized facts or from imagination. This creates many errors in LLM-assisted workflows, making them difficult to identify. Alongside hallucinations, LLMs sometimes also simply fail to use available data effectively, leading to irrelevant or incorrect responses.

LLMs are currently most often used in production as chatbots or for performance and productivity-enhancing “copilot” use cases, with a human still fully in the loop rather than for fully automated tasks due to these limitations. But there is a long journey from a basic LLM prompt to sufficient accuracy, reliability, and observability for a target copilot use case. This journey is called the “march of 9s” and is popularized in self-driving car development. The term describes the gradual improvement in reliability, often measured in the number of nines (e.g., 90% to 99% reliability) needed to reach human-level performance eventually.

We think the key developer tool kit for the “march of 9s” for LLM-based products is 1) Prompt Engineering, 2) Retrieval Augmented Generation (RAG), 3) Fine-Tuning, and 4) Custom UI/UX. In the near term, AI can assist many human tasks across various industries by combining LLMs, prompting, RAG, and fine-tuning workflows. We think the most successful “AI” companies will focus on highly tailored solutions for specific industries or niches and contribute a lot of industry-specific data and intelligence/experience to how the product is developed.

RAG consists of augmenting LLMs with specific data and requiring the model to use and source this data in its answer rather than relying on what it may or may not have memorized in its model weights. We include entering information into a model’s context window here, together with more complex techniques. We love RAG because it helps with:

Reducing hallucinations by limiting the LLM to answer based on existing chosen data.
Helping with explainability, error checking, and copyright issues by clearly referencing its sources for each comment.
Giving private/specific or more up-to-date data to the LLM.
Not relying too much on black box LLM training/fine tuning for what the models know and have memorized.

Another way to increase LLM performance is through good prompting. Multiple techniques have been found to improve model performance. These methods can be simple, such as giving detailed instructions to the models or breaking down big tasks into smaller ones to make them easier for the model to handle. Some prompting techniques are:

“Chain of Thought” prompting involves asking the model to think through a problem step by step before coming up with a final answer. The key idea is that each token in a language model has a limited “processing bandwidth” or “thinking capacity.” The LLMs need these tokens to figure things out. By asking it to reason through a problem step by step, we use the model’s total capacity to think and help it arrive at the correct answer.
“Few-Shot Prompting” is when we show the model examples of the answers we seek based on some given questions similar to those we expect the model to receive. It’s like showing the model a pattern of how we want it to respond.
“Self-Consistency” involves asking the same question to multiple versions of the model and then choosing the answer that comes up most often. This method helps get more reliable answers.

In short, good prompting is about guiding the model with clear instructions, breaking down tasks into simpler ones, and using specific methods to improve performance. It’s basically the same steps we must do when starting new assignments. The professor assumes you know the concepts and asks you to apply them intelligently.

On the other hand, fine-tuning is like giving the language model extra lessons to improve output for specific tasks. For example, if you want the model to turn regular sentences into SQL database queries, you can train it specifically on that task. Or, if you need the model to respond with answers in JSON format — a type of structured data used in programming — you can fine-tune it. This process can also help the model learn specific information about a certain field or subject. However, if you want to add specialized knowledge quickly and more efficiently, Retrieval Augmented Generation (RAG) is usually a better first step. With RAG, you have more control over the information the model uses to generate responses, making the experimentation phase quicker, more transparent, and easier to manage.

Parts of this toolkit will be partially integrated into the next generation of foundation models, while parts will be solved through added frameworks like Llamaindex and Langchain, especially for RAG workflows. However, the best solutions will need to tailor these tools to specific industries and applications. We also believe prompting, along with RAG, are here to stay — over time, prompting will resemble the necessary skills for effective communication and delegation to human colleagues.

The potential of this generation of AI models goes beyond typical natural language processing (NLP) tasks. There are countless use cases, such as explaining complex algorithms, building bots, helping with app development, and explaining academic concepts. Text-to-image programs like DALL-E, Stable Diffusion, and Midjourney revolutionize fields like animation, gaming, art, movies, and architecture. Additionally, generative AI models have shown transformative capabilities in complex software development with tools like GitHub Copilot.

With this book, we want to take the pressure of ‘keeping up’ away from you and deliver something that can withstand the test of this rapidly evolving field. We focus on the LLM concepts from the ground up to advanced techniques. And most importantly, this book is a product of all the challenges we faced in a production environment. So, it particularly focuses on practical solutions for tackling each roadblock.

Order your copy and start your learning journey now!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Introducing “Building LLMs for Production”

Author(s): Towards AI Editorial Team

Why This Book?

Amazing Early Feedback from AI Industry Leaders & Professionals

Why Towards AI?

Why Prompt Engineering, Fine-tuning, and RAG?

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Understandability of Deep Learning Models

AI for Everyone: The Biggest AI Myths People Still Believe

How We Taught Machines to Think

#62 Will AI Take Your Job?

NN#6 — Neural Networks Decoded: Concepts Over Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Introducing “Building LLMs for Production”

Author(s): Towards AI Editorial Team

Why This Book?

Amazing Early Feedback from AI Industry Leaders & Professionals

Why Towards AI?

Why Prompt Engineering, Fine-tuning, and RAG?

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement