Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

How to Fine-Tune Any Large Language Model (LLM)
Artificial Intelligence   Data Science   Latest   Machine Learning

How to Fine-Tune Any Large Language Model (LLM)

Last Updated on January 29, 2025 by Editorial Team

Author(s): Pranjal Khadka

Originally published on Towards AI.

Fine-tuning large language models (LLMs) has become an easier task today thanks to the availability of low-code/no-code tools that allow you to simply upload your data, select a base model and obtain a fine-tuned model. However, it is important to understand the fundamentals before diving into these tools. In this article, we’ll explore the entire process of fine-tuning LLMs in detail.

LLMs operate in two main stages: pre-training and fine-tuning.

Image taken from [1]

1. Pre-training

During the pre-training phase, LLMs are exposed to massive datasets of text. This stage involves defining the model architecture, selecting the tokenizer and processing the data using the tokenizer’s vocabulary. In autoregressive models like GPT and LLaMA, the model learns to predict the next word in a sentence. For encoder-only architectures like BERT, the model learns to predict missing words in a sentence. These two approaches are known as causal language modeling (CLM) and masked language modeling (MLM) respectively.

Most commonly, we use causal language modeling where the model predicts the next word in the sequence based on the previous context. However, pre-trained models are general purpose and lack domain-specific knowledge and can’t perform specialized tasks. This is where fine-tuning comes in.

2. Fine- tuning

Fine-tuning allows us to specialize a model’s capabilities for a particular task by adjusting the model’s parameters in a way that minimizes the task-specific loss. However, conducting a full fine-tuning process which involves retraining all parameters of the model can be computationally expensive.

Recently, advanced techniques like LoRA (Low-Rank Adaptation) and QLoRA (Quantized Low-Rank Adaptation) have emerged making fine-tuning more efficient and accessible. The official paper of LoRA demonstrated with the example of GPT-3 175B that LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times and performs on-par or better than normal fine-tuning despite having fewer trainable parameters, a higher training throughput and no additional inference latency.

Let’s take a closer Look at the Fine-Tuning Process :-

1. Training Compute Requirements

Fine-tuning large models like LLaMA 7B and Mistral 7B typically requires around 150–195 GB of GPU memory. To train these models, you’ll need to rent GPUs from cloud providers like AWS SageMaker or Google Colab.

2. Preparing the Dataset

You can either create your own dataset or use publicly available datasets from sources like Hugging Face. Using quality data is a very essential aspect of fine-tuning.

3. Using Low-Code Tools

Tools like Axolotl simplify the fine-tuning process by offering pre-defined configurations for LoRA and QLoRA and open source LLMs. Axolotl requires minimal coding. You just need to clone the GitHub repository, follow the setup instructions and you’ll be able to fine-tune any available LLM with a simple trigger.

Understanding LoRA and QLoRA

LoRA is a technique that allows efficient fine-tuning of LLMs by freezing the pre-trained model weights and introducing low-rank matrices. These low-rank matrices are fine-tuned while the original model weights remain frozen. This approach reduces the number of trainable parameters and makes the fine-tuning process memory efficient while maintaining high performance.

Image taken from [2]

Key Parameters when fine-tuning LoRA:-

  1. lora_r :- rank of the low rank decomposition matrices. A higher value allows to capture more information (better performance), but increases memory usage. If you have a very complex dataset, consider setting this value to high.
  2. lora_alpha :- Scaling factor to control the impact of the LoRA weight updates on the original model weights. A lower value gives more weight to the original pre-training data and maintains model’s existing knowledge to greater extent.
  3. lora_target_module :-Determines which specific layers/matrices is to be trained. Typically, Query and Value projection matrices in the self-attention mechanism are chosen because they have the most impact on model adaptation.

QLoRA enhances LoRA by applying low-rank updates to a model that has been quantized to lower precision. This allows fine-tuning large models with significantly reduced memory requirements, making it easier to work with limited resources.

Image taken from [3]

Key Parameters in QLoRA:-

  1. load_in_4bit :- Load model in 4-bit precision for memory efficiency. Can be set to either True or False. Similarly, there is load_in_8bit.

In addition to the specific parameters for LoRA and QLoRA, you’ll encounter common machine learning hyperparameters when fine-tuning such as num_epochs, batch_size, optimizer, learning_rate, lr_scheduler, wandb parameters for model experimentation and so on.

Fine-tuning LLMs is no longer a daunting task thanks to no-code and low-code tools and techniques like LoRA and QLoRA. By understanding these core principles and training parameters, you can efficiently fine-tune any LLM to meet the specific needs of your application.

References:
1. https://www.upstage.ai/blog/en/understanding-fine-tuning-of-large-language-models
2. https://kim95175.tistory.com/28
3. https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/qlora.html

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓