How to Fine-Tune Any Large Language Model (LLM)

Last Updated on January 29, 2025 by Editorial Team

Author(s): Pranjal Khadka

Originally published on Towards AI.

Fine-tuning large language models (LLMs) has become an easier task today thanks to the availability of low-code/no-code tools that allow you to simply upload your data, select a base model and obtain a fine-tuned model. However, it is important to understand the fundamentals before diving into these tools. In this article, we’ll explore the entire process of fine-tuning LLMs in detail.

LLMs operate in two main stages: pre-training and fine-tuning.

How to Fine-Tune Any Large Language Model (LLM) — Image taken from [1]

1. Pre-training

During the pre-training phase, LLMs are exposed to massive datasets of text. This stage involves defining the model architecture, selecting the tokenizer and processing the data using the tokenizer’s vocabulary. In autoregressive models like GPT and LLaMA, the model learns to predict the next word in a sentence. For encoder-only architectures like BERT, the model learns to predict missing words in a sentence. These two approaches are known as causal language modeling (CLM) and masked language modeling (MLM) respectively.

Most commonly, we use causal language modeling where the model predicts the next word in the sequence based on the previous context. However, pre-trained models are general purpose and lack domain-specific knowledge and can’t perform specialized tasks. This is where fine-tuning comes in.

2. Fine- tuning

Fine-tuning allows us to specialize a model’s capabilities for a particular task by adjusting the model’s parameters in a way that minimizes the task-specific loss. However, conducting a full fine-tuning process which involves retraining all parameters of the model can be computationally expensive.

Recently, advanced techniques like LoRA (Low-Rank Adaptation) and QLoRA (Quantized Low-Rank Adaptation) have emerged making fine-tuning more efficient and accessible. The official paper of LoRA demonstrated with the example of GPT-3 175B that LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times and performs on-par or better than normal fine-tuning despite having fewer trainable parameters, a higher training throughput and no additional inference latency.

Let’s take a closer Look at the Fine-Tuning Process :-

1. Training Compute Requirements

Fine-tuning large models like LLaMA 7B and Mistral 7B typically requires around 150–195 GB of GPU memory. To train these models, you’ll need to rent GPUs from cloud providers like AWS SageMaker or Google Colab.

2. Preparing the Dataset

You can either create your own dataset or use publicly available datasets from sources like Hugging Face. Using quality data is a very essential aspect of fine-tuning.

3. Using Low-Code Tools

Tools like Axolotl simplify the fine-tuning process by offering pre-defined configurations for LoRA and QLoRA and open source LLMs. Axolotl requires minimal coding. You just need to clone the GitHub repository, follow the setup instructions and you’ll be able to fine-tune any available LLM with a simple trigger.

Understanding LoRA and QLoRA

LoRA is a technique that allows efficient fine-tuning of LLMs by freezing the pre-trained model weights and introducing low-rank matrices. These low-rank matrices are fine-tuned while the original model weights remain frozen. This approach reduces the number of trainable parameters and makes the fine-tuning process memory efficient while maintaining high performance.

Key Parameters when fine-tuning LoRA:-

lora_r :- rank of the low rank decomposition matrices. A higher value allows to capture more information (better performance), but increases memory usage. If you have a very complex dataset, consider setting this value to high.
lora_alpha :- Scaling factor to control the impact of the LoRA weight updates on the original model weights. A lower value gives more weight to the original pre-training data and maintains model’s existing knowledge to greater extent.
lora_target_module :-Determines which specific layers/matrices is to be trained. Typically, Query and Value projection matrices in the self-attention mechanism are chosen because they have the most impact on model adaptation.

QLoRA enhances LoRA by applying low-rank updates to a model that has been quantized to lower precision. This allows fine-tuning large models with significantly reduced memory requirements, making it easier to work with limited resources.

Key Parameters in QLoRA:-

load_in_4bit :- Load model in 4-bit precision for memory efficiency. Can be set to either True or False. Similarly, there is load_in_8bit.

In addition to the specific parameters for LoRA and QLoRA, you’ll encounter common machine learning hyperparameters when fine-tuning such as num_epochs, batch_size, optimizer, learning_rate, lr_scheduler, wandb parameters for model experimentation and so on.

Fine-tuning LLMs is no longer a daunting task thanks to no-code and low-code tools and techniques like LoRA and QLoRA. By understanding these core principles and training parameters, you can efficiently fine-tune any LLM to meet the specific needs of your application.

References:
1. https://www.upstage.ai/blog/en/understanding-fine-tuning-of-large-language-models
2. https://kim95175.tistory.com/28
3. https://docs.nvidia.com/nemo-framework/user-guide/latest/sft_peft/qlora.html

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

How to Fine-Tune Any Large Language Model (LLM)

Author(s): Pranjal Khadka

1. Pre-training

2. Fine- tuning

1. Training Compute Requirements

2. Preparing the Dataset

3. Using Low-Code Tools

Understanding LoRA and QLoRA

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

How to Fine-Tune Any Large Language Model (LLM)

Author(s): Pranjal Khadka

1. Pre-training

2. Fine- tuning

1. Training Compute Requirements

2. Preparing the Dataset

3. Using Low-Code Tools

Understanding LoRA and QLoRA

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement