Fine-Tune LLMs with Unsloth

Last Updated on October 31, 2024 by Editorial Team

Author(s): Barhoumi Mosbeh

Originally published on Towards AI.

Why Fine-Tune When We Have RAG?

It’s a question I see a lot — with RAG (Retrieval-Augmented Generation) becoming increasingly popular, why bother with fine-tuning at all? While RAG is fantastic for many use cases, fine-tuning still has its place in the ML toolkit.

Here’s why: Fine-tuning allows you to fundamentally alter how your model “thinks” about specific domains. While RAG provides context at inference time, fine-tuning builds domain expertise directly into the model’s weights. This is particularly powerful when you need:

Consistent domain-specific behavior
Faster inference (no need to search through external documents)
Specialized knowledge that’s difficult to capture in reference documents

Plus, there’s a compelling cost argument: You can fine-tune smaller models for specific tasks and achieve performance comparable to much larger models at a fraction of the hosting cost.

I found this discussion on Reddit great. Have a look!

Enter Unsloth: Making Fine-Tuning Accessible

Training times have always been one of the biggest barriers to fine-tuning. That’s where Unsloth comes in — a new optimization framework that claims to make LLM training up to 30x faster.

The secret to Unsloth’s efficiency lies in deep optimization. While PyTorch and Transformers are built for flexibility across different architectures, Unsloth takes a more focused approach. It combines techniques like QLoRA and Triton with architecture-specific optimizations to squeeze maximum performance out of the training process.

Hands-on: Fine-Tuning for SQL Generation

Let’s put this into practice by fine-tuning a model to generate SQL queries. We’ll use Llama-3.2–3B, a 3-billion parameter model that strikes a good balance between capability and resource requirements.

First, it’s important to find a good dataset to fine-tune the model, and the reason why finding the right dataset is so crucial is that when you train a small language model with data relevant to the task at hand, it can actually outperform larger models. What we aim to do is create a small, fast LLM that generates SQL queries based on table data.

One of the most significant datasets for this purpose is called Synthetic Text to SQL, which contains over 105,000 records divided into columns of prompt SQL content, complexity, and more.
Here is the link to the dataset:
Synthetic Text to SQL

Setting Up the Environment

First, let’s install the necessary packages. We’ll need to manage our PyTorch installation carefully:

%%capture
!pip install pip3-autoremove
!pip-autoremove torch torchvision torchaudio -y
!pip install "torch==2.4.0" "xformers==0.0.27.post2" triton torchvision torchaudio
!pip install "unsloth[kaggle-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install datasets

Loading the Model

Now we’ll load our base model using Unsloth’s optimized loader:

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.


model, tokenizer = FastLanguageModel.from_pretrained(
 model_name = "unsloth/Llama-3.2-3B-bnb-4bit",
 max_seq_length = max_seq_length,
 dtype = dtype,
 load_in_4bit = load_in_4bit,
)

Setting Up PEFT

We’ll load the PEFT (Parameter-Efficient Fine-Tuning) model, which uses LoRA (Low-Rank Adaptation) adapters. If you’re not familiar with these terms, don’t worry. LoRA adapters allow us to update only 1–10% of the model’s parameters during fine-tuning. Without them, we’d need to retrain the entire model, which would be significantly more time-consuming, computationally intensive, and expensive. Unsloth provides these recommended settings for optimal performance. While we’ll use their default configuration for this tutorial, feel free to explore and adjust these parameters based on your needs.

model = FastLanguageModel.get_peft_model(
 model,
 r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
 target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
 "gate_proj", "up_proj", "down_proj",],
 lora_alpha = 16,
 lora_dropout = 0, # Supports any, but = 0 is optimized
 bias = "none", # Supports any, but = "none" is optimized
 use_gradient_checkpointing = "unsloth", # 4x longer contexts auto supported!
 random_state = 3407,
 use_rslora = False, # We support rank stabilized LoRA
 loftq_config = None, # And LoftQ
)

Data

Now, this is where things can get a little bit tricky depending on what data set you’re using. The each data set comes different from each other, but they’re each formatted in the same way such that the large language model can understand it. Llama3.2 uses alpaca prompts, which look like this.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
[The task or question you want the model to perform/answer]

### Input:
[Additional context or information needed to complete the task. This can be empty if the instruction is self-contained]

### Response:
[The expected output or answer you want the model to learn]

For our SQL database project, we’re specifically interested in three components:

The SQL query prompt
The generated SQL code
The explanation of the code

from datasets import Dataset, load_dataset

# Define the prompt template with variables matching the loop content
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
{response}"""

# Set the EOS token (assuming the tokenizer is already defined)
EOS_TOKEN = tokenizer.eos_token

# Formatting function to apply the prompt template to the dataset
def formatting_prompts_func(examples):
 company_databases = examples["sql_context"]
 prompts = examples["sql_prompt"]
 sqls = examples["sql"]
 explanations = examples["sql_explanation"]
 texts = []
 for company_database, prompt, sql, explanation in zip(company_databases, prompts, sqls, explanations):
 # Substitute the correct placeholders
 text = alpaca_prompt.format(
 instruction=prompt,
 input=company_database,
 response=sql + " " + explanation
 ) + EOS_TOKEN
 texts.append(text)
 return {"text": texts} # Ensure the formatted text is returned as a "text" field

# Load dataset and map formatting function to add prompts
ds = load_dataset("gretelai/synthetic_text_to_sql")
formatted_ds = ds.map(formatting_prompts_func, batched=True) # Apply formatting

# Select the 'train' split from the formatted dataset
train_dataset = formatted_ds['train']

Training Configuration

There are a lot of parameters to use, and all that can be described. For example, have the maximum number of steps, which tells us how many training steps to perform. Seed is a random number generator. We used to be able to reproduce results, and warmup steps gradually increased the learning rate over time.

from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from trl import SFTTrainer

# Trainer setup
trainer = SFTTrainer(
 model=model, # Ensure model is defined
 tokenizer=tokenizer, # Ensure tokenizer is defined
 train_dataset=train_dataset, # Use the 'train' split from formatted_ds
 dataset_text_field="text", # This is the field we created with formatted prompts
 max_seq_length=max_seq_length, # Ensure max_seq_length is defined
 dataset_num_proc=2,
 packing=False, # Can make training 5x faster for short sequences.
 args=TrainingArguments(
 per_device_train_batch_size=2,
 gradient_accumulation_steps=4,
 warmup_steps=5,
 max_steps=60,
 learning_rate=2e-4,
 fp16=not is_bfloat16_supported(),
 bf16=is_bfloat16_supported(),
 logging_steps=1,
 optim="adamw_8bit",
 weight_decay=0.01,
 lr_scheduler_type="linear",
 seed=3407,
 output_dir="outputs",
 report_to="none", # Disable WANDB logging
 )
)

So now that we have everything set up, let’s run it. And that’s it.

trainer_stats = trainer.train()

Resources

Code on colab: https://colab.research.google.com/drive/1BHuj-8mA8lvxJQyQsCyp-7vlGkXuRxKY?usp=sharing

unsloth (Unsloth AI)

Hey! We're focusing on making AI more accessible to everyone!

huggingface.co

https://www.reddit.com/r/LocalLLaMA/comments/1ar7e4m/unsloth_whats_the_catch_seems_too_good_to_be_true/

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Fine-Tune LLMs with Unsloth

Author(s): Barhoumi Mosbeh

Why Fine-Tune When We Have RAG?

Enter Unsloth: Making Fine-Tuning Accessible

Hands-on: Fine-Tuning for SQL Generation

Setting Up the Environment

Loading the Model

Setting Up PEFT

Data

Training Configuration

Resources

unsloth (Unsloth AI)

Hey! We're focusing on making AI more accessible to everyone!

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Secret to Unlocking Deeper SWOT Analysis with AI (The Code That Started It All — and How I Took It to the Next Level)

Evaluating and Monitoring LLM Agents: Tools, Metrics, and Best Practices

Building Multi-Agent AI Systems From Scratch: OpenAI vs. Ollama

Web-LLM Assistant: Bridging Local AI Models With Real-Time Web Intelligence

ChatGPT Gets Windows App

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Fine-Tune LLMs with Unsloth

Author(s): Barhoumi Mosbeh

Why Fine-Tune When We Have RAG?

Enter Unsloth: Making Fine-Tuning Accessible

Hands-on: Fine-Tuning for SQL Generation

Setting Up the Environment

Loading the Model

Setting Up PEFT

Data

Training Configuration

Resources

unsloth (Unsloth AI)

Hey! We're focusing on making AI more accessible to everyone!

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement