Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-FranΓ§ois Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Fine-Tuning 101: Unlocking the Power of AI Customization
Latest   Machine Learning

Fine-Tuning 101: Unlocking the Power of AI Customization

Last Updated on March 10, 2025 by Editorial Team

Author(s): Dhruv Tiwari

Originally published on Towards AI.

Fine-tuning is the process of training a model with specific examples to shape its responses in a desired way. However, it should not be used to teach new knowledge.

1. What are the methods to do fine-tuning

There are 3 methods to do fine-tuning as of now:

  1. Full parameter training
  2. Low Rank Adaptation (LoRA)
  3. Quantized LoRA (QLoRA)

1. Full Parameter Fine-Tuning

Full-parameter fine-tuning is a method of adapting a pre-trained model to a specific task by updating all of its parameters. Unlike parameter-efficient tuning techniques (such as LoRA), which modify only a subset of parameters, full fine-tuning allows the model to learn task-specific knowledge comprehensively.

2. Low Rank Adaptation

LoRA is a technique designed to efficiently fine-tune models like LLAMA, which has 8 billion parameters. Training such a model is extremely GPU and memory intensive because all the parameters must be stored and calculated in memory.

How does LoRA work?

Let’s say we have a weight matrix W of shape (200,200), which means it has:

200 x 200 = 40,000 trainable parameters

Instead of updating this entire matrix, LoRA freezes W and introduces two much smaller matrices:

  • A of shape (200,1) β†’ 200 parameters
  • B of shape (1,200) β†’ 200 parameters

This gives a total of 200+200 = 400 trainable parameters

Low-Rank Approximation

  • Instead of updating 40,000 parameters, we only train 400 parameters.
  • However, the matrix multiplication A x B reconstructs a full-sized (200,200) matrix.

A x B = (200,1) x (1,200) = (200,200)

Now what?

We only store 400 parameters, yet during the forward pass, the model still operates on a (200,200) matrix.

This is because we only train 400 parameters instead of the usual 40,000, which significantly reduces memory usage and speeds up fine-tuning.

3. Quantized LoRA

It’s straightforward; we simply add a quantization layer before LoRA.

Final model = Quantization to 4 bit + LoRA

What does this help with?

Although LoRA helps in reducing the number of trainable parameters, we still have to load all the parameters into our GPU for training. By using Quantization in 4 bit we reduce the size of the model’s parameters by 4–6x.

What is Quantization?

Normally, the model’s parameters are stored in FP32 or Floating Point with 32 bits. This is necessary for the model’s accuracy during training. By using quantization, we reduce the number of bits to 4 bits (NF4). While this reduces accuracy, it’s not a significant loss since the parameters are already trained.

Now let’s get started with the practical implementation.

2. The process to Fine-Tune an LLM

Libraries that are used:

  • Datasets β€” Loading the dataset
  • Transformers β€” Loading the LLM and storing it
  • PEFT β€” LoRA configuration
  • TRL β€” Training the model using Supervised Fine-Tuning
  • Optional: Unsloth β€” Fast fine-tuning

We will be working with the SocraticChat dataset

from datasets import load_dataset
dataset = load_dataset('FreedomIntelligence/SocraticChat',split='train[0:500]')

Loading the data with the first 500 rows

bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
'meta-llama/Meta-Llama-3-8B',
quantization_config=bnb_config,
device_map="auto",
attn_implementation='eager'
)
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B')
model, tokenizer = setup_chat_format(model, tokenizer)

The model along with the tokenizer is loaded and bitsanbytes is used to quantize the Llama 3 8B model to 4 bits.

Setup chat format is a function imported from TRL and is used to transform the chat to ChatML

peft_config = LoraConfig(
r=8,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']
)
model = get_peft_model(model, peft_config)

This code converts a standard model into a LoRA model by adding two matrices.

  • r=8 – This defines the rank of the matrix. For example, if we had a (200,1) matrix, its rank would be 1.
  • target_modules – Specifies where the LoRA layers will be added. These layers are the ones that will be fine-tuned.
def formatting_prompts_func(example):
k=[]
for converse in example['conversations']:
k.append({'role':converse['from'], 'content':'assistant' if converse['value'] == 'gpt' else 'user'})
example['text'] = tokenizer.apply_chat_template(k, tokenize=False)
return example
dataset = dataset.map(formatting_prompts_func, num_proc=4)

It will go through the entire 500 converstations and convert them into the format of {β€˜role’:”, β€˜content’:’’}

trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset=dataset,
peft_config=peft_config,
args = SFTConfig(
dataset_text_field="text",
per_device_train_batch_size=1,
gradient_accumulation_steps=2,
warmup_steps = 5,
num_train_epochs = 1,
learning_rate = 2e-4,
fp16 = False,
bf16 = True,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "model_traning_outputs",
report_to = "wandb",
max_seq_length = 512,
dataset_num_proc = 4,
packing = False,
),
)

There are several hyperparameters that determine how well our model performs. However, you don’t need to worry about them too much, as they are based on official documentation.

Key Hyperparameters (Abstract Overview):

  • peft_config – The LoRA configuration we created.
  • per_device_train_batch_size – The number of batches sent to the GPU. Increase this if you have a powerful GPU for faster training.
  • num_epochs – Defines how many times the model trains on the dataset. You can use fractional values (e.g., 0.1 for quick training) or increase it (e.g., 2 for better accuracy). However, be mindful of overfitting.
  • max_seq_length – The maximum number of tokens in the model’s response.

And finally, to start training, we run

trainer.train()
wandb.finish()
model.config.use_cache = True

To evaluate our fine-tuned model, we use Weights & Biases for a comprehensive training report. After saving the model with trainer.model.save_pretrained(new_model), we can test it using the following code

trainer.model.save_pretrained(new_model)
messages = [ { "role": "user", "content": "What is the sum of 2+2" }]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, 
add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors='pt', padding=True,
truncation=True).to("cuda")
outputs = model.generate(**inputs, max_length=150,
num_return_sequences=1)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)print(text.split("assistant")[1])

With this, our fine-tuned model is ready! 🎉 We’ve successfully fine-tuned a model and gained hands-on experience with the fundamentals of AI.

And here’s the final repo

GitHub – dhruv1710/FinetuneSocraticLLM: project for fine-tuning large language models (specifically…

project for fine-tuning large language models (specifically Llama 3 8B) on the SocraticChat dataset to create a model…

github.com

Conclusion

While new AI technologies continue to emerge, mastering the fundamentals such as fine-tuning, is crucial for effectively applying AI. In this article, we successfully fine-tuned LLaMA 3 to perform Socratic questioning. This same technique can be leveraged for high-impact applications, such as fine-tuning models on radiological data to improve disease detection accuracy.

Keep Learning!

About Me

β€” β€” β€” β€” β€” β€”

I’m Dhruv, a 17yo builder passionate about Applied AI. I’ve been coding since I was 7, dropped out of high school to pursue startups, and have built several AI-driven projects. I love working with technology, and sharing insights on cutting-edge AI topics. Follow me for in-depth, knowledge-packed articles on the key topics of AI!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓