Fine-Tuning Embedding Models: Achieving More with Less
Author(s): Nilesh Raghuvanshi
Originally published on Towards AI.
Improving Retrieval Augmented Generation (RAG) Systematically
Introduction
In my last article, we saw that, while evaluating multiple embedding models on our domain-specific data, the huggingface/BAAI/bge-large-en-v1.5 model (1024 dimensions) showed competitive performance. It was comparable to azure/text-embedding-3-large (3072 dimensions) and azure/text-embedding-3-small (1536 dimensions). What made it more interesting was its flexibility for fine-tuning on domain-specific data using the sentence-transformers library.
Choosing the Right Model for Fine-Tuning
As you may know, the BGE family of models comes in multiple sizes (large, base, and small), each with different parameter counts and memory usage. The large model is suitable for high-resource environments due to its high parameter count and memory requirements, while the base and small models are more practical for resource-constrained scenarios. After initial exploration, I decided to approach my fine-tuning experiments with the base model BAAI/bge-base-en-v1.5. I chose the base model as it provided a good balance between resource efficiency and performance, making it suitable for the available computational resources. The base model has 109 million parameters, using 0.41 GB of memory, and outputs 768 dimensions. In comparison, the large model BAAI/bge-large-en-v1.5 has 335 million parameters, uses 1.25 GB of memory, and outputs 1024 dimensions. Its smaller size made it more practical for my GPU (NVIDIA A40 with 48 GB VRAM), allowing for faster iterations given the memory limitations.
Matryoshka Representation Learning (MRL)
One notable feature of the sentence-transformers library is support for Matryoshka Representation Learning (MRL). MRL boosts efficiency by enabling the embedding models to generate embeddings at different dimensions without a significant loss in performance. Reducing embedding size improves computational efficiency and lowers memory requirements, which is particularly useful when deploying models in resource-constrained environments. For this evaluation, I experimented with embedding dimensions of [768, 512, 256, 128, 64]. The latest version of OpenAI embedding models azure/text-embedding-3-large and azure/text-embedding-3-small also supports MRL, making it an exciting area for comparison.
Training
If you remember, we used only 20% of the generated synthetic dataset for evaluation to ensure a representative sample for testing while keeping computational requirements manageable. The remaining 80% of the dataset was reserved for fine-tuning the embedding model to provide ample training data and enhance model generalization. For training, we used a combination of MatryoshkaLoss and MultipleNegativesRankingLoss as the loss function. MatryoshkaLoss helps in learning embeddings at multiple granularities. MultipleNegativesRankingLoss is a loss function that optimizes models to produce similar embeddings for positive sentence pairs and dissimilar embeddings for negative pairs. By integrating MatryoshkaLoss with MultipleNegativesRankingLoss, one can train a model to generate embeddings that are both dimensionally flexible and semantically robust. This combination facilitates the use of multiple embedding sizes while maintaining high performance in tasks requiring precise semantic understanding. Finally, we used adamw optimizer with a learning rate of 2e-5 and trained for 10 epochs. This number of epochs was chosen to strike a balance between training time and model performance, providing sufficient learning without overfitting.
Evaluation and Results
The model was fine-tuned on the training set. We evaluated its performance using only test queries against the entire corpus (both training and test data) using InformationRetrievalEvaluator from the sentence-transformers library. The evaluation on multiple embedding models in my last article was performed only using the test set i.e. queries and corpus both from test dataset only. Next, we compare the performance of the base model BAAI/bge-base-en-v1.5 with the newly fine-tuned version.
Comparison of Base and Fine-Tuned Models
To visualize the results, we compared the base and fine-tuned models. Note that the base model does not support MRL. In the visualization, the first gray bar represents the base model at 768 dimensions, while the green bars represent the fine-tuned model at 768, 512, 256, 128, and 64 dimensions. Interestingly, the fine-tuned model at 64 dimensions (last green bar) outperformed the base model at 768 dimensions across all metrics. The fine-tuned model performed best at 512 and 256 dimensions, showing the strength of MRL. Fine-tuning for just 10 epochs on a domain-specific dataset led to an 8% improvement in NDCG@10.
Comparison with Top-Performing Models
Next, we compared the fine-tuned model against the top-performing models from our last evaluation. Not only did the fine-tuned model huggingface/BAAI/ft-bge-base-en-v1.5 512 outperform the rest of the competition, but it also challenged the top model azure/text-embedding-3-large 3072. In fact, at higher cutoffs (3, 5, 10), the fine-tuned model edges past azure/text-embedding-3-large 3072 if you consider the metrics up-to 3 decimal places (not shown here).
Fair Comparison Across Dimensions
To make the comparisons fair, we also evaluated the fine-tuned model at 512, 256, and 64 dimensions against azure/text-embedding-3-large at corresponding dimensions. Here, the fine-tuned model emerged as the clear winner, though azure/text-embedding-3-large remained competitive at 512 dimensions. However, at 256 and especially at 64 dimensions, the performance of azure/text-embedding-3-large dropped significantly.
Conclusion
Overall, our fine-tuning efforts have paid off well. We now have a model that offers 6x to 48x storage reduction compared to the top-performing model from earlier evaluations, with better performance across all metrics. For instance, this dimensionality reduction and improved performance translates into lower storage costs, faster search times, reduced memory usage, and ultimately lower overall costs, all while delivering superior performance. In the final article of this short series, we will see how to evaluate retrieval and generation pipeline to determine the most optimal RAG pipeline for your application.
References
[1] Fine-tune Embedding models for Retrieval Augmented Generation (RAG)
[2] Introduction to Matryoshka Embedding Models
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI