Practical Guide to Distilling Large Models into Small Models: A Novel Approach with Extended Distillation
Last Updated on March 3, 2025 by Editorial Team
Author(s): Shenggang Li
Originally published on Towards AI.
Comparing Traditional and Enhanced Step-by-Step Distillation: Adaptive Learning, Cosine Similarity, and Curriculum-Based Rationale Supervision
This member-only story is on us. Upgrade to access all of Medium.
In this paper, I will uncover the secrets behind transferring βbig modelβ intelligence to smaller, more agile models using two distinct distillation techniques: Traditional Distillation and Step-by-Step Distillation. Imagine having a wise, resource-heavy teacher model that not only gives the right answer but also explains its thought process β like a master chef sharing both the recipe and the secret tricks behind it. My goal is to teach a lean, efficient student model to emulate that expertise using just the distilled essence of knowledge.
To make these ideas crystal clear, I illustrate each technique using simple Logistic Regression demos. Although Logistic Regression is simpler than deep neural networks, it serves as an excellent canvas to experiment with concepts like temperature scaling, weighted losses, and even simulating a βchain-of-thoughtβ through intermediate linear scores. For Traditional Distillation, our student learns from the teacherβs soft probability outputs, balancing hard label accuracy with the subtle cues of soft labels. Meanwhile, Step-by-Step Distillation goes one step further by also incorporating the teacherβs internal reasoning process.
Finally, I propose an improved step-by-step distillation method that makes learning more stable and efficient. By adding… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI