| dc.description.abstract |
Large Language Models (LLMs) excel at a variety of Natural Language Processing (NLP)
tasks, but their implementation is frequently hampered by high computational costs and inefficiency
in low-resource settings. While smaller models provide faster inference, they often
have worse contextual knowledge, greater mistake rates, and are more susceptible to hallucinations.
This paper investigates task-specific knowledge distillation as a realistic method for
transferring skills from high-capacity teacher models to compact student models in three important
NLP tasks: text summarization, sentiment analysis, and text categorisation. Information
was condensed from four powerful teacher models LLaMA 3.1 (70B), Falcon (40B), Gemma2
(10B), and Qwen2.5 (72B), into much smaller students (8B, 7B, 2B, and 7B, respectively), and
their performance was assessed using conventional task-specific measures. The results reveal
that distilled models preserve a significant percentage of instructor performance while minimizing
hallucinations and increasing efficiency. In summary, Qwen2.5 (7B) obtained ROUGE-L
of 0.6743, BLEU of 47.6483, and METEOR of 0.6726. In sentiment analysis, distillation increased
LLaMA 8B accuracy from 0.4025 to 0.5900, and in classification, distilled models
better caught overlapping category meanings. These findings demonstrate that task-specific
distillation is a viable strategy for developing lightweight, high-performance NLP models for
resource-constrained applications. |
en_US |