Despite the remarkable performance of large language models (LLMs) in recent NLP tasks, their deployment poses substantial challenges due to high computational and memory demands. Recent research has concentrated on improving open-source smaller models through knowledge distillation from LLMs to reduce computational resource costs with promising outcomes. Nevertheless, they frequently fall short of attaining LLM-level performance, particularly in tasks demanding advanced reasoning. In this work, we introduce the \textbf{Mixed Distillation} framework, which capitalizes on the strengths of Program-of-Thought (PoT) and Chain-of-Thought (CoT) capabilities within LLMs and distills these capabilities to smaller models. Regarding these two capabilities, the PoT is dedicated to enhancing the performance of reasoning results generated by smaller models, while CoT simultaneously optimizes the results. Our Mixed Distillation framework offers a promising approach to enhance the capabilities of smaller models, bridging the gap with LLMs, and demonstrating better performance across various tasks. Specifically, on the SVAMP dataset, employing a 7 billion parameter Llama2 and CodeLlama in a mixed distillation framework not only boosts distillation capabilities beyond single-path distillation methods but also outperforms the LLM (GPT-3.5-turbo) in terms of reasoning accuracy. Through sampling in multiple-path reasoning, the models achieve impressive accuracy performances of 85% and 85.5%, respectively, signifying advancements over previous distillation methods.
翻译:暂无翻译