Pre-trained language models (LMs) obtain state-of-the-art performance when adapted to text classification tasks. However, when using such models in real-world applications, efficiency considerations are paramount. In this paper, we study how different training procedures that adapt LMs to text classification perform, as we vary model and train set size. More specifically, we compare standard fine-tuning, prompting, and knowledge distillation (KD) when the teacher was trained with either fine-tuning or prompting. Our findings suggest that even though fine-tuning and prompting work well to train large LMs on large train sets, there are more efficient alternatives that can reduce compute or data cost. Interestingly, we find that prompting combined with KD can reduce compute and data cost at the same time.
翻译:培训前语言模型(LMS)在适应文本分类任务时取得最先进的业绩;然而,在实际应用中使用这种模型时,效率因素是最重要的。在本文中,我们研究使LMS适应文本分类的不同培训程序如何表现,因为我们的模型和火车设置大小不同。更具体地说,当教师经过微调或激励培训时,我们比较标准微调、提示和知识蒸馏(KD ) 。 我们的研究结果表明,即使微调和推动良好工作,在大型列车上培训大型LMS,但有更有效的替代方法可以降低计算或数据成本。有趣的是,我们发现,与KD相结合的提示可以同时降低计算和数据成本。