With the development of deep learning, advanced dialogue generation methods usually require a greater amount of computational resources. One promising approach to obtaining a high-performance and lightweight model is knowledge distillation, which relies heavily on the pre-trained powerful teacher. Collaborative learning, also known as online knowledge distillation, is an effective way to conduct one-stage group distillation in the absence of a well-trained large teacher model. However, previous work has a severe branch homogeneity problem due to the same training objective and the independent identical training sets. To alleviate this problem, we consider the dialogue attributes in the training of network branches. Each branch learns the attribute-related features based on the selected subset. Furthermore, we propose a dual group-based knowledge distillation method, consisting of positive distillation and negative distillation, to further diversify the features of different branches in a steadily and interpretable way. The proposed approach significantly improves branch heterogeneity and outperforms state-of-the-art collaborative learning methods on two widely used open-domain dialogue datasets.
翻译:随着深度学习的发展,先进的对话生成方法通常需要更多的计算资源。一种获得高性能和轻量级模型的有前途的方法是知识蒸馏,其严重依赖于预训练的先进 teacher 模型。协作学习,也称为在线知识蒸馏,是在没有训练良好的大型 teacher 模型的情况下进行一阶段群组蒸馏的有效方法。然而,由于相同的训练目标和独立相同的训练集,之前的工作存在严重的支路同质性问题。为了缓解这个问题,我们在支路的训练中考虑对话属性。每个支路基于所选的子集学习属性相关的特征。此外,我们提出了一个双重群组知识蒸馏方法,包括正蒸馏和负蒸馏,以进一步以稳定和可解释的方式使不同支路的特征多样化。所提出的方法显着改善了支路的异质性,并在两个广泛使用的开放域对话数据集上优于最先进的协作学习方法。