In view of the poor robustness of existing Chinese grammatical error correction models on attack test sets and large model parameters, this paper uses the method of knowledge distillation to compress model parameters and improve the anti-attack ability of the model. In terms of data, the attack test set is constructed by integrating the disturbance into the standard evaluation data set, and the model robustness is evaluated by the attack test set. The experimental results show that the distilled small model can ensure the performance and improve the training speed under the condition of reducing the number of model parameters, and achieve the optimal effect on the attack test set, and the robustness is significantly improved. Code is available at https://github.com/Richard88888/KD-CGEC.
翻译:鉴于现有中国攻击试验机组和大型模型参数的语法错误校正模型不够坚固,本文件使用知识蒸馏法压缩模型参数,提高模型的反攻击能力,在数据方面,攻击试验组通过将扰动纳入标准评价数据集构建,模型坚固度由攻击试验组进行评估,实验结果表明,蒸馏的小模型可以在减少模型参数数目的条件下确保性能,提高培训速度,实现对攻击试验组的最佳效果,并大大改进了攻击试验组的可靠性,代码可在https://github.com/Richard88888/KD-CGEC查阅。