Chinese Grammatical Error Correction (CGEC) aims to automatically detect and correct grammatical errors contained in Chinese text. In the long term, researchers regard CGEC as a task with a certain degree of uncertainty, that is, an ungrammatical sentence may often have multiple references. However, we argue that even though this is a very reasonable hypothesis, it is too harsh for the intelligence of the mainstream models in this era. In this paper, we first discover that multiple references do not actually bring positive gains to model training. On the contrary, it is beneficial to the CGEC model if the model can pay attention to small but essential data during the training process. Furthermore, we propose a simple yet effective training strategy called OneTarget to improve the focus ability of the CGEC models and thus improve the CGEC performance. Extensive experiments and detailed analyses demonstrate the correctness of our discovery and the effectiveness of our proposed method.
翻译:中国语言错误校正(CGEC)旨在自动发现和纠正中文文本中的语法错误。从长远看,研究人员将CCC视为具有某种程度不确定性的任务,也就是说,非语法句往往有多重参考。然而,我们认为,尽管这是一个非常合理的假设,但对于当今时代的主流模型的智慧来说,它过于严厉。在本文中,我们首先发现,多处引用实际上并没有给示范培训带来积极的成果。相反,如果模型在培训过程中能够关注小型但基本的数据,则对CCC模式有益。此外,我们提出了一个简单而有效的培训战略,称为Onetarget, 以提高CECC模型的重点能力,从而改进CECC的绩效。广泛的实验和详细分析表明我们发现的方法的正确性和我们拟议方法的有效性。