Chinese Grammatical Error Correction (CGEC) aims to generate a correct sentence from an erroneous sequence, where different kinds of errors are mixed. This paper divides the CGEC task into two steps, namely spelling error correction and grammatical error correction. Specifically, we propose a novel zero-shot approach for spelling error correction, which is simple but effective, obtaining a high precision to avoid error accumulation of the pipeline structure. To handle grammatical error correction, we design part-of-speech (POS) features and semantic class features to enhance the neural network model, and propose an auxiliary task to predict the POS sequence of the target sentence. Our proposed framework achieves a 42.11 F0.5 score on CGEC dataset without using any synthetic data or data augmentation methods, which outperforms the previous state-of-the-art by a wide margin of 1.30 points. Moreover, our model produces meaningful POS representations that capture different POS words and convey reasonable POS transition rules.
翻译:中国语言错误校正(CGEC)旨在从错误序列中产生正确句子,因为错误类型不一。本文将CCC的任务分为两个步骤,即拼写错误校正和语法错误校正。具体地说,我们建议对拼写错误校正采取新颖的零点方法,该方法简单但有效,获得高精度以避免管道结构的错误累积。为了处理语法错误校正,我们设计了部分语音特征和语义类特征,以加强神经网络模型,并提出了预测目标句POS序列的辅助任务。我们的拟议框架在CCC数据集上取得了42.11 F0.5分,而没有使用任何合成数据或数据增强方法,这些方法以1.30点的宽差幅超过了以前的状态。此外,我们的模型产生了有意义的POS演示,其中捕捉了不同的POS词并传达了合理的POS过渡规则。