We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction. Our method aims at simulating mistakes made by second language learners, and produces a wider range of non-native style language in comparison to state-of-the-art synthetic data creation methods. In addition to purely grammatical errors, our approach generates other types of errors, such as lexical errors. We perform grammatical error correction experiments using neural sequence-to-sequence models, and carry out quantitative and qualitative evaluation. A model trained on data created using our proposed method is shown to outperform a baseline model on test data with a high proportion of errors.
翻译:我们用神经机器翻译句号碎片,以便为英文语法错误校正建立大量的培训数据。我们的方法旨在模拟第二语言学习者犯的错误,并产生与最新合成数据生成方法相比范围更广的非本地风格语言。除了纯粹的语法错误外,我们的方法还产生其他类型的错误,如字典错误。我们使用神经序列序列序列模型进行语法错误校正实验,并进行定量和定性评估。一个以我们拟议方法生成的数据为培训的模型显示,在测试数据的基准模型中,误差比例很高。