使用背翻译模型校正明显错误比较 (Comparison of Grammatical Error Correction Using Back-Translation Models)

Grammatical error correction (GEC) suffers from a lack of sufficient parallel data. Therefore, GEC studies have developed various methods to generate pseudo data, which comprise pairs of grammatical and artificially produced ungrammatical sentences. Currently, a mainstream approach to generate pseudo data is back-translation (BT). Most previous GEC studies using BT have employed the same architecture for both GEC and BT models. However, GEC models have different correction tendencies depending on their architectures. Thus, in this study, we compare the correction tendencies of the GEC models trained on pseudo data generated by different BT models, namely, Transformer, CNN, and LSTM. The results confirm that the correction tendencies for each error type are different for every BT model. Additionally, we examine the correction tendencies when using a combination of pseudo data generated by different BT models. As a result, we find that the combination of different BT models improves or interpolates the F_0.5 scores of each error type compared with that of single BT models with different seeds.

翻译：典型错误校正(GEC)缺乏足够的平行数据,因此,GEC研究开发了各种生成伪数据的方法,其中包括成对的语法和人工生成的非语法句。目前,生成伪数据的主流方法是回译(BT)。以前使用BT进行的大多数GEC研究对GEC模型和BT模型都采用了相同的结构。然而,GEC模型根据其结构有不同的校正趋势。因此,在本研究中,我们比较了以不同BT模型(即变换器、CNN和LSTM)生成的伪数据培训的GEC模型的校正趋势。结果证实,每一种错误类型的校正趋势对每一种BT模型都是不同的。此外,我们在使用不同BT模型产生的伪数据组合时,我们研究了纠正趋势。结果发现,不同的BT模型组合改进或将每种错误类型的F_0.5分数与不同种子的单个BT模型相比较。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《Transformers模型》教程，64页ppt

专知会员服务

319+阅读 · 2020年11月26日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

18+阅读 · 2020年11月17日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日