Data building for automatic post-editing (APE) requires extensive and expert-level human effort, as it contains an elaborate process that involves identifying errors in sentences and providing suitable revisions. Hence, we develop a self-supervised data generation tool, deployable as a web application, that minimizes human supervision and constructs personalized APE data from a parallel corpus for several language pairs with English as the target language. Data-centric APE research can be conducted using this tool, involving many language pairs that have not been studied thus far owing to the lack of suitable data.
翻译:自动编辑后数据(APE)的建立需要大量和专家层面的人力工作,因为它包含一个复杂的过程,涉及查明判决中的错误和提供适当的修改,因此,我们开发了一个自我监督的数据生成工具,可作为一种网络应用程序部署,最大限度地减少人的监督,并从一个平行的文体中为若干种语言配对(以英语作为目标语言)建立个人化的APE数据,以数据为中心的APE研究可使用这一工具进行,涉及许多因缺乏适当数据而尚未研究的语文配对。