We introduce translation error correction (TEC), the task of automatically correcting human-generated translations. Imperfections in machine translations (MT) have long motivated systems for improving translations post-hoc with automatic post-editing. In contrast, little attention has been devoted to the problem of automatically correcting human translations, despite the intuition that humans make distinct errors that machines would be well-suited to assist with, from typos to inconsistencies in translation conventions. To investigate this, we build and release the Aced corpus with three TEC datasets. We show that human errors in TEC exhibit a more diverse range of errors and far fewer translation fluency errors than the MT errors in automatic post-editing datasets, suggesting the need for dedicated TEC models that are specialized to correct human errors. We show that pre-training instead on synthetic errors based on human errors improves TEC F-score by as much as 5.1 points. We conducted a human-in-the-loop user study with nine professional translation editors and found that the assistance of our TEC system led them to produce significantly higher quality revised translations.
翻译:我们引入了翻译错误校正(TEC),这是自动校正人为翻译的任务。机器翻译中的缺陷具有长期动机,用自动编辑后编辑来改进翻译后加热的系统。相反,人们很少关注自动校正翻译的问题,尽管人们直觉地认为,机器会做出非常适合帮助的明显错误,从打字到翻译公约的不一致,从打字到翻译公约的不一致。为了对此进行调查,我们用三个技术EC数据集构建并发布Aced 程序。我们发现,技术执行委员会的人类错误比自动编辑后数据集的MC错误要多得多,翻译流畅率差得多,表明需要专门纠正人为错误的专门技术执行委员会模型。我们表明,根据人为错误对合成错误进行预先培训,使技术执行委员会的F分数提高了5.1分。我们与9个专业翻译编辑进行了人到在线用户的研究,发现技术执行委员会系统的协助导致他们产生质量高得多的翻译。