We introduce DivEMT, the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages. Using a strictly controlled setup, 18 professional translators were instructed to translate or post-edit the same set of English documents into Arabic, Dutch, Italian, Turkish, Ukrainian, and Vietnamese. During the process, their edits, keystrokes, editing times and pauses were recorded, enabling an in-depth, cross-lingual evaluation of NMT quality and post-editing effectiveness. Using this new dataset, we assess the impact of two state-of-the-art NMT systems, Google Translate and the multilingual mBART-50 model, on translation productivity. We find that post-editing is consistently faster than translation from scratch. However, the magnitude of productivity gains varies widely across systems and languages, highlighting major disparities in post-editing effectiveness for languages at different degrees of typological relatedness to English, even when controlling for system architecture and training data size. We publicly release the complete dataset including all collected behavioral data, to foster new research on the translation capabilities of NMT systems for typologically diverse languages.
翻译:我们引入了DivEMT, 这是针对一组类型多样的目标语言的首个公开的神经机器翻译(NMT)编辑后编辑研究。 我们使用严格控制的设置, 18个专业翻译被指示将同一套英文文件翻译或编辑后译成阿拉伯文、 荷兰文、 意大利文、 土耳其文、 乌克兰文和越南文。 在此期间, 他们的编辑、 键盘、 编辑时间和暂停记录了记录, 使得能够对国家神经机器翻译质量和编辑后效果进行深入、 跨语种评估。 我们使用这一新数据集, 评估两种最先进的NMT系统( Google Translate 和多语种的 mBART-50 模型)对翻译生产率的影响。 我们发现, 编辑后这一套英文文件总是比从零开始翻译的速度要快。 然而, 各个系统和语言在编辑后提高生产率的程度差异很大, 突出与英文不同程度的语文在编辑后的有效性方面存在重大差异, 即使在控制系统结构和培训数据大小时 。 我们公开发布完整的数据集, 包括所有收集的行为数据数据数据,, 以促进对NMT系统不同类型翻译能力进行新的研究。