We introduce DivEMT, the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages. Using a strictly controlled setup, 18 professional translators were instructed to translate or post-edit the same set of English documents into Arabic, Dutch, Italian, Turkish, Ukrainian, and Vietnamese. During the process, their edits, keystrokes, editing times, pauses, and perceived effort were recorded, enabling an in-depth, cross-lingual evaluation of NMT quality and its post-editing process. Using this new dataset, we assess the impact on translation productivity of two state-of-the-art NMT systems, namely: Google Translate and the open-source multilingual model mBART50. We find that, while post-editing is consistently faster than translation from scratch, the magnitude of its contribution varies largely across systems and languages, ranging from doubled productivity in Dutch and Italian to marginal gains in Arabic, Turkish and Ukrainian, for some of the evaluated modalities. Moreover, the observed cross-language variability appears to partly reflect source-target relatedness and type of target morphology, while remaining hard to predict even based on state-of-the-art automatic MT quality metrics. We publicly release the complete dataset, including all collected behavioural data, to foster new research on the ability of state-of-the-art NMT systems to generate text in typologically diverse languages.
翻译:我们引入了DivEMT, 这是首次针对一组类型多样的目标语言对神经机器翻译(NMT)进行公开编辑后的研究。 我们使用严格控制的设置, 18个专业笔译员被指示将同样的一套英文文件翻译或编辑成阿拉伯文、荷兰文、意大利文、土耳其文、乌克兰文和越南文。 在这一过程过程中,他们编辑、键盘、编辑时间、暂停和感知的努力都得到了记录,使得能够对国家神经机器翻译的多样化质量及其编辑后进程进行深入、跨语种评估。我们使用这一新数据集,评估了两种最先进的NMT系统对翻译生产率的影响,即:谷歌翻译和开放源的多语种模型 mBART50。 我们发现,尽管编辑后的文件始终比翻译快,但其贡献的规模在系统和语言之间有很大差异,从荷兰文和意大利文的生产率翻番到阿拉伯文、土耳其文和乌克兰文的边际收益。 此外,观察到的跨语差异变化似乎部分反映了两种最先进的NMT系统的来源-目标目标目标目标相关目标相关程度和类型的质量预测能力,包括我们收集的完整数据库数据,直到所有硬性数据库。