DIMT Eval:印度语言元-Evaluate机器翻译指标数据集 (IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages)

The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over a billion speakers, are linguistically different from English, and to date, there has not been a systematic study of evaluating MT systems from English into Indian languages. In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics. Our results show that pre-trained metrics, such as COMET, have the highest correlations with annotator scores. Additionally, we find that the metrics do not adequately capture fluency-based errors in Indian languages, and there is a need to develop metrics focused on Indian languages. We hope that our dataset and analysis will help promote further research in this area.

翻译：机器翻译系统(MT)的迅速发展使得有必要进行全面研究,对正在使用的评价衡量标准进行元评价,从而能够更好地选择最能反映MT质量的衡量标准。不幸的是,大多数研究侧重于高资源语言,主要是英语,其观察结果可能并不总是适用于其他语言。印度语言有10亿以上的语言,在语言上与英语不同,迄今为止,还没有系统地研究如何评价从英语到印度语言的MT系统。在本文中,我们通过建立一个由7000个精细说明组成的MQM数据集来填补这一空白,该数据集涵盖7000个印度语言和7个MT系统,并利用该数据集来建立说明分数与使用现有自动衡量标准获得的分数之间的联系。我们的结果显示,预先培训的衡量标准,如知识与技术委员会,在语言上的评分具有最高的关联性。此外,我们发现,衡量标准并未充分捕捉到印度语言中基于流利的错误,因此需要制定侧重于印度语言的衡量标准。我们希望,我们的数据集和分析将有助于推动这一领域的进一步研究。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日