《人文翻译和机器翻译自动分类:从字典多样性角度研究》 (Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical Diversity)

By using a trigram model and fine-tuning a pretrained BERT model for sequence classification, we show that machine translation and human translation can be classified with an accuracy above chance level, which suggests that machine translation and human translation are different in a systematic way. The classification accuracy of machine translation is much higher than of human translation. We show that this may be explained by the difference in lexical diversity between machine translation and human translation. If machine translation has independent patterns from human translation, automatic metrics which measure the deviation of machine translation from human translation may conflate difference with quality. Our experiment with two different types of automatic metrics shows correlation with the result of the classification task. Therefore, we suggest the difference in lexical diversity between machine translation and human translation be given more attention in machine translation evaluation.

翻译：通过使用字典模型和微调经过预先训练的BERT序列分类模型,我们发现机器翻译和人文翻译可以分类,其准确性高于偶然性,这表明机器翻译和人文翻译有系统性的不同。机器翻译的分类准确性比人文翻译高得多。我们表明,这可能是由于机器翻译和人文翻译之间在词汇多样性方面的差异造成的。如果机器翻译与人文翻译有独立的模式,衡量机器翻译与人文翻译偏差的自动衡量标准可能与质量不同。我们对两种不同的自动衡量标准的试验显示了与分类任务结果的关联性。因此,我们建议机器翻译和人文翻译之间在词汇多样性方面的差异在机器翻译评价中给予更多的注意。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

多语言神经机器翻译综述论文，34页pdf，A Comprehensive Survey of Multilingual Neural Machine Translation

专知会员服务

19+阅读 · 2020年4月25日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知会员服务

24+阅读 · 2020年3月31日

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日