比较人文和机器翻译的公式语言:从议会团体的角度看问题 (Comparing Formulaic Language in Human and Machine Translation: Insight from a Parliamentary Corpus)

A recent study has shown that, compared to human translations, neural machine translations contain more strongly-associated formulaic sequences made of relatively high-frequency words, but far less strongly-associated formulaic sequences made of relatively rare words. These results were obtained on the basis of translations of quality newspaper articles in which human translations can be thought to be not very literal. The present study attempts to replicate this research using a parliamentary corpus. The text were translated from French to English by three well-known neural machine translation systems: DeepL, Google Translate and Microsoft Translator. The results confirm the observations on the news corpus, but the differences are less strong. They suggest that the use of text genres that usually result in more literal translations, such as parliamentary corpora, might be preferable when comparing human and machine translations. Regarding the differences between the three neural machine systems, it appears that Google translations contain fewer highly collocational bigrams, identified by the CollGram technique, than Deepl and Microsoft translations.

翻译：最近的一项研究表明,与人文翻译相比,神经机器翻译包含较强关联的公式序列,由相对高频的单词组成,但用相对稀有的单词制作的公式序列则少得多。这些结果的根据是高质量的报纸文章的翻译,其中可以认为人文翻译并不十分字面化。本研究报告试图用议会版复制这项研究。文本由三种著名的神经机器翻译系统(DeepL、Google Translate和微软翻译)从法文翻译成英文。结果证实了对新闻材料的观察,但差异较小。它们表明,在比较人文翻译和机器翻译时,使用通常导致更多文字翻译的文本类型,例如议会卷子。关于三种神经机器系统之间的差异,谷歌翻译中与Colgram技术确定的Crevel和微软翻译相比,似乎包含的高度合地段大号。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日