A recent study has shown that, compared to human translations, neural machine translations contain more strongly-associated formulaic sequences made of relatively high-frequency words, but far less strongly-associated formulaic sequences made of relatively rare words. These results were obtained on the basis of translations of quality newspaper articles in which human translations can be thought to be not very literal. The present study attempts to replicate this research using a parliamentary corpus. The text were translated from French to English by three well-known neural machine translation systems: DeepL, Google Translate and Microsoft Translator. The results confirm the observations on the news corpus, but the differences are less strong. They suggest that the use of text genres that usually result in more literal translations, such as parliamentary corpora, might be preferable when comparing human and machine translations. Regarding the differences between the three neural machine systems, it appears that Google translations contain fewer highly collocational bigrams, identified by the CollGram technique, than Deepl and Microsoft translations.
翻译:最近的一项研究表明,与人文翻译相比,神经机器翻译包含较强关联的公式序列,由相对高频的单词组成,但用相对稀有的单词制作的公式序列则少得多。这些结果的根据是高质量的报纸文章的翻译,其中可以认为人文翻译并不十分字面化。本研究报告试图用议会版复制这项研究。文本由三种著名的神经机器翻译系统(DeepL、Google Translate和微软翻译)从法文翻译成英文。结果证实了对新闻材料的观察,但差异较小。它们表明,在比较人文翻译和机器翻译时,使用通常导致更多文字翻译的文本类型,例如议会卷子。关于三种神经机器系统之间的差异,谷歌翻译中与Colgram技术确定的Crevel和微软翻译相比,似乎包含的高度合地段大号。