By using a trigram model and fine-tuning a pretrained BERT model for sequence classification, we show that machine translation and human translation can be classified with an accuracy above chance level, which suggests that machine translation and human translation are different in a systematic way. The classification accuracy of machine translation is much higher than of human translation. We show that this may be explained by the difference in lexical diversity between machine translation and human translation. If machine translation has independent patterns from human translation, automatic metrics which measure the deviation of machine translation from human translation may conflate difference with quality. Our experiment with two different types of automatic metrics shows correlation with the result of the classification task. Therefore, we suggest the difference in lexical diversity between machine translation and human translation be given more attention in machine translation evaluation.
翻译:通过使用字典模型和微调经过预先训练的BERT序列分类模型,我们发现机器翻译和人文翻译可以分类,其准确性高于偶然性,这表明机器翻译和人文翻译有系统性的不同。机器翻译的分类准确性比人文翻译高得多。我们表明,这可能是由于机器翻译和人文翻译之间在词汇多样性方面的差异造成的。如果机器翻译与人文翻译有独立的模式,衡量机器翻译与人文翻译偏差的自动衡量标准可能与质量不同。我们对两种不同的自动衡量标准的试验显示了与分类任务结果的关联性。因此,我们建议机器翻译和人文翻译之间在词汇多样性方面的差异在机器翻译评价中给予更多的注意。