Human-translated text displays distinct features from naturally written text in the same language. This phenomena, known as translationese, has been argued to confound the machine translation (MT) evaluation. Yet, we find that existing work on translationese neglects some important factors and the conclusions are mostly correlational but not causal. In this work, we collect CausalMT, a dataset where the MT training data are also labeled with the human translation directions. We inspect two critical factors, the train-test direction match (whether the human translation directions in the training and test sets are aligned), and data-model direction match (whether the model learns in the same direction as the human translation direction in the dataset). We show that these two factors have a large causal effect on the MT performance, in addition to the test-model direction mismatch highlighted by existing work on the impact of translationese. In light of our findings, we provide a set of suggestions for MT training and evaluation. Our code and data are at https://github.com/EdisonNi-hku/CausalMT
翻译:人文翻译文本显示了与用同一语言编写的自然书面文本不同的独特特征。 这个现象被称为翻译现象,被认为混淆了机器翻译(MT)评估。 然而,我们发现,现有的翻译工作忽略了一些重要因素,结论大多是相关因素,但大多不是因果。在这项工作中,我们收集了“CausalMT”,这是一个数据集,MT培训数据也与人文翻译方向贴上标签。我们检查了两个关键因素,即火车测试方向匹配(无论是培训和测试机组中的人文翻译方向是否一致),以及数据模型方向匹配(模型是否与数据集中的人文翻译方向相同)。我们发现,除了现有翻译影响工作突出的测试模式方向不匹配外,这两个因素对MT的表现具有很大的因果关系。根据我们的调查结果,我们为MT培训和评估提供了一套建议。我们的代码和数据在https://github.com/Edison-ni-hku/CausalsalMT。我们发现,我们的代码和数据在https://github. com/Edison-h/Cau/Cau-MTMT。