Cross-lingual natural language processing relies on translation, either by humans or machines, at different levels, from translating training data to translating test sets. However, compared to original texts in the same language, translations possess distinct qualities referred to as translationese. Previous research has shown that these translation artifacts influence the performance of a variety of cross-lingual tasks. In this work, we propose a novel approach to reducing translationese by extending an established bias-removal technique. We use the Iterative Null-space Projection (INLP) algorithm, and show by measuring classification accuracy before and after debiasing, that translationese is reduced at both sentence and word level. We evaluate the utility of debiasing translationese on a natural language inference (NLI) task, and show that by reducing this bias, NLI accuracy improves. To the best of our knowledge, this is the first study to debias translationese as represented in latent embedding space.
翻译:跨语言的自然语言处理取决于翻译,从翻译培训数据到翻译测试组,从翻译培训数据到翻译测试组,从人或机器的翻译到不同层次的翻译,但与同一种语言的原始文本相比,翻译具有不同的翻译性质。以前的研究表明,这些翻译工艺品影响着多种跨语言任务的执行。在这项工作中,我们提出了通过推广既定的消除偏差技术来减少翻译的新办法。我们使用“超自然空间投影”算法,并通过测量偏差之前和之后的分类准确性来显示,翻译在句子和字句上都有所减少。我们评估了在自然语言推论(NLI)任务中贬低翻译的效用,并表明通过减少这种偏差,NLI准确性会提高。根据我们的知识,这是对隐蔽空间中体现的贬低翻译的首项研究。