Given a document in a source language, cross-lingual summarization (CLS) aims at generating a concise summary in a different target language. Unlike monolingual summarization (MS), naturally occurring source-language documents paired with target-language summaries are rare. To collect large-scale CLS samples, existing datasets typically involve translation in their creation. However, the translated text is distinguished from the text originally written in that language, i.e., translationese. Though many efforts have been devoted to CLS, none of them notice the phenomenon of translationese. In this paper, we first confirm that the different approaches to constructing CLS datasets will lead to different degrees of translationese. Then we design systematic experiments to investigate how translationese affects CLS model evaluation and performance when it appears in source documents or target summaries. In detail, we find that (1) the translationese in documents or summaries of test sets might lead to the discrepancy between human judgment and automatic evaluation; (2) the translationese in training sets would harm model performance in the real scene; (3) though machine-translated documents involve translationese, they are very useful for building CLS systems on low-resource languages under specific training strategies. Furthermore, we give suggestions for future CLS research including dataset and model developments. We hope that our work could let researchers notice the phenomenon of translationese in CLS and take it into account in the future.
翻译:根据一种源语言的文件,跨语言汇总(CLS)的目的是用一种不同的目标语言生成简明摘要。不同于单一语言汇总(MS),自然产生的源语言文件与目标语言摘要是罕见的。为了收集大规模 CLS样本,现有的数据集通常在创建过程中涉及翻译。但是,翻译文本与最初以该语言编写的文本有区别,即翻译。虽然已经为CLS做出了许多努力,但其中没有一处提到翻译现象。在本文中,我们首先确认,建造CLS数据集的不同方法将导致不同程度的翻译。然后,我们设计系统实验,调查翻译如何影响CLS模型评估,当它出现在源文件或目标摘要中时,现有数据集通常涉及翻译。我们发现:(1) 文件或测试集摘要中的翻译可能会导致人文判断和自动评估之间的差异;(2) 培训组中的翻译内容会损害实际的示范性工作表现;(3) 虽然机器翻译文件涉及翻译,但它们对于建立CLS系统关于低资源数据集的系统将会导致不同程度的翻译。然后,我们在具体的研究战略下,将CLS系统的数据转换成未来的研究记录。