Cross-Lingual Summarization (CLS) is a task that extracts important information from a source document and summarizes it into a summary in another language. It is a challenging task that requires a system to understand, summarize, and translate at the same time, making it highly related to Monolingual Summarization (MS) and Machine Translation (MT). In practice, the training resources for Machine Translation are far more than that for cross-lingual and monolingual summarization. Thus incorporating the Machine Translation corpus into CLS would be beneficial for its performance. However, the present work only leverages a simple multi-task framework to bring Machine Translation in, lacking deeper exploration. In this paper, we propose a novel task, Cross-lingual Summarization with Compression rate (CSC), to benefit Cross-Lingual Summarization by large-scale Machine Translation corpus. Through introducing compression rate, the information ratio between the source and the target text, we regard the MT task as a special CLS task with a compression rate of 100%. Hence they can be trained as a unified task, sharing knowledge more effectively. However, a huge gap exists between the MT task and the CLS task, where samples with compression rates between 30% and 90% are extremely rare. Hence, to bridge these two tasks smoothly, we propose an effective data augmentation method to produce document-summary pairs with different compression rates. The proposed method not only improves the performance of the CLS task, but also provides controllability to generate summaries in desired lengths. Experiments demonstrate that our method outperforms various strong baselines in three cross-lingual summarization datasets. We released our code and data at https://github.com/ybai-nlp/CLS_CR.
翻译:跨语言拼图化( CLS) 是一项从源文档中提取重要信息并用另一种语言将其汇总为摘要的任务。 这是一项具有挑战性的任务, 需要有一个系统来理解、 总结并同时翻译, 使其与单语拼图化( MS) 和机器翻译( MT) 高度相关。 实际上, 机器翻译的培训资源远比跨语言和单语拼图化( CLS) 还要多语言拼图( CLS) 。 因此, 将机器翻译程序纳入 CLS 将有利于其业绩。 但是, 目前的工作只能利用一个简单的多任务框架来将机器翻译引入, 缺乏更深入的探索。 在本文件中, 我们提议一个新任务, 跨语言拼图的拼图化( CCCCCCCC), 通过大型机器翻译( CMLS) 生成跨语言拼图( CMLS) 。 通过引入压缩速率, 我们把MT任务视为一个特殊的 CLS 任务, 压缩率只有100 % 。 因此, 也可以将它们训练成一个统一的任务, 分享知识, 而不是更高效的 。 然而, 我们的缩缩缩化工作在 30 和 任务中, 的缩略图中, 我们的CMTLSLSLSLSLSLSLSLSLSLS 将产生一个巨大的一个巨大的 。