Cross-lingual summarization is the task of generating a summary in one language (e.g., English) for the given document(s) in a different language (e.g., Chinese). Under the globalization background, this task has attracted increasing attention of the computational linguistics community. Nevertheless, there still remains a lack of comprehensive review for this task. Therefore, we present the first systematic critical review on the datasets, approaches, and challenges in this field. Specifically, we carefully organize existing datasets and approaches according to different construction methods and solution paradigms, respectively. For each type of datasets or approaches, we thoroughly introduce and summarize previous efforts and further compare them with each other to provide deeper analyses. In the end, we also discuss promising directions and offer our thoughts to facilitate future research. This survey is for both beginners and experts in cross-lingual summarization, and we hope it will serve as a starting point as well as a source of new ideas for researchers and engineers interested in this area.
翻译:以一种语文(如英文)编写不同语文(如中文)的某一文件摘要(如英文)是一项任务。在全球化背景下,这项任务已引起计算语言界越来越多的注意,然而,仍缺乏对这项任务的全面审查,因此,我们首次对这一领域的数据集、方法和挑战进行系统的严格审查。具体地说,我们分别按照不同的构建方法和解决方案模式,仔细组织现有的数据集和方法。对于每一类数据集或方法,我们全面介绍和总结以往的努力,并进一步相互比较,以提供更深入的分析。最后,我们还讨论有希望的方向,提出我们的想法,以促进今后的研究。这项调查既针对初学者,也针对跨语言的拼图化专家,我们希望它将成为对这一领域感兴趣的研究人员和工程师的新想法的起点和来源。