长期文件总结经验调查:数据集、模型和计量 (An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics)

Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader's comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation metrics. For each component, we organize the literature within the context of long document summarization and conduct an empirical analysis to broaden the perspective on current research progress. The empirical analysis includes a study on the intrinsic characteristics of benchmark datasets, a multi-dimensional analysis of summarization models, and a review of the summarization evaluation metrics. Based on the overall findings, we conclude by proposing possible directions for future exploration in this rapidly growing field.

翻译：学术文章和商务报告等长期文件一直是详细阐述需要特别关注的重要问题和复杂主题的标准格式。因此,一个自动总结系统可以有效地将长篇文件压缩为简短和简洁的文本,以包罗最重要的信息,对于帮助读者理解最重要的信息将具有重要意义。最近,随着神经结构的出现,为推进自动文本汇总系统作出了重大研究努力,并出现了关于将这些系统扩展至长期文件领域的诸多挑战的研究。在这次调查中,我们全面概述了关于长篇文件汇总的研究,并系统地评价其研究设置的三个主要组成部分:基准数据集、汇总模型和评价指标。我们根据总的调查结果,在长篇文件汇总的背景下组织文献,并进行实证分析,以扩大当前研究进展的视角。实证分析包括对基准数据集的内在特征进行研究,对汇总模型进行多维度分析,并审查汇总评价指标。我们根据总的调查结果,总结了这一快速发展领域未来探索的可能方向。