Unsupervised summarization methods have achieved remarkable results by incorporating representations from pre-trained language models. However, existing methods fail to consider efficiency and effectiveness at the same time when the input document is extremely long. To tackle this problem, in this paper, we proposed an efficient Coarse-to-Fine Facet-Aware Ranking (C2F-FAR) framework for unsupervised long document summarization, which is based on the semantic block. The semantic block refers to continuous sentences in the document that describe the same facet. Specifically, we address this problem by converting the one-step ranking method into the hierarchical multi-granularity two-stage ranking. In the coarse-level stage, we propose a new segment algorithm to split the document into facet-aware semantic blocks and then filter insignificant blocks. In the fine-level stage, we select salient sentences in each block and then extract the final summary from selected sentences. We evaluate our framework on four long document summarization datasets: Gov-Report, BillSum, arXiv, and PubMed. Our C2F-FAR can achieve new state-of-the-art unsupervised summarization results on Gov-Report and BillSum. In addition, our method speeds up 4-28 times more than previous methods.\footnote{\url{https://github.com/xnliang98/c2f-far}}
翻译:未经监督的缩略图方法已经取得了显著的成果,纳入了经过预先培训的语言模型的演示。 然而, 现有的方法未能在输入文件极长的同一阶段同时考虑效率和有效性。 为了解决这个问题, 我们在本文件中提出了一个高效的 Coarse- to- Fine Facet-Aware Ranging (C2F-FAR) 框架, 用于未经监督的长文档缩略图, 该框架以语义块为基础。 语义块块是指描述同一方格的文档中的连续句子。 具体地说, 我们通过将单步排序方法转换为等级多语种的两阶段排名来解决这一问题。 在粗糙的阶段, 我们提出一个新的分区算法, 将文件分割成面面观的缩略图块, 然后过滤微小块。 在精细的阶段, 我们在每个区块中选择突出的句子, 然后从选中的句子中提取最后摘要。 我们评估了四个长文档缩略图数据集的框架: Gov- Report, BillSum, anXiv, 和 PubM- millub- million- 。 我们的C2FAR_FAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_____BAR___BAR__BAR_BAR_BAR_BAR___BAR_____BAR___BAR______BAR_BAR_BAR__BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR___BAR__________BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR__BAR________BAR______________________________BAR_BAR___BAR_BAR_BAR_BAR__BAR_BAR_BAR____BAR_BAR__________BAR_