We present Semantic WordRank (SWR), an unsupervised method for generating an extractive summary of a single document. Built on a weighted word graph with semantic and co-occurrence edges, SWR scores sentences using an article-structure-biased PageRank algorithm with a Softplus function adjustment, and promotes topic diversity using spectral subtopic clustering under the Word-Movers-Distance metric. We evaluate SWR on the DUC-02 and SummBank datasets and show that SWR produces better summaries than the state-of-the-art algorithms over DUC-02 under common ROUGE measures. We then show that, under the same measures over SummBank, SWR outperforms each of the three human annotators (aka. judges) and compares favorably with the combined performance of all judges.
翻译:我们提出Sermantic WordRank(SWR),这是生成单一文件的抽取摘要的一种不受监督的方法。我们用一个带有语义和共发边缘的加权字图构建了SWR, 使用带有软性功能调整的有条理结构偏向的PageRank算法进行评分,并根据Word-Movers-Disstant 指标, 利用光谱子子专题组合促进专题多样性。我们评估了DUC-02和SummBank数据集的SWR, 并表明SWR生成的提要优于根据共同的ROUGE措施的DUC-02的最新算法。然后我们表明,在SumBank的相同措施下,SWR优于所有法官的综合表现。