Keyphrases are crucial for searching and systematizing scholarly documents. Most current methods for keyphrase extraction are aimed at the extraction of the most significant words in the text. But in practice, the list of keyphrases often includes words that do not appear in the text explicitly. In this case, the list of keyphrases represents an abstractive summary of the source text. In this paper, we experiment with popular transformer-based models for abstractive text summarization using four benchmark datasets for keyphrase extraction. We compare the results obtained with the results of common unsupervised and supervised methods for keyphrase extraction. Our evaluation shows that summarization models are quite effective in generating keyphrases in the terms of the full-match F1-score and BERTScore. However, they produce a lot of words that are absent in the author's list of keyphrases, which makes summarization models ineffective in terms of ROUGE-1. We also investigate several ordering strategies to concatenate target keyphrases. The results showed that the choice of strategy affects the performance of keyphrase generation.
翻译:关键词句对于搜索和系统化学术文档至关重要。 目前大多数关键词提取方法都旨在提取文本中最重要的词。 但在实践中,关键词句列表通常包含文本中未明确出现的词句。 在此情况下, 关键词句列表代表了源文本的抽象摘要。 在本文中, 我们用四个关键词提取基准数据集实验基于流行的基于变压器的抽象文本归纳模型。 我们比较了获得的结果和通用的未经监督和监管的关键词提取方法的结果。 我们的评估显示, 组合化模型在生成完全匹配 F1 核心和 BERPScore 术语的关键词句方面相当有效。 但是, 关键词句列表中缺少大量词句, 这使得组合化模型在 ROUGE-1 上无效。 我们还调查了几种对调目标关键词句的排序策略。 结果表明, 战略的选择会影响关键词组的运行。