Keyphrase generation aims at generating important phrases (keyphrases) that best describe a given document. In scholarly domains, current approaches have largely used only the title and abstract of the articles to generate keyphrases. In this paper, we comprehensively explore whether the integration of additional information from the full text of a given article or from semantically similar articles can be helpful for a neural keyphrase generation model or not. We discover that adding sentences from the full text, particularly in the form of the extractive summary of the article can significantly improve the generation of both types of keyphrases that are either present or absent from the text. Experimental results with three widely used models for keyphrase generation along with one of the latest transformer models suitable for longer documents, Longformer Encoder-Decoder (LED) validate the observation. We also present a new large-scale scholarly dataset FullTextKP for keyphrase generation. Unlike prior large-scale datasets, FullTextKP includes the full text of the articles along with the title and abstract. We release the source code at https://github.com/kgarg8/FullTextKP.
翻译:关键句生成旨在生成能最好地描述给定文件的重要短语(关键词句) 。 在学术领域, 目前的方法大多只使用文章的标题和摘要来生成关键词句。 在本文中, 我们全面探讨从某条全文或语义相似的条款中补充信息是否有助于神经关键词生成模型。 我们发现, 从全文中添加句子, 特别是以文章的摘录摘要形式添加句子, 可以大大改进文本中存在或缺少的两种关键词句的生成。 三种广泛使用的关键词句生成模型的实验结果, 以及适合较长文档的最新变换模型之一, 远程 Encoder- Decoder (LED) 验证了观察结果。 我们还为关键词生成提供了一个新的大规模学术数据集全TextKP 。 与以往的大型数据集不同, 全面TextKP 包含文章的全文以及标题和抽象。 我们在 https://github.com/kgarg8/FullTextKP 发布源代码。