Neural keyphrase generation models have recently attracted much interest due to their ability to output absent keyphrases, that is, keyphrases that do not appear in the source text. In this paper, we discuss the usefulness of absent keyphrases from an Information Retrieval (IR) perspective, and show that the commonly drawn distinction between present and absent keyphrases is not made explicit enough. We introduce a finer-grained categorization scheme that sheds more light on the impact of absent keyphrases on scientific document retrieval. Under this scheme, we find that only a fraction (around 20%) of the words that make up keyphrases actually serves as document expansion, but that this small fraction of words is behind much of the gains observed in retrieval effectiveness. We also discuss how the proposed scheme can offer a new angle to evaluate the output of neural keyphrase generation models.
翻译:神经关键词生成模型最近引起了很大的兴趣, 因为它们能够输出缺席关键词句, 也就是说, 关键词句没有出现在源文本中 。 在本文中, 我们讨论信息检索( IR) 角度的缺失关键词句是否有用, 并表明当前关键词句和不存在的关键词生成模型之间通常的区分不够明确。 我们引入了一个细微的分类方案, 更清楚地说明缺失关键词句对科学文档检索的影响 。 在这个方案下, 我们发现, 构成关键词句的单词中只有一小部分( 大约 20 % ) 实际起到文档扩展的作用, 但这一小部分的单词却落后于检索有效性所观察到的大部分收益。 我们还讨论了拟议方案如何提供一个新的角度来评价神经关键词生成模型的输出。