Keyphrase extraction is the task of finding several interesting phrases in a text document, which provide a list of the main topics within the document. Most existing graph-based models use co-occurrence links as cohesion indicators to model the relationship of syntactic elements. However, a word may have different forms of expression within the document, and may have several synonyms as well. Simply using co-occurrence information cannot capture this information. In this paper, we enhance the graph-based ranking model by leveraging word embeddings as background knowledge to add semantic information to the inter-word graph. Our approach is evaluated on established benchmark datasets and empirical results show that the word embedding neighborhood information improves the model performance.
翻译:关键词提取是一项在文本文件中找到几个有趣的短语的任务,它提供了文件中主要议题的列表。大多数基于图形的现有模型使用共同链接作为聚合指标,以模拟合成元素之间的关系。然而,一个单词在文件中可能具有不同表达形式,也可能有几种同义词。仅仅使用共犯信息无法捕捉这一信息。在本文中,我们通过利用以文字嵌入为背景知识的文字来将语义信息添加到词义图中来强化基于图形的排序模型。我们的方法是通过既定的基准数据集和实证结果来评估,表明嵌入周边信息的词会改善模型的性能。