Keyphrase extraction is a fundamental task in Natural Language Processing, which usually contains two main parts: candidate keyphrase extraction and keyphrase importance estimation. From the view of human understanding documents, we typically measure the importance of phrase according to its syntactic accuracy, information saliency, and concept consistency simultaneously. However, most existing keyphrase extraction approaches only focus on the part of them, which leads to biased results. In this paper, we propose a new approach to estimate the importance of keyphrase from multiple perspectives (called as \textit{KIEMP}) and further improve the performance of keyphrase extraction. Specifically, \textit{KIEMP} estimates the importance of phrase with three modules: a chunking module to measure its syntactic accuracy, a ranking module to check its information saliency, and a matching module to judge the concept (i.e., topic) consistency between phrase and the whole document. These three modules are seamlessly jointed together via an end-to-end multi-task learning model, which is helpful for three parts to enhance each other and balance the effects of three perspectives. Experimental results on six benchmark datasets show that \textit{KIEMP} outperforms the existing state-of-the-art keyphrase extraction approaches in most cases.
翻译:关键词提取是自然语言处理中的一项基本任务, 它通常包含两个主要部分: 候选关键词提取和关键句重要性估计。 从人类理解文件的角度来看, 我们通常同时根据其合成精度、 信息突出度和概念一致性来测量短语的重要性。 然而, 大多数现有关键词提取方法仅以它们为焦点, 导致偏差结果。 在本文中, 我们提出一种新的方法, 从多个角度( 称为\ textit{ kIEMP} ) 来估计关键词表述的重要性, 并进一步改进关键词提取的性能。 具体地说,\ textit{ KIEMP} 估计三个模块的重要性: 测量其合成精度的块块模块、 检查其信息突出度的排序模块、 判断概念( 即, 主题) 短语和整个文件之间一致性的匹配模块。 这三个模块通过一个端到端多任务学习模式紧密结合在一起, 有助于三个部分加强彼此, 平衡三个模块的效果。 在六个基准数据提取方法中, 实验性结果显示当前关键提取法 。