Keyword extraction is called identifying words or phrases that express the main concepts of texts in best. There is a huge amount of texts that are created every day and at all times through electronic infrastructure. So, it is practically impossible for humans to study and manage this volume of documents. However, the need for efficient and effective access to these documents is evident in various purposes. Weblogs, News, and technical notes are almost long texts, while the reader seeks to understand the concepts by topics or keywords to decide for reading the full text. To this aim, we use a combined approach that consists of two models of graph centrality features and textural features. In the following, graph centralities, such as degree, betweenness, eigenvector, and closeness centrality, have been used to optimally combine them to extract the best keyword among the candidate keywords extracted by the proposed method. Also, another approach has been introduced to distinguishing keywords among candidate phrases and considering them as a separate keyword. To evaluate the proposed method, seven datasets named, Semeval2010, SemEval2017, Inspec, fao30, Thesis100, pak2018 and WikiNews have been used, and results reported Precision, Recall, and F- measure.
翻译:关键词的提取被称为“关键词提取”是指用最能表达文本主要概念的词或词句来表示最优文本的主要概念。有很多文本都是通过电子基础设施每天和任何时候都通过电子基础设施创建的。因此,人类几乎不可能研究和管理这一数量的文件。然而,从各种目的来看,对这些文件的高效率和有效访问的必要性显而易见。 Weblogs、 News和技术说明是几乎很长的案文,而读者则试图通过专题或关键词来理解概念,以便决定阅读全文。为了这个目的,我们采用了一种由两个图形中心特点和质谱特征模型组成的综合方法。在下文中,图形中心(例如程度、介质、灵巧和近距离中心)已被使用,以优化地结合到拟议方法所摘录的候选关键词中的最佳关键词。此外,还采用了另一种方法来区分关键词,并将之视为一个单独的关键词。为了评估拟议的方法,7个数据集命名为Semeval2010、SemEval2017、Inspec、fao30、Thes100、Fpas-18、和Recas-bas-2018和Wikial结果已被报告。