Keyword extraction is the process of identifying the words or phrases that express the main concepts of text to the best of one's ability. Electronic infrastructure creates a considerable amount of text every day and at all times. This massive volume of documents makes it practically impossible for human resources to study and manage them. Nevertheless, the need for these documents to be accessed efficiently and effectively is evident in numerous purposes. A blog, news article, or technical note is considered a relatively long text since the reader aims to learn the subject based on keywords or topics. Our approach consists of a combination of two models: graph centrality features and textural features. The proposed method has been used to extract the best keyword among the candidate keywords with an optimal combination of graph centralities, such as degree, betweenness, eigenvector, closeness centrality and etc, and textural, such as Casing, Term position, Term frequency normalization, Term different sentence, Part Of Speech tagging. There have also been attempts to distinguish keywords from candidate phrases and consider them on separate keywords. For evaluating the proposed method, seven datasets were used: Semeval2010, SemEval2017, Inspec, fao30, Thesis100, pak2018, and Wikinews, with results reported as Precision, Recall, and F- measure. Our proposed method performed much better in terms of evaluation metrics in all reviewed datasets compared with available methods in literature. An approximate 16.9% increase was witnessed in F-score metric and this was much more for the Inspec in English datasets and WikiNews in forgone languages.
翻译:关键词的提取是确定表达文本主要概念的字词或词句的过程,这些词或词句的表达方式最符合个人的能力。电子基础设施每天和随时创造大量文本。如此大量的文件使得人力资源几乎不可能研究和管理这些文件。然而,从许多方面看,这些文件需要高效率和有成效地访问,这在很多方面是显而易见的。博客、新闻文章或技术说明被认为是一个相对较长的文本,因为读者的目的是学习基于关键词或主题的主题。我们的方法包括两种模型的组合:图表中心特征和文本特征。拟议的方法被用来在候选人关键词中提取最好的关键词,其中最佳地结合了图表中心要素,例如程度、间隔、信息源、近距离中心等等,以及文字,例如传记、时间位置、频率正常化、语句不同句、语言标记部分。还试图将关键词与候选词区分,并在不同的关键词中考虑这些关键词。在评价拟议方法时,使用了7个数据集:Semvalium 2010、SemEval20 crimeal-comlical 等关键关键关键关键词,在16-Recal 17中进行了较精确的比较的衡量方法, 以及我们报告的缩缩缩算方法, 17 以更好的格式方法对数据进行了审评。