Background: Keyword extraction is a popular research topic in the field of natural language processing. Keywords are terms that describe the most relevant information in a document. The main problem that researchers are facing is how to efficiently and accurately extract the core keywords from a document. However, previous keyword extraction approaches have utilized the text and graph features, there is the lack of models that can properly learn and combine these features in a best way. Methods: In this paper, we develop a multimodal Key-phrase extraction approach, namely Phraseformer, using transformer and graph embedding techniques. In Phraseformer, each keyword candidate is presented by a vector which is the concatenation of the text and structure learning representations. Phraseformer takes the advantages of recent researches such as BERT and ExEm to preserve both representations. Also, the Phraseformer treats the key-phrase extraction task as a sequence labeling problem solved using classification task. Results: We analyze the performance of Phraseformer on three datasets including Inspec, SemEval2010 and SemEval 2017 by F1-score. Also, we investigate the performance of different classifiers on Phraseformer method over Inspec dataset. Experimental results demonstrate the effectiveness of Phraseformer method over the three datasets used. Additionally, the Random Forest classifier gain the highest F1-score among all classifiers. Conclusions: Due to the fact that the combination of BERT and ExEm is more meaningful and can better represent the semantic of words. Hence, Phraseformer significantly outperforms single-modality methods.
翻译:关键字提取是自然语言处理领域最受欢迎的研究课题: 关键字提取是自然语言处理领域最受欢迎的研究主题。 关键字是描述文档中最相关信息的术语。 研究人员面临的主要问题是如何高效和准确地从文档中提取核心关键字。 然而, 先前的关键字提取方法已经使用了文本和图形特征, 缺乏能够以最佳方式正确学习和结合这些特征的模型。 方法 : 在本文件中, 我们开发了一种多式关键词提取方法, 即使用变压器和图形嵌入技术。 在词典中, 每个关键词候选者都由一个矢量( 即文本和结构学习演示的配音) 来显示。 词典提取方法利用了最新研究的优势, 如 BERT 和 ExEm 来保存这两个表达方式。 另外, 词典将关键词提取任务作为通过分类任务解决的顺序标签问题。 结果: 我们用F1 核心 来分析包括 Inspect、 Semval 2010 和 Semeval 2017 的组合中, 每个关键对象都由一个矢量为文本和结构 学习演示工具中最高级的 。 我们调查了不同变变变变变变变的变变的变变的变的变式 方法, 。 。 的变式的变式的变式的变式的变式的变式的变式的变式 。