Keyphrase provides highly-condensed information that can be effectively used for understanding, organizing and retrieving text content. Though previous studies have provided many workable solutions for automated keyphrase extraction, they commonly divided the to-be-summarized content into multiple text chunks, then ranked and selected the most meaningful ones. These approaches could neither identify keyphrases that do not appear in the text, nor capture the real semantic meaning behind the text. We propose a generative model for keyphrase prediction with an encoder-decoder framework, which can effectively overcome the above drawbacks. We name it as deep keyphrase generation since it attempts to capture the deep semantic meaning of the content with a deep learning method. Empirical analysis on six datasets demonstrates that our proposed model not only achieves a significant performance boost on extracting keyphrases that appear in the source text, but also can generate absent keyphrases based on the semantic meaning of the text. Code and dataset are available at https://github.com/memray/OpenNMT-kpg-release.
翻译:关键词句提供了高度集中的信息,可以有效地用于理解、组织和检索文本内容。虽然以前的研究为自动关键词提取提供了许多可行的解决方案,但它们通常将待总结的内容分成多个文本块,然后排行和选择最有意义的文本块。这些方法既不能识别文本中未出现的关键词句,也不能捕捉文本背后的真正语义含义。我们提出了一个关键词预测的基因化模型,并配有能够有效克服上述缺陷的编码解码器-解码器框架。我们将其命名为深关键词组生成,因为它试图用深层学习方法捕捉到内容的深层语义含义。关于六个数据集的“经验分析”表明,我们拟议的模式不仅在提取源文本中出现的语句上取得了显著的性能促进作用,而且还能够根据文本的语义含义产生缺失的关键词句。代码和数据集可在https://github.com/memerray/OpenNMT-kpg-release https://giuthub.com/memery/ OnNMT-pg-release。