Natural language processing techniques have demonstrated promising results in keyphrase generation. However, one of the major challenges in \emph{neural} keyphrase generation is processing long documents using deep neural networks. Generally, documents are truncated before given as inputs to neural networks. Consequently, the models may miss essential points conveyed in the target document. To overcome this limitation, we propose \emph{SEG-Net}, a neural keyphrase generation model that is composed of two major components, (1) a selector that selects the salient sentences in a document and (2) an extractor-generator that jointly extracts and generates keyphrases from the selected sentences. SEG-Net uses Transformer, a self-attentive architecture, as the basic building block with a novel \emph{layer-wise} coverage attention to summarize most of the points discussed in the document. The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin.
翻译:自然语言处理技术在关键词生成中显示出了有希望的结果。 但是, 关键词生成中的主要挑战之一是使用深神经网络处理长文档。 一般来说, 文件在作为神经网络投入之前被截断。 因此, 模型可能会错过目标文件中传达的基本点 。 为了克服这一限制, 我们提议 \ emph{ SEG- Net}, 由两个主要组成部分组成的神经关键词生成模型, (1) 选择文档中突出句子的选择器, (2) 联合提取并生成选定句子的关键词的提取器- 生成器。 SEG- Net 使用自惯性结构, 即自惯性结构, 作为基本构件, 使用新颖的 emph{ layer- 覆盖器 来概括文件中讨论的大多数点。 科学和网络文件中七个关键词生成基准的实验结果显示, SEG- Net 用大边距完成状态的神经基因转换方法。