选择、提取和生成:带有同步指导的神经关键词生成 (Select, Extract and Generate: Neural Keyphrase Generation with Syntactic Guidance)

In recent years, deep neural sequence-to-sequence framework has demonstrated promising results in keyphrase generation. However, processing long documents using such deep neural networks requires high computational resources. To reduce the computational cost, the documents are typically truncated before given as inputs. As a result, the models may miss essential points conveyed in a document. Moreover, most of the existing methods are either extractive (identify important phrases from the document) or generative (generate phrases word by word), and hence they do not benefit from the advantages of both modeling techniques. To address these challenges, we propose \emph{SEG-Net}, a neural keyphrase generation model that is composed of two major components, (1) a selector that selects the salient sentences in a document, and (2) an extractor-generator that jointly extracts and generates keyphrases from the selected sentences. SEG-Net uses a self-attentive architecture, known as, \emph{Transformer} as the building block with a couple of uniqueness. First, SEG-Net incorporates a novel \emph{layer-wise} coverage attention to summarize most of the points discussed in the target document. Second, it uses an \emph{informed} copy attention mechanism to encourage focusing on different segments of the document during keyphrase extraction and generation. Besides, SEG-Net jointly learns keyphrase generation and their part-of-speech tag prediction, where the later provides syntactic supervision to the former. The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin in both domains.

翻译：近些年来, 深神经序列到序列框架在关键词生成中展示了充满希望的结果。然而, 使用这种深神经网络处理长文件需要高计算资源。为了降低计算成本, 文件通常在输入前被缩短。因此, 模型可能会错过文档中传达的基本点。此外, 大多数现有方法要么是提取( 识别文档中的重要词组), 要么是基因化( 逐字逐字的词组), 因而它们无法从两个建模技术的优势中受益。为了应对这些挑战, 我们建议使用由两个主要组成部分组成的长文档处理长文档。为了降低计算成本, 文件通常在输入前一个选择器, 在文档中选择突出的句子。 SEG- Net 使用自我强化结构( 以词组字组字组字组字组字组词组词组词组词组词组词组词组词组词组词组词组词组) 。首先, SEG- Net 将一个神经组词组词组词组词组词组生成模型的模型, 将一个小数组词组词组词组词组词组词组词组词组词组词组词组词组词组词组词组词组词组词组词组词组词组词组词组词组词组的精组元组元组元组元组的内, 将SEGEGEGD=GD=GDLOGD, 在生成生成的精组的精组的精组的精组的精组的精组的精组的精组码组法组法组法组法组法组法组法组法组法组法组法组内,, 将SGEGEGEGEGEGLOGDRDRDRDRDRDLLOFDLOFDRDRDRDGLODGLOGLOFDGLODFDFDRDLF,,,, 在SDRDRDRDRDLEGDRDRDRDGDRDLDGDGDRDRDRDRDRDRDGLODGLODRDL