The encoder-decoder framework achieves state-of-the-art results in keyphrase generation (KG) tasks by predicting both present keyphrases that appear in the source document and absent keyphrases that do not. However, relying solely on the source document can result in generating uncontrollable and inaccurate absent keyphrases. To address these problems, we propose a novel graph-based method that can capture explicit knowledge from related references. Our model first retrieves some document-keyphrases pairs similar to the source document from a pre-defined index as references. Then a heterogeneous graph is constructed to capture relationships of different granularities between the source document and its references. To guide the decoding process, a hierarchical attention and copy mechanism is introduced, which directly copies appropriate words from both the source document and its references based on their relevance and significance. The experimental results on multiple KG benchmarks show that the proposed model achieves significant improvements against other baseline models, especially with regard to the absent keyphrase prediction.
翻译:编码器解码器框架通过预测源文档中出现的当前关键词句和没有的缺失关键词句,实现了关键词生成(KG)任务中最先进的结果。然而,仅仅依赖源文档可能导致无法控制且不准确的缺失关键词句。为解决这些问题,我们提议了一种基于图表的新颖方法,从相关引用中获取明确的知识。我们的模型首先从预定义索引中从参考索引中检索到一些与源文档相似的文档关键词组配对。随后构建了一个多式图表,以记录源文档及其引用之间的不同颗粒关系。为了指导解码进程,引入了一个分层注意和复制机制,直接复制源文档及其依据其相关性和意义的引用的适当词。关于多个 KG 基准的实验结果表明,拟议的模型与其他基线模型相比,特别是在缺少关键词预测的情况下,取得了重大改进。