Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text. Despite its recent flourishing, keyphrase generation on non-English languages haven't been vastly investigated. In this paper, we call attention to a new setting named multilingual keyphrase generation and we contribute two new datasets, EcommerceMKP and AcademicMKP, covering six languages. Technically, we propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the data shortage problem in non-English languages. The retrieval-augmented model leverages keyphrase annotations in English datasets to facilitate generating keyphrases in low-resource languages. Given a non-English passage, a cross-lingual dense passage retrieval module finds relevant English passages. Then the associated English keyphrases serve as external knowledge for keyphrase generation in the current language. Moreover, we develop a retriever-generator iterative training algorithm to mine pseudo parallel passage pairs to strengthen the cross-lingual passage retriever. Comprehensive experiments and ablations show that the proposed approach outperforms all baselines.
翻译:关键词生成是自动预测给予一段长文本的关键词句的任务。 尽管它最近蓬勃发展, 有关非英语语言的关键词生成还没有受到广泛调查 。 在本文中, 我们提请注意一个名为多语种关键词生成的新设置, 我们贡献了两个新的数据集, 包括六种语言。 在技术上, 我们为多语种关键词生成建议了一个检索- 推荐的方法, 以减轻非英语语言的数据短缺问题 。 检索模型利用了英语数据集中的关键词描述, 以便利以低资源语言生成关键词句 。 在非英语通道中, 一个跨语言密集的通道检索模块找到相关的英语段落 。 然后, 相关的英语关键词表达作为当前语言中关键词生成的外部知识 。 此外, 我们开发了一种检索器- 生成者的迭代培训算法, 用于开采假冒平行通道对子, 以加强跨语言通道检索器 。 全面实验和校略显示, 拟议的方法超越了所有基线 。