Keyphrase extraction is a fundamental task in natural language processing and information retrieval that aims to extract a set of phrases with important information from a source document. Identifying important keyphrase is the central component of the keyphrase extraction task, and its main challenge is how to represent information comprehensively and discriminate importance accurately. In this paper, to address these issues, we design a new hyperbolic matching model (HyperMatch) to represent phrases and documents in the same hyperbolic space and explicitly estimate the phrase-document relevance via the Poincar\'e distance as the important score of each phrase. Specifically, to capture the hierarchical syntactic and semantic structure information, HyperMatch takes advantage of the hidden representations in multiple layers of RoBERTa and integrates them as the word embeddings via an adaptive mixing layer. Meanwhile, considering the hierarchical structure hidden in the document, HyperMatch embeds both phrases and documents in the same hyperbolic space via a hyperbolic phrase encoder and a hyperbolic document encoder. This strategy can further enhance the estimation of phrase-document relevance due to the good properties of hyperbolic space. In this setting, the keyphrase extraction can be taken as a matching problem and effectively implemented by minimizing a hyperbolic margin-based triplet loss. Extensive experiments are conducted on six benchmarks and demonstrate that HyperMatch outperforms the state-of-the-art baselines.
翻译:关键词提取是自然语言处理和信息检索中的一项基本任务, 目的是从源文档中提取一组含有重要信息的短语。 识别重要关键词句是关键词提取任务的核心组成部分, 其主要挑战是如何全面代表信息并区分重要性。 在本文件中, 要解决这些问题, 我们设计一个新的双曲匹配模型( HyperMatch), 以代表同一双曲空间中的短语和文档, 并明确通过 Poincar\'e 距离作为每个短语的重要分数来估计语句- 文档的相关性。 具体来说, 要捕捉等级合成和语义结构信息, 确定重要的关键关键语句和文档的相关性, 并使用双曲矩阵空间的高级词句和文档编码。 此策略可以进一步加强对双曲空间多个层和语义结构结构结构结构结构中隐藏的隐隐隐含性表示, 并且能够通过双曲空间的精度模型模型和双轨基基基底基底基底底基底基底基底基底基底进行。 设置 ChyM 能够通过超双偏基底底底底底底底底底底底底基底基底基底基底基底基底基底基底, 。