从检索到生成: 高效和有效的实体集扩展 (From Retrieval to Generation: Efficient and Effective Entity Set Expansion)

Entity Set Expansion (ESE) is a critical task aiming to expand entities of the target semantic class described by a small seed entity set. Most existing ESE methods are retrieval-based frameworks that need to extract the contextual features of entities and calculate the similarity between seed entities and candidate entities. To achieve the two purposes, they should iteratively traverse the corpus and the entity vocabulary provided in the datasets, resulting in poor efficiency and scalability. The experimental results indicate that the time consumed by the retrieval-based ESE methods increases linearly with entity vocabulary and corpus size. In this paper, we firstly propose a generative ESE framework, Generative Entity Set Expansion (GenExpan), which utilizes a generative pre-trained language model to accomplish ESE task. Specifically, a prefix tree is employed to guarantee the validity of entity generation, and automatically generated class names are adopted to guide the model to generate target entities. Moreover, we propose Knowledge Calibration and Generative Ranking to further bridge the gap between generic knowledge of the language model and the goal of ESE task. Experiments on publicly available datasets show that GenExpan is efficient and effective. For efficiency, expansion time consumed by GenExpan is independent of entity vocabulary and corpus size, and GenExpan achieves an average 600% speedup compared to strong baselines. For expansion performance, our framework outperforms previous state-of-the-art ESE methods.

翻译：实体集扩展(ESE)是一个重要的任务,旨在扩展由小型种子实体集描述的目标语义类的实体。大多数现有的ESE方法都是基于检索的框架,需要提取实体的上下文特征,并计算种子实体和候选实体之间的相似度。为了实现这两个目的,它们应该迭代地遍历语料库和提供在数据集中的实体词汇,导致效率和可扩展性较差。实验结果表明,基于检索的ESE方法所消耗的时间随实体词汇和语料库大小呈线性增长。在本文中,我们首先提出了一种生成ESE框架,生成实体集扩展(GenExpan),它利用生成预训练语言模型完成ESE任务。具体而言,采用前缀树来保证实体生成的有效性,并采用自动生成的类名来指导模型生成目标实体。此外,我们提出了知识校准和生成排序,以进一步弥合语言模型的通用知识与ESE任务目标之间的差距。公开数据集上的实验表明,GenExpan是高效且有效的。对于效率,GenExpan的扩展时间与实体词汇和语料库大小无关,并且与强基线相比,GenExpan实现了平均600%的加速。对于扩展性能,我们的框架优于先前的最先进ESE方法。

相关内容

ESE

关注 0

经验软件工程为应用软件工程研究提供了一个具有很强的经验成分的论坛，并为发表与研究者和实践者相关的经验结果提供了一个场所。这里提出的实证研究通常涉及数据和经验的收集和分析，这些数据和经验可用于描述、评估和揭示软件开发可交付成果、实践和技术之间的关系。随着时间的推移，预计这些经验结果将形成一个知识体系，从而形成广为接受和形成良好的理论。《华尔街日报》还提供了行业经验报告，详细介绍了软件技术（过程、方法或工具）的应用及其在工业环境中的有效性。实证软件工程促进了行业相关研究的出版，解决了研究与实践之间的巨大差距。官网地址：http://dblp.uni-trier.de/db/journals/ese/