Entity Set Expansion (ESE) is a promising task which aims to expand entities of the target semantic class described by a small seed entity set. Various NLP and IR applications will benefit from ESE due to its ability to discover knowledge. Although previous ESE methods have achieved great progress, most of them still lack the ability to handle hard negative entities (i.e., entities that are difficult to distinguish from the target entities), since two entities may or may not belong to the same semantic class based on different granularity levels we analyze on. To address this challenge, we devise an entity-level masked language model with contrastive learning to refine the representation of entities. In addition, we propose the ProbExpan, a novel probabilistic ESE framework utilizing the entity representation obtained by the aforementioned language model to expand entities. Extensive experiments and detailed analyses on three datasets show that our method outperforms previous state-of-the-art methods. The source codes of this paper are available at https://github.com/geekjuruo/ProbExpan.
翻译:扩展实体是一个大有希望的任务,旨在扩大一个小种子实体所描述的目标语义类实体,由于能够发现知识,各种国家语言平台和IR应用程序将受益于ESE。虽然以前ESE方法取得了巨大进展,但大多数方法仍然缺乏处理硬性负面实体(即难以与目标实体区分的实体)的能力,因为两个实体可能或可能不属于基于我们分析的不同颗粒度的同一语义类。为了应对这一挑战,我们设计了一个实体一级的隐性语言模型,以对比性学习来完善实体的代表性。此外,我们提出了ProbExpan,这是一个利用上述语言模型获得的实体代表来扩大实体的新的概率性ESE框架。关于三个数据集的广泛实验和详细分析表明,我们的方法超越了先前的状态-艺术方法。本文的源代码见https://github.com/geekjuruo/Probexportan。