Entity Set Expansion (ESE) is a promising task which aims to expand entities of the target semantic class described by a small seed entity set. Various NLP and IR applications will benefit from ESE due to its ability to discover knowledge. Although previous ESE methods have achieved great progress, most of them still lack the ability to handle hard negative entities (i.e., entities that are difficult to distinguish from the target entities), since two entities may or may not belong to the same semantic class based on different granularity levels we analyze on. To address this challenge, we devise an entity-level masked language model with contrastive learning to refine the representation of entities. In addition, we propose the ProbExpan, a novel probabilistic ESE framework utilizing the entity representation obtained by the aforementioned language model to expand entities. Extensive experiments and detailed analyses on three datasets show that our method outperforms previous state-of-the-art methods.
翻译:实体扩大(ESE)是一项大有希望的任务,旨在扩大一个小种子实体描述的目标语义类实体,由一组小种子实体描述的语义类实体。各种国家语言平台和IR应用程序将因其发现知识的能力而从ESE中受益。尽管以前ESE方法已经取得了巨大进步,但大多数方法仍然缺乏处理硬性负实体(即难以与目标实体区分的实体)的能力,因为两个实体可能或可能不属于基于我们分析的不同颗粒度的同一语义类。为了应对这一挑战,我们设计了一个实体一级的隐形语言模型,以对比性学习来完善实体的代表性。此外,我们提出了ProbExpan,这是一个新的概率性ESE框架,利用上述语言模型获得的实体代表来扩大实体。关于三个数据集的广泛实验和详细分析表明,我们的方法超越了以往的状态方法。