Entity Set Expansion (ESE) is a valuable task that aims to find entities of the target semantic class described by given seed entities. Various NLP and IR downstream applications have benefited from ESE due to its ability to discover knowledge. Although existing bootstrapping methods have achieved great progress, most of them still rely on manually pre-defined context patterns. A non-negligible shortcoming of the pre-defined context patterns is that they cannot be flexibly generalized to all kinds of semantic classes, and we call this phenomenon as "semantic sensitivity". To address this problem, we devise a context pattern generation module that utilizes autoregressive language models (e.g., GPT-2) to automatically generate high-quality context patterns for entities. In addition, we propose the GAPA, a novel ESE framework that leverages the aforementioned GenerAted PAtterns to expand target entities. Extensive experiments and detailed analyses on three widely used datasets demonstrate the effectiveness of our method. All the codes of our experiments will be available for reproducibility.
翻译:实体设置扩展(ESE)是一项宝贵的任务,旨在寻找特定种子实体描述的目标语义类实体。各种国家语言平台和IR公司下游应用因其发现知识的能力而从ESE中受益。虽然现有的靴式方法已经取得了很大进展,但大多数仍然依赖人工预设的上下文模式。预先界定的上下文模式的一个不可忽略的缺点是,它们不能灵活地推广到所有类型的语义类,我们称这种现象为“语义敏感性”。为了解决这一问题,我们设计了一个背景模式生成模块,利用自动递增语言模型(如GPT-2)自动为实体生成高质量的语义模式。此外,我们建议GAPA,这是一个创新的ESE框架,利用上述GENERAed Patterns来扩大目标实体。对三种广泛使用的数据集进行广泛的实验和详细分析,显示了我们方法的有效性。我们所有实验的代码都将可供重复使用。