实体扩展集自动生成上下文模式 (Automatic Context Pattern Generation for Entity Set Expansion)

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Entity Set Expansion (ESE) is a valuable task that aims to find entities of the target semantic class described by given seed entities. Various Natural Language Processing (NLP) and Information Retrieval (IR) downstream applications have benefited from ESE due to its ability to discover knowledge. Although existing corpus-based ESE methods have achieved great progress, they still rely on corpora with high-quality entity information annotated, because most of them need to obtain the context patterns through the position of the entity in a sentence. Therefore, the quality of the given corpora and their entity annotation has become the bottleneck that limits the performance of such methods. To overcome this dilemma and make the ESE models free from the dependence on entity annotation, our work aims to explore a new ESE paradigm, namely corpus-independent ESE. Specifically, we devise a context pattern generation module that utilizes autoregressive language models (e.g., GPT-2) to automatically generate high-quality context patterns for entities. In addition, we propose the GAPA, a novel ESE framework that leverages the aforementioned GenerAted PAtterns to expand target entities. Extensive experiments and detailed analyses on three widely used datasets demonstrate the effectiveness of our method. All the codes of our experiments are available at https://github.com/geekjuruo/GAPA.

翻译：各种自然语言处理(NLP)和信息检索(IR)下游应用由于能够发现知识而获益于ESE。尽管现有的基于物理的ESE方法已经取得了巨大进展,但它们仍然依赖具有高质量实体信息的公司,并附加了附加说明,因为大多数公司需要通过实体在一句话中的位置获得背景模式。因此,给定公司及其实体的注释质量已成为限制这些方法绩效的瓶颈。为了克服这一困境,并使ESE模型摆脱对实体说明的依赖,我们的工作目标是探索新的ESE模式,即独立于实体的ESE。具体地说,我们设计了一种背景模式生成模块,利用自动反向语言模式(例如GPT-2),为实体自动生成高质量的背景模式。此外,我们建议GAPA(GESE)是一个利用上述GenerAdivePA(Generub)/Acentrobro 数据模型的新型框架,以广泛展示我们Generabreabroal Affective agress agroductions)使用的所有数据模型。

相关内容

ESE

关注 0

经验软件工程为应用软件工程研究提供了一个具有很强的经验成分的论坛，并为发表与研究者和实践者相关的经验结果提供了一个场所。这里提出的实证研究通常涉及数据和经验的收集和分析，这些数据和经验可用于描述、评估和揭示软件开发可交付成果、实践和技术之间的关系。随着时间的推移，预计这些经验结果将形成一个知识体系，从而形成广为接受和形成良好的理论。《华尔街日报》还提供了行业经验报告，详细介绍了软件技术（过程、方法或工具）的应用及其在工业环境中的有效性。实证软件工程促进了行业相关研究的出版，解决了研究与实践之间的巨大差距。官网地址：http://dblp.uni-trier.de/db/journals/ese/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日