从未贴标签的文本中学习长零售关系提取的原型 (Learning Relation Prototype from Unlabeled Texts for Long-tail Relation Extraction)

Relation Extraction (RE) is a vital step to complete Knowledge Graph (KG) by extracting entity relations from texts.However, it usually suffers from the long-tail issue. The training data mainly concentrates on a few types of relations, leading to the lackof sufficient annotations for the remaining types of relations. In this paper, we propose a general approach to learn relation prototypesfrom unlabeled texts, to facilitate the long-tail relation extraction by transferring knowledge from the relation types with sufficient trainingdata. We learn relation prototypes as an implicit factor between entities, which reflects the meanings of relations as well as theirproximities for transfer learning. Specifically, we construct a co-occurrence graph from texts, and capture both first-order andsecond-order entity proximities for embedding learning. Based on this, we further optimize the distance from entity pairs tocorresponding prototypes, which can be easily adapted to almost arbitrary RE frameworks. Thus, the learning of infrequent or evenunseen relation types will benefit from semantically proximate relations through pairs of entities and large-scale textual information.We have conducted extensive experiments on two publicly available datasets: New York Times and Google Distant Supervision.Compared with eight state-of-the-art baselines, our proposed model achieves significant improvements (4.1% F1 on average). Furtherresults on long-tail relations demonstrate the effectiveness of the learned relation prototypes. We further conduct an ablation study toinvestigate the impacts of varying components, and apply it to four basic relation extraction models to verify the generalization ability.Finally, we analyze several example cases to give intuitive impressions as qualitative analysis. Our codes will be released later.

翻译：通过从文本中提取实体关系来完成知识图( KG) 是完成知识图( KG) 的关键步骤。但是, 它通常会受到长尾问题的影响。培训数据主要集中于少数类型的关系, 导致其余类型的关系缺乏足够的说明。在本文件中, 我们提出一个一般性的方法, 从未标注文本中学习关系原型, 以便通过从关系类型中转让知识, 并有足够的培训数据来便利长尾关系提取。我们学习的是实体之间的隐含因素, 反映了关系的含义及其在转移学习方面的相似性。具体地说, 我们从文本中构建了一个共存图, 并记录了一阶和二阶实体在嵌入关系中的相似性。在此基础上, 我们进一步优化了从实体对正对正对正匹配原型的距离, 这很容易适应于几乎任意的 RED 框架。因此, 学习不定期或偶见的关系类型将受益于我们通过两组实体的模型和大缩略缩缩缩的文本关系。我们进行了大规模的升级能力实验, 与八级数据库进行重大的基线分析。我们进行了两次的实验, 将进行重大的升级以展示数据库。