用于实体决议的知识增加前培训语言模式 (KAER: A Knowledge Augmented Pre-Trained Language Model for Entity Resolution)

Entity resolution has been an essential and well-studied task in data cleaning research for decades. Existing work has discussed the feasibility of utilizing pre-trained language models to perform entity resolution and achieved promising results. However, few works have discussed injecting domain knowledge to improve the performance of pre-trained language models on entity resolution tasks. In this study, we propose Knowledge Augmented Entity Resolution (KAER), a novel framework named for augmenting pre-trained language models with external knowledge for entity resolution. We discuss the results of utilizing different knowledge augmentation and prompting methods to improve entity resolution performance. Our model improves on Ditto, the existing state-of-the-art entity resolution method. In particular, 1) KAER performs more robustly and achieves better results on "dirty data", and 2) with more general knowledge injection, KAER outperforms the existing baseline models on the textual dataset and dataset from the online product domain. 3) KAER achieves competitive results on highly domain-specific datasets, such as citation datasets, requiring the injection of expert knowledge in future work.

翻译：几十年来,实体的解决方案一直是数据清理研究中一项至关重要和研究周全的任务。现有工作讨论了利用培训前语言模型执行实体解决方案并取得有希望的成果的可行性。然而,很少有工作讨论了注射领域知识,以改进实体解决方案任务培训前语言模型的绩效。在本研究中,我们提出了知识增强实体解决方案(KAER),这是一个新颖的框架,旨在增加培训前语言模型,并提供外部知识,供实体解决方案使用。我们讨论了利用不同知识增加和推动方法提高实体解决方案绩效的结果。我们改进了现有最新实体解决方案方法Ditto的模型。特别是,1 KAER在“脏数据”方面表现得更加有力,并取得更好的结果。2)通过更一般性的知识注入,KAER超越了现有关于文本数据集和在线产品领域数据集的基线模型。3) KAER在高域数据集(如引用数据集)上取得竞争性结果,这需要在今后工作中注入专家知识。

相关内容

实体解析

关注 5

不同的数据提供方对同一个事物即实体 (Entity)可能会有不同的描述 (这里的描述包括数据格式、表示方法等) ，每一个对实体的描述称为该实体的一个引用。实体解析，是指从一个“ 引用集合”中解析并映射到现实世界中的“ 实体”过程。实体解析(Entity Resolution)又被称为记录链接(Record Linkage) 、对象识别(object Identification ) 、个体识别(Individual Identification) 、重复检测(Duplicate Detection)

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日