灵活:针对多种意图的灵活实体决议 (FlexER: Flexible Entity Resolution for Multiple Intents)

Entity resolution, a longstanding problem of data cleaning and integration, aims at identifying data records that represent the same real-world entity. Existing approaches treat entity resolution as a universal task, assuming the existence of a single interpretation of a real-world entity and focusing only on finding matched records, separating corresponding from non-corresponding ones, with respect to this single interpretation. However, in real-world scenarios, where entity resolution is part of a more general data project, downstream applications may have varying interpretations of real-world entities relating, for example, to various user needs. In what follows, we introduce the problem of multiple intents entity resolution (MIER), an extension to the universal (single intent) entity resolution task. As a solution, we propose FlexER, utilizing contemporary solutions to universal entity resolution tasks to solve multiple intents entity resolution. FlexER addresses the problem as a multi-label classification problem. It combines intent-based representations of tuple pairs using a multiplex graph representation that serves as an input to a graph neural network (GNN). FlexER learns intent representations and improves the outcome to multiple resolution problems. A large-scale empirical evaluation introduces a new benchmark and, using also two well-known benchmarks, shows that FlexER effectively solves the MIER problem and outperforms the state-of-the-art for a universal entity resolution.

翻译：解决实体的长期问题,即数据清理和整合问题,目的是确定代表同一个现实世界实体的数据记录; 现有办法将实体解决办法视为一项普遍任务,假设存在对真实世界实体的单一解释,而仅侧重于寻找匹配的记录,将对应的记录与非对应的记录区分开来,就这一单一解释而言; 然而,在实体解决办法是较一般性数据项目的一部分的实际情况中,下游应用对真实世界实体的解释可能不同,例如与各种用户需要有关。在下文中,我们提出多种意向实体解决办法的问题,即扩大普遍(单一意向)实体解决办法的任务。作为一种解决办法,我们提出灵活办法,利用当前通用实体解决办法的现代解决办法解决多重意向实体决议。灵活办法将这一问题作为一个多标签分类问题处理。在现实世界情景中,下游应用多种图表表达方式对双胞胎的意向表示方式,作为对图表神经网络(GNNN)的投入。

相关内容

实体解析

关注 5

不同的数据提供方对同一个事物即实体 (Entity)可能会有不同的描述 (这里的描述包括数据格式、表示方法等) ，每一个对实体的描述称为该实体的一个引用。实体解析，是指从一个“ 引用集合”中解析并映射到现实世界中的“ 实体”过程。实体解析(Entity Resolution)又被称为记录链接(Record Linkage) 、对象识别(object Identification ) 、个体识别(Individual Identification) 、重复检测(Duplicate Detection)

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

专知会员服务

17+阅读 · 2020年5月6日