Entity resolution, a longstanding problem of data cleaning and integration, aims at identifying data records that represent the same real-world entity. Existing approaches treat entity resolution as a universal task, assuming the existence of a single interpretation of a real-world entity and focusing only on finding matched records, separating corresponding from non-corresponding ones, with respect to this single interpretation. However, in real-world scenarios, where entity resolution is part of a more general data project, downstream applications may have varying interpretations of real-world entities relating, for example, to various user needs. In what follows, we introduce the problem of multiple intents entity resolution (MIER), an extension to the universal (single intent) entity resolution task. As a solution, we propose FlexER, utilizing contemporary solutions to universal entity resolution tasks to solve multiple intents entity resolution. FlexER addresses the problem as a multi-label classification problem. It combines intent-based representations of tuple pairs using a multiplex graph representation that serves as an input to a graph neural network (GNN). FlexER learns intent representations and improves the outcome to multiple resolution problems. A large-scale empirical evaluation introduces a new benchmark and, using also two well-known benchmarks, shows that FlexER effectively solves the MIER problem and outperforms the state-of-the-art for a universal entity resolution.
翻译:解决实体的长期问题,即数据清理和整合问题,目的是确定代表同一个现实世界实体的数据记录; 现有办法将实体解决办法视为一项普遍任务,假设存在对真实世界实体的单一解释,而仅侧重于寻找匹配的记录,将对应的记录与非对应的记录区分开来,就这一单一解释而言; 然而,在实体解决办法是较一般性数据项目的一部分的实际情况中,下游应用对真实世界实体的解释可能不同,例如与各种用户需要有关。 在下文中,我们提出多种意向实体解决办法的问题,即扩大普遍(单一意向)实体解决办法的任务。作为一种解决办法,我们提出灵活办法,利用当前通用实体解决办法的现代解决办法解决多重意向实体决议。灵活办法将这一问题作为一个多标签分类问题处理。在现实世界情景中,下游应用多种图表表达方式对双胞胎的意向表示方式,作为对图表神经网络(GNNN)的投入。