Schema matching is a core task of any data integration process. Being investigated in the fields of databases, AI, Semantic Web and data mining for many years, the main challenge remains the ability to generate quality matches among data concepts (e.g., database attributes). In this work, we examine a novel angle on the behavior of humans as matchers, studying match creation as a process. We analyze the dynamics of common evaluation measures (precision, recall, and f-measure), with respect to this angle and highlight the need for unbiased matching to support this analysis. Unbiased matching, a newly defined concept that describes the common assumption that human decisions represent reliable assessments of schemata correspondences, is, however, not an inherent property of human matchers. In what follows, we design PoWareMatch that makes use of a deep learning mechanism to calibrate and filter human matching decisions adhering the quality of a match, which are then combined with algorithmic matching to generate better match results. We provide an empirical evidence, established based on an experiment with more than 200 human matchers over common benchmarks, that PoWareMatch predicts well the benefit of extending the match with an additional correspondence and generates high quality matches. In addition, PoWareMatch outperforms state-of-the-art matching algorithms.
翻译:任何数据整合过程的核心任务都是Schema 匹配。 在数据库、 AI、 语义网站和数据挖掘领域调查多年后,主要的挑战仍然是在数据概念( 如数据库属性)之间产生质量匹配的能力。 在这项工作中,我们研究关于人类作为匹配者的行为的新角度,研究匹配的创建过程。 我们分析关于这一角度的共同评价措施( 精度、 回溯和F- 度量)的动态,并强调需要公正匹配以支持这一分析。 无偏见匹配,一个新定义的概念描述了人类决定代表对相配者通信的可靠评估这一共同假设,然而,并不是人类匹配者的固有属性。 在随后的工作中,我们设计了波瓦雷马奇,利用深学习机制校准和筛选符合匹配质量的人类匹配决定,然后与算法匹配相结合,以产生更好的匹配结果。 我们根据200多名匹配者对共同基准的实验,提供了经验证据。 波瓦雷马奇预测了与高级匹配的匹配的收益。