Matching is a task at the heart of any data integration process, aimed at identifying correspondences among data elements. Matching problems were traditionally solved in a semi-automatic manner, with correspondences being generated by matching algorithms and outcomes subsequently validated by human experts. Human-in-the-loop data integration has been recently challenged by the introduction of big data and recent studies have analyzed obstacles to effective human matching and validation. In this work we characterize human matching experts, those humans whose proposed correspondences can mostly be trusted to be valid. We provide a novel framework for characterizing matching experts that, accompanied with a novel set of features, can be used to identify reliable and valuable human experts. We demonstrate the usefulness of our approach using an extensive empirical evaluation. In particular, we show that our approach can improve matching results by filtering out inexpert matchers.
翻译:匹配是任何数据整合过程的核心任务,目的在于确定数据元素之间的对应关系。匹配问题传统上以半自动方式解决,通过匹配算法和人类专家随后验证的结果生成的对应关系。最近,引入大数据对流动中人的数据整合提出了挑战,最近的研究分析了有效人类匹配和验证的障碍。在这项工作中,我们给人类匹配专家定性,那些其拟议通信大多可以被信任为有效的人。我们为匹配专家的定性提供了一个新的框架,这些专家可以使用一套新的特征来识别可靠和有价值的人类专家。我们用广泛的实证评估来证明我们的方法的有用性。我们特别表明,我们的方法可以通过在专家匹配者中过滤来改进匹配结果。