This work offers a novel view on the use of human input as labels, acknowledging that humans may err. We build a behavioral profile for human annotators which is used as a feature representation of the provided input. We show that by utilizing black-box machine learning, we can take into account human behavior and calibrate their input to improve the labeling quality. To support our claims and provide a proof-of-concept, we experiment with three different matching tasks, namely, schema matching, entity matching and text matching. Our empirical evaluation suggests that the method can improve the quality of gathered labels in multiple settings including cross-domain (across different matching tasks).
翻译:这项工作为使用人类输入作为标签提供了新颖的观点, 承认人类可能犯错。 我们为人类标记员构建了行为剖面图, 用作所提供输入的特征描述。 我们显示, 通过使用黑盒机器学习, 我们可以将人类行为考虑在内, 并校准其输入, 以提高标签质量 。 为了支持我们的主张, 并提供概念证明, 我们实验三种不同的匹配任务, 即 Schema 匹配、 实体匹配和文本匹配。 我们的实证评估表明, 这种方法可以提高包括跨域( 跨越不同匹配任务) 在内的多种环境中所收集标签的质量 。