Real-world data for classification is often labeled by multiple annotators. For analyzing such data, we introduce CROWDLAB, a straightforward approach to utilize any trained classifier to estimate: (1) A consensus label for each example that aggregates the available annotations; (2) A confidence score for how likely each consensus label is correct; (3) A rating for each annotator quantifying the overall correctness of their labels. Existing algorithms to estimate related quantities in crowdsourcing often rely on sophisticated generative models with iterative inference. CROWDLAB instead uses a straightforward weighted ensemble. Existing algorithms often rely solely on annotator statistics, ignoring the features of the examples from which the annotations derive. CROWDLAB utilizes any classifier model trained on these features, and can thus better generalize between examples with similar features. On real-world multi-annotator image data, our proposed method provides superior estimates for (1)-(3) than existing algorithms like Dawid-Skene/GLAD.
翻译:用于分类的实际世界数据往往被多个说明者贴上标签。为了分析这些数据,我们引入了CROWDLAB, 这是一种直接的方法,利用任何经过培训的分类师来估算:(1) 对现有说明加以综合的每个示例的一致标签;(2) 每个协商一致标签的正确可能性的可信度;(3) 对每个说明员的评级,以量化其标签的整体正确性;在众包中估计相关数量的现有算法往往依赖具有迭代引力的精密基因化模型;CROWDLAB 使用一个直接的加权合用词。现有的算法往往仅仅依赖注释统计,而忽略说明所引出的例子的特征;CROWDLAB使用任何经过有关这些特征培训的分类者模型,从而可以更好地概括具有类似特征的示例。在真实世界多说明者图像数据中,我们提出的方法为(1)-(3)提供了比现有算法(如Daid-Skene/GLAD)的更高级估计。