An interesting development in automatic visual recognition has been the emergence of tasks where it is not possible to assign objective labels to images, yet still feasible to collect annotations that reflect human judgements about them. Machine learning-based predictors for these tasks rely on supervised training that models the behavior of the annotators, i.e., what would the average person's judgement be for an image? A key open question for this type of work, especially for applications where inconsistency with human behavior can lead to ethical lapses, is how to evaluate the epistemic uncertainty of trained predictors, i.e., the uncertainty that comes from the predictor's model. We propose a Bayesian framework for evaluating black box predictors in this regime, agnostic to the predictor's internal structure. The framework specifies how to estimate the epistemic uncertainty that comes from the predictor with respect to human labels by approximating a conditional distribution and producing a credible interval for the predictions and their measures of performance. The framework is successfully applied to four image classification tasks that use subjective human judgements: facial beauty assessment, social attribute assignment, apparent age estimation, and ambiguous scene labeling.
翻译:在自动视觉识别方面,一个令人感兴趣的发展是,在无法对图像指定客观标签的情况下,出现了一项任务,但是仍然可以收集反映人类对其判断的注释。这些任务的机器学习预测器依赖于监督培训,以模拟注解器的行为,即普通人的判断对图像来说是什么?对于这类工作来说,一个关键的未决问题,尤其是对于与人类行为不一致可能导致道德失误的应用而言,是如何评价受过训练的预测器的共性不确定性,即预测器模型产生的不确定性。我们提议了一个贝叶西亚框架,用于评价这个系统中的黑盒预测器,预测器的内部结构的不可知性。该框架规定如何通过适应有条件分布和为预测及其性能衡量提供可信的间隔,来估计预测器对人类标签的共性不确定性。这个框架成功地应用了四种人类主观判断的图像分类任务:面容美度评估、社会属性分配、显眼的年龄估计和模糊的场景标签。