Machine learning (ML) is increasingly being used to support high-stakes decisions, a trend owed in part to its promise of superior predictive power relative to human assessment. However, there is frequently a gap between decision objectives and what is captured in the observed outcomes used as labels to train ML models. As a result, machine learning models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. In this work, we explore the use of historical expert decisions as a rich -- yet imperfect -- source of information that is commonly available in organizational information systems, and show that it can be leveraged to bridge the gap between decision objectives and algorithm objectives. We consider the problem of estimating expert consistency indirectly when each case in the data is assessed by a single expert, and propose influence function-based methodology as a solution to this problem. We then incorporate the estimated expert consistency into a predictive model through a training-time label amalgamation approach. This approach allows ML models to learn from experts when there is inferred expert consistency, and from observed labels otherwise. We also propose alternative ways of leveraging inferred consistency via hybrid and deferral models. In our empirical evaluation, focused on the context of child maltreatment hotline screenings, we show that (1) there are high-risk cases whose risk is considered by the experts but not wholly captured in the target labels used to train a deployed model, and (2) the proposed approach significantly improves precision for these cases.
翻译:机械学习(ML)正越来越多地被用来支持高级决策,这种趋势部分归功于其对人类评估具有超强预测能力的承诺;然而,决策目标与用作培训ML模型标签的观察结果所捕捉到的结果之间往往存在差距;结果,机器学习模式可能无法捕捉决定标准的重要层面,妨碍其决策支持的效用;在这项工作中,我们探索历史专家决定作为一种组织信息系统中常见的丰富 -- -- 但不完善 -- -- 信息来源的使用情况,并表明可以利用它弥补决策目标与算法目标之间的差距;我们考虑在由单一专家评估数据中每个案例时间接估计专家一致性的问题,并提出基于影响力的功能方法,作为解决这一问题的一种解决办法;因此,机器学习模式可能无法捕捉到决定标准的重要层面,从而妨碍其决策支持作用;在推断专家一致性时,以及从其他观察标签中,我们探索了向专家学习。 我们还提出了通过混合和延迟模型推断一致性的替代方法。 在我们的经验评估中,我们考虑间接地估算了专家的一致性问题,提出以基于影响力为基础的方法作为解决这一问题,我们考虑过的儿童虐待问题热线案例。