A growing literature on human-AI decision-making investigates strategies for combining human judgment with statistical models to improve decision-making. Research in this area often evaluates proposed improvements to models, interfaces, or workflows by demonstrating improved predictive performance on "ground truth" labels. However, this practice overlooks a key difference between human judgments and model predictions. Whereas humans reason about broader phenomena of interest in a decision - including latent constructs that are not directly observable, such as disease status, the "toxicity" of online comments, or future "job performance" - predictive models target proxy labels that are readily available in existing datasets. Predictive models' reliance on simplistic proxies makes them vulnerable to various sources of statistical bias. In this paper, we identify five sources of target variable bias that can impact the validity of proxy labels in human-AI decision-making tasks. We develop a causal framework to disentangle the relationship between each bias and clarify which are of concern in specific human-AI decision-making tasks. We demonstrate how our framework can be used to articulate implicit assumptions made in prior modeling work, and we recommend evaluation strategies for verifying whether these assumptions hold in practice. We then leverage our framework to re-examine the designs of prior human subjects experiments that investigate human-AI decision-making, finding that only a small fraction of studies examine factors related to target variable bias. We conclude by discussing opportunities to better address target variable bias in future research.
翻译:有关人类-大赦国际决策的文献不断增长,调查将人类判断与统计模型相结合的战略,以改善决策。这一领域的研究经常通过在“地面真相”标签上展示更好的预测性业绩来评估拟议改进模型、界面或工作流程的建议,但是,这种做法忽略了人类判断和模型预测之间的一个关键区别。人类对决策感兴趣的更广泛现象的人类理由,包括对决策感兴趣的各种潜在结构,包括疾病状况、在线评论的“毒性”或未来的“工作业绩”——预测模型针对现有数据集中现成的代用标签。预测性模型依赖简单化的代用标签,使其易受各种统计偏差来源的影响。在本文件中,我们找出了可能影响人类-大赦国际决策任务中代用标签有效性的五个不同目标偏差来源。我们制定了一个因果框架,以消除在具体人类-大赦国际决策工作中所关切的每一种偏差和澄清之间的关系。我们展示了如何利用我们的框架来阐明在以前的模拟工作中作出的隐含的假设。我们建议评估战略,以便核实这些假设是否在人类-大赦国际的决策工作中具有可变数的实践。我们仅通过研究来分析人类决策框架,从而确定人类选择的可变数选择。