Recent work has shown the potential benefit of selective prediction systems that can learn to defer to a human when the predictions of the AI are unreliable, particularly to improve the reliability of AI systems in high-stakes applications like healthcare or conservation. However, most prior work assumes that human behavior remains unchanged when they solve a prediction task as part of a human-AI team as opposed to by themselves. We show that this is not the case by performing experiments to quantify human-AI interaction in the context of selective prediction. In particular, we study the impact of communicating different types of information to humans about the AI system's decision to defer. Using real-world conservation data and a selective prediction system that improves expected accuracy over that of the human or AI system working individually, we show that this messaging has a significant impact on the accuracy of human judgements. Our results study two components of the messaging strategy: 1) Whether humans are informed about the prediction of the AI system and 2) Whether they are informed about the decision of the selective prediction system to defer. By manipulating these messaging components, we show that it is possible to significantly boost human performance by informing the human of the decision to defer, but not revealing the prediction of the AI. We therefore show that it is vital to consider how the decision to defer is communicated to a human when designing selective prediction systems, and that the composite accuracy of a human-AI team must be carefully evaluated using a human-in-the-loop framework.
翻译:最近的工作显示了选择性预测系统的潜在好处,这些系统在AI的预测不可靠时可以学习向人类顺从,特别是为了提高AI系统在保健或保护等高端应用中的可靠性。然而,大多数先前的工作假设,当人类作为人类-AI团队的一部分而不是本身解决预测任务时,人类行为将保持不变。我们表明,在有选择性的预测中进行量化人类-AI互动的实验,情况并非如此。特别是,我们研究了向人类传递关于AI系统推迟决定的不同类型信息的影响。我们利用现实世界保护数据和选择性预测系统提高AI系统的预期准确性,以提高人类或AI系统单独运行的预期准确性。我们发现,这一信息对人类判断的准确性有重大影响。我们的成果研究信息传递战略的两个组成部分:(1) 人类是否被告知对AI系统的预测,和(2) 是否告知他们关于选择性预测系统推迟决定的信息。我们通过操纵这些信息系统来表明,通过向人类通报决定推迟执行决定的准确性是有可能大大提高人类业绩的,因此,在使用选择性的AI时,我们必须谨慎地评估对决定的准确性作出预测。