In many practical applications of AI, an AI model is used as a decision aid for human users. The AI provides advice that a human (sometimes) incorporates into their decision-making process. The AI advice is often presented with some measure of "confidence" that the human can use to calibrate how much they depend on or trust the advice. In this paper, we present an initial exploration that suggests showing AI models as more confident than they actually are, even when the original AI is well-calibrated, can improve human-AI performance (measured as the accuracy and confidence of the human's final prediction after seeing the AI advice). We first train a model to predict human incorporation of AI advice using data from thousands of human-AI interactions. This enables us to explicitly estimate how to transform the AI's prediction confidence, making the AI uncalibrated, in order to improve the final human prediction. We empirically validate our results across four different tasks--dealing with images, text and tabular data--involving hundreds of human participants. We further support our findings with simulation analysis. Our findings suggest the importance of jointly optimizing the human-AI system as opposed to the standard paradigm of optimizing the AI model alone.
翻译:在AI的许多实际应用中,AI模型被用来作为人类用户的决策辅助工具。AI提供建议,建议人(有时)将人(有时)纳入他们的决策过程。AI建议常常以某种“信心”表示,人类可以使用这种“信心”来校准他们依赖或信任建议的程度。在本文中,我们提出初步探索,表明AI模型比实际更加自信,即使最初的AI得到很好校准,也可以提高人类AI的性能(在看到AI建议后,作为人类最后预测的准确性和信心衡量)。我们首先培训一种模型,利用人类-AI互动的数千个数据来预测AI建议是否纳入人类。这使我们能够明确估计如何改变AI的预测信心,使AI不加校正,以改进人类最后的预测。我们用图像、文字和表格数据对涉及数百人的四种不同任务的结果进行了实证。我们通过模拟分析进一步支持我们的研究结果。我们的调查结果表明,必须共同优化AI系统,而不是只优化AI的标准模式。