Human-AI collaboration for decision-making strives to achieve team performance that exceeds the performance of humans or AI alone. However, many factors can impact success of Human-AI teams, including a user's domain expertise, mental models of an AI system, trust in recommendations, and more. This work examines users' interaction with three simulated algorithmic models, all with similar accuracy but different tuning on their true positive and true negative rates. Our study examined user performance in a non-trivial blood vessel labeling task where participants indicated whether a given blood vessel was flowing or stalled. Our results show that while recommendations from an AI-Assistant can aid user decision making, factors such as users' baseline performance relative to the AI and complementary tuning of AI error types significantly impact overall team performance. Novice users improved, but not to the accuracy level of the AI. Highly proficient users were generally able to discern when they should follow the AI recommendation and typically maintained or improved their performance. Mid-performers, who had a similar level of accuracy to the AI, were most variable in terms of whether the AI recommendations helped or hurt their performance. In addition, we found that users' perception of the AI's performance relative on their own also had a significant impact on whether their accuracy improved when given AI recommendations. This work provides insights on the complexity of factors related to Human-AI collaboration and provides recommendations on how to develop human-centered AI algorithms to complement users in decision-making tasks.
翻译:人类-大赦国际在决策方面的合作努力达到团队业绩超过人或AI本身的绩效。然而,许多因素可以影响人类-AI团队的成功,包括用户的域内专长、AI系统的精神模型、AI系统的精神模型、对建议的信任等等。这项工作审查了用户与三种模拟算法模型的互动,三种模拟算法模型的精确度相似,但对其真实正负率和真实负率的调整也不同。我们的研究审查了在非三进制血液容器标签任务中的用户业绩,其中参与者表示某一血管容器是流动还是停滞。我们的结果显示,尽管AI-AA助理的建议可以帮助用户决策,但用户相对于AI的基线业绩和对AI错误的补充性调整等因素会大大影响团队的总体业绩。Novice用户改进了,但没有达到AI的准确度。 高精准用户通常能够辨别何时应该遵循AI建议,并且通常保持或改进他们的性能。与AI的准确度相当的中产者在AI建议是否有助于或损害其业绩方面差异最大。此外,我们发现,当用户对AI建议的相对准确性对AI建议提出了对其自身的准确性评估时,他们的工作是否对AI的准确性提出了关于其工作的影响。