A growing literature studies how humans incorporate advice from algorithms. This study examines an algorithm with millions of daily users: ChatGPT. We conduct a lab experiment in which 118 student participants answer 2,828 multiple-choice questions across 25 academic subjects. We present participants with answers from a GPT model and allow them to update their initial responses. We find that the advisor's identity ("AI chatbot" versus a human "expert"), presence of written justification, and advice correctness do not significant affect weight on advice. Instead, we show that participants weigh advice more heavily if they (1) are unfamiliar with the topic, (2) used ChatGPT in the past, or (3) received more accurate advice previously. These three effects -- task difficulty, algorithm familiarity, and experience, respectively -- appear to be stronger with an AI chatbot as the advisor. Moreover, we find that participants are able to place greater weight on correct advice only when written justifications are provided. In a parallel analysis, we find that the student participants are miscalibrated and significantly underestimate the accuracy of ChatGPT on 10 of 25 topics. Students under-weigh advice by over 50% and would have scored better if they trusted ChatGPT more.
翻译:暂无翻译