Why do biased predictions arise? What interventions can prevent them? We evaluate 8.2 million algorithmic predictions of math performance from $\approx$400 AI engineers, each of whom developed an algorithm under a randomly assigned experimental condition. Our treatment arms modified programmers' incentives, training data, awareness, and/or technical knowledge of AI ethics. We then assess out-of-sample predictions from their algorithms using randomized audit manipulations of algorithm inputs and ground-truth math performance for 20K subjects. We find that biased predictions are mostly caused by biased training data. However, one-third of the benefit of better training data comes through a novel economic mechanism: Engineers exert greater effort and are more responsive to incentives when given better training data. We also assess how performance varies with programmers' demographic characteristics, and their performance on a psychological test of implicit bias (IAT) concerning gender and careers. We find no evidence that female, minority and low-IAT engineers exhibit lower bias or discrimination in their code. However, we do find that prediction errors are correlated within demographic groups, which creates performance improvements through cross-demographic averaging. Finally, we quantify the benefits and tradeoffs of practical managerial or policy interventions such as technical advice, simple reminders, and improved incentives for decreasing algorithmic bias.
翻译:为何会出现偏向预测? 哪些干预措施可以防止这些预测? 我们从$approx$400 AI工程师那里评估了820万数学表现的算法预测,每个工程师在随机分配的实验条件下开发了算法; 我们的治疗武器包括了修改程序者的激励、培训数据、认识和/或对AI道德的技术性知识; 然后我们利用对算法投入的随机审计操作和20K科目的地面真实性数学表现来评估其算法的抽查外预测; 我们发现,偏向预测大多是由有偏差的培训数据造成的。 然而,改进培训数据的好处有三分之一是通过一个新的经济机制产生的: 工程师在提供更好的培训数据时做出更大的努力,对激励做出更积极的反应。 我们还评估了程序人员在人口特征方面的不同,以及他们在对性别和职业的隐性偏差进行心理测试后的表现。 我们没有发现任何证据表明,女性、少数群体和低IAT工程师在其代码中表现出的偏差或歧视程度较低。 但是,我们发现,预测错误与人口群体之间是相互关联的,通过跨人口学平均水平来提高绩效。