In AI-assisted decision-making, effective hybrid (human-AI) teamwork is not solely dependent on AI performance alone, but also on its impact on human decision-making. While prior work studies the effects of model accuracy on humans, we endeavour here to investigate the complex dynamics of how both a model's predictive performance and bias may transfer to humans in a recommendation-aided decision task. We consider the domain of ML-assisted hiring, where humans -- operating in a constrained selection setting -- can choose whether they wish to utilize a trained model's inferences to help select candidates from written biographies. We conduct a large-scale user study leveraging a re-created dataset of real bios from prior work, where humans predict the ground truth occupation of given candidates with and without the help of three different NLP classifiers (random, bag-of-words, and deep neural network). Our results demonstrate that while high-performance models significantly improve human performance in a hybrid setting, some models mitigate hybrid bias while others accentuate it. We examine these findings through the lens of decision conformity and observe that our model architecture choices have an impact on human-AI conformity and bias, motivating the explicit need to assess these complex dynamics prior to deployment.
翻译:在AI协助的决策中,有效的混合(人类-AI)协同工作不仅完全取决于AI的绩效本身,而且取决于它对人类决策的影响。在先前的工作研究模型准确性对人类的影响的同时,我们在此努力调查模型预测性业绩和偏见如何在建议辅助决策任务中向人类转移的复杂动态。我们考虑ML协助雇用的范围,在这个范围内,人类 -- -- 在有限的选择环境中运作 -- -- 可以选择他们是否希望利用经过训练的模型推论来帮助从书面的传记中挑选候选人。我们进行大规模用户研究,利用从以前工作中重新创建的关于真实生物的数据集,在以前的工作中,人类预测给候选人的地面真实性职业,而没有三个不同的国家实验室分类员(随机、书包和深层神经网络)的帮助。我们的结果表明,虽然高性模型在混合环境中大大改进了人类的绩效,但有些模型减轻了混合偏见,而另一些则更加突出。我们从决定一致性的角度审视了这些研究结果,并观察我们的模型结构在人类部署之前的复杂性动态方面,评估了这些模型的内在性。