With machine learning models being increasingly used to aid decision making even in high-stakes domains, there has been a growing interest in developing interpretable models. Although many supposedly interpretable models have been proposed, there have been relatively few experimental studies investigating whether these models achieve their intended effects, such as making people more closely follow a model's predictions when it is beneficial for them to do so or enabling them to detect when a model has made a mistake. We present a sequence of pre-registered experiments(N=3,800) in which we showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box). Predictably, participants who saw a clear model with few features could better simulate the model's predictions. However, we did not find that participants more closely followed its predictions. Furthermore, showing participants a clear model meant that they were less able to detect and correct for the model's sizable mistakes, seemingly due to information overload. These counterintuitive findings emphasize the importance of testing over intuition when developing interpretable models.
翻译:随着机器学习模型越来越多地被用来帮助决策,即使是在高镜头领域,人们也越来越有兴趣开发可解释模型。虽然提出了许多所谓可解释模型,但相对较少的实验研究调查这些模型是否达到预期效果,例如使人们在有利于他们时更密切地遵循模型的预测,或者在模型出错时能够发现模型的预测。我们提出了一系列预先登记的实验(N=3 800),在其中我们向参与者展示了功能相同的模型,这些模型在通常认为使机器学习模型更难或更难解释的两个因素上有所不同:特征的数量和模型的透明度(即模型内部的特征是清晰的还是黑盒子)。可以预测的是,那些看到清晰模型的模型的人可以更好地模拟模型的预测。然而,我们没有发现参与者更密切地遵循模型的预测。此外,向参与者展示一个明确的模型意味着他们不太能够检测和纠正模型的可测量错误,似乎由于信息过重。这些反直觉的调查结果强调在开发可解释模型时测试直觉的重要性。