The field explainable artificial intelligence (XAI) has brought about an arsenal of methods to render Machine Learning (ML) predictions more interpretable. But how useful explanations provided by transparent ML methods are for humans remains difficult to assess. Here we investigate the quality of interpretable computer vision algorithms using techniques from psychophysics. In crowdsourced annotation tasks we study the impact of different interpretability approaches on annotation accuracy and task time. We compare these quality metrics with classical XAI, automated quality metrics. Our results demonstrate that psychophysical experiments allow for robust quality assessment of transparency in machine learning. Interestingly the quality metrics computed without humans in the loop did not provide a consistent ranking of interpretability methods nor were they representative for how useful an explanation was for humans. These findings highlight the potential of methods from classical psychophysics for modern machine learning applications. We hope that our results provide convincing arguments for evaluating interpretability in its natural habitat, human-ML interaction, if the goal is to obtain an authentic assessment of interpretability.
翻译:现场可以解释的人工智能(XAI)带来了使机器学习(ML)预测更易于解释的一整套方法。但是,透明的ML方法为人类提供的有用解释仍然难以评估。在这里,我们调查了使用心理物理学技术的可解释计算机视觉算法的质量。在多方源说明任务中,我们研究了不同解释方法对注释准确性和任务时间的影响。我们将这些质量指标与经典XAI(自动质量指标)进行比较。我们的结果表明,心理物理实验使得能够对机器学习的透明度进行强有力的质量评估。有趣的是,在没有人类在循环中进行计算的质量指标没有提供一致的可解释方法的等级,也没有代表对人类作出如何有用的解释。这些结论强调了传统心理物理方法对现代机器学习应用的潜力。我们希望,我们的结果为评价其自然生境的可解释性提供了令人信服的论据,即人类-ML互动,如果目标是获得对可解释性的真实评估的话。