Evaluating methods of explainable artificial intelligence (XAI) is challenging, because the fidelity of an explanation to the AI model does not necessarily go hand in hand with its interpretability for humans. For instance, when classifying images with Convolutional Neural Networks (CNN), XAI algorithms can explain which image areas had an impact on the CNN decision. However, it is unclear whether the areas that best reflect the CNN internal data processing will also make most sense to humans. Thus, the present study investigated whether the image classification of humans and CNN is supported by the same explanations. To assess such differences in interpretability, human participants and a CNN classified image segments that were considered most informative either by other humans (as revealed by eye movements and manual selection) or by two XAI methods (Grad-CAM and XRAI). In three experiments, humans classified and rated these segments, and we also had a CNN classify them. The results indicated that the respective interpretability of the two XAI methods strongly depended on image type, both for humans and CNN. Moreover, human classification performance was highest with human segments, regardless of how they were generated (i.e., from eye movements or manual selection), whereas the type of human segment had major impacts on CNN classification performance. Our results caution against general statements about the interpretability of explanations, as this interpretability varies with the explanation method, the explanations to be interpreted, and the agent who needs to perform the interpretation.
翻译:暂无翻译