A precise understanding of why units in an artificial network respond to certain stimuli would constitute a big step towards explainable artificial intelligence. One widely used approach towards this goal is to visualize unit responses via activation maximization. These synthetic feature visualizations are purported to provide humans with precise information about the image features that cause a unit to be activated - an advantage over other alternatives like strongly activating natural dataset samples. If humans indeed gain causal insight from visualizations, this should enable them to predict the effect of an intervention, such as how occluding a certain patch of the image (say, a dog's head) changes a unit's activation. Here, we test this hypothesis by asking humans to decide which of two square occlusions causes a larger change to a unit's activation. Both a large-scale crowdsourced experiment and measurements with experts show that on average the extremely activating feature visualizations by Olah et al. (2017) indeed help humans on this task ($68 \pm 4$% accuracy; baseline performance without any visualizations is $60 \pm 3$%). However, they do not provide any substantial advantage over other visualizations (such as e.g. dataset samples), which yield similar performance ($66\pm3$% to $67 \pm3$% accuracy). Taken together, we propose an objective psychophysical task to quantify the benefit of unit-level interpretability methods for humans, and find no evidence that a widely-used feature visualization method provides humans with better "causal understanding" of unit activations than simple alternative visualizations.
翻译:精确理解人工网络中的单位为何响应某些刺激, 将构成向可解释的人工智能迈出的一大步。 实现此目标的一个广泛使用的方法是通过激活最大化来直观单位反应。 这些合成特征可视化旨在向人类提供精确信息, 显示一个单元的图像特征, 从而激活一个单元—— 比其他替代方法( 比如大力激活自然数据集样本) 的优势。 如果人类确实从可视化中获得了因果洞察力, 这应该使他们能够预测干预的效果, 比如图像的某些补丁( 比如狗头) 如何隐化改变一个单元的激活。 在这里, 我们广泛使用的方法是通过激活最大化的单位反应。 这些合成特征可视化功能旨在向人类提供精确的精确度 3 。 与专家一起进行的大规模众包实验和测量显示, 平均而言, 由Olah 等人( 2017) 所显示的极具激活特性的特征可视化特性( 68\ pm 4 % 准确性; 基线性性能改变一个单位的激活值是60\ pm 3$ 。 但是, 我们通过要求人类的两种正位的直观性精确性分析方法, 提供比其它直观的精确的精确性数据 6 。