示范性自然图像解释有线电视新闻网的动能比最新地物可视化更好。 (Exemplary Natural Images Explain CNN Activations Better than State-of-the-Art Feature Visualization)

Feature visualizations such as synthetic maximally activating images are a widely used explanation method to better understand the information processing of convolutional neural networks (CNNs). At the same time, there are concerns that these visualizations might not accurately represent CNNs' inner workings. Here, we measure how much extremely activating images help humans to predict CNN activations. Using a well-controlled psychophysical paradigm, we compare the informativeness of synthetic images by Olah et al. (2017) with a simple baseline visualization, namely exemplary natural images that also strongly activate a specific feature map. Given either synthetic or natural reference images, human participants choose which of two query images leads to strong positive activation. The experiments are designed to maximize participants' performance, and are the first to probe intermediate instead of final layer representations. We find that synthetic images indeed provide helpful information about feature map activations ($82\pm4\%$ accuracy; chance would be $50\%$). However, natural images - originally intended as a baseline - outperform synthetic images by a wide margin ($92\pm2\%$). Additionally, participants are faster and more confident for natural images, whereas subjective impressions about the interpretability of the feature visualizations are mixed. The higher informativeness of natural images holds across most layers, for both expert and lay participants as well as for hand- and randomly-picked feature visualizations. Even if only a single reference image is given, synthetic images provide less information than natural images ($65\pm5\%$ vs. $73\pm4\%$). In summary, synthetic images from a popular feature visualization method are significantly less informative for assessing CNN activations than natural images. We argue that visualization methods should improve over this baseline.

翻译：合成最大启动图像等可视化功能是人们广泛使用的一种解释方法,以更好地了解合成神经神经网络(CNNs)的信息处理。与此同时,人们担心这些可视化可能无法准确地代表CNN的内在运行。在这里,我们测量了如何使用极多的可视化图像帮助人类预测CNN启动。我们用一种控制良好的心理物理范式,将Olah等人(2017年)的合成图像信息性与简单的基线可视化比较,即示范性自然图像,也强烈激活特定功能图象。考虑到合成或自然参考图像,人类参与者选择了两种可查图象导致强烈的可视性激活。这些实验旨在最大限度地提高参与者的性能,是首先探测中间图象,而不是最后的图象。我们发现合成图像确实提供了有关功能启动功能的有用信息(82 pm4 ⁇ $;机会为50 美元美元 ) 。然而,自然图像(最初的基底线-直观合成图像比远处(92\pm2美元) 。此外,参与者对自然图像的参考性参考性更快、更有信心,而自然图像的直观性更难判分化方法更难。