Neural networks have greatly boosted performance in computer vision by learning powerful representations of input data. The drawback of end-to-end training for maximal overall performance are black-box models whose hidden representations are lacking interpretability: Since distributed coding is optimal for latent layers to improve their robustness, attributing meaning to parts of a hidden feature vector or to individual neurons is hindered. We formulate interpretation as a translation of hidden representations onto semantic concepts that are comprehensible to the user. The mapping between both domains has to be bijective so that semantic modifications in the target domain correctly alter the original representation. The proposed invertible interpretation network can be transparently applied on top of existing architectures with no need to modify or retrain them. Consequently, we translate an original representation to an equivalent yet interpretable one and backwards without affecting the expressiveness and performance of the original. The invertible interpretation network disentangles the hidden representation into separate, semantically meaningful concepts. Moreover, we present an efficient approach to define semantic concepts by only sketching two images and also an unsupervised strategy. Experimental evaluation demonstrates the wide applicability to interpretation of existing classification and image generation networks as well as to semantically guided image manipulation.
翻译:• 由于分布式编码是潜在层的最佳方法,可以提高潜伏层的稳健性,将含义归属于隐藏特性矢量的某些部分或单个神经元,因此,我们将隐蔽的表达方式转化为用户可以理解的语义概念,因此,两个域之间的映射必须是双向的,以便目标域的语义修改能够正确改变最初的表达方式。提议的不可视地解释网络可以在现有结构的顶部透明应用,而无需修改或再培训这些结构。因此,我们将原始的表达方式翻译为同等的、可解释的和后向的,同时不影响原始的表达方式和性能。不可视地解释网络将隐藏的表达方式转换为用户可以理解的语义概念。此外,我们提出一种有效的方法,通过只绘制两个图像的草图和一种非可视化的战略来定义语义概念。