This paper reviews recent studies in understanding neural-network representations and learning neural networks with interpretable/disentangled middle-layer representations. Although deep neural networks have exhibited superior performance in various tasks, the interpretability is always the Achilles' heel of deep neural networks. At present, deep neural networks obtain high discrimination power at the cost of low interpretability of their black-box representations. We believe that high model interpretability may help people to break several bottlenecks of deep learning, e.g., learning from very few annotations, learning via human-computer communications at the semantic level, and semantically debugging network representations. We focus on convolutional neural networks (CNNs), and we revisit the visualization of CNN representations, methods of diagnosing representations of pre-trained CNNs, approaches for disentangling pre-trained CNN representations, learning of CNNs with disentangled representations, and middle-to-end learning based on model interpretability. Finally, we discuss prospective trends in explainable artificial intelligence.
翻译:本文回顾了最近对理解神经-网络表现和学习具有可解释/分解中层表现的神经网络的研究。虽然深神经网络在各种任务方面表现优异,但解释性始终是深神经网络的致命环节。目前,深神经网络以低解黑盒表现为代价获得高度歧视力量。我们认为,高模型可解释性可能有助于人们打破深层次学习的几个瓶颈,例如,从极少的注释中学习,在语义层面通过人-计算机交流学习,在语义层面通过语义解调网络表现。我们侧重于共变神经网络(CNNs),我们重新审视CNN表现的视觉化、预先训练的CNN的辨别方法、在经过训练的CNN陈述中断线的方法、在经过训练的CNN陈述中解密的学习以及基于模型解释性的中端学习。最后,我们讨论了可解释的人工智能中潜在趋势。