Representation learning, i.e. the generation of representations useful for downstream applications, is a task of fundamental importance that underlies much of the success of deep neural networks (DNNs). Recently, robustness to adversarial examples has emerged as a desirable property for DNNs, spurring the development of robust training methods that account for adversarial examples. In this paper, we aim to understand how the properties of representations learned by robust training differ from those obtained from standard, non-robust training. This is critical to diagnosing numerous salient pitfalls in robust networks, such as, degradation of performance on benign inputs, poor generalization of robustness, and increase in over-fitting. We utilize a powerful set of tools known as representation similarity metrics, across three vision datasets, to obtain layer-wise comparisons between robust and non-robust DNNs with different training procedures, architectural parameters and adversarial constraints. Our experiments highlight hitherto unseen properties of robust representations that we posit underlie the behavioral differences of robust networks. We discover a lack of specialization in robust networks' representations along with a disappearance of `block structure'. We also find overfitting during robust training largely impacts deeper layers. These, along with other findings, suggest ways forward for the design and training of better robust networks.
翻译:代表学习,即形成对下游应用有用的代表,是一项至关重要的任务,是深层神经网络成功与否的重要基础。 最近,对竞争实例的有力性已成为DNN公司可取的财产,促使开发强有力的培训方法,以说明对抗性实例。在本文件中,我们的目的是了解强力培训所学到的表述特征与标准、非野蛮培训所学到的表述特征有何不同。这对于分析强势网络中许多显著缺陷至关重要,例如良性投入性能下降、强性不够普遍化以及过度适应性增加。我们利用一套强有力的工具,称为代表相似性指标,跨越三个愿景数据集,以获得强势和非勃性DNNNN公司之间不同培训程序、建筑参数和对抗性制约的层次比较。我们的实验突出了我们迄今所发现的强势表现特征,这些特征是我们所揭示的强势网络行为差异的隐秘性。我们发现,在强大的网络中缺乏专业化,同时缺乏其他“阻力结构”的消失。我们利用一套强有力的工具,即代表相似性指标,在三个愿景数据集中,我们还发现这些在更牢固的培训中发现这些层次上有很大的影响。