Representation learning, i.e. the generation of representations useful for downstream applications, is a task of fundamental importance that underlies much of the success of deep neural networks (DNNs). Recently, robustness to adversarial examples has emerged as a desirable property for DNNs, spurring the development of robust training methods that account for adversarial examples. In this paper, we aim to understand how the properties of representations learned by robust training differ from those obtained from standard, non-robust training. This is critical to diagnosing numerous salient pitfalls in robust networks, such as, degradation of performance on benign inputs, poor generalization of robustness, and increase in over-fitting. We utilize a powerful set of tools known as representation similarity metrics, across three vision datasets, to obtain layer-wise comparisons between robust and non-robust DNNs with different architectures, training procedures and adversarial constraints. Our experiments highlight hitherto unseen properties of robust representations that we posit underlie the behavioral differences of robust networks. We discover a lack of specialization in robust networks' representations along with a disappearance of `block structure'. We also find overfitting during robust training largely impacts deeper layers. These, along with other findings, suggest ways forward for the design and training of better robust networks.
翻译:代表学习,即形成对下游应用有用的代表,是一项至关重要的任务,是深层神经网络成功与否的重要基础。 最近,对竞争实例的有力性已成为DNN公司可取的财产,促使开发强有力的培训方法,以说明对抗性实例。在本文件中,我们的目的是了解强力培训所学到的表述特征与标准、非野蛮培训所学到的表述特征有何不同。这对于分析强势网络中许多显著缺陷至关重要,如良性投入的性能退化、稳健性不够普及以及过度装配的增加。我们利用一套强有力的工具,称为代表相似性指标,跨越三个愿景数据集,以获得强有力的培训方法,在不同结构、培训程序和对抗性制约下进行分层比较。我们的实验突出了强势培训的前所未有的强健性特征,这些特征是我们构建强势网络行为差异的基础。我们发现,在强大的网络中缺乏专业化,同时缺乏其他“阻力结构”的消失。我们还利用一套强有力的工具,即代表相似性指标,在三个愿景数据集中,我们发现这些在培训过程中发现强健健健的层次上。