The adversarial vulnerability of neural nets, and subsequent techniques to create robust models have attracted significant attention; yet we still lack a full understanding of this phenomenon. Here, we study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods, namely the Neural Tangent Kernel (NTK), following a growing body of work that leverages the NTK approximation to successfully analyze important deep learning phenomena and design algorithms for new applications. We show how NTKs allow to generate adversarial examples in a ``training-free'' fashion, and demonstrate that they transfer to fool their finite-width neural net counterparts in the ``lazy'' regime. We leverage this connection to provide an alternative view on robust and non-robust features, which have been suggested to underlie the adversarial brittleness of neural nets. Specifically, we define and study features induced by the eigendecomposition of the kernel to better understand the role of robust and non-robust features, the reliance on both for standard classification and the robustness-accuracy trade-off. We find that such features are surprisingly consistent across architectures, and that robust features tend to correspond to the largest eigenvalues of the model, and thus are learned early during training. Our framework allows us to identify and visualize non-robust yet useful features. Finally, we shed light on the robustness mechanism underlying adversarial training of neural nets used in practice: quantifying the evolution of the associated empirical NTK, we demonstrate that its dynamics falls much earlier into the ``lazy'' regime and manifests a much stronger form of the well known bias to prioritize learning features within the top eigenspaces of the kernel, compared to standard training.
翻译:神经网的对抗性脆弱性,以及随后创建强大模型的技术,引起了人们的极大关注;然而,我们仍然缺乏对这一现象的充分理解。在这里,我们研究通过将神经网络和内核方法连接起来的最近理论进步所提供的分析工具,即Neal Tangent Kernel(NTK),通过越来越多的工作,利用NTK近似来成功分析重要的深层次学习现象和设计新应用的算法。我们展示了NTK允许以“无培训”的方式生成强有力的对抗性实例,并展示了它们转移给“Lalazy”制度内有限的神经网网络特征来欺骗。我们利用这一链接来提供关于坚固和不固的神经网络特征的替代观点。我们利用NTK近光线来成功分析重要的深层学习现象和设计新应用的算法。我们展示了由神经内分泌(NTNTK) 以“不严谨”的形式更好地了解坚固的和不严固的内向内向内核的神经网络的特性。我们发现,在最早期的训练中,我们所了解的内和最坚固的内核的内核的内核结构。