We study the average robustness notion in deep neural networks in (selected) wide and narrow, deep and shallow, as well as lazy and non-lazy training settings. We prove that in the under-parameterized setting, width has a negative effect while it improves robustness in the over-parameterized setting. The effect of depth closely depends on the initialization and the training mode. In particular, when initialized with LeCun initialization, depth helps robustness with the lazy training regime. In contrast, when initialized with Neural Tangent Kernel (NTK) and He-initialization, depth hurts the robustness. Moreover, under the non-lazy training regime, we demonstrate how the width of a two-layer ReLU network benefits robustness. Our theoretical developments improve the results by [Huang et al. NeurIPS21; Wu et al. NeurIPS21] and are consistent with [Bubeck and Sellke NeurIPS21; Bubeck et al. COLT21].
翻译:我们在(选定的)宽窄、深浅、狭窄、深浅以及懒惰和非懒惰的训练环境中研究深神经网络的平均稳健性概念。我们证明,在参数不足的环境下,宽度具有负面效应,而这种效应在超参数化环境中则能提高稳健性。深度的效果密切取决于初始化和培训模式。特别是,在采用LeCun初始化程序时,深度有助于与懒惰的培训制度保持稳健性。相比之下,当与Neural Tangent Kernel(NTK)和He初始化(NTK)一起启动时,深度会损害稳健性。此外,在非懒惰的培训制度下,我们展示了两层再生网络的宽度如何有利于稳健性。我们的理论发展改进了[Huang等人 NeurIPS21;Wu等人NeurIPS21]的成果,并与[Bubeck和Sellke NeurIPS21;Bubeck等人COLT21]相一致。