We study the average robustness notion in deep neural networks in (selected) wide and narrow, deep and shallow, as well as lazy and non-lazy training settings. We prove that in the under-parameterized setting, width has a negative effect while it improves robustness in the over-parameterized setting. The effect of depth closely depends on the initialization and the training mode. In particular, when initialized with LeCun initialization, depth helps robustness with lazy training regime. In contrast, when initialized with Neural Tangent Kernel (NTK) and He-initialization, depth hurts the robustness. Moreover, under non-lazy training regime, we demonstrate how the width of a two-layer ReLU network benefits robustness. Our theoretical developments improve the results by Huang et al. [2021], Wu et al. [2021] and are consistent with Bubeck and Sellke [2021], Bubeck et al. [2021].
翻译:我们研究深神经网络在(选定的)宽窄、窄窄、深浅、懒惰和非懒惰训练环境中的平均稳健性概念。我们证明,在参数不足的环境下,宽度具有负面效应,而这种宽度则在超参数化的环境下提高稳健性。深度效应密切取决于初始化和培训模式。特别是,当与LeCun初始化初始化后,深度有助于懒惰训练制度的稳健性。相比之下,在与Neural Tangent Kernel(NTK)和He初始化后,深度会损害稳健性。此外,在非疲软训练制度下,我们展示了两层再解放铀网络的宽度如何有利于稳健性。我们的理论发展改善了黄等人[2021]、吴等人[2021]的成果,并与Bubeck和Sellke [2021]、Bubeck等人[2021]、Bubeck等人[2021]。