We argue that many properties of fully-connected feedforward neural networks (FCNNs), also called multi-layer perceptrons (MLPs), are explainable from the analysis of a single pair of operations, namely a random projection into a higher-dimensional space than the input, followed by a sparsification operation. For convenience, we call this pair of successive operations expand-and-sparsify following the terminology of Dasgupta. We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature, such as the so-called Lottery Ticket Hypothesis, the surprisingly good performance of randomly-initialized untrained neural networks, the efficacy of Dropout in training and most importantly, the mysterious generalization ability of overparameterized models, first highlighted by Zhang et al. and subsequently identified even in non-neural network models by Belkin et al.
翻译:我们争论说,完全连通的饲料向神经网络(FCNN)的很多特性,也称为多层感应器(MLPs),可以从对单一一对操作的分析中解释,即随机投射到比输入高维空间,然后是静电操作。为方便起见,我们称这对连续的扩展和分解操作为 " Dasgupta " 术语的扩展和分解。我们展示了扩张和分化如何解释文献中讨论的观察到的现象,例如所谓的 " Lotterery Ticket Hypotheis " 、随机初始化的未经训练的神经网络的惊人良好性能、训练中辍学的功效,以及最重要的是,张等人首先强调的、后来甚至在非神经网络模型中也发现过分模型的神秘一般化能力,Belkin等人随后也强调了这一点。