Overparametrized neural networks, where the number of active parameters is larger than the sample size, prove remarkably effective in modern deep learning practice. From the classical perspective, however, much fewer parameters are sufficient for optimal estimation and prediction, whereas overparametrization can be harmful even in the presence of explicit regularization. To reconcile this conflict, we present a generalization theory for overparametrized ReLU networks by incorporating an explicit regularizer based on the scaled variation norm. Interestingly, this regularizer is equivalent to the ridge from the angle of gradient-based optimization, but is similar to the group lasso in terms of controlling model complexity. By exploiting this ridge-lasso duality, we show that overparametrization is generally harmless to two-layer ReLU networks. In particular, the overparametrized estimators are minimax optimal up to a logarithmic factor. By contrast, we show that overparametrized random feature models suffer from the curse of dimensionality and thus are suboptimal.
翻译:过度平衡的神经网络,其主动参数数量大于抽样规模,在现代深层学习实践中证明非常有效。 但是,从古典角度看,参数要少得多,就足以进行最佳估计和预测,而过度平衡即使存在明确的正规化,也可能有害。为了调和这一冲突,我们提出了一个过度平衡的ReLU网络的概括理论,根据比例变异规范引入一个明确的常规化器。有趣的是,这一常规化器相当于从梯度优化角度出发的脊脊,但在控制模型复杂性方面与群状正方形相似。通过利用这种脊脊-弧索的双重性,我们表明过度平衡通常对双层的ReLU网络无害。特别是,过度平衡的估量是最小的负负值,与逻辑性系数相匹配。相比之下,我们发现过度匹配的随机特征模型受到维度的诅咒,因此不那么完美。