Models with nonlinear architectures/parameterizations such as deep neural networks (DNNs) are well known for their mysteriously good generalization performance at overparameterization. In this work, we tackle this mystery from a novel perspective focusing on the transition of the target recovery/fitting accuracy as a function of the training data size. We propose a rank stratification for general nonlinear models to uncover a model rank as an "effective size of parameters" for each function in the function space of the corresponding model. Moreover, we establish a linear stability theory proving that a target function almost surely becomes linearly stable when the training data size equals its model rank. Supported by our experiments, we propose a linear stability hypothesis that linearly stable functions are preferred by nonlinear training. By these results, model rank of a target function predicts a minimal training data size for its successful recovery. Specifically for the matrix factorization model and DNNs of fully-connected or convolutional architectures, our rank stratification shows that the model rank for specific target functions can be much lower than the size of model parameters. This result predicts the target recovery capability even at heavy overparameterization for these nonlinear models as demonstrated quantitatively by our experiments. Overall, our work provides a unified framework with quantitative prediction power to understand the mysterious target recovery behavior at overparameterization for general nonlinear models.
翻译:具有非线性结构/参数模型的模型,如深神经网络(DNNs),以其超度参数化的神秘而良好的概括性表现而著称。在这项工作中,我们从新颖的角度处理这一谜题,重点是目标恢复/精确度的过渡,以此作为培训数据大小的函数。我们建议一般非线性模型的等级分级,以发现模型的等级,作为相应模型功能空间中每个函数的“有效参数大小”。此外,我们建立了一个线性稳定理论,证明当培训数据大小等于模型等级时,目标函数几乎肯定会线性稳定。我们通过实验,我们提出了一个线性稳定假设,即非线性稳定功能更受非线性培训的偏好。根据这些结果,目标函数的模型的等级为成功恢复预测最低培训数据大小。具体为矩阵要素化模型和完全连接或革命性结构的DNNN,我们的级稳定度理论表明,具体目标函数的模型等级可能大大低于模型的大小。通过我们的实验模型预测了不精确的恢复能力,甚至以高度的量化模型的方式向我们展示了在一般的量化模型上展示了我们的恢复能力。