We study the implicit regularization phenomenon induced by simple optimization algorithms in over-parameterized nonlinear statistical models. Specifically, we study both vector and matrix single index models where the link function is nonlinear and unknown, the signal parameter is either a sparse vector or a low-rank symmetric matrix, and the response variable can be heavy-tailed. To gain a better understanding of the role played by implicit regularization in the nonlinear models without excess technicality, we assume that the distribution of the covariates is known a priori. For both the vector and matrix settings, we construct an over-parameterized least-squares loss function by employing the score function transform and a robust truncation step designed specifically for heavy-tailed data. We propose to estimate the true parameter by applying regularization-free gradient descent to the loss function. When the initialization is close to the origin and the stepsize is sufficiently small, we prove that the obtained solution achieves minimax optimal statistical rates of convergence in both the vector and matrix cases. In particular, for the vector single index model with Gaussian covariates, our proposed estimator is shown to further enjoy the oracle statistical rate. Our results capture the implicit regularization phenomenon in over-parameterized nonlinear and noisy statistical models with possibly heavy-tailed data.
翻译:我们研究过量参数非线性非线性统计模型中的简单优化算法引出的隐性正规化现象。 具体地说, 我们研究的是非线性和非线性非线性统计模型中的简单优化算法引出的隐性正规化现象。 具体地说, 我们研究的是链接功能为非线性和未知的矢量和矩阵单一指数模型, 信号参数或是一个稀疏的矢量或低等级的对称矩阵, 反应变量可能是重度的。 为了更好地了解非线性模型中隐性正规化的作用, 而没有超量的技术性, 我们假设共变异体的分布是众所周知的。 对于矢量和矩阵设置来说, 我们通过使用分数函数变换, 和专门为重度数据设计的强度脱轨步骤, 来构建一个超度最小的最小度最小度最小的指数损失值损失函数。 我们提议通过对损失函数应用无规范的梯度梯度梯度下降来估计真正的参数。 当初始化和梯度接近初始化时, 我们证明获得的解决方案在矢量和矩阵中都达到了最小最佳的统计一致的统计率率, 我们的统计模型将进一步显示为不甚高度的统计标准。