We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. We show the following intriguing property of our reparameterization: gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. In contrast to many existing works in understanding implicit regularization, we prove that our training trajectory cannot be simulated by mirror descent. We analyze the gradient dynamics of the corresponding regression problem in the general noise setting and obtain minimax-optimal error rates. Compared to existing bounds for implicit sparse regularization using diagonal linear networks, our analysis with the new reparameterization shows improved sample complexity. In the degenerate case of size-one groups, our approach gives rise to a new algorithm for sparse linear regression. Finally, we demonstrate the efficacy of our approach with several numerical experiments.
翻译:我们研究通过新型神经再校准仪将梯度下降到结构化的宽度的隐含规律化,我们称之为对等组合线性神经网络。我们展示了我们重新校准的以下引人入胜的特性:平方回归损失的梯度下降,没有任何明确的正规化,偏向于集体宽度结构的解决方案。与许多现有的理解隐含正规化的工程相比,我们证明我们的训练轨迹无法通过镜状下降模拟。我们分析了一般噪音设置中相应回归问题的梯度动态,并获得了微缩式最佳误差率。与使用对角线网络的隐性稀疏度调整的现有界限相比,我们用新的再校准度分析显示的样本复杂性有所提高。在大小一组的堕落案例中,我们的方法产生了一种稀薄线性回归的新算法。最后,我们用数个实验来展示了我们方法的功效。