We study the Stochastic Gradient Descent (SGD) algorithm in nonparametric statistics: kernel regression in particular. The directional bias property of SGD, which is known in the linear regression setting, is generalized to the kernel regression. More specifically, we prove that SGD with moderate and annealing step-size converges along the direction of the eigenvector that corresponds to the largest eigenvalue of the Gram matrix. In addition, the Gradient Descent (GD) with a moderate or small step-size converges along the direction that corresponds to the smallest eigenvalue. These facts are referred to as the directional bias properties; they may interpret how an SGD-computed estimator has a potentially smaller generalization error than a GD-computed estimator. The application of our theory is demonstrated by simulation studies and a case study that is based on the FashionMNIST dataset.
翻译:在非参数统计中,我们研究了Stochatic 梯度底部(SGD)算法:特别是内核回归。在线性回归环境中已知的SGD的方向偏向属性,被普遍化为内核回归。更具体地说,我们证明,SGD的中度和反射级级步数在与Gram矩阵最大等值相对应的源源体方向上会合。此外,GD的梯度底部(GD)在与最小的源值相对应的方向上会合中度或小步数。这些事实被称为方向偏向属性。这些事实被称为方向偏向性属性;它们可能解释SGD-计算估计的天体数的误差如何小于GD的测算天体。我们理论的应用通过模拟研究和基于Fashian MNIST数据集的案例研究得到证明。