We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In the traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the squared loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training. In experiments, we verify our analyses on the $L_2$-regularized logistic regression.
翻译:我们考虑了在复制内核Hilbert 空间时的悬浮梯度下降及其二进制分类问题的平均值变量。在使用损失功能一致性属性的传统分析中,已知预期的分类错误比预期的风险要慢一些,即使假设有条件标签概率的低噪条件,也比预期的分类错误要慢一些。因此,由此产生的比率是次线性。因此,必须考虑预期分类错误能否更快地达到趋同。在最近的研究中,随机梯度下降的指数趋同率在强烈的低噪声状态下显示,但提供的理论分析限于平方损失功能,而对于二进制分类任务来说,这种功能有些不够。在本文中,我们显示了在类似假设下,在悬浮梯度梯度梯度下降功能的最后阶段,预期的分类错误会成指数性趋同。关于平均悬浮梯度梯度梯度下降率,我们从培训的早期阶段就显示出同样的趋同的趋同率。在试验中,我们核查了我们对$L_2美元正常的后勤倒退的分析。