We analyze the properties of gradient descent on convex surrogates for the zero-one loss for the agnostic learning of linear halfspaces. If $\mathsf{OPT}$ is the best classification error achieved by a halfspace, by appealing to the notion of soft margins we are able to show that gradient descent finds halfspaces with classification error $\tilde O(\mathsf{OPT}^{1/2}) + \varepsilon$ in $\mathrm{poly}(d,1/\varepsilon)$ time and sample complexity for a broad class of distributions that includes log-concave isotropic distributions as a subclass. Along the way we answer a question recently posed by Ji et al. (2020) on how the tail behavior of a loss function can affect sample complexity and runtime guarantees for gradient descent.
翻译:我们分析了线性半空学零一损失的曲线代谢器上梯度下降的特性。 如果$\ mathsf{OPT}$是半空取得的最佳分类错误, 我们通过呼吁软边距概念, 能够显示梯度下降在分类错误中找到半空 $\ tilde O (\ mathsf{OPT ⁇ 1/2}) +\ varepsilon$, 单位为$\mathrm{poly}(d, 1/\varepsilon), 时间和样本复杂性, 包括以对数剖面等分布为子类。 在回答Ji等人最近提出的问题( 202020年), 损失函数的尾部行为如何影响样本复杂性和梯度下降的运行时间保障 。