We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions. Unlike prior work that studies the setting of zero bias, we consider the more challenging scenario when the bias of the ReLU function is non-zero. Our main result establishes that starting from random initialization, in a polynomial number of iterations gradient descent outputs, with high probability, a ReLU function that achieves a competitive error guarantee when compared to the error of the best ReLU function. We also provide finite sample guarantees, and these techniques generalize to a broader class of marginal distributions beyond Gaussians.
翻译:我们为Gaussian 分布下的单一RELU函数的不可知性学习问题提供了梯度下降的趋同性分析。 与以往研究零偏差设置的工作不同,我们认为,如果RELU函数的偏差不为零,则情况更具挑战性。 我们的主要结果确定,从随机初始化开始,从多倍迭代梯度下降输出的多倍数开始,概率很高,RELU函数在与最佳RELU函数的错误相比,可以实现竞争性错误保证。 我们还提供了有限的样本保证,这些技术可以普遍适用于高斯以外范围更广的边缘分布。