We theoretically study the fundamental problem of learning a single neuron with a bias term ($\mathbf{x} \mapsto \sigma(<\mathbf{w},\mathbf{x}> + b)$) in the realizable setting with the ReLU activation, using gradient descent. Perhaps surprisingly, we show that this is a significantly different and more challenging problem than the bias-less case (which was the focus of previous works on single neurons), both in terms of the optimization geometry as well as the ability of gradient methods to succeed in some scenarios. We provide a detailed study of this problem, characterizing the critical points of the objective, demonstrating failure cases, and providing positive convergence guarantees under different sets of assumptions. To prove our results, we develop some tools which may be of independent interest, and improve previous results on learning single neurons.
翻译:理论上,我们研究的是学习单一神经元的根本问题,在RELU激活的可实现环境中,使用梯度下行来学习带有偏差名词 ($\mathbf{x}\mapsto\sigma ( ⁇ mathbf{w},\mathbf{x{x{x+b)$) 的单一神经元。也许令人惊讶的是,我们发现,在优化几何以及梯度方法在某些情景中取得成功的能力方面,这与没有偏差的个案(这是以前关于单神经元的著作的重点)相比,是一个显著不同和更具挑战性的问题。 我们对此问题进行了详细研究,将目标的临界点定性化,演示失败案例,并在不同的假设下提供积极的趋同保证。为了证明我们的成果,我们开发了一些可能具有独立兴趣的工具,并改进了以前学习单神经元的成果。