We study the learning performance of gradient descent when the empirical risk is weakly convex, namely, the smallest negative eigenvalue of the empirical risk's Hessian is bounded in magnitude. By showing that this eigenvalue can control the stability of gradient descent, generalisation error bounds are proven that hold under a wider range of step sizes compared to previous work. Out of sample guarantees are then achieved by decomposing the test error into generalisation, optimisation and approximation errors, each of which can be bounded and traded off with respect to algorithmic parameters, sample size and magnitude of this eigenvalue. In the case of a two layer neural network, we demonstrate that the empirical risk can satisfy a notion of local weak convexity, specifically, the Hessian's smallest eigenvalue during training can be controlled by the normalisation of the layers, i.e., network scaling. This allows test error guarantees to then be achieved when the population risk minimiser satisfies a complexity assumption. By trading off the network complexity and scaling, insights are gained into the implicit bias of neural network scaling, which are further supported by experimental findings.
翻译:当实验性风险微弱时,我们研究梯度下降的学习表现,即实验性风险中最小负值的半数值,其范围是范围。通过显示这种半数值能够控制梯度下降的稳定性,一般误差的界限被证明与以前的工作相比,具有更广泛的步骤大小。然后,通过将试验错误分解成一般化、优化和近似差错,从而实现抽样保证,其中每种差错都可以在算法参数、样本大小和范围上被约束和交易。在两个层神经网络中,我们证明经验性风险可以满足当地微弱凝固度的概念,具体地说,海森在培训期间最小的精度值可以通过层的正常化来控制,即网络缩放。这样,在人口风险最小化的假设达到一个复杂度时,就可以实现试验性错误保证。通过交换网络的复杂度和扩展,洞察到神经性网络缩放的隐含偏差,这得到实验结论的进一步支持。