Optimization in Deep Learning is mainly dominated by first-order methods which are built around the central concept of backpropagation. Second-order optimization methods, which take into account the second-order derivatives are far less used despite superior theoretical properties. This inadequacy of second-order methods stems from its exorbitant computational cost, poor performance, and the ineluctable non-convex nature of Deep Learning. Several attempts were made to resolve the inadequacy of second-order optimization without reaching a cost-effective solution, much less an exact solution. In this work, we show that this long-standing problem in Deep Learning could be solved in the stochastic case, given a suitable regularization of the neural network. Interestingly, we provide an expression of the stochastic Hessian and its exact eigenvalues. We provide a closed-form formula for the exact stochastic second-order Newton direction, we solve the non-convexity issue and adjust our exact solution to favor flat minima through regularization and spectral adjustment. We test our exact stochastic second-order method on popular datasets and reveal its adequacy for Deep Learning.
翻译:深层学习的优化主要以围绕后方适应的中央概念构建的第一阶方法为主。第二阶优化方法,其中考虑到第二阶衍生物,尽管理论性质优异,但使用率要低得多。第二阶方法的这种不足源于其高昂的计算成本、性能差和深层学习不可调和的性质。曾几次试图解决第二阶优化的不足,但没有找到成本效益高的解决办法,更没有精确的解决办法。在这项工作中,我们表明深层学习中这一长期存在的问题可以在神经网络适当正规化的情况下解决。有趣的是,我们提供了热层分析法及其精度等值的表达方式。我们为精确的随机第二阶的牛顿方向提供了一种封闭式公式,我们解决了非凝固性的问题,并调整了我们的确切解决办法,以通过正规化和光谱调整来有利于平坦的微型。我们测试了大众数据集的精确的第二阶梯方法,并展示了深层学习的充足性。