Though second-order optimization methods are highly effective, popular approaches in machine learning such as SGD and Adam use only first-order information due to the difficulty of computing curvature in high dimensions. We present FOSI, a novel meta-algorithm that improves the performance of any first-order optimizer by efficiently incorporating second-order information during the optimization process. In each iteration, FOSI implicitly splits the function into two quadratic functions defined on orthogonal subspaces, then uses a second-order method to minimize the first, and the base optimizer to minimize the other. Our analysis of FOSI's preconditioner and effective Hessian proves that FOSI improves the condition number for a large family of optimizers. Our empirical evaluation demonstrates that FOSI improves the convergence rate and optimization time of GD, Heavy-Ball, and Adam when applied to several deep neural networks training tasks such as audio classification, transfer learning, and object classification and when applied to convex functions.
翻译:尽管二级优化方法非常有效,但流行的机器学习方法,如SGD和Adam,由于在高维方面计算曲线的困难而只使用一阶信息。我们介绍了FOSI,这是一个新颖的元数据,它通过在优化过程中有效整合二阶信息,改进了任何一级优化的性能。在每次循环中,FOS将功能暗含地分割成在正方位空间上定义的两个四阶函数,然后使用二阶方法尽量减少第一阶,而基底优化器尽量减少另一阶。我们对FOS的前提条件和有效赫森的分析证明FOS改进了大批优化者家庭的条件号。我们的经验评估表明,FOSI改进了GD、Wack-Ball和Adam的趋同率和优化时间,当应用到音频分类、传输学习和对象分类等若干深线网络培训任务时,当应用到直线网络培训任务时,当应用到同流函数时。