Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the stepsize. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates second-order information that allows adjusting not only the stepsize but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems. More precisely, SMB requires less tuning, and shows comparable performance to other adaptive methods.
翻译:电流梯度下降法及其变体构成核心优化算法,这些算法在解决机器学习问题方面达到良好的趋同率。这些算法是特别当这些算法为手头应用进行微调时获得的。虽然这一调制过程需要大量的计算成本,但最近的工作表明,这些成本可以通过对阶梯进行迭接调整的线上搜索方法来降低。我们建议了一种替代方法,通过使用基于前步建模的新算法来进行抽查线搜索。这个建模步骤包含第二阶梯信息,不仅允许调整阶梯化,而且允许调整搜索方向。注意到深层次的学习模型参数来自各组(高压层),我们的方法为每个参数组构建了模型并计算了新的步骤。这种新型的对等化方法使选定的步长具有适应性。我们提供了趋同率分析,并实验性地表明,拟议的算法在众所周知的测试问题中实现了更快的趋同和更加概括化。更精确地说,SMB需要较少的调,并显示与其他适应方法的相似性。