First-order stochastic methods for solving large-scale non-convex optimization problems are widely used in many big-data applications, e.g. training deep neural networks as well as other complex and potentially non-convex machine learning models. Their inexpensive iterations generally come together with slow global convergence rate (mostly sublinear), leading to the necessity of carrying out a very high number of iterations before the iterates reach a neighborhood of a minimizer. In this work, we present a first-order stochastic algorithm based on a combination of homotopy methods and SGD, called Homotopy-Stochastic Gradient Descent (H-SGD), which finds interesting connections with some proposed heuristics in the literature, e.g. optimization by Gaussian continuation, training by diffusion, mollifying networks. Under some mild assumptions on the problem structure, we conduct a theoretical analysis of the proposed algorithm. Our analysis shows that, with a specifically designed scheme for the homotopy parameter, H-SGD enjoys a global linear rate of convergence to a neighborhood of a minimum while maintaining fast and inexpensive iterations. Experimental evaluations confirm the theoretical results and show that H-SGD can outperform standard SGD.
翻译:在许多大数据应用中,例如培训深神经网络以及其他复杂和潜在的非凝固机器学习模型,都广泛使用第一级测序方法来解决大规模非凝固优化问题,例如,培训深神经网络以及其他复杂和潜在的非凝固机器学习模型,它们的廉价迭代通常伴随着缓慢的全球趋同率(主要是次线性线性),导致有必要进行大量的迭代,在迭代到达一个最小化器附近之前,进行大量的迭代。在这项工作中,我们提出了一个以同质调法和SGD(称为H-SGD)相结合的第一级测序算法。 SGD发现与文献中某些拟议的超常相连接的令人感兴趣,例如高斯的继续优化、通过扩散培训、网络变压。在对问题结构进行一些温和的假设下,我们对拟议的算法进行了理论分析。我们的分析表明,由于专门设计了一个同质调参数计划,H-SGD拥有与一个最起码的相近线性趋同率率,同时保持快速和廉价的理论性SG结果。