Training deep neural networks (DNNs) is an important and challenging optimization problem in machine learning due to its non-convexity and non-separable structure. The alternating minimization (AM) approaches split the composition structure of DNNs and have drawn great interest in the deep learning and optimization communities. In this paper, we propose a unified framework for analyzing the convergence rate of AM-type network training methods. Our analysis are based on the $j$-step sufficient decrease conditions and the Kurdyka-Lojasiewicz (KL) property, which relaxes the requirement of designing descent algorithms. We show the detailed local convergence rate if the KL exponent $\theta$ varies in $[0,1)$. Moreover, the local R-linear convergence is discussed under a stronger $j$-step sufficient decrease condition.
翻译:深层神经网络培训(DNN)是机器学习中一个重要的、具有挑战性的最佳优化问题,原因是机器学习不精密,结构不分离。交替最小化(AM)方法将DNN的组成结构分割开来,对深层学习和优化社区产生了极大兴趣。在本文中,我们提议了一个用于分析AM型网络培训方法趋同率的统一框架。我们的分析基于美元分步的足够减少条件和Kurdyka-Lojasiewicz(KL)财产,后者放宽了对下限算法的设计要求。如果KL以$和美元计价,我们展示了详细的本地趋同率。此外,当地R-线性趋同是在一个更坚固的美元分步的减少条件下讨论的。