Training deep neural networks (DNNs) is an important and challenging optimization problem in machine learning due to its non-convexity and non-separable structure. The alternating minimization (AM) approaches split the composition structure of DNNs and have drawn great interest in the deep learning and optimization communities. In this paper, we propose a unified framework for analyzing the convergence rate of AM-type network training methods. Our analysis is based on the non-monotone $j$-step sufficient decrease conditions and the Kurdyka-Lojasiewicz (KL) property, which relaxes the requirement of designing descent algorithms. We show the detailed local convergence rate if the KL exponent $\theta$ varies in $[0,1)$. Moreover, the local R-linear convergence is discussed under a stronger $j$-step sufficient decrease condition.
翻译:深层神经网络(DNN)的训练是机器学习中一个重要而具有挑战性的优化问题,由于其非凸性和非可分离结构。交替最小化(AM)方法将DNN的组合结构拆分,并在深度学习和优化社区引起了极大的兴趣。在本文中,我们提出了一个统一的框架来分析AM类型网络训练方法的收敛速度。我们的分析基于非单调$j$步充分减量条件和Kurdyka-Lojasiewicz(KL)属性,后者放宽了设计下降算法的要求。我们展示了如果KL指数$\theta$在$[0,1)$范围内变化,则具有详细的局部收敛速度。此外,在更强的$j$-step充分减少条件下讨论了局部R线性收敛。