Alternating Direction Method of Multipliers (ADMM) has been used successfully in many conventional machine learning applications and is considered to be a useful alternative to Stochastic Gradient Descent (SGD) as a deep learning optimizer. However, as an emerging domain, several challenges remain, including 1) The lack of global convergence guarantees, 2) Slow convergence towards solutions, and 3) Cubic time complexity with regard to feature dimensions. In this paper, we propose a novel optimization framework for deep learning via ADMM (dlADMM) to address these challenges simultaneously. The parameters in each layer are updated backward and then forward so that the parameter information in each layer is exchanged efficiently. The time complexity is reduced from cubic to quadratic in (latent) feature dimensions via a dedicated algorithm design for subproblems that enhances them utilizing iterative quadratic approximations and backtracking. Finally, we provide the first proof of global convergence for an ADMM-based method (dlADMM) in a deep neural network problem under mild conditions. Experiments on benchmark datasets demonstrated that our proposed dlADMM algorithm outperforms most of the comparison methods.
翻译:在许多常规机器学习应用中成功地使用了多种不同方向方法(ADMM)来同时应对这些挑战。每个层的参数都向后更新,然后往前更新,以便有效地交换每一层的参数信息。时间复杂性通过利用迭代四面形近似法和反向跟踪法,从立方到(近代)地貌特征的等次问题专用算法设计,从(近代)地貌的立方到(近代)地貌的四面形,从而增强这些次级问题。最后,我们提供了在温和条件下在深层神经网络问题中基于ADMMM(DLADMM)的方法的全球趋同的第一个证据。关于基准数据集的实验表明,我们提议的DLADMM算法在大部分比较方法上都超越了范围。