As a well-known optimization framework, the Alternating Direction Method of Multipliers (ADMM) has achieved tremendous success in many classification and regression applications. Recently, it has attracted the attention of deep learning researchers and is considered to be a potential substitute to Gradient Descent (GD). However, as an emerging domain, several challenges remain unsolved, including 1) The lack of global convergence guarantees, 2) Slow convergence towards solutions, and 3) Cubic time complexity with regard to feature dimensions. In this paper, we propose a novel optimization framework to solve a general neural network training problem via ADMM (dlADMM) to address these challenges simultaneously. Specifically, the parameters in each layer are updated backward and then forward so that parameter information in each layer is exchanged efficiently. When the dlADMM is applied to specific architectures, the time complexity of subproblems is reduced from cubic to quadratic via a dedicated algorithm design utilizing quadratic approximations and backtracking techniques. Last but not least, we provide the first proof of convergence to a critical point sublinearly for an ADMM-type method (dlADMM) under mild conditions. Experiments on seven benchmark datasets demonstrate the convergence, efficiency, and effectiveness of our proposed dlADMM algorithm.
翻译:作为众所周知的优化框架,倍增效应的交替方向方法(ADMM)在许多分类和回归应用中取得了巨大成功。最近,它吸引了深层学习研究人员的注意,被认为是可替代渐变源(GD)的一个潜在替代物。然而,作为一个新兴领域,若干挑战仍未解决,包括:(1) 缺乏全球趋同保证,(2) 解决办法的趋同缓慢,(3) 特征层面的立体时间复杂性。在本文件中,我们提出了一个新颖的优化框架,通过ADMM(dLAMM)解决总体神经网络培训问题,以同时应对这些挑战。具体地说,每一层的参数都向后更新,然后以有效地交换每个层的参数信息。当dlADMMM应用于特定结构时,通过利用四面近似和回溯跟踪技术的专用算法设计,使子问题从立体到四面体的复杂性降低。最后但并非最不重要,我们为ADMMM(dADMM)类型方法的关键次线提供了第一证据,以显示我们提议的趋同式数据的有效性。