双重结构梯度梯度下降算法:分析和应用于神经网络 (The duality structure gradient descent algorithm: analysis and applications to neural networks)

The training of deep neural networks is typically carried out using some form of gradient descent, often with great success. However, existing non-asymptotic analyses of first-order optimization algorithms typically employ a gradient smoothness assumption that is too strong to be applicable in the case of deep neural networks. To address this, we propose an algorithm named duality structure gradient descent (DSGD) that is amenable to non-asymptotic performance analysis, under mild assumptions on the training set and network architecture. The algorithm can be viewed as a form of layer-wise coordinate descent, where at each iteration the algorithm chooses one layer of the network to update. The decision of what layer to update is done in a greedy fashion, based on a rigorous lower bound on the improvement of the objective function for each choice of layer. In the analysis, we bound the time required to reach approximate stationary points, in both the deterministic and stochastic settings. The convergence is measured in terms of a parameter-dependent family of norms that is derived from the network architecture and designed to confirm a smoothness-like property on the gradient of the training loss function. We empirically demonstrate the effectiveness of DSGD in several neural network training scenarios.

翻译：深神经网络的培训通常使用某种形式的梯度下降,往往非常成功;然而,目前对一阶优化算法进行的非非抽量分析通常使用一种梯度平稳的假设,这种假设对于深神经网络而言太强,无法适用。为了解决这个问题,我们提议一种名为双度结构梯度下降的算法(DSGD)的算法,该算法在对培训组和网络结构的轻度假设下,可以进行非抽量性绩效分析;该算法可被视为一种分层协调下降的一种形式,在每次迭代时,算法选择一个网络的更新层。根据对改进每一层的客观功能的严格较低约束,以贪婪的方式决定了哪个层的更新。在分析中,我们把达到大约固定点所需的时间捆绑在一起,在确定性和对网络结构和网络结构的轻度假设下,用一个参数独立的标准组合来衡量其趋同性,目的是确认培训损失功能的梯度。我们用实验性地展示了网络的一些模型的有效性。

相关内容

Neural Networks

关注 0

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】计算机图形学数学结构，411页pdf，Mathematical Structures for CG

专知会员服务

92+阅读 · 2020年5月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日