Momentum Stiefel 优化器, 应用到合适的正正正注意, 优化运输</s> (Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport) - 专知论文

会员服务 ·

0

优化器 · 动量 · 正交 · Attention · 流形 ·

2023 年 3 月 2 日

Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport

翻译：Momentum Stiefel 优化器, 应用到合适的正正正注意, 优化运输

Lingkai Kong,Yuqing Wang,Molei Tao

from arxiv, Code: https://github.com/konglk1203/VariationalStiefelOptimizer

The problem of optimization on Stiefel manifold, i.e., minimizing functions of (not necessarily square) matrices that satisfy orthogonality constraints, has been extensively studied. Yet, a new approach is proposed based on, for the first time, an interplay between thoughtfully designed continuous and discrete dynamics. It leads to a gradient-based optimizer with intrinsically added momentum. This method exactly preserves the manifold structure but does not require additional operation to keep momentum in the changing (co)tangent space, and thus has low computational cost and pleasant accuracy. Its generalization to adaptive learning rates is also demonstrated. Notable performances are observed in practical tasks. For instance, we found that placing orthogonal constraints on attention heads of trained-from-scratch Vision Transformer [Dosovitskiy et al. 2022] could markedly improve its performance, when our optimizer is used, and it is better that each head is made orthogonal within itself but not necessarily to other heads. This optimizer also makes the useful notion of Projection Robust Wasserstein Distance [Paty & Cuturi 2019; Lin et al. 2020] for high-dim. optimal transport even more effective.

翻译：Stiefel 元件上的优化问题,即将满足正方形限制的矩阵功能(不一定平方)最小化问题,已经进行了广泛研究。然而,首次根据思维周密设计的连续动态和离散动态之间的相互作用,提出了一种新的方法。它导致一个基于梯度的优化器,并具有内在的增加动力。这种方法确切地保持了多元结构,但不需要额外操作来保持变化中(相差空间)的动力,因此计算成本和准确性较低。它一般化为适应性学习率也得到了证明。在实际任务中也观察到了值得注意的绩效。例如,我们发现,对训练有素的来自斯克拉奇愿景变异器[Dosovitskiy 和al.2022] 的负责人的注意力设置矩形限制可以明显地改善其性能,当我们的优化器被使用时,每个头部本身就具有正向性,但不一定与其他头部有交错。这种优化还使得Prophion Robust Valterstein距离[甚至Paty 2019;Lin et al.2020] 的实用概念对高度更为有效。</s>

0

相关内容

优化器

【MIT Sam Hopkins】如何读论文？How to Read a Paper

【MIT Sam Hopkins】如何读论文？How to Read a Paper

专知会员服务

108+阅读 · 2022年3月20日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知会员服务

66+阅读 · 2020年6月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

核因子NF90在肝癌细胞中稳定细胞周期蛋白Cyclin E1 mRNA的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

铝合金表面激光熔覆Al-Fe-Mn-Si-Zn系高熵合金涂层的成分设计与耐磨性研究

国家自然科学基金

0+阅读 · 2013年12月31日

发射光谱解析常压均匀放电等离子体处理服用丙纶表面的反应机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

低能团簇离子注入制备宽禁带氟化石墨烯(fluorographene)

国家自然科学基金

0+阅读 · 2013年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

磷酸钒锂微纳结构的可控制备及其用作高能量锂电池正极材料的研究

国家自然科学基金

0+阅读 · 2012年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

百日咳博德特氏菌血清抗性蛋白BrkA跨膜结构域的晶体学研究

国家自然科学基金

0+阅读 · 2009年12月31日

The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium

Arxiv

0+阅读 · 2023年4月25日

Jacobi-type algorithms for homogeneous polynomial optimization on Stiefel manifolds with applications to tensor approximations

Arxiv

0+阅读 · 2023年4月25日

Unsourced Random Access With Tensor-Based and Coherent Modulations

Arxiv

0+阅读 · 2023年4月24日

A Nonparametric, Mixed Effect, Maximum Likelihood Estimator for the Distribution of Random Parameters in Discrete-Time Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol

Arxiv

0+阅读 · 2023年4月24日

Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol

Arxiv

0+阅读 · 2023年4月24日

Performance Analysis and Optimal Design of HARQ-IR-Aided Terahertz Communications

Arxiv

0+阅读 · 2023年4月22日

Under-Approximate Reachability Analysis for a Class of Linear Systems with Inputs

Arxiv

0+阅读 · 2023年4月20日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Graph Signal Processing -- Part III: Machine Learning on Graphs, from Graph Topology to Applications

Arxiv

19+阅读 · 2020年1月2日

AdderNet: Do We Really Need Multiplications in Deep Learning?

AdderNet: Do We Really Need Multiplications in Deep Learning?

Arxiv

10+阅读 · 2019年12月31日

VIP会员

文章信息

相关主题

相关VIP内容

【MIT Sam Hopkins】如何读论文？How to Read a Paper

【MIT Sam Hopkins】如何读论文？How to Read a Paper

专知会员服务

108+阅读 · 2022年3月20日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知会员服务

66+阅读 · 2020年6月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium

Arxiv

0+阅读 · 2023年4月25日

Jacobi-type algorithms for homogeneous polynomial optimization on Stiefel manifolds with applications to tensor approximations

Arxiv

0+阅读 · 2023年4月25日

Unsourced Random Access With Tensor-Based and Coherent Modulations

Arxiv

0+阅读 · 2023年4月24日

A Nonparametric, Mixed Effect, Maximum Likelihood Estimator for the Distribution of Random Parameters in Discrete-Time Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol

Arxiv

0+阅读 · 2023年4月24日

Prohorov Metric-Based Nonparametric Estimation of the Distribution of Random Parameters in Abstract Parabolic Systems with Application to the Transdermal Transport of Alcohol

Arxiv

0+阅读 · 2023年4月24日

Performance Analysis and Optimal Design of HARQ-IR-Aided Terahertz Communications

Arxiv

0+阅读 · 2023年4月22日

Under-Approximate Reachability Analysis for a Class of Linear Systems with Inputs

Arxiv

0+阅读 · 2023年4月20日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Graph Signal Processing -- Part III: Machine Learning on Graphs, from Graph Topology to Applications

Arxiv

19+阅读 · 2020年1月2日

AdderNet: Do We Really Need Multiplications in Deep Learning?

AdderNet: Do We Really Need Multiplications in Deep Learning?

Arxiv

10+阅读 · 2019年12月31日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

核因子NF90在肝癌细胞中稳定细胞周期蛋白Cyclin E1 mRNA的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

铝合金表面激光熔覆Al-Fe-Mn-Si-Zn系高熵合金涂层的成分设计与耐磨性研究

国家自然科学基金

0+阅读 · 2013年12月31日

发射光谱解析常压均匀放电等离子体处理服用丙纶表面的反应机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

低能团簇离子注入制备宽禁带氟化石墨烯(fluorographene)

国家自然科学基金

0+阅读 · 2013年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

磷酸钒锂微纳结构的可控制备及其用作高能量锂电池正极材料的研究

国家自然科学基金

0+阅读 · 2012年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

百日咳博德特氏菌血清抗性蛋白BrkA跨膜结构域的晶体学研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员