利用理论保障解决不同不平等的压缩通信的分布方法 (Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees) - 专知论文

会员服务 ·

0

鞍点 · MoDELS · 可约的 · INFORMS · Performer ·

2021 年 10 月 7 日

Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees

翻译：利用理论保障解决不同不平等的压缩通信的分布方法

Aleksandr Beznosikov,Peter Richtárik,Michael Diskin,Max Ryabinin,Alexander Gasnikov

from arxiv, 30 pages, 2 algorithms (MASHA 1 and MASHA2), 2 theorems

Variational inequalities in general and saddle point problems in particular are increasingly relevant in machine learning applications, including adversarial learning, GANs, transport and robust optimization. With increasing data and problem sizes necessary to train high performing models across these and other applications, it is necessary to rely on parallel and distributed computing. However, in distributed training, communication among the compute nodes is a key bottleneck during training, and this problem is exacerbated for high dimensional and over-parameterized models models. Due to these considerations, it is important to equip existing methods with strategies that would allow to reduce the volume of transmitted information during training while obtaining a model of comparable quality. In this paper, we present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2. Our theory and methods allow for the use of both unbiased (such as Rand$k$; MASHA1) and contractive (such as Top$k$; MASHA2) compressors. We empirically validate our conclusions using two experimental setups: a standard bilinear min-max problem, and large-scale distributed adversarial training of transformers.

翻译：在机器学习应用方面,一般的不平等,尤其是搭桥点问题,越来越与机器学习应用相关,包括对抗性学习、GANs、交通和稳健优化。随着为在这些和其他应用中培养高性能模型而需要的数据和问题规模不断增加,有必要依赖平行和分布式计算。然而,在分布式培训中,计算节点之间的沟通是培训过程中的一个关键瓶颈,而高度和超分化模型的模型则加剧了这一问题。由于这些考虑,有必要为现有方法配备战略,以便能够减少培训期间传递的信息量,同时获得同等质量的模型。在本文件中,我们介绍了第一个基于理论的分布方法,用以利用压缩通信解决变异性不平等和尖点问题:MASHA1和MASA2。我们的理论和方法既允许使用不带偏见(如Randk$;MASHA1),又允许使用合同性(如顶值美元;MASHA2)的模型。由于这些考虑,我们必须用两种实验性结构来验证我们的结论:标准双线式微轴问题和大规模分布式的顶式变压器培训。

0

相关内容

在数学中，鞍点或极大极小点是函数图形表面上的一点，其正交方向上的斜率(导数)都为零，但它不是函数的局部极值。鞍点是在某一轴向(峰值之间)有一个相对最小的临界点，在交叉轴上有一个相对最大的临界点。

2021年中国AI开发平台市场报告

2021年中国AI开发平台市场报告

专知会员服务

74+阅读 · 2021年10月26日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

专知会员服务

148+阅读 · 2020年4月20日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

专知会员服务

95+阅读 · 2019年11月12日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

On the Complexity of the Geometric Median Problem with Outliers

Arxiv

0+阅读 · 2021年12月1日

Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年11月30日

Trust the Critics: Generatorless and Multipurpose WGANs with Initial Convergence Guarantees

Arxiv

0+阅读 · 2021年11月30日

Projected Statistical Methods for Distributional Data on the Real Line with the Wasserstein Metric

Arxiv

0+阅读 · 2021年11月29日

Convergence Analysis For Non Linear System Of Parabolic Variational Inequalities

Arxiv

0+阅读 · 2021年11月28日

Recent Theoretical Advances in Non-Convex Optimization

Arxiv

0+阅读 · 2021年11月26日

Time-independent Generalization Bounds for SGLD in Non-convex Settings

Arxiv

0+阅读 · 2021年11月25日

Distributed Information-Theoretic Clustering

Arxiv

0+阅读 · 2021年11月25日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

VIP会员

文章信息

相关主题

相关VIP内容

2021年中国AI开发平台市场报告

2021年中国AI开发平台市场报告

专知会员服务

74+阅读 · 2021年10月26日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

专知会员服务

148+阅读 · 2020年4月20日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

专知会员服务

95+阅读 · 2019年11月12日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【NTU博士论文】反事实推理在多模态对话生成中的应用

基于强化学习的智能体化搜索全面综述：基础、角色、优化、评估与应用

ICCV最佳论文出炉，朱俊彦团队用砖块积木摘得桂冠

面向具身操作的高效视觉–语言–动作模型：系统综述

相关资讯

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

On the Complexity of the Geometric Median Problem with Outliers

Arxiv

0+阅读 · 2021年12月1日

Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年11月30日

Trust the Critics: Generatorless and Multipurpose WGANs with Initial Convergence Guarantees

Arxiv

0+阅读 · 2021年11月30日

Projected Statistical Methods for Distributional Data on the Real Line with the Wasserstein Metric

Arxiv

0+阅读 · 2021年11月29日

Convergence Analysis For Non Linear System Of Parabolic Variational Inequalities

Arxiv

0+阅读 · 2021年11月28日

Recent Theoretical Advances in Non-Convex Optimization

Arxiv

0+阅读 · 2021年11月26日

Time-independent Generalization Bounds for SGLD in Non-convex Settings

Arxiv

0+阅读 · 2021年11月25日

Distributed Information-Theoretic Clustering

Arxiv

0+阅读 · 2021年11月25日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

Arxiv

7+阅读 · 2020年3月12日

微信扫码咨询专知VIP会员