逐步扩大顺序:为分配培训寻找良好的数据差异 (Scale up with Order: Finding Good Data Permutations for Distributed Training) - 专知论文

会员服务 ·

0

评论员 · 缩放 · 样例 · Learning · ENJOY ·

2023 年 2 月 2 日

Scale up with Order: Finding Good Data Permutations for Distributed Training

翻译：逐步扩大顺序:为分配培训寻找良好的数据差异

Wentao Guo,Khiem Pham,Yucheng Lu,Tiancheng Yuan,Charlie F. Ruan,Christopher De Sa

Gradient Balancing (GraB) is a recently proposed technique that finds provably better data permutations when training models with multiple epochs over a finite dataset. It converges at a faster rate than the widely adopted Random Reshuffling, by minimizing the discrepancy of the gradients on adjacently selected examples. However, GraB only operates under critical assumptions such as small batch sizes and centralized data, leaving open the question of how to order examples at large scale -- i.e. distributed learning with decentralized data. To alleviate the limitation, in this paper we propose D-GraB that involves two novel designs: (1) $\textsf{PairBalance}$ that eliminates the requirement to use stale gradient mean in GraB which critically relies on small learning rates; (2) an ordering protocol that runs $\textsf{PairBalance}$ in a distributed environment with negligible overhead, which benefits from both data ordering and parallelism. We prove D-GraB enjoys linear speed up at rate $\tilde{O}((mnT)^{-2/3})$ on smooth non-convex objectives and $\tilde{O}((mnT)^{-2})$ under PL condition, where $n$ denotes the number of parallel workers, $m$ denotes the number of examples per worker and $T$ denotes the number of epochs. Empirically, we show on various applications including GLUE, CIFAR10 and WikiText-2 that D-GraB outperforms naive parallel GraB and Distributed Random Reshuffling in terms of both training and validation performance.

翻译：渐渐平衡( GraB) 是最近提出的一种技术, 在使用多步数的模型对有限数据集进行培训时, 发现数据更加精确。它比广泛采用的随机重整速度更快, 其方法是将相邻选择的示例中的梯度差异最小化。然而, GraB 只在小批量大小和集中数据等关键假设下运作, 从而可以解决如何大规模订购示例的问题 -- 即以分散数据进行分布式学习。为了减轻限制, 我们在此文件中建议 D- graB 包含两个新设计:(1) $\ textsf{ PairBalance}, 它比广泛采用的随机重调整速度要快得多, 因为它非常依赖小的学习率; (2) 在一个分布式环境中运行$\ textsf{ PairBalBalance} 协议, 它从数据订购和平行数据中受益。我们证明 D- GraB 以直线速度上升了以 $( m) (mT\) T- t 3} 在平滑的 O- 2 train 里值操作中, 和 Real- demoteal de de demodes lades Prodeal files files files files files files files nu nu files nu 和 nu nu nu ex number number number 和 number nuts (我们 ) a nu nu nu nuts) 和 nu nuts (我们 $) 和数。

0

相关内容

评论员

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

在体示踪益生菌治疗肠易激综合征的微生态动力学研究

国家自然科学基金

0+阅读 · 2013年12月31日

"β-hCG-ERK1/2-MMP-2"信号通路在卵巢癌侵袭、转移中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

非凸Hamilton系统的Aubry-Mather理论

国家自然科学基金

0+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

退化k-Hessian方程解的正则性研究

国家自然科学基金

0+阅读 · 2011年12月31日

牦牛乳腺泌乳启动期基因表达谱及初乳期优质基因筛选

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

Greedy Training Algorithms for Neural Networks and Applications to PDEs

Arxiv

0+阅读 · 2023年3月24日

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

Arxiv

0+阅读 · 2023年3月24日

Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents

Arxiv

0+阅读 · 2023年3月24日

Logistic Regression Equivalence: A Framework for Comparing Logistic Regression Models Across Populations

Logistic Regression Equivalence: A Framework for Comparing Logistic Regression Models Across Populations

Arxiv

0+阅读 · 2023年3月23日

Decentralized Adversarial Training over Graphs

Decentralized Adversarial Training over Graphs

Arxiv

0+阅读 · 2023年3月23日

Distributed Random Reshuffling over Networks

Arxiv

0+阅读 · 2023年3月23日

Secure Aggregation in Federated Learning is not Private: Leaking User Data at Large Scale through Model Modification

Arxiv

0+阅读 · 2023年3月21日

CSK Realization for MC via Spatially Distributed Multicellular Consortia

Arxiv

0+阅读 · 2023年3月20日

Unravelling the Performance of Physics-informed Graph Neural Networks for Dynamical Systems

Arxiv

13+阅读 · 2022年11月10日

Distributed Graph Neural Network Training: A Survey

Arxiv

16+阅读 · 2022年11月1日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

Greedy Training Algorithms for Neural Networks and Applications to PDEs

Arxiv

0+阅读 · 2023年3月24日

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

Arxiv

0+阅读 · 2023年3月24日

Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents

Arxiv

0+阅读 · 2023年3月24日

Logistic Regression Equivalence: A Framework for Comparing Logistic Regression Models Across Populations

Logistic Regression Equivalence: A Framework for Comparing Logistic Regression Models Across Populations

Arxiv

0+阅读 · 2023年3月23日

Decentralized Adversarial Training over Graphs

Decentralized Adversarial Training over Graphs

Arxiv

0+阅读 · 2023年3月23日

Distributed Random Reshuffling over Networks

Arxiv

0+阅读 · 2023年3月23日

Secure Aggregation in Federated Learning is not Private: Leaking User Data at Large Scale through Model Modification

Arxiv

0+阅读 · 2023年3月21日

CSK Realization for MC via Spatially Distributed Multicellular Consortia

Arxiv

0+阅读 · 2023年3月20日

Unravelling the Performance of Physics-informed Graph Neural Networks for Dynamical Systems

Arxiv

13+阅读 · 2022年11月10日

Distributed Graph Neural Network Training: A Survey

Arxiv

16+阅读 · 2022年11月1日

相关基金

在体示踪益生菌治疗肠易激综合征的微生态动力学研究

国家自然科学基金

0+阅读 · 2013年12月31日

"β-hCG-ERK1/2-MMP-2"信号通路在卵巢癌侵袭、转移中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

非凸Hamilton系统的Aubry-Mather理论

国家自然科学基金

0+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

退化k-Hessian方程解的正则性研究

国家自然科学基金

0+阅读 · 2011年12月31日

牦牛乳腺泌乳启动期基因表达谱及初乳期优质基因筛选

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员