规模安全分配培训 (Secure Distributed Training at Scale)

from arxiv, Accepted to International Conference on Machine Learning (ICML 2022). 61 pages, 10 figures. The version 4 fixes inaccuracies in the proofs of Lemmas E.2 and E.4. Code: https://github.com/yandex-research/btard

Many areas of deep learning benefit from using increasingly larger neural networks trained on public data, as is the case for pre-trained models for NLP and computer vision. Training such models requires a lot of computational resources (e.g., HPC clusters) that are not available to small research groups and independent researchers. One way to address it is for several smaller groups to pool their computational resources together and train a model that benefits all participants. Unfortunately, in this case, any participant can jeopardize the entire training run by sending incorrect updates, deliberately or by mistake. Training in presence of such peers requires specialized distributed training algorithms with Byzantine tolerance. These algorithms often sacrifice efficiency by introducing redundant communication or passing all updates through a trusted server, making it infeasible to apply them to large-scale deep learning, where models can have billions of parameters. In this work, we propose a novel protocol for secure (Byzantine-tolerant) decentralized training that emphasizes communication efficiency.

翻译：许多深层学习领域都受益于利用日益扩大的、经过公共数据培训的神经网络,如国家实验室和计算机愿景的预培训模型,培训这些模型需要大量小型研究团体和独立研究人员无法获得的计算资源(如高常委会集群),解决的方法之一是,一些较小的小组将其计算资源汇集在一起,并培训一个惠及所有参与者的模型。不幸的是,在这种情况下,任何参与者都可能因故意或错误地发送不正确的更新信息而危及整个培训过程。在这类同行在场的情况下,培训需要采用有Byzantine容忍度的专门分布式培训算法。这些算法往往通过引入多余的通信或通过信任的服务器传递所有更新信息而牺牲效率,使得无法将其应用于大规模深层学习,而模型可以有数十亿个参数。在这项工作中,我们提出了一个新的安全(Byzantine-容忍)分散化培训协议,强调通信效率。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日