旗号聚合器:利用 Convex 优化在故障和增加损失情况下的可缩放分布培训 (Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization)

Modern ML applications increasingly rely on complex deep learning models and large datasets. There has been an exponential growth in the amount of computation needed to train the largest models. Therefore, to scale computation and data, these models are inevitably trained in a distributed manner in clusters of nodes, and their updates are aggregated before being applied to the model. However, a distributed setup is prone to byzantine failures of individual nodes, components, and software. With data augmentation added to these settings, there is a critical need for robust and efficient aggregation systems. We extend the current state-of-the-art aggregators and propose an optimization-based subspace estimator by modeling pairwise distances as quadratic functions by utilizing the recently introduced Flag Median problem. The estimator in our loss function favors the pairs that preserve the norm of the difference vector. We theoretically show that our approach enhances the robustness of state-of-the-art byzantine resilient aggregators. Also, we evaluate our method with different tasks in a distributed setup with a parameter server architecture and show its communication efficiency while maintaining similar accuracy. The code is publicly available at https://github.com/hamidralmasi/FlagAggregator

翻译：现代 ML 应用程序日益依赖复杂的深层学习模型和大型数据集。培训最大模型所需的计算数量出现了指数增长。因此, 为了计算和数据,这些模型不可避免地以分布方式在节点群集中以分布方式进行计算和数据培训, 并且在应用模型之前将其更新加以汇总。但是, 分布式的设置容易发生单个节点、组件和软件的拜占廷失败。由于在这些设置中增加了数据扩增, 因此非常需要强大和高效的聚合系统。我们扩展了当前最先进的节点聚合器, 并提议了一个基于优化的子空间估计器, 通过利用最近推出的旗式介质问题来模拟对称距离作为二次函数。我们损失函数中的估计器有利于维护差异矢量规范的配对。我们理论上地显示, 我们的方法可以增强这些设置的状态- 端点有耐力的耐冲力聚合器的聚合器的坚固性。此外, 我们用分布式设置参数服务器结构的不同任务来评估我们的方法, 并显示其通信效率, 同时保持相似的精确性。在 httpgamb/ 上可公开获取的代码。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日