水平折叠培训阵列: 用于培训新新深学习模式的有效硬件使用驱动器 (Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models)

Driven by the tremendous effort in researching novel deep learning (DL) algorithms, the training cost of developing new models increases staggeringly in recent years. To reduce this training cost and optimize the cluster-wide hardware resource usage, we analyze GPU cluster usage statistics from a well-known research institute. Our study reveals that single-accelerator training jobs can dominate the cluster-wide resource consumption when launched repetitively (e.g., for hyper-parameter tuning) while severely underutilizing the hardware. This is because DL researchers and practitioners often lack the required expertise to independently optimize their own workloads. Fortunately, we observe that such workloads have the following unique characteristics: (i) the models among jobs often have the same types of operators with the same shapes, and (ii) the inter-model horizontal fusion of such operators is mathematically equivalent to other already well-optimized operators. Thus, to help DL researchers and practitioners effectively and easily improve the hardware utilization of their novel DL training workloads, we propose Horizontally Fused Training Array (HFTA). HFTA is a new DL framework extension library that horizontally fuses the models from different repetitive jobs deeply down to operators, and then trains those models simultaneously on a shared accelerator. On three emerging DL training workloads and state-of-the-art accelerators (GPUs and TPUs), HFTA demonstrates strong effectiveness in squeezing out hardware utilization and achieves up to $15.1 \times$ higher training throughput vs. the standard practice of running each job on a separate accelerator.

翻译：在研究新的深层次学习(DL)算法的巨大努力的推动下,开发新模式的培训成本近年来惊人地增加。为了降低培训成本和优化全组硬件资源的使用效率,我们从一个知名的研究机构分析了GPU集使用统计数据。我们的研究显示,单加速器培训工作在重复启动(例如超光量调试)的同时,可以主导整个组群的资源消耗,同时严重低估硬件。这是因为DL研究人员和从业人员往往缺乏独立优化自身工作量所需的专门知识。幸运的是,我们观察到,这种工作量具有以下独特的利用性:(一) 各种工作之间的模型往往具有相同形状的操作者类型,以及(二) 这些操作者之间的模型横向融合在数学上相当于其他已经非常精准的操作者。因此,为了帮助DL研究人员和从业人员有效地提高他们新的DL值培训的硬件利用率,我们建议通过横向培训强大的Array(HFTA) 。HFTA是一个新的DL框架扩展图书馆,然后在高级操作者与高级操作者之间同时运行一个水平模型。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【斯坦福】从电子病历EHR构建知识图谱，Robustly Extracting Medical Knowledge from EHRs:A Case Study of Learning a Health Knowledge Graph

专知会员服务

56+阅读 · 2020年6月2日

【UCSD-MIT】深度学习隐私综述论文，Privacy in Deep Learning: A Survey

专知会员服务

68+阅读 · 2020年4月28日