Deep learning (DL) workflows demand an ever-increasing budget of compute and energy in order to achieve outsized gains. Neural architecture searches, hyperparameter sweeps, and rapid prototyping consume immense resources that can prevent resource-constrained researchers from experimenting with large models and carry considerable environmental impact. As such, it becomes essential to understand how different deep neural networks (DNNs) and training leverage increasing compute and energy resources -- especially specialized computationally-intensive models across different domains and applications. In this paper, we conduct over 3,400 experiments training an array of deep networks representing various domains/tasks -- natural language processing, computer vision, and chemistry -- on up to 424 graphics processing units (GPUs). During training, our experiments systematically vary compute resource characteristics and energy-saving mechanisms such as power utilization and GPU clock rate limits to capture and illustrate the different trade-offs and scaling behaviors each representative model exhibits under various resource and energy-constrained regimes. We fit power law models that describe how training time scales with available compute resources and energy constraints. We anticipate that these findings will help inform and guide high-performance computing providers in optimizing resource utilization, by selectively reducing energy consumption for different deep learning tasks/workflows with minimal impact on training.
翻译:深度学习(DL)工作流程需要不断增加的计算和能源预算,以获得巨大的收益。神经建筑搜索、超光谱扫描和快速原型工程消耗了大量资源,这些资源可以防止资源受限制的研究人员实验大型模型并产生相当大的环境影响。因此,了解深神经网络(DNN)和训练如何利用不同程度的深度神经网络(DNN)和培训杠杆,增加计算和能源资源,特别是不同领域和应用的专门计算密集型模型。在本文中,我们进行了3 400多次实验,培训一系列代表不同领域/任务的深层次网络,包括:深语言处理、计算机视觉和化学等,最多可达424个图形处理单位。在培训期间,我们的实验系统地改变了资源特性和节能机制,如电力利用和GPUP时速限制,以捕捉和说明不同的交易和比例,以及扩大行为,每个有代表性的模型在不同的资源和能源限制制度下展示。我们采用了电法模型,说明如何用现有的计算资源和能源限制来培训时间尺度。我们预计,这些发现将有助于向高绩效的计算公司提供信息和指导,以优化能源利用方面的最佳做法计算工作。