Training Deep Neural Networks (DNNs) is resource-intensive and time-consuming. While prior research has explored many different ways of reducing DNN training time, the impact of input data pipeline, i.e., fetching raw data items from storage and performing data pre-processing in memory, has been relatively unexplored. This paper makes the following contributions: (1) We present the first comprehensive analysis of how the input data pipeline affects the training time of widely-used computer vision and audio Deep Neural Networks (DNNs), that typically involve complex data preprocessing. We analyze nine different models across three tasks and four datasets while varying factors such as the amount of memory, number of CPU threads, storage device, GPU generation etc on servers that are a part of a large production cluster at Microsoft. We find that in many cases, DNN training time is dominated by data stall time: time spent waiting for data to be fetched and preprocessed. (2) We build a tool, DS-Analyzer to precisely measure data stalls using a differential technique, and perform predictive what-if analysis on data stalls. (3) Finally, based on the insights from our analysis, we design and implement three simple but effective techniques in a data-loading library, CoorDL, to mitigate data stalls. Our experiments on a range of DNN tasks, models, datasets, and hardware configs show that when PyTorch uses CoorDL instead of the state-of-the-art DALI data loading library, DNN training time is reduced significantly (by as much as 5x on a single server).
翻译:深神经网络(DNN)是资源密集和耗时的深神经网络(DNN)培训。虽然先前的研究探索了许多减少DNN培训时间的不同方法,但输入数据管道(即从存储和进行记忆中的数据预处理中获取原始数据项目)的影响相对没有探索。本文做出以下贡献:(1) 我们第一次全面分析输入数据管道如何影响广泛使用的计算机视觉和音频深神经网络(DNN)的培训时间,通常涉及复杂的数据预处理。 我们分析了三个任务和四个数据集的9种不同模型,同时分析各种因素,如存储量、CPU线条数、存储装置、GPU生成等作为微软大型生产集群一部分的服务器。 我们发现,在许多情况下, DNN培训时间由数据滞缓时间决定:等待数据被提取和预处理的时间。 (2) 我们建立一个工具,DS-Anaynzer, 精确测量数据摊位,使用差异技术,并对数据摊位进行预测什么是分析。 (3) 最后,我们从一个简单的数据存储器到一个数据存储器,我们从一个简单的数据存储器上,从一个数据存储到一个数据存储到一个数据分析。