The training process of deep neural networks (DNNs) is usually pipelined with stages for data preparation on CPUs followed by gradient computation on accelerators like GPUs. In an ideal pipeline, the end-to-end training throughput is eventually limited by the throughput of the accelerator, not by that of data preparation. In the past, the DNN training pipeline achieved a near-optimal throughput by utilizing datasets encoded with a lightweight, lossy image format like JPEG. However, as high-resolution, losslessly-encoded datasets become more popular for applications requiring high accuracy, a performance problem arises in the data preparation stage due to low-throughput image decoding on the CPU. Thus, we propose L3, a custom lightweight, lossless image format for high-resolution, high-throughput DNN training. The decoding process of L3 is effectively parallelized on the accelerator, thus minimizing CPU intervention for data preparation during DNN training. L3 achieves a 9.29x higher data preparation throughput than PNG, the most popular lossless image format, for the Cityscapes dataset on NVIDIA A100 GPU, which leads to 1.71x higher end-to-end training throughput. Compared to JPEG and WebP, two popular lossy image formats, L3 provides up to 1.77x and 2.87x higher end-to-end training throughput for ImageNet, respectively, at equivalent metric performance.
翻译:深神经网络(DNN)的训练过程通常与以下几个阶段相连接:关于CPU的数据编制阶段,然后是GPU等加速器的梯度计算。在理想的管道中,端到端培训输送量最终受到加速器的输送量的限制,而不是数据编制过程的限制。过去,DNN培训管道通过使用与JPEG等轻量、损耗图像格式编码的数据集,实现了接近最佳的输送量。然而,随着高分辨率,对需要高准确度的应用程序来说,无损编码数据集变得更加流行,随后是梯度计算。在一个理想的管道中,由于低通量图像解析器的输送量最终限制了端到端培训过程。因此,我们建议L3,一个定制的轻度、无损的图像格式用于高分辨率、高压 DNNPU培训。L3在DNNG培训过程中实现了较高水平的数据准备量的等值,在PNFA1上实现了9.29x等值数据准备,在高端培训过程中提供了最低的图像格式。