Training wide and deep neural networks (DNNs) require large amounts of storage resources such as memory because the intermediate activation data must be saved in the memory during forward propagation and then restored for backward propagation. However, state-of-the-art accelerators such as GPUs are only equipped with very limited memory capacities due to hardware design constraints, which significantly limits the maximum batch size and hence performance speedup when training large-scale DNNs. Traditional memory saving techniques either suffer from performance overhead or are constrained by limited interconnect bandwidth or specific interconnect technology. In this paper, we propose a novel memory-efficient CNN training framework (called COMET) that leverages error-bounded lossy compression to significantly reduce the memory requirement for training, to allow training larger models or to accelerate training. Different from the state-of-the-art solutions that adopt image-based lossy compressors (such as JPEG) to compress the activation data, our framework purposely adopts error-bounded lossy compression with a strict error-controlling mechanism. Specifically, we perform a theoretical analysis on the compression error propagation from the altered activation data to the gradients, and empirically investigate the impact of altered gradients over the training process. Based on these analyses, we optimize the error-bounded lossy compression and propose an adaptive error-bound control scheme for activation data compression. We evaluate our design against state-of-the-art solutions with five widely-adopted CNNs and ImageNet dataset. Experiments demonstrate that our proposed framework can significantly reduce the training memory consumption by up to 13.5X over the baseline training and 1.8X over another state-of-the-art compression-based framework, respectively, with little or no accuracy loss.
翻译:广度和深层神经网络(DNNS)需要大量存储资源,例如记忆,因为中间激活数据必须在远端传播期间保存在记忆中,然后恢复到后向传播。然而,GPU等最先进的加速器由于硬件设计限制而只能配备非常有限的记忆能力,这大大限制了最大批量规模,因此在培训大型DNS时加快了性能。传统记忆保存技术要么受到性能管理的影响,要么受到有限的互连带带带宽度或特定互连技术的限制。在本文中,我们提议建立一个新型的记忆高效CNN培训框架(称为 ICWET), 利用受错误影响的网络缩略微缩缩略缩缩缩缩缩缩缩缩缩,以大幅降低培训的记忆需求要求,允许培训更大的模型或加快培训速度。不同于采用基于图像的最大压缩压缩压缩压缩压缩器的解决方案(如JPEGEG),我们的框架有意采用基于错误的错误压缩缩略图压缩的缩略图,我们通过一个严格的控制机制,我们用一个理论分析从已变动的服务器压缩的 Ralizalex-realtrade train view destal view destal train vidududududustring the the the the saltidudududududududustrational