Deep learning training is an expensive process that extensively uses GPUs, but not all model training saturates the modern powerful GPUs. Multi-Instance GPU (MIG) is a new technology introduced by NVIDIA that can partition a GPU to better fit workloads that don't require all the memory and compute resources of a full GPU. In this paper, we examine the performance of a MIG-enabled A100 GPU under deep learning workloads of three sizes focusing on image recognition training with ResNet models. We investigate the behavior of these workloads when running in isolation on a variety of MIG instances allowed by the GPU in addition to running them in parallel on homogeneous instances co-located on the same GPU. Our results demonstrate that employing MIG can significantly improve the utilization of the GPU when the workload is too small to utilize the whole GPU in isolation. By training multiple small models in parallel, more work can be performed by the GPU per unit of time, despite the increase in time-per-epoch, leading to $\sim$3 times the throughput. In contrast, for medium and large-sized workloads, which already utilize the whole GPU well on their own, MIG only provides marginal performance improvements. Nevertheless, we observe that training models in parallel using separate MIG partitions does not exhibit interference underlining the value of having a functionality like MIG on modern GPUs.
翻译:深层次的学习培训是一个昂贵的过程,它广泛使用GPU,但并非所有的示范培训都与现代强大的GPU相匹配。多层次的GPU(MIG)是由NVIDIA引进的新技术,它可以分割一个GPU,以更好地适应不需要全部GPU的全部记忆和计算资源的工作量。在本文中,我们研究了由MIG支持的A100 GPU在三个深度学习工作量下的表现,重点是与ResNet模型的图像识别培训。我们调查了这些工作量在GPU允许的多种MIG案例单独运行时的表现,同时运行这些案例是在同一GPU同时同时运行的。我们的结果表明,如果工作量太小,无法在孤立地利用整个GPU时使用整个GPU,使用GPIG的大规模性能改进,我们只能利用GGG的全局性能模型。