There has been significant progress in developing neural network architectures that both achieve high predictive performance and that also achieve high application-level inference throughput (e.g., frames per second). Another metric of increasing importance is GPU utilization during inference: the measurement of how well a deployed neural network uses the computational capabilities of the GPU on which it runs. Achieving high GPU utilization is critical to increasing application-level throughput and ensuring a good return on investment for deploying GPUs. This paper analyzes the GPU utilization of convolutional neural network (CNN) inference. We first survey the GPU utilization of CNNs to show that there is room to improve the GPU utilization of many of these CNNs. We then investigate the GPU utilization of networks within a neural architecture search (NAS) search space, and explore how using GPU utilization as a metric could potentially be used to accelerate NAS itself. Our study makes the case that there is room to improve the inference-time GPU utilization of CNNs and that knowledge of GPU utilization has the potential to benefit even applications that do not target utilization itself. We hope that the results of this study will spur future innovation in designing GPU-efficient neural networks.
翻译:在开发神经网络结构方面取得了显著进展,这些结构既能达到高预测性能,又能达到高应用水平的推导量(例如每秒框架)。另一个越来越重要的衡量标准是,在推断过程中利用GPU:测量部署的神经网络使用GPU所运行的GPU计算能力的情况;实现GPU的利用率对于提高应用水平的吞吐量和确保部署GPU的投资获得良好回报至关重要。本文分析了GPU对进化神经网络(CNN)利用GPU的情况。我们首先调查CNN的GPU利用情况,以显示在改进许多这些CNN的GPU利用情况方面还有空间。然后我们调查GPU在神经结构搜索空间内对网络的利用情况,并探索如何利用GPU这一利用作为衡量标准来加速NAS本身。我们的研究证明,在改进CNPU对进化神经网络(CNN)的推断-时间利用情况方面,以及GPU的利用情况有可能使GPU的应用程序受益,而这些应用没有目标的GPU利用结果。我们希望,从而推动未来的GPUP的利用结果。