The increased memory and processing capabilities of today's edge devices create opportunities for greater edge intelligence. In the domain of vision, the ability to adapt a Convolutional Neural Network's (CNN) structure and parameters to the input data distribution leads to systems with lower memory footprint, latency and power consumption. However, due to the limited compute resources and memory budget on edge devices, it is necessary for the system to be able to predict the latency and memory footprint of the training process in order to identify favourable training configurations of the network topology and device combination for efficient network adaptation. This work proposes perf4sight, an automated methodology for developing accurate models that predict CNN training memory footprint and latency given a target device and network. This enables rapid identification of network topologies that can be retrained on the edge device with low resource consumption. With PyTorch as the framework and NVIDIA Jetson TX2 as the target device, the developed models predict training memory footprint and latency with 95% and 91% accuracy respectively for a wide range of networks, opening the path towards efficient network adaptation on edge GPUs.
翻译:今天边缘装置的记忆和处理能力增加,为获得更大的边缘智能创造了机会。在视觉领域,使进化神经网络的结构和参数适应输入数据分布的能力导致记忆足迹、延缓力和能量消耗较少的系统。然而,由于边端装置的计算资源和记忆预算有限,系统必须能够预测培训过程的内存和记忆足迹,以便确定网络地形和装置组合的有利培训配置,从而有效地改造网络。这项工作提出了perf4sight,一种用于开发准确模型的自动方法,用于预测有目标的设备和网络的CNN培训记忆足迹和延缓力。这样可以快速识别网络的表层,在边缘装置上可以以低资源消耗量重新训练。以PyTorrch作为框架和NVIDIA Jetson TX2作为目标装置,开发模型预测培训记忆足迹和耐久性,对广泛的网络分别达到95%和91%的精度,从而打开在边缘GPUPS上高效网络适应的路径。