近似Train:为 DNN 培训和推断快速模拟近似倍数器 (ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference)

Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This paper presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library, in order to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy of DNN training with approximate multipliers for small and large datasets (including ImageNet) using LeNets and ResNets architectures. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is only 8x faster than ApproxTrain.

翻译：深神经网络(DNN)的边缘培训是持续学习的可取目标;然而,它受到培训所需的巨大的计算能力所阻碍。硬件近似乘数已经显示,在 DNN 推推加速器中,它们能有效地提高资源效率;不过,使用大约的乘数培训基本上没有被探索。要建立资源高效加速器,并使用大约的乘数支持 DNN培训,就需要对不同 DNN 架构和不同近似乘数的培训趋同和准确度进行全面评估。本文展示了ApproxTrain,这是一个开放源框架,可以使用模拟的近似倍增倍的乘数来快速评估 DNNT 培训。 ApproxTrading and Flights Flights erveralalalalalations Arights 使用新的LUTUT 和Ormals Oral-LOFA, 将OFUDR 和Oral-LUD Aral-LUS 的直径直径直径直径直径直径直径直径对DA的直径直径直径直径直径直径、直径直径、直径直径直径直径直径、直径直径、直控和直径直对OO的直对硬对硬、直控、直径、直径、直至直至直对直径直径直径、直径直径、直控、直控、直控、直至直控、直控、直径、直径、直径、直径直距、直控、直径、直距、直控、直控、直径、直至直至直至直至直至直至直至直距、直至直至直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、直距、