Deep neural networks (DNN) have shown superior performance in a variety of tasks. As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices. Though extensive efficient accelerator designs, from traditional electronics to emerging photonics, have been successfully demonstrated, they are still bottlenecked by expensive memory accesses due to tremendous gaps between the bandwidth/power/latency of electrical memory and computing cores. Previous solutions fail to fully-leverage the ultra-fast computational speed of emerging DNN accelerators to break through the critical memory bound. In this work, we propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations, directly translating to performance improvement. We are the first to jointly explore the intrinsic correlations and bit-level redundancy within DNN kernels and propose a multi-level in situ generation mechanism with mixed-precision bases to achieve on-the-fly recovery of high-resolution parameters with minimum hardware overhead. Extensive experiments demonstrate that our proposed joint method can boost the memory efficiency by 10-20x with comparable accuracy over four state-of-the-art designs, when benchmarked on ResNet-18/DenseNet-121/MobileNetV2/V3 with various tasks.
翻译:深心神经网络(DNN)在各种任务中表现优异。随着其快速演变,其不断升级的计算和记忆要求使得在资源紧缺的边缘装置上部署它们具有挑战性。虽然已经成功地展示了从传统电子到新兴光子的高效加速器设计,但由于电内存和计算核心的带宽/功率/延缓性之间存在巨大差距,它们仍然被昂贵的内存访问器所阻挡。以前的解决方案未能充分利用新兴DNN加速器超快的计算速度,以打破关键记忆约束。在这项工作中,我们提出了一个通用的统一框架,用超快的芯计算器进行昂贵的内存交易,直接转换到性能改进。我们首先共同探索DNNN内核内核内部的内在相关性和位级冗余,并提议一个具有混合精密基础的多位本地生成机制,以在空中恢复高分辨率参数,并配备最低硬件间接费用。 广泛实验表明,我们提议的联合方法可以提高存储效率,在10-20x的网络/18x上,同时以各种可比较的精确度基准设计,使存储效率提高。