The record-breaking performance of deep neural networks (DNNs) comes with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage. The prohibitive energy of DRAM accesses makes it non-trivial to deploy DNN on resource-constrained devices, calling for minimizing the weight and data movements to improve the energy efficiency. We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation, in order to aggressively boost the storage and energy efficiency, for both inference and training. The core of SD is a novel weight decomposition with structural constraints, carefully crafted to unleash the hardware efficiency potential. Specifically, we decompose each weight tensor as the product of a small basis matrix and a large structurally sparse coefficient matrix whose non-zeros are quantized to power-of-2. The resulting sparse and quantized DNNs enjoy greatly reduced energy for data movement and weight storage, incurring minimal overhead to recover the original weights thanks to the sparse bit-operations and cost-favorable computations. Beyond inference, we take another leap to embrace energy-efficient training, introducing innovative techniques to address the unique roadblocks arising in training while preserving the SD structures. We also design a dedicated hardware accelerator to fully utilize the SD structure to improve the real energy efficiency and latency. We conduct experiments on both multiple tasks, models and datasets in different settings. Results show that: 1) applied to inference, SD achieves up to 2.44x energy efficiency as evaluated via real hardware implementations; 2) applied to training, SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines. Our source codes are available online.
翻译:深神经网络(DNNs)破纪录的性能具有很强的参数化,导致外部动态随机存储(DRAM)的功能性强。DRAM(DRAM)的能量令人望而却步,使得在资源限制的装置上部署DNN(DNN),要求尽量减少重量和数据移动,以提高能效。我们提出了SmartDeal(SD),这是一个用于交换高成本内存存储/存取的算法框架,以便大幅度提高储存和能源效率,以便进行推论和培训。SD的核心是结构限制的新重量分解,精心设计以释放硬件效率潜力。具体地说,我们将每件重量都降为一个小基础基矩阵和大量结构的产值,要求尽量减少重量的重量,要求将非零的重量降成一个巨大的结构来提高能效。因此,SmartDealDNNS(SD)的能量储存/进入了数据移动和重存储的能量存储的能量,由于微缩的操作和成本计算,使得原始重量恢复到原来的重量。超越了结构的重量, 应用了结构,我们应用了硬件存储的存储的储存的存储,我们进入了一个智能的储存的储存,我们进入了一个智能, 测试, 也进入了一种智能的智能的智能的智能的训练,我们进入了智能的智能的智能的智能的训练, 进入了智能的智能的智能, 进入了智能, 到了一个智能的智能的智能的训练, 的训练结构, 进入了智能的训练, 到了一个智能的训练, 向了智能的训练, 到了一个智能的训练, 向了智能的训练, 向了智能的训练,我们进入了一种特殊的训练, 向了, 向了智能结构, 向了一种特殊的训练, 向了智能的训练, 向智能的训练, 向上, 向了, 向上, 向上, 向上到了一种 的训练, 的训练, 向智能的训练到智能的训练, 向上,我们的训练到智能的训练, 向上, 向上, 向上, 向了, 向了, 向智能的训练,我们的训练, 向智能的训练,我们的训练,我们的训练,我们的