As the machine learning and systems communities strive to achieve higher energy-efficiency through custom deep neural network (DNN) accelerators, varied bit precision or quantization levels, there is a need for design space exploration frameworks that incorporate quantization-aware processing elements (PE) into the accelerator design space while having accurate and fast power, performance, and area models. In this work, we present QADAM, a highly parameterized quantization-aware power, performance, and area modeling framework for DNN accelerators. Our framework can facilitate future research on design space exploration and Pareto-efficiency of DNN accelerators for various design choices such as bit precision, PE type, scratchpad sizes of PEs, global buffer size, number of total PEs, and DNN configurations. Our results show that different bit precisions and PE types lead to significant differences in terms of performance per area and energy. Specifically, our framework identifies a wide range of design points where performance per area and energy varies more than 5x and 35x, respectively. We also show that the proposed lightweight processing elements (LightPEs) consistently achieve Pareto-optimal results in terms of accuracy and hardware-efficiency. With the proposed framework, we show that LightPEs achieve on par accuracy results and up to 5.7x more performance per area and energy improvement when compared to the best INT16 based design.
翻译:机器学习和系统社区通过定制的深神经网络加速器(DNNN)加速器(DNN)努力提高能源效率。 我们的框架可以促进未来对DNN的空间探索设计以及DNN的Preto效率的研究,这些空间探索框架可以用于各种设计选择的设计选择,例如:小精度、PE类型、刮分处理大小、全球缓冲规模、总PE数量和DNN配置。我们的成果显示,不同比特精度和PE类型导致每个区域和能源的性能差异很大。具体地说,我们的框架可以确定一系列广泛的设计点,其中我们每个领域和能源的性能差异大于5x和35x。我们还可以显示,拟议的轻精度设计框架能够达到最轻的性能和最优性能框架,同时以最优性能为基准,我们还可以以最优性能和最优性能为基准,以最优性能为基准,我们还可以以最优性能和最优性能为基准。