As the machine learning and systems community strives to achieve higher energy-efficiency through custom DNN accelerators and model compression techniques, there is a need for a design space exploration framework that incorporates quantization-aware processing elements into the accelerator design space while having accurate and fast power, performance, and area models. In this work, we present QAPPA, a highly parameterized quantization-aware power, performance, and area modeling framework for DNN accelerators. Our framework can facilitate the future research on design space exploration of DNN accelerators for various design choices such as bit precision, processing element type, scratchpad sizes of processing elements, global buffer size, device bandwidth, number of total processing elements in the the design, and DNN workloads. Our results show that different bit precisions and processing element types lead to significant differences in terms of performance per area and energy. Specifically, our proposed lightweight processing elements achieve up to 4.9x more performance per area and energy improvement when compared to INT16 based implementation.
翻译:随着机器学习和系统界努力通过定制的 DNN 加速器和模型压缩技术实现更高的能源效率,需要设计空间探索框架,在具有准确和快速功率、性能和面积模型的加速器设计空间时,将量化感处理元件纳入加速器设计空间;在这项工作中,我们提出了QAPPA,这是DNN加速器高度参数化量化功率、性能和地区建模框架;我们的框架可以促进今后设计DNN加速器的空间探索研究,以进行各种设计选择,例如,比特精度、处理元件类型、处理元件的刮痕尺寸、全球缓冲尺寸、设备带宽、设计中总处理元件的数量以及DNN工作量。我们的结果表明,不同比特精度和处理元件类型导致每个领域和能源性能的显著差异。具体地说,我们提议的轻量处理元件处理元件在每一区域达到4.9x的性能,在与INT16 的基础上实施时能源改进。