With the popularity of the deep neural network (DNN), hardware accelerators are demanded for real time execution. However, lengthy design process and fast evolving DNN models make hardware evaluation hard to meet the time to market need. This paper proposes a pre-RTL DNN hardware evaluator that supports conventional layer-by-layer processing as well as the fused layer processing for low external bandwidth requirement. The evaluator supports two state-of-the-art accelerator architectures and finds the best hardware and layer fusion group The experimental results show the layer fusion scheme can achieve 55.6% memory bandwidth reduction, 36.7% latency improvement and 49.2% energy reduction compared with layer-by-layer operation.
翻译:随着深层神经网络(DNN)的普及,需要硬件加速器来实时执行。然而,由于设计过程漫长和快速演变的DNN模型使得硬件评估难以满足市场需要的时间。本文件提议设立一个RTL DNN硬件评估员,负责支持传统的逐层处理以及用于低外部带宽要求的接合层处理。评价员支持两个最先进的加速器结构,并找到最佳的硬件和层聚变组。实验结果显示,与逐层操作相比,层聚变计划可以实现55.6%的内存带宽减速、36.7%的延绳改善和49.2%的能源减速。