Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or activation word-length reduction. Thereby, layer-wise mixed-precision quantization allows for more efficient results while inflating the design space. In this work, we present an in-depth quantitative methodology to efficiently explore the design space considering the limited hardware resources of a given FPGA. Our holistic exploration approach vertically traverses the various design entry levels from the architectural down to the logic level, and laterally covers optimization from processing elements to dataflow for an efficient mixed-precision CNN accelerator. Our resulting hardware accelerators implement truly mixed-precision operations that enable efficient execution of layer-wise and channel-wise quantized CNNs. Mapping feed-forward and identity-shortcut-connection mixed-precision CNNs result in competitive accuracy-throughout trade-offs: 245 frames/s with 87.48% Top-5 accuracy for ResNet-18 and 92.9% Top-5 accuracy with 1.13 TOps/s for ResNet-152, respectively. Thereby, the required memory footprint for parameters is reduced by 4.9x and 9.4x compared to the respective floating-point baseline.
翻译:在这项工作中,我们提出了一个深入的量化方法,以有效探索设计空间,同时考虑到某个特定FPGA的有限硬件资源。我们的整体探索方法垂直穿透了从建筑到逻辑水平的各种设计进入水平,并横向覆盖了从处理元素到数据流的优化,以便实现高效混合精度CNN加速器的数据流。我们由此产生的硬件加速器实施了真正混合精度操作,从而使得能够高效率地执行分层和通道偏斜的CNN。我们提出了一种深入的定量方法,以有效探索设计空间,同时考虑到某个特定FPGA的有限硬件资源。我们的整体探索方法垂直穿透了从建筑向下到逻辑水平的各种设计进入水平,并同时覆盖了从处理元素到高效混合精度和(或)光度降低单字长度。我们产生的硬件加速器可以实现更高效的混合精度操作,从而使得能够高效地执行分层和通道偏移的CNN。我们绘制了进和身份短连接混合精度图,导致竞争性的精确度交易:245框架/有87.48%的顶端至5的精确度,以及数据流参数从处理到数据流流流到有效的CN新闻网络的精确度为RS-18和9.9x所需的最高精确度,而需要降为S-9.-9-9-9-9-9号的S-9号的S-9号的S-9号的S-9-9-9-9-9号基底的S-9x。