In order to handle modern convolutional neural networks (CNNs) efficiently, a hardware architecture of CNN inference accelerator is proposed to handle depthwise convolutions and regular convolutions, which are both essential building blocks for embedded-computer-vision algorithms. Different from related works, the proposed architecture can support filter kernels with different sizes with high flexibility since it does not require extra costs for intra-kernel parallelism, and it can generate convolution results faster than the architecture of the related works. The experimental results show the importance of supporting depthwise convolutions and dilated convolutions with the proposed hardware architecture. In addition to depthwise convolutions with large-kernels, a new structure called DDC layer, which includes the combination of depthwise convolutions and dilated convolutions, is also analyzed in this paper. For face detection, the computational costs decrease by 30%, and the model size decreases by 20% when the DDC layers are applied to the network. For image classification, the accuracy is increased by 1% by simply replacing $3 \times 3$ filters with $5 \times 5$ filters in depthwise convolutions.
翻译:为了有效地处理现代进化神经网络(CNN),建议建立一个CNN推导加速器的硬件结构,以便处理深度演动和常规演动,两者都是嵌入计算机算法的基本构件。与相关工程不同,拟议的结构可以支持不同尺寸的过滤内核,因为不需要为内心平行体增加额外费用,而且它能够产生比相关工程结构更快的卷动结果。实验结果显示支持深度演动和与拟议硬件结构相扩展的重要性。除了与大内核的深度演动之外,本文件还分析了称为DDC层的新结构,其中包括深度演动和拉动的组合。关于面部检测,计算成本减少30%,在对网络应用DDC层时模型规模减少20%。关于图像分类,精确度增加1%,只需更换3美元的进位3美元过滤器。