High-performance deep neural network (DNN)-based systems are in high demand in edge environments. Due to its high computational complexity, it is challenging to deploy DNNs on edge devices with strict limitations on computational resources. In this paper, we derive a compact while highly-accurate DNN model, termed dsODENet, by combining recently-proposed parameter reduction techniques: Neural ODE (Ordinary Differential Equation) and DSC (Depthwise Separable Convolution). Neural ODE exploits a similarity between ResNet and ODE, and shares most of weight parameters among multiple layers, which greatly reduces the memory consumption. We apply dsODENet to a domain adaptation as a practical use case with image classification datasets. We also propose a resource-efficient FPGA-based design for dsODENet, where all the parameters and feature maps except for pre- and post-processing layers can be mapped onto on-chip memories. It is implemented on Xilinx ZCU104 board and evaluated in terms of domain adaptation accuracy, inference speed, FPGA resource utilization, and speedup rate compared to a software counterpart. The results demonstrate that dsODENet achieves comparable or slightly better domain adaptation accuracy compared to our baseline Neural ODE implementation, while the total parameter size without pre- and post-processing layers is reduced by 54.2% to 79.8%. Our FPGA implementation accelerates the inference speed by 23.8 times.
翻译:高性能的深度神经网络(DNN)系统在边缘环境中需求量很大。由于其高计算复杂性,很难在具有严格计算资源限制的边缘设备上部署DNN。在本文中,我们通过结合最近提出的参数减少技术:神经ODE(Ordinary Differential Equation)和DSC(Depthwise Separable Convolution),得到一个紧凑但高度准确的DNN模型,称为dsODENet。神经ODE利用ResNet和ODE之间的相似性,并将大多数权重参数共享给多个层,从而大大减少了内存消耗。我们将dsODENet应用于实际用例域适应(domain adaptation)中的图像分类数据集。我们还提出了dsODENet的资源高效FPGA设计,其中除了预处理和后处理层之外,所有参数和特征映射都可以映射到芯片内存中。它在Xilinx ZCU104板上实现,并从领域适应精度、推理速度、FPGA资源利用率和与软件对应物的加速比等方面进行评估。结果表明,dsODENet在领域适应精度方面与基线神经ODE实现具有可比性或略好,而没有预处理和后处理层的总参数大小减少了54.2%至79.8%。我们的FPGA实现将推理速度加速了23.8倍。