Recent advances in algorithm-hardware co-design for deep neural networks (DNNs) have demonstrated their potential in automatically designing neural architectures and hardware designs. Nevertheless, it is still a challenging optimization problem due to the expensive training cost and the time-consuming hardware implementation, which makes the exploration on the vast design space of neural architecture and hardware design intractable. In this paper, we demonstrate that our proposed approach is capable of locating designs on the Pareto frontier. This capability is enabled by a novel three-phase co-design framework, with the following new features: (a) decoupling DNN training from the design space exploration of hardware architecture and neural architecture, (b) providing a hardware-friendly neural architecture space by considering hardware characteristics in constructing the search cells, (c) adopting Gaussian process to predict accuracy, latency and power consumption to avoid time-consuming synthesis and place-and-route processes. In comparison with the manually-designed ResNet101, InceptionV2 and MobileNetV2, we can achieve up to 5% higher accuracy with up to 3x speed up on the ImageNet dataset. Compared with other state-of-the-art co-design frameworks, our found network and hardware configuration can achieve 2% ~ 6% higher accuracy, 2x ~ 26x smaller latency and 8.5x higher energy efficiency.
翻译:深神经网络的算法硬件共同设计(DNNS)最近的进展显示了其在自动设计神经结构和硬件设计方面的潜力。然而,由于培训成本昂贵和耗时的硬件实施,这仍然是一个具有挑战性的优化问题,这使得探索神经结构的巨大设计空间和硬件设计难以完成。在本文件中,我们证明我们提出的方法能够定位Pareto前沿的建筑。这一能力是由一个新的三阶段共同设计框架所促成的,具有以下新特点:(a) DNN培训与硬件结构和神经结构的设计空间探索脱钩;(b) 在建造搜索单元时考虑硬件特点,提供一个硬件友好的神经结构空间;(c) 采用高斯进程来预测准确性、惯性以及电能消耗,以避免耗时合成以及地点和路线进程。与人工设计的ResNet101、lacepionV2和MoliveNet2相比,我们可以在图像网络的3x速度上实现更高5%的精确度,在图像网络上提供硬件友好的更高速度,2 与我们找到了其他的硬件配置2。