Neural architectures and hardware accelerators have been two driving forces for the progress in deep learning. Previous works typically attempt to optimize hardware given a fixed model architecture or model architecture given fixed hardware. And the dominant hardware architecture explored in this prior work is FPGAs. In our work, we target the optimization of hardware and software configurations on an industry-standard edge accelerator. We systematically study the importance and strategies of co-designing neural architectures and hardware accelerators. We make three observations: 1) the software search space has to be customized to fully leverage the targeted hardware architecture, 2) the search for the model architecture and hardware architecture should be done jointly to achieve the best of both worlds, and 3) different use cases lead to very different search outcomes. Our experiments show that the joint search method consistently outperforms previous platform-aware neural architecture search, manually crafted models, and the state-of-the-art EfficientNet on all latency targets by around 1% on ImageNet top-1 accuracy. Our method can reduce energy consumption of an edge accelerator by up to 2x under the same accuracy constraint, when co-adapting the model architecture and hardware accelerator configurations.
翻译:神经架构和硬件加速器是深层学习进步的两个驱动力。 以往的工作通常试图优化硬件, 给固定模型架构或模型架构提供固定硬件。 而在先前的工作中探索的主要硬件架构是 FPGAs 。 我们的工作目标是优化硬件和软件配置, 在工业标准边缘加速器上。 我们系统地研究共同设计神经架构和硬件加速器的重要性和战略。 我们做了三点观察:1) 软件搜索空间必须定制, 以充分利用目标硬件架构; 2) 模型架构和硬件架构的搜索应该联合进行, 以达到两个世界的最佳目标; 3) 不同的使用案例导致非常不同的搜索结果。 我们的实验显示, 联合搜索方法始终比先前的平台- 观测神经架构搜索、 手动设计模型和所有嵌入目标的状态- 高效网络, 在图像网络顶层-1 精确度上大约1% 。 我们的方法可以降低边端加速器的能源消耗量, 而在硬体结构下, 当硬体结构限制下, 联合搜索方法可以降低到 2x 。