Neural architecture search automates neural network design and has achieved state-of-the-art results in many deep learning applications. While recent literature has focused on designing networks to maximize accuracy, little work has been conducted to understand the compatibility of architecture design spaces to varying hardware. In this paper, we analyze the neural blocks used to build Once-for-All (MobileNetV3), ProxylessNAS and ResNet families, in order to understand their predictive power and inference latency on various devices, including Huawei Kirin 9000 NPU, RTX 2080 Ti, AMD Threadripper 2990WX, and Samsung Note10. We introduce a methodology to quantify the friendliness of neural blocks to hardware and the impact of their placement in a macro network on overall network performance via only end-to-end measurements. Based on extensive profiling results, we derive design insights and apply them to hardware-specific search space reduction. We show that searching in the reduced search space generates better accuracy-latency Pareto frontiers than searching in the original search spaces, customizing architecture search according to the hardware. Moreover, insights derived from measurements lead to notably higher ImageNet top-1 scores on all search spaces investigated.
翻译:虽然最近的文献侧重于设计网络以最大限度地提高准确性,但几乎没有开展什么工作来理解建筑设计空间与各种硬件的兼容性。在本文中,我们分析了用于构建“万事通”的神经构造块(MobileNetV3)、ProxlessNAS和ResNet家庭。根据广泛的剖析结果,我们得出设计洞察力并将其应用于硬件专用搜索空间的缩小。我们显示,在缩小的搜索空间搜索比在原始搜索空间搜索更能产生精确度比在高级搜索空间搜索,根据硬件对硬件进行定制化的建筑搜索。此外,从最高图像网络到高级搜索,从所有高级搜索到硬件搜索。