一个代理设备已足够用于硬件软件软件神经结构搜索 (One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search)

from arxiv, Accepted by the ACM SIGMETRICS 2022. Published in the Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 5, no. 3, Article 34, December 2021. GitHub: https://github.com/Ren-Research/OneProxy

Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity -- the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device. GitHub: https://github.com/Ren-Research/OneProxy

翻译：连锁神经网络(CNNs) 用于许多现实世界应用, 如基于视觉的自主驱动和视频内容分析。要运行CNN对各种目标装置的推断, 硬件智能神经结构搜索( NAS) 至关重要。高效硬件智能神经结构搜索( NAS) 的关键要求是快速评估推断延迟时间, 以便排列不同的结构。虽然为每个目标装置建立一个悬浮预测器( CNNs) 已被常用, 但这是一个非常耗时的过程, 在存在极其多样化的装置时, 缺乏可缩放性。在这项工作中, 我们通过利用 Latency 单调性( 硬件) 来应对可缩放性挑战。硬件神经神经结构搜索( 硬件) 关键要求高效的硬件神经结构搜索( 快速评估), 在新的目标装置上重新使用一个代理设备, 同时又不失去最佳性。在缺乏强液态的单调性的情况下, 我们建议一种高效的代用适应技术, 以显著地提升透明性设备。最后, 我们验证了我们的方法, 并用不同平台的移动系统网络的系统,, 包括移动系统移动系统搜索空间,,, 移动系统,, 网络的系统,, 我们的网络的网络的网络的,, 网络的网络的网络的, 的自动搜索的搜索系统, 搜索的,,,, 搜索的搜索的的的,,, 的的搜索的,,, 的, 搜索的的搜索的搜索的的的的的的的的,, 的的的的的的的的的的的的的的的的的的的的的的的的流式的流动的的的的的的的的的的的的的的的的的的的的的的的流式的流式的流式的的