Neural Architecture Search (NAS), together with model scaling, has shown remarkable progress in designing high accuracy and fast convolutional architecture families. However, as neither NAS nor model scaling considers sufficient hardware architecture details, they do not take full advantage of the emerging datacenter (DC) accelerators. In this paper, we search for fast and accurate CNN model families for efficient inference on DC accelerators. We first analyze DC accelerators and find that existing CNNs suffer from insufficient operational intensity, parallelism, and execution efficiency. These insights let us create a DC-accelerator-optimized search space, with space-to-depth, space-to-batch, hybrid fused convolution structures with vanilla and depthwise convolutions, and block-wise activation functions. On top of our DC accelerator optimized neural architecture search space, we further propose a latency-aware compound scaling (LACS), the first multi-objective compound scaling method optimizing both accuracy and latency. Our LACS discovers that network depth should grow much faster than image size and network width, which is quite different from previous compound scaling results. With the new search space and LACS, our search and scaling on datacenter accelerators results in a new model series named EfficientNet-X. EfficientNet-X is up to more than 2X faster than EfficientNet (a model series with state-of-the-art trade-off on FLOPs and accuracy) on TPUv3 and GPUv100, with comparable accuracy. EfficientNet-X is also up to 7X faster than recent RegNet and ResNeSt on TPUv3 and GPUv100.
翻译:在设计高精度和快速进化建筑家庭方面,NAS(NAS)和模型缩放显示在设计高精度和快速进化建筑家庭方面取得了显著进展。然而,由于NAS(NAS)和模型缩放都不考虑足够的硬件结构细节,它们并没有充分利用新兴的数据中心(DC)加速器。在本文中,我们搜索快速和准确的CNN模型家庭,以便在DC加速器上高效推断。我们首先分析DC加速器,发现现有CNN的操作强度、平行度和执行效率不足。这些洞察让我们创建一个DC加速器优化搜索空间空间结构,空间到深度的硬件结构细节,以及混合的组合变速器加速器加速器。在我们的DC加速器顶端,优化电动加速器建筑搜索空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间网络,其最新搜索速度和速度比图像快速的搜索速度要快得多。我们LACS发现网络网络的网络深度比图像搜索和高,而最新的搜索和空间网络空间空间网络空间空间网络的深度也比新的搜索速度要大。