For real time applications utilizing Deep Neural Networks (DNNs), it is critical that the models achieve high-accuracy on the target task and low-latency inference on the target computing platform. While Neural Architecture Search (NAS) has been effectively used to develop low-latency networks for image classification, there has been relatively little effort to use NAS to optimize DNN architectures for other vision tasks. In this work, we present what we believe to be the first proxyless hardware-aware search targeted for dense semantic segmentation. With this approach, we advance the state-of-the-art accuracy for latency-optimized networks on the Cityscapes semantic segmentation dataset. Our latency-optimized small SqueezeNAS network achieves 68.02% validation class mIOU with less than 35 ms inference times on the NVIDIA AGX Xavier. Our latency-optimized large SqueezeNAS network achieves 73.62% class mIOU with less than 100 ms inference times. We demonstrate that significant performance gains are possible by utilizing NAS to find networks optimized for both the specific task and inference hardware. We also present detailed analysis comparing our networks to recent state-of-the-art architectures.
翻译:对于利用深神经网络(DNNS)的实时应用来说,模型在目标任务和目标计算平台的低持久性推断值上达到高度精确度至关重要。虽然神经结构搜索(NAS)被有效地用于开发低纬度图像分类网络,但在利用NAS优化DNN架构以优化其他愿景任务方面,相对没有做出多少努力。在这项工作中,我们展示了我们认为是首个无代理的无代理硬件认知搜索,目标是密集的语义分化。通过这种方法,我们推进了目标任务和目标计算平台上低持久性优化网络的高级精确度。虽然神经结构搜索(NAS)被有效地用于开发低纬度图像分类网络,但是,在利用NVVIDIA AGX Xavier 上不到35米的图像时间优化 DNNNNES网络实现了68. 02%的验证等级 mIOUE。我们laent-opimation 大型SqueezeNAS网络取得了73.62%的等级 mIOUVE, 使用不到100米深度的网络进行最新的图像分析。我们还在当前的具体任务结构中找到了显著的成绩,我们对目前进行最优化的硬件的分析。我们通过国家任务分析。我们通过国家任务分析可以对具体任务分析。