Vision Transformers have enabled recent attention-based Deep Learning (DL) architectures to achieve remarkable results in Computer Vision (CV) tasks. However, due to the extensive computational resources required, these architectures are rarely implemented on resource-constrained platforms. Current research investigates hybrid handcrafted convolution-based and attention-based models for CV tasks such as image classification and object detection. In this paper, we propose HyT-NAS, an efficient Hardware-aware Neural Architecture Search (HW-NAS) including hybrid architectures targeting vision tasks on tiny devices. HyT-NAS improves state-of-the-art HW-NAS by enriching the search space and enhancing the search strategy as well as the performance predictors. Our experiments show that HyT-NAS achieves a similar hypervolume with less than ~5x training evaluations. Our resulting architecture outperforms MLPerf MobileNetV1 by 6.3% accuracy improvement with 3.5x less number of parameters on Visual Wake Words.
翻译:视觉变异器使最近基于关注的深层学习(DL)架构得以在计算机视野(CV)任务中取得显著成果,然而,由于需要大量计算资源,这些架构很少在资源限制的平台上实施。当前研究调查了以手工制作和关注为主的CV任务(如图像分类和物体探测)混合模型。在本文中,我们建议HyT-NAS, 高效的硬件智能神经结构搜索(HW-NAS), 包括针对小型装置的视觉任务的混合结构。 HyT-NAS通过丰富搜索空间、加强搜索战略以及性能预测器,改进了最新的HW-NAS。我们的实验显示, HyT-NAS以不到~5x的培训评价实现类似的超容量。我们由此形成的架构比MLPerf MobNetV1提高了6.3%的精度,而视觉觉醒的参数则减少了3.5x。</s>