In recent years Deep Learning reached significant results in many practical problems, such as computer vision, natural language processing, speech recognition and many others. For many years the main goal of the research was to improve the quality of models, even if the complexity was impractically high. However, for the production solutions, which often require real-time work, the latency of the model plays a very important role. Current state-of-the-art architectures are found with neural architecture search (NAS) taking model complexity into account. However, designing of the search space suitable for specific hardware is still a challenging task. To address this problem we propose a measure of hardware efficiency of neural architecture search space - matrix efficiency measure (MEM); a search space comprising of hardware-efficient operations; a latency-aware scaling method; and ISyNet - a set of architectures designed to be fast on the specialized neural processing unit (NPU) hardware and accurate at the same time. We show the advantage of the designed architectures for the NPU devices on ImageNet and the generalization ability for the downstream classification and detection tasks.
翻译:近年来,深海学习在许多实际问题上取得了重大成果,例如计算机视觉、自然语言处理、语音识别和其他许多问题。多年来,研究的主要目标是提高模型质量,即使复杂程度不切实际。然而,对于往往需要实时工作的生产解决方案而言,模型的耐久性起着非常重要的作用。当前的先进结构是结合模型复杂性进行神经结构搜索的。然而,设计适合特定硬件的搜索空间仍是一项具有挑战性的任务。为解决这一问题,我们提出了神经结构搜索空间硬件效率衡量标准-矩阵效率衡量标准(MEM);由硬件高效操作构成的搜索空间;一个液态测量测量测量方法;以及一套设计在专门神经处理单元硬件上快速和同时准确的架构。我们展示了在图像网络上设计NPU装置的架构的优势,以及下游分类和检测任务的普及能力。