There is a growing need to deploy machine learning for different tasks on a wide array of new hardware platforms. Such deployment scenarios require tackling multiple challenges, including identifying a model architecture that can achieve a suitable predictive accuracy (architecture search), and finding an efficient implementation of the model to satisfy underlying hardware-specific systems constraints such as latency (system optimization search). Existing works treat architecture search and system optimization search as separate problems and solve them sequentially. In this paper, we instead propose to solve these problems jointly, and introduce a simple but effective baseline method called SONAR that interleaves these two search problems. SONAR aims to efficiently optimize for predictive accuracy and inference latency by applying early stopping to both search processes. Our experiments on multiple different hardware back-ends show that SONAR identifies nearly optimal architectures 30 times faster than a brute force approach.
翻译:日益需要为各种新的硬件平台的不同任务部署机器学习,这种部署方案需要应对多重挑战,包括确定一个能够实现适当预测准确性的模型结构(建筑搜索),并找到高效实施模型,以满足潜在的硬件特定系统限制,如潜伏系统(系统优化搜索)等。现有的工程将建筑搜索和系统优化搜索作为不同的问题处理,并按顺序解决这些问题。在本文中,我们提议联合解决这些问题,并引入一个简单而有效的基准方法,称为SONAR,将这两个搜索问题联系起来。 SONAR的目标是通过对两种搜索进程及早停止,有效地优化预测准确性和推断延迟性。我们在多个不同硬件后端的实验显示,SONAR发现最理想的建筑比粗力方法快30倍。