阿波罗:可转让建筑勘探 (Apollo: Transferable Architecture Exploration)

Amir Yazdanbakhsh,Christof Angermueller,Berkin Akin,Yanqi Zhou,Albin Jones,Milad Hashemi,Kevin Swersky,Satrajit Chatterjee,Ravi Narayanaswami,James Laudon

from arxiv, 10 pages, 5 figures, Accepted to Workshop on ML for Systems at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

The looming end of Moore's Law and ascending use of deep learning drives the design of custom accelerators that are optimized for specific neural architectures. Architecture exploration for such accelerators forms a challenging constrained optimization problem over a complex, high-dimensional, and structured input space with a costly to evaluate objective function. Existing approaches for accelerator design are sample-inefficient and do not transfer knowledge between related optimizations tasks with different design constraints, such as area and/or latency budget, or neural architecture configurations. In this work, we propose a transferable architecture exploration framework, dubbed Apollo, that leverages recent advances in black-box function optimization for sample-efficient accelerator design. We use this framework to optimize accelerator configurations of a diverse set of neural architectures with alternative design constraints. We show that our framework finds high reward design configurations (up to 24.6% speedup) more sample-efficiently than a baseline black-box optimization approach. We further show that by transferring knowledge between target architectures with different design constraints, Apollo is able to find optimal configurations faster and often with better objective value (up to 25% improvements). This encouraging outcome portrays a promising path forward to facilitate generating higher quality accelerators.

翻译：摩尔法律的即将结束和深层学习的日益使用将推动定制加速器的设计。用于这种加速器的建筑探索在复杂、高维和结构化的输入空间上形成一个具有挑战性的有限优化问题, 在一个复杂、高维和结构化的输入空间上, 以成本来评估客观功能。现有的加速器设计方法是抽样效率低下的, 并且不会在具有不同设计限制的相关优化任务之间转移知识, 如面积和(或)延时预算, 或神经系统结构配置。在这项工作中, 我们提出一个可转移的架构探索框架, 称为阿波罗, 利用黑盒功能优化的最新进展来优化样本高效加速器的设计。我们使用这个框架来优化具有替代设计限制的多种神经结构的加速器配置。我们显示, 我们的框架发现高报酬设计配置( 高达24.6%的加速度) 比基线黑盒优化优化方法更有效率。我们进一步表明, 通过在不同设计制约的目标结构之间转让知识, 阿波罗尔能够找到最佳配置速度, 并常常以更客观的更高的结果。