Need for the efficient processing of neural networks has given rise to the development of hardware accelerators. The increased adoption of specialized hardware has highlighted the need for more agile design flows for hardware-software co-design and domain-specific optimizations. In this paper, we present CFU Playground: a full-stack open-source framework that enables rapid and iterative design and evaluation of machine learning (ML) accelerators for embedded ML systems. Our tool provides a completely open-source end-to-end flow for hardware-software co-design on FPGAs and future systems research. This full-stack framework gives the users access to explore experimental and bespoke architectures that are customized and co-optimized for embedded ML. Our rapid, deploy-profile-optimization feedback loop lets ML hardware and software developers achieve significant returns out of a relatively small investment in customization. Using CFU Playground's design and evaluation loop, we show substantial speedups between 55$\times$ and 75$\times$. The soft CPU coupled with the accelerator opens up a new, rich design space between the two components that we explore in an automated fashion using Vizier, an open-source black-box optimization service.
翻译:硬件加速神经网络的需求推动了硬件加速器的发展。专用硬件的广泛采用突显出需要更灵活的硬件-软件协同设计和领域特定的优化。在本文中,我们提出了CFU Playground:一种全栈开源框架,可快速迭代地设计和评估嵌入式机器学习(ML)系统的ML加速器。我们的工具提供了针对硬件-软件协同设计和未来系统研究的完全开源的端到端流程,为用户提供了探索定制和协同优化的实验性和定制性体系结构的开放性。我们的快速、部署-性能优化反馈循环让ML硬件和软件开发人员只需付出相对较小的定制投资便可获得显著的回报。使用CFU Playground的设计和评估环路,我们展示了55$\times$到75$\times$之间的显著加速。软CPU与加速器相结合,在两个组件之间开辟了一个新的、丰富的设计空间,我们使用开源的黑盒优化服务Vizier来自动化地探索这个空间。