将深学习加速器编程作为一个制约性满意度问题 (The Programming of Deep Learning Accelerators as a Constraint Satisfaction Problem)

The success of Deep Artificial Neural Networks (DNNs) in many domains created a rich body of research concerned with hardware accelerators for compute-intensive DNN operators. However, implementing such operators efficiently with complex instructions such as matrix multiply is a task not yet automated gracefully. Solving this task often requires complex program and memory layout transformations. First solutions to this problem have been proposed, such as TVM or ISAMIR, which work on a loop-level representation of operators and rewrite the program before an instruction embedding into the operator is performed. This top-down approach creates a tension between exploration range and search space complexity. In this work, we propose a new approach to this problem. We have created a bottom-up method that allows the direct generation of implementations based on an accelerator's instruction set. By formulating the embedding as a constraint satisfaction problem over the scalar dataflow, every possible embedding solution is contained in the search space. By adding additional constraints, a solver can produce the subset of preferable solutions. %From the information in a computed embedding, an implementation can be generated. A detailed evaluation using the VTA hardware accelerator with the Baidu DeepBench inference benchmark suite shows that our approach can automatically generate code competitive to reference implementations, and furthermore that memory layout flexibilty can be beneficial for overall performance. While the reference implementation achieves very low hardware utilization due to its fixed embedding strategy, we achieve a geomean speedup of up to x2.49, while individual operators can improve as much as x238.

翻译：深人工神经网络(DNN)在许多领域的成功创造了大量与硬件加速器有关的大量研究,涉及计算密集 DNN 操作员的硬件加速器。然而, 以矩阵乘法等复杂指示高效率地执行这些操作员, 并不容易实现自动化。解决这项任务通常需要复杂的程序和记忆布局转换。已经提出了这一问题的第一个解决方案, 例如 TVM 或 ISAMIR, 它们在操作员执行指令之前, 对操作员进行循环级代表, 并重写程序。这种自上而下的方法在计算精密的 DNNNO 操作员和搜索空间复杂度之间制造了一种紧张关系。在这项工作中, 我们提出了一种新的方法, 使得能够直接生成基于一个加速器指令设置的直接执行。通过将嵌入作为卡路里数据流的一个约束性满意度问题, 每一个可能的嵌入解决方案都包含在搜索空间中。通过增加额外的制约, 解决者可以生成一个更好的解决方案的子。从一个在计算参考嵌入中的信息中, 一个新的方法。我们提出了一种新的方法, 一个在深度嵌入式的递增的递增的硬化的操作的操作, 将产生一个可以产生一个在深度的硬化的硬化的硬化的操作。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

Google-EfficientNet v2来了！更快，更小，更强！

专知会员服务

19+阅读 · 2021年4月4日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日