Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating, "programming,"and integrating this hardware into a hardware/software system is difficult. We address this problem by extending the image processing language, Halide, so users can specify which portions of their applications should become hardware accelerators, and then we provide a compiler that uses this code to automatically create the accelerator along with the "glue" code needed for the user's application to access this hardware. Starting with Halide not only provides a very high-level functional description of the hardware, but also allows our compiler to generate the complete software program including the sequential part of the workload, which accesses the hardware for acceleration. Our system also provides high-level semantics to explore different mappings of applications to a heterogeneous system, with the added flexibility of being able to map at various throughput rates. We demonstrate our approach by mapping applications to a Xilinx Zynq system. Using its FPGA with two low-power ARM cores, our design achieves up to 6x higher performance and 8x lower energy compared to the quad-core ARM CPU on an NVIDIA Tegra K1, and 3.5x higher performance with 12x lower energy compared to the K1's 192-core GPU.
翻译:专门化图像处理加速器对于实现计算机视觉、计算摄影以及扩大现实等重要应用要求的性能和能源效率来说是必要的。 但是, 创建“ 程序化” 和将硬件整合到硬件/ 软件系统是困难的。 我们通过扩展图像处理语言 Halide 来解决这个问题, 这样用户可以指定其应用的哪一部分应该成为硬件加速器, 然后我们提供一个编译器, 使用这个代码自动创建加速器, 以及用户应用到此硬件所需的“ 胶” 代码。 从 Halide 开始, 不仅提供非常高的硬件功能描述, 而且还允许我们的编译者生成完整的软件程序, 包括连续的工作量部分, 从而获得加速的硬件。 我们的系统还可以提供高层次的语义学, 探索不同应用程序的映射到一个混杂系统, 以及能够以各种通量率绘制地图的灵活性。 我们通过将应用程序映射到Xilinx Zynq 192 系统来展示我们的方法。 从 Halinx 开始, 不仅提供该硬件的高度功能描述, 而且允许我们的编程使用两个高的FGGGGGA, 和高点 与比低的C- NARM 1, K- 和低的C- VER 10 和低能量性能性能性能设计, 和低的性能与低的性能与比比为10, K- NV 1, K- C- NBIS 和低的性能与低的性能性能性能。