In this study, we introduce a methodology for automatically transforming user applications in the radar and communication domain written in C/C++ based on dynamic profiling to a parallel representation targeted for a heterogeneous SoC. We present our approach for instrumenting the user application binary during the compilation process with barrier synchronization primitives that enable runtime system schedule and execute independent tasks concurrently over the available compute resources. We demonstrate the capabilities of our integrated compile time and runtime flow through task-level parallel and functionally correct execution of real-life applications. We perform validation of our integrated system by executing four distinct applications each carrying various degrees of task level parallelism over the Xeon-based multi-core homogeneous processor. We use the proposed compilation and code transformation methodology to re-target each application for execution on a heterogeneous SoC composed of three ARM cores and one FFT accelerator that is emulated on the Xilinx Zynq UltraScale+ platform. We demonstrate our runtime's ability to process application binary, dispatch independent tasks over the available compute resources of the emulated SoC on the Zynq FPGA based on three different scheduling heuristics. Finally we demonstrate execution of each application individually with task level parallelism on the Zynq FPGA and execution of workload scenarios composed of multiple instances of the same application as well as mixture of two distinct applications to demonstrate ability to realize both application and task level parallel execution. Our integrated approach offers a path forward for application developers to take full advantage of the target SoC without requiring users to become hardware and parallel programming experts.
翻译:在此研究中,我们引入了一种方法,在动态特征分析的基础上,自动将C/C+++所撰写的雷达和通信领域的用户应用自动转换成C/C++所撰写的雷达和通信领域的用户应用软件,并针对多元同质处理器进行平行演示。我们在编译过程中,我们展示了一种方法,即用屏障同步原始仪对用户应用二进制工具进行仪器处理,使运行时间表能够运行,同时在现有计算资源中执行独立的任务。我们展示了我们综合汇编时间的能力,通过任务级别平行和功能正确执行现实应用软件进行运行。我们通过执行四进制多级多级同质处理器执行四个不同不同的应用程序,我们使用拟议的汇编和代码转换方法,在编程过程中重新定位每个应用程序,由3个ARM核心和1FFT加速器组成,同时在Xlinx Zynq Ultra Serga+平台上进行运行。我们展示了我们的运行时间处理应用二进制方法,在Zynq平级多端应用软件的平行平行应用软件应用中,我们作为个人执行前程任务执行的双重任务执行程序,最后展示了两进化任务任务应用软件。