In this paper, we present FLASH 1.0, a C++-based software framework for rapid parallel deployment and enhancing host code portability in heterogeneous computing. FLASH takes a novel approach in describing kernels and dynamically dispatching them in a hardware-agnostic manner. FLASH features truly hardware-agnostic frontend interfaces, which not only unify the compile-time control flow but also enforces a portability-optimized code organization that imposes a demarcation between computational (performance-critical) and functional (non-performance-critical) codes as well as the separation of hardware-specific and hardware-agnostic codes in the host application. We use static code analysis to measure the hardware independence ratio of popular HPC applications and show that up to 99.72% code portability can be achieved with FLASH. Similarly, we measure the complexity of state-of-the-art portable programming models and show that a code reduction of up to 2.2x can be achieved for two common HPC kernels while maintaining 100% code portability with a normalized framework overhead between 1% - 13% of the total kernel runtime. The codes are available at https://github.com/PSCLab-ASU/FLASH.
翻译:在本文中,我们介绍FLASH 1.0(一个基于C++的软件框架,用于快速平行部署和增强不同计算中主机代码的可移动性)。FLASH采用新颖的方法描述内核并动态发送它们。FLASH具有真正的硬件-敏感前端界面功能,它不仅统一了编译-控制流程,而且强制执行了可移动性优化的代码组织,它使计算(性能关键值)和功能(非性能关键值)代码以及硬件专用代码和硬件通用代码在主机应用程序中分离。我们使用静态代码分析来测量广受欢迎的HPC应用程序的硬件独立比率,并表明最多可实现99.72%的代码可移植性。同样,我们测量了目前最先进的便携式编程模型的复杂性,并显示可以将两个普通HPC内核(性能关键值)和功能(非性能关键值)代码降低到2.2x,同时保持100%的代码可移动性,使总HPC/ASFLA/FLA运行的1%-13%之间实现。