On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as hardware architectures are becoming more and more diverse. Today's heterogeneous systems often include two or more completely distinct and incompatible hardware execution models, such as GPGPU's, SIMD vector units, and general purpose cores which conventionally have to be programmed using separate tool chains representing non-overlapping programming models. The recent revival of interest in the industry and the wider community for the C++ language has spurred a remarkable amount of standardization proposals and technical specifications in the arena of concurrency and parallelism. This recently includes an increasing amount of discussion around the need for a uniform, higher-level abstraction and programming model for parallelism in the C++ standard targeting heterogeneous and distributed computing. Such an abstraction should perfectly blend with existing, already standardized language and library features, but should also be generic enough to support future hardware developments. In this paper, we present the results from developing such a higher-level programming abstraction for parallelism in C++ which aims at enabling code and performance portability over a wide range of architectures and for various types of parallelism. We present and compare performance data obtained from running the well-known STREAM benchmark ported to our higher level C++ abstraction with the corresponding results from running it natively. We show that our abstractions enable performance at least as good as the comparable base-line benchmarks while providing a uniform programming API on all compared target architectures.
翻译:在通往Exscal的路上,程序员面临越来越多的挑战,即必须支持同一代码基础的多种硬件结构。与此同时,随着硬件结构越来越多样化,代码和性能的可移动性越来越难以实现。今天的多样化系统往往包括两个或两个以上完全不同和不兼容的硬件执行模式,如GPGPPPU、SIMD矢量单位以及通常必须使用代表非重叠的编程模式的不同工具链来编程的通用核心。最近C++语言行业和更广泛的社区对C++语言的兴趣重新抬头,这在调和平行主义领域激发了大量标准化建议和技术规格。最近,这包括围绕C+PGPU、SIMD矢量单位和一般目的核心的硬件执行模式需要越来越多的讨论,例如,GPGPGPUPU、SIMD矢量单位以及通常必须使用代表非重叠性化的编程模式来编程。在C++语言的行业和更广泛的社区中激发了对C++语言的兴趣,这引起了相当程度的标准化的标准化建议和技术规格。最近,在C++的编程上提出了相当数量的标准化的标准化建议和技术规格建议和技术规格建议和技术规格,目的是要对CB+Brocial 进行更低级的比较,目的是让我们的编程的编程的编程的编程的编程的代码和编程质量和编程的进度,我们所的编程的编程的编程的编程的编程数据比。我们所比较了我们所比较了不同的级的编程标准,让我们的编程的C+B级的编程,让我们的编程范围,让我们的编程的编程式的编程范围,让我们的编程。