Applications in High-Performance Computing (HPC) environments face challenges due to increasing complexity. Among them, the increasing usage of sparse data pushes the limits of data structures and programming models and hampers the efficient usage of existing, highly parallel hardware. The GraphBLAS specification tackles these challenges by proposing a set of data containers and primitives, coupled with a semantics based on abstract algebraic concepts: this allows multiple applications on sparse data to be described with a small set of primitives and benefit from the many optimizations of a compile-time-known algebraic specification. Among HPC applications, the High Performance Conjugate Gradient (HPCG) benchmark is an important representative of a large body of sparse workloads, and its structure poses several programmability and performance challenges. This work tackles them by proposing and evaluating an implementation on GraphBLAS of HPCG, highlighting the main changes to its kernels. The results for shared memory systems outperforms the reference, while results in distributed systems highlight fundamental limitations of GraphBLAS-compliant implementations, which suggests several future directions.
翻译:高性能计算环境中的应用由于复杂性的增加而面临各种挑战。其中,稀疏数据的增加推动了数据结构和编程模型的极限,并妨碍了现有高度并行硬件的有效利用。GraphBLAS规范提出了一组数据容器和基元,结合基于抽象代数概念的语义:这允许使用较小的一组基元描述稀疏数据上的多个应用,并受益于编译时已知的代数规范的许多优化。在高性能计算应用中,高性能共轭梯度(HPCG)基准测试是稀疏工作负载的重要代表之一,其结构提出了若干可编程性和性能挑战。本文通过在GraphBLAS上提出和评估HPCG的实现来解决这些问题,并突出其内核的主要变化。共享内存系统的结果优于参考结果,而分布式系统的结果则突显了GraphBLAS兼容实现的基本局限性,这提示了若干未来的方向。