Sparse matrices are an integral part of scientific simulations. As hardware evolves new sparse matrix storage formats are proposed aiming to exploit optimizations specific to the new hardware. In the era of heterogeneous computing, users often are required to use multiple formats for their applications to remain optimal across the different available hardware, resulting in larger development times and maintenance overhead. A potential solution to this problem is the use of a lightweight auto-tuner driven by Machine Learning (ML) that would select for the user an optimal format from a pool of available formats that will match the characteristics of the sparsity pattern, target hardware and operation to execute. In this paper, we introduce Morpheus-Oracle, a library that provides a lightweight ML auto-tuner capable of accurately predicting the optimal format across multiple backends, targeting the major HPC architectures aiming to eliminate any format selection input by the end-user. From more than 2000 real-life matrices, we achieve an average classification accuracy and balanced accuracy of 92.63% and 80.22% respectively across the available systems. The adoption of the auto-tuner results in average speedup of 1.1x on CPUs and 1.5x to 8x on NVIDIA and AMD GPUs, with maximum speedups reaching up to 7x and 1000x respectively.
翻译:由于硬件正在开发新的稀薄矩阵存储格式,目的是利用新硬件特有的优化。在混合计算时代,用户往往需要使用多种格式来保持其应用在不同的现有硬件中保持最佳,从而产生更大的开发时间和维护间接费用。这个问题的潜在解决办法是使用由机器学习驱动的轻量型自动教学器,由机器学习(ML)驱动,为用户选择一种最佳格式,从现有格式库中选择一种最优格式,与宽度模式、目标硬件和操作特点相匹配。在本文中,我们引入了莫斐斯-奥克拉奇,这是一个图书馆,提供轻量型ML自动教学器,能够在多个后端准确预测最佳格式,目标是消除终端用户的任何格式选择输入。从2000年以上的实际矩阵中,我们从现有系统的平均分类准确性和平衡性分别为92.63%和80.22%。我们采用自动教学结果,平均速度为CPUs的1.1x和1.5x最高速度,分别达到GVIA至8。</s>