Sparse general matrix multiplication (SpGEMM) is a fundamental building block in numerous scientific applications. One critical task of SpGEMM is to compute or predict the structure of the output matrix (i.e., the number of nonzero elements per output row) for efficient memory allocation and load balance, which impact the overall performance of SpGEMM. Existing work either precisely calculates the output structure or adopts upper-bound or sampling-based methods to predict the output structure. However, these methods either take much execution time or are not accurate enough. In this paper, we propose a novel sampling-based method with better accuracy and low costs compared to the existing sampling-based method. The proposed method first predicts the compression ratio of SpGEMM by leveraging the number of intermediate products (denoted as FLOP) and the number of nonzero elements (denoted as NNZ) of the same sampled result matrix. And then, the predicted output structure is obtained by dividing the FLOP per output row by the predicted compression ratio. We also propose a reference design of the existing sampling-based method with optimized computing overheads to demonstrate the better accuracy of the proposed method. We construct 625 test cases with various matrix dimensions and sparse structures to evaluate the prediction accuracy. Experimental results show that the absolute relative errors of the proposed method and the reference design are 1.56\% and 8.12\%, respectively, on average, and 25\% and 156\%, respectively, in the worst case.
翻译:简单一般矩阵乘法(SpGEMM)是许多科学应用中一个根本的构件。 SpGEMM 的关键任务是计算或预测输出矩阵的结构(即每个输出行的非零元素的数量),以便有效地进行内存分配和负载平衡,从而影响SpGEM 的总体性能。现有工作或者精确地计算产出结构,或者采用上下限或抽样方法来预测产出结构。然而,这些方法要么需要大量执行时间,或者不够准确。在本文中,我们提出了一种新的基于抽样的方法,比现有基于取样的方法更准确,成本更低。拟议方法首先预测SpGEMM的压缩比率,方法是利用同一抽样结果矩阵的中间产品数量(称为FLOP)和非零要素(称为NNZ),然后通过将FLOP每产出行的预测值除以预测最差的压缩率比率计算出来。我们还提出了一种基于抽样的现有方法的参考设计,即优化计算间接费用和成本比现有方法更精确。 拟议的方法首先通过利用中间产品(称为FLOP) 25 和精确度分析模型,然后分别分析各种方法。我们提出的方法, 和精确地评估了各种试验模型。