Sparse matrices are the key ingredients of several application domains, from scientific computation to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries. Such formats, essentially designed to optimize memory footprint, may not be as successful in performing faster processing. In other words, although they allow faster data transfer and improve memory bandwidth utilization -- the classic challenge of sparse problems -- their decompression mechanism can potentially create a computation bottleneck. Not only is this challenge not resolved, but also it becomes more serious with the advent of domain-specific architectures (DSAs), as they intend to more aggressively improve performance. The performance implications of using various formats along with DSAs, however, has not been extensively studied by prior work. To fill this gap of knowledge, we characterize the impact of using seven frequently used sparse formats on performance, based on a DSA for sparse matrix-vector multiplication (SpMV), implemented on an FPGA using high-level synthesis (HLS) tools, a growing and popular method for developing DSAs. Seeking a fair comparison, we tailor and optimize the HLS implementation of decompression for each format. We thoroughly explore diverse metrics, including decompression overhead, latency, balance ratio, throughput, memory bandwidth utilization, resource utilization, and power consumption, on a variety of real-world and synthetic sparse workloads.
翻译:从科学计算到机器学习等几个应用领域的关键成份都是松散的矩阵,从科学计算到机器学习,其首要挑战在于有效地储存和传输数据,为此提出了许多稀疏的格式,以大幅消除零条目。这类格式基本上是为优化记忆足迹而设计的,在加快处理速度方面可能没有那么成功。换句话说,尽管它们允许更快的数据传输,改进记忆带宽的利用 -- -- 这是稀疏问题的典型挑战 -- -- 其压抑机制可能会造成计算瓶颈。 这一挑战不仅没有解决,而且随着具体领域架构的出现,随着它们打算更加积极地改进绩效而变得更为严重。然而,以往的工作并未广泛研究使用与DSA一道的各种格式对业绩的影响。为了填补这一知识差距,我们描述了使用七种经常使用的稀疏漏格式对业绩的影响,即基于利用高密度矩阵-病毒倍增倍增(SpMV)的DSA(SA)机制,使用高层次合成(HLS)工具,以及日益增长和受欢迎的发展DSA的方法。我们寻求对DSA(HLS)进行真正的比较,我们调整并优化地利用HLS的存储和最深层的磁带宽度利用。