Sparse matrix representations are ubiquitous in computational science and machine learning, leading to significant reductions in compute time, in comparison to dense representation, for problems that have local connectivity. The adoption of sparse representation in leading ML frameworks such as PyTorch is incomplete, however, with support for both automatic differentiation and GPU acceleration missing. In this work, we present an implementation of a CSR-based sparse matrix wrapper for PyTorch with CUDA acceleration for basic matrix operations, as well as automatic differentiability. We also present several applications of the resulting sparse kernels to optimization problems, demonstrating ease of implementation and performance measurements versus their dense counterparts.
翻译:在计算学和机器学习中,分布不均的矩阵表示方式普遍存在,与密集的表示方式相比,对具有当地连通性的问题,计算时间大大缩短;然而,在PyTorrch等领先的ML框架中采用稀疏的表示方式并不完全,对自动区分和GPU加速度都缺乏支持。在这项工作中,我们介绍了对PyTorch采用基于CSR的稀疏矩阵包装器的情况,CUDA加速基本矩阵操作,以及自动差异性。我们还介绍了由此产生的稀疏内核对优化问题的几种应用,表明执行和性能衡量与其密集的对应方相比是容易的。