Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating demands from new operators. Sparse tensor compilers simplify the development of operators, but efficient sparse compilation for deep learning remains challenging because a single sparse format cannot maximize hardware efficiency, and single-shot compilers cannot keep up with latest hardware and system advances. We show that the key to addressing both challenges is two forms of composability. In this paper, we propose SparseTIR, a sparse tensor compilation abstraction that offers composable formats and composable transformations for deep learning workloads. SparseTIR constructs a search space over these composable components for performance tuning. With these improvements, SparseTIR obtains consistent performance speedups vs vendor libraries on GPUs for single operators: 1.1-3.3x for GNN operators, 1.1-3.3x for sparse attention operators, and 0.6-2.2x for sparse convolution operators. SparseTIR also accelerates end-to-end GNNs by 1.1-2.2x for GraphSAGE training, and 4.2-16.8x for RGCN inference.
翻译:然而,开发高性能、稀薄的操作员可能困难且乏味,现有销售商图书馆无法满足新操作员不断上升的需求。 粗度的散装散装编集器简化了操作员的开发,但为深层学习而高效的散装汇编仍然具有挑战性,因为单一的稀散格式无法最大限度地提高硬件效率,单发编集器无法跟上最新的硬件和系统进步。我们表明,应对这两个挑战的关键是两种可折合形式。我们在此文件中提议,SprassTIR,一种稀有的散散散式散装散式散式汇编抽象体,为深层学习工作量提供可比较格式和可折合成转换。SprassTIR为这些可折现的部件建造了一个搜索空间,以便进行性能调整。有了这些改进,SprassTIR为单一操作员在GUPS上对供应商图书馆进行了一致的性能加速:GNNPO操作员为1.1-3.3x,对稀薄关注操作员为1.1-3.3x,对稀薄的连带操作员为0.6-2.2x。 SprassTIR还加快了GNNNNNNS端至1-2.6的加速。