Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating demands from new operators. Sparse tensor compilers simplify the development of operators, but efficient sparse compilation for deep learning remains challenging because a single sparse format cannot maximize hardware efficiency, and single-shot compilers cannot keep up with latest hardware and system advances. We show that the key to addressing both challenges is two forms of composability. In this paper, we propose SparseTIR, a sparse tensor compilation abstraction that offers composable formats and composable transformations for deep learning workloads. SparseTIR constructs a search space over these composable components for performance tuning. With these improvements, SparseTIR obtains consistent performance speedups vs vendor libraries on GPUs for single operators: 1.1-3.3x for GNN operators and 1.1-4.4x for sparse transformer operators. SparseTIR also accelerates end-to-end GNNs by 1.1-2.2x for GraphSAGE training and 0.9-26x for RGCN inference.
翻译:然而,开发高性能的稀有操作员可能困难且乏味,现有销售商图书馆无法满足新操作员不断升级的需求。 粗度的散装编集器简化了操作员的开发,但高效的零散编集仍具有挑战性,因为单一的稀散格式无法最大限度地提高硬件效率,单发编集器无法跟上最新的硬件和系统进步。我们显示,应对这两个挑战的关键是两种可复式形式。我们在此文件中提议,SprassTIR, 一种稀有的散散式散装散式编集抽象,为深层学习工作量提供可制格式和可复式转换。 SprassTIR为这些可变装组件建造了一个搜索空间,用于进行性能调整。有了这些改进,SprassTIR在单个操作员的GPUS上获得了一致的性能加速度,相对于供应商图书馆:GNN操作员的1.1-3.3x和稀薄变压器操作员的1.1-4.4x。 SprassTIR还加快了GNNNNS的端对端端端至端点速度,为 1.1-2.2x用于图形SAGSAGSAGSAGSAGA培训,0.26x。