Basic Linear Algebra Subprograms (BLAS) is a core library in scientific computing and machine learning. This paper presents FT-BLAS, a new implementation of BLAS routines that not only tolerates soft errors on the fly, but also provides comparable performance to modern state-of-the-art BLAS libraries on widely-used processors such as Intel Skylake and Cascade Lake. To accommodate the features of BLAS, which contains both memory-bound and computing-bound routines, we propose a hybrid strategy to incorporate fault tolerance into our brand-new BLAS implementation: duplicating computing instructions for memory-bound Level-1 and Level-2 BLAS routines and incorporating an Algorithm-Based Fault Tolerance mechanism for computing-bound Level-3 BLAS routines. Our high performance and low overhead are obtained from delicate assembly-level optimization and a kernel-fusion approach to the computing kernels. Experimental results demonstrate that FT-BLAS offers high reliability and high performance -- faster than Intel MKL, OpenBLAS, and BLIS by up to 3.50%, 22.14% and 21.70%, respectively, for routines spanning all three levels of BLAS we benchmarked, even under hundreds of errors injected per minute.
翻译:基本线性电离值子方案( BLAS) 是科学计算和机器学习的核心图书馆。 本文展示了FT- BLAS, 这是执行BLAS新常规的FT- BLAS, 这是执行BLAS新程序的一种新做法, 不仅容忍飞行上的软差错, 而且还提供现代最先进的BLAS图书馆在Intel Skylake 和 Cascade Lake 等广泛使用的处理器方面的类似性能。 为了容纳BLAS的特征, 其中包括内存和计算机化的常规, 我们提议了一项混合战略, 将错误容忍度纳入我们的品牌新BLAS实施中: 重复使用MKL1级和二级BLAS常规的计算指令, 并纳入基于Algorithm BLAS 的失灵容忍机制。 我们的高性能和低空位来自微妙的组装级优化和计算机内核聚法。 实验结果显示, FT- BLAS提供高可靠性和高性能 -- 比Intel MKLLLLL、 Open lex lex lex lex 3 lex lexxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx