Linear algebra expressions, which play a central role in countless scientific computations, are often computed via a sequence of calls to existing libraries of building blocks (such as those provided by BLAS and LAPACK). A sequence identifies a computing strategy, i.e., an algorithm, and normally for one linear algebra expression many alternative algorithms exist. Although mathematically equivalent, those algorithms might exhibit significant differences in terms of performance. Several high-level languages and tools for matrix computations such as Julia, Armadillo, Linnea, etc., make algorithmic choices by minimizing the number of Floating Point Operations (FLOPs). However, there can be several algorithms that share the same (or have nearly identical) number of FLOPs; in many cases, these algorithms exhibit execution times which are statistically equivalent and one could arbitrarily select one of them as the best algorithm. It is however not unlikely to find cases where the execution times are significantly different from one another (despite the FLOP count being almost the same). It is also possible that the algorithm that minimizes FLOPs is not the one that minimizes execution time. In this work, we develop a methodology to test the reliability of FLOPs as discriminant for linear algebra algorithms. Given a set of algorithms (for an instance of a linear algebra expression) as input, the methodology ranks them into performance classes; algorithms in the same class are statistically equivalent in performance. To this end, we measure the algorithms iteratively until the changes in the ranks converge to a value close to zero. FLOPs are a valid discriminant for an instance if all the algorithms with minimum FLOPs are assigned the best rank; otherwise, the instance is regarded as an anomaly, which can then be used in the investigation of the root cause of performance differences.
翻译:在无数科学计算中起着核心作用的线性代数表达式,通常通过向现有建筑群库(例如由BLAS和LAPACK提供)的调用顺序来计算。一个序列可以确定一个计算策略,即算法,通常是一个线性代数表达式,许多替代算法存在。虽然在数学上等同,但这些算法在性能方面可能表现出显著的差异。一些用于计算矩阵的高级语言和工具,如Julia、Armadillo、Linnea等,通过尽可能减少浮点操作(FLOPs)的数量来作出算法选择。然而,可能有一些与FLOPs相同(或几乎完全相同)数目的算法可以确定一个相同的计算法。这些算法显示执行时间在统计学上和任意选择其中的一种最佳算法。然而,如果FLOPs 的调值是几乎相同的表达式,那么,也有可能通过接近的算法来尽量减少FLOPsalbrial 的算法的演算法,这样算算算算法在进行一个最起码的演算法的演算。