论文标题

FLOPS作为线性代数算法的判别测试

A Test for FLOPs as a Discriminant for Linear Algebra Algorithms

论文作者

Sankaran, Aravind, Bientinesi, Paolo

论文摘要

在无数科学计算中起着核心作用的线性代数表达式通常是通过对构建块现有库(例如Blas和Lapack提供的)来计算的。一个序列识别计算策略,即算法,通常对于一个线性代数表达,存在许多替代算法。尽管在数学上等效,但这些算法在性能方面可能表现出显着差异。通过最小化浮点操作的数量(FLOPS),几种用于矩阵计算的高级语言和工具,例如Julia,Armadillo,Linnea等,做出算法选择。但是,可以有几种算法共享相同(或几乎相同)的拖鞋。在许多情况下,这些算法表现出在统计上等效的执行时间,并且可以任意选择其中一种作为最佳算法。但是,没有发现执行时间与彼此显着不同的案例(尽管失败的计数几乎相同)。最小化FLOP的算法也不是最小化执行时间的算法。在这项工作中,我们开发了一种方法来测试FLOP作为线性代数算法的判别性的可靠性。如果将一组算法(对于线性代数表达式的实例)作为输入,该方法论将它们排名为性能类。即,允许多种算法共享相同的等级。为此,我们迭代地测量算法,直到等级的变化收敛到接近零的值。如果将所有具有最小拖鞋的算法分配给最佳等级,则拖船是实例的有效判别。否则,该实例被视为一种异常,然后可以将其用于研究性能差异的根本原因。

Linear algebra expressions, which play a central role in countless scientific computations, are often computed via a sequence of calls to existing libraries of building blocks (such as those provided by BLAS and LAPACK). A sequence identifies a computing strategy, i.e., an algorithm, and normally for one linear algebra expression many alternative algorithms exist. Although mathematically equivalent, those algorithms might exhibit significant differences in terms of performance. Several high-level languages and tools for matrix computations such as Julia, Armadillo, Linnea, etc., make algorithmic choices by minimizing the number of Floating Point Operations (FLOPs). However, there can be several algorithms that share the same (or have nearly identical) number of FLOPs; in many cases, these algorithms exhibit execution times which are statistically equivalent and one could arbitrarily select one of them as the best algorithm. It is however not unlikely to find cases where the execution times are significantly different from one another (despite the FLOP count being almost the same). It is also possible that the algorithm that minimizes FLOPs is not the one that minimizes execution time. In this work, we develop a methodology to test the reliability of FLOPs as discriminant for linear algebra algorithms. Given a set of algorithms (for an instance of a linear algebra expression) as input, the methodology ranks them into performance classes; i.e., multiple algorithms are allowed to share the same rank. To this end, we measure the algorithms iteratively until the changes in the ranks converge to a value close to zero. FLOPs are a valid discriminant for an instance if all the algorithms with minimum FLOPs are assigned the best rank; otherwise, the instance is regarded as an anomaly, which can then be used in the investigation of the root cause of performance differences.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源