论文标题

Ollie:基于推导的张量程序优化器

OLLIE: Derivation-based Tensor Program Optimizer

论文作者

Zheng, Liyan, Wang, Haojie, Zhai, Jidong, Hu, Muyan, Ma, Zixuan, Wang, Tuowei, Tang, Shizhi, Xie, Lei, Huang, Kezhao, Jia, Zhihao

论文摘要

由于它们在现实世界中的广泛采用,提高深神经网络(DNN)的运行时性能至关重要。现有的优化DNN张量代数表达的方法仅考虑由固定的预定义运算符表示的表达式,而一般表达式之间可能缺少可能的优化机会。我们提出了Ollie,这是第一个基于衍生的张量程序优化器。 Ollie通过利用一般张量代数表达式之间的转换来优化张量程序,从而实现了一个更大的表达搜索空间,其中包括由先前工作作为特殊情况支持的搜索空间。 Ollie使用基于混合衍生的优化器,该优化器有效地结合了探索性和指导性推导,以快速发现高度优化的表达式。对七个DNN的评估表明,Ollie可以在A100 GPU上胜过2.73 $ \ times $(平均为1.46 $ \ times $),分别在V100 GPU上分别在V100 GPU上胜过2.68 $ \ times $(1.51 $ \ times $)。

Boosting the runtime performance of deep neural networks (DNNs) is critical due to their wide adoption in real-world tasks. Existing approaches to optimizing the tensor algebra expression of a DNN only consider expressions representable by a fixed set of predefined operators, missing possible optimization opportunities between general expressions. We propose OLLIE, the first derivation-based tensor program optimizer. OLLIE optimizes tensor programs by leveraging transformations between general tensor algebra expressions, enabling a significantly larger expression search space that includes those supported by prior work as special cases. OLLIE uses a hybrid derivation-based optimizer that effectively combines explorative and guided derivations to quickly discover highly optimized expressions. Evaluation on seven DNNs shows that OLLIE can outperform existing optimizers by up to 2.73$\times$ (1.46$\times$ on average) on an A100 GPU and up to 2.68$\times$ (1.51$\times$) on a V100 GPU, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源